Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: AlphaGeometry

Floris van Doorn (Jan 17 2024 at 16:05):

Google Deepmind has created an AI AlphaGeometry that can solve 25 out of 30 recent geometry IMO problems: https://www.technologyreview.com/2024/01/17/1086722/google-deepmind-alphageometry/

Floris van Doorn (Jan 17 2024 at 16:09):

Note that these are informal-to-informal proofs, although they do use some kind of domain-specific geometry language that is (semi?-)rigorous.

Floris van Doorn (Jan 17 2024 at 16:10):

I got an early peek at the paper yesterday and was asked for comments. The actual paper seems to be not available on the Nature website yet.

Floris van Doorn (Jan 17 2024 at 16:19):

The paper is now also online: https://www.nature.com/articles/s41586-023-06747-5 [EDIT: linkfix]

David Renshaw (Jan 17 2024 at 16:20):

that's the same link as the one above?

Junyan Xu (Jan 17 2024 at 16:26):

It completed 25 within the time limit. The previous state-of-the-art system, developed by the Chinese mathematician Wen-Tsün Wu in 1978, completed only 10.

Interesting comparison between neural/statistical and ~~symbolic/mechanical~~ hybrid systems.

To train AlphaGeometry's language model, the researchers had to create their own training data to compensate for the scarcity of existing geometric data. They generated nearly half a billion random geometric diagrams and fed them to the symbolic engine. This engine analyzed each diagram and produced statements about their properties. These statements were organized into 100 million synthetic proofs to train the language model.

Synthetic data!

Floris van Doorn (Jan 17 2024 at 16:28):

David Renshaw said:

that's the same link as the one above?

fixed

Junyan Xu (Jan 17 2024 at 16:33):

Surprisingly few authors (only five). Apparently they're all from the original Google Brain (US-based) rather than DeepMind London, and this is probably done before Tony Wu left Google ...

P.S. The MIT article mentions "Thang Wang" as a coauthor but I can only find Thang Luong in the Nature paper ...

David Renshaw (Jan 17 2024 at 16:36):

We are open-sourcing the AlphaGeometry code and model, and hope that together with other tools and approaches in synthetic data generation and training, it helps open up new possibilities across mathematics, science, and AI.

https://deepmind.google/discover/blog/alphageometry-an-olympiad-level-ai-system-for-geometry/

Junyan Xu (Jan 17 2024 at 16:38):

It makes perfect sense to me now that researchers in AI are trying their hands on the IMO geometry problems first because finding solutions for them works a little bit like chess in the sense that we have a rather small number of sensible moves at every step. But I still find it stunning that they could make it work. It's an impressive achievement.

Fields medalist Ngô Bảo Châu is quoted in the DeepMind blog post (which is an obvious choice as there are three Vietnamese names among the authors; the other two are Chinese names).

NYT back story:

For four years, the computer scientist Trieu Trinh has been consumed with something of a meta-math problem: how to build an A.I. model that solves geometry problems from the International Mathematical Olympiad, the annual competition for the world’s most mathematically attuned high-school students.
Last week Dr. Trinh successfully defended his doctoral dissertation on this topic at New York University; this week, he described the result of his labors in the journal Nature.
While developing the project, Dr. Trinh pitched it to two research scientists at Google, and they brought him on as a resident from 2021 to 2023.
The paper’s co-authors are Dr. Trinh’s doctoral adviser, He He, at New York University; Yuhuai Wu, known as Tony, a co-founder of xAI (formerly at Google) who in 2019 had independently started exploring a similar idea; Thang Luong, the principal investigator, and Quoc Le, both from Google DeepMind.

Jason Rute (Jan 17 2024 at 17:09):

Haven't read it yet. I'm hoping they carefully account for data leakage. Seeing a previous IMO problem could greatly distort the pass rate. But other than that concern, this looks exciting.

Junyan Xu (Jan 17 2024 at 17:10):

Because language models excel at identifying general patterns and relationships in data, they can quickly predict potentially useful constructs, but often lack the ability to reason rigorously or explain their decisions. Symbolic deduction engines, on the other hand, are based on formal logic and use clear rules to arrive at conclusions. They are rational and explainable, but they can be “slow” and inflexible - especially when dealing with large, complex problems on their own.

Interesting characterization of symbolic methods as "slow". Maybe their neural guidance has become accurate enough so that less search is required, to compensate for the slowness of NN inference?

Olympiad geometry problems are based on diagrams that need new geometric constructs to be added before they can be solved, such as points, lines or circles. AlphaGeometry’s language model predicts which new constructs would be most useful to add, from an infinite number of possibilities. These clues help fill in the gaps and allow the symbolic engine to make further deductions about the diagram and close in on the solution.

From the example the symbolic engine processes concepts like midpoint, collinear, tangent, etc. In a sense these are APIs between the neural and symbolic parts. Should we develop / do we have parallel mathlib APIs?

The symbolic engine needs definitions and deduction rules to operate. These definitions and rules are provided in two text files defs.txt and rules.txt

See also "Extended Data Table 1 | List of actions to construct the random premises" in Nature paper.

Method:

Humans can learn geometry using a pen and paper, examining diagrams and using existing knowledge to uncover new, more sophisticated geometric properties and relationships. Our synthetic data generation approach emulates this knowledge-building process at scale, allowing us to train AlphaGeometry from scratch, without any human demonstrations.
Using highly parallelized computing, the system started by generating one billion random diagrams of geometric objects and exhaustively derived all the relationships between the points and lines in each diagram. AlphaGeometry found all the proofs contained in each diagram, then worked backwards to find out what additional constructs, if any, were needed to arrive at those proofs. We call this process “symbolic deduction and traceback”.

Solutions are human-like:

Chen said: “AlphaGeometry's output is impressive because it's both verifiable and clean. Past AI solutions to proof-based competition problems have sometimes been hit-or-miss (outputs are only correct sometimes and need human checks). AlphaGeometry doesn't have this weakness: its solutions have machine-verifiable structure. Yet despite this, its output is still human-readable. One could have imagined a computer program that solved geometry problems by brute-force coordinate systems: think pages and pages of tedious algebra calculation. AlphaGeometry is not that. It uses classical geometry rules with angles and similar triangles just as students do.”

Bronze medal with geometry alone:

As each Olympiad features six problems, only two of which are typically focused on geometry, AlphaGeometry can only be applied to one-third of the problems at a given Olympiad. Nevertheless, its geometry capability alone makes it the first AI model in the world capable of passing the bronze medal threshold of the IMO in 2000 and 2015.
In geometry, our system approaches the standard of an IMO gold-medalist, but we have our eye on an even bigger prize: advancing reasoning for next-generation AI systems. Given the wider potential of training AI systems from scratch with large-scale synthetic data, this approach could shape how the AI systems of the future discover new knowledge, in math and beyond.

Junyan Xu (Jan 17 2024 at 17:11):

Jason Rute said:

Haven't read it yet. I'm hoping they carefully account for data leakage. Seeing a previous IMO problem could greatly distort the pass rate. But other than that concern, this looks exciting.

They are touting "train from scratch, without any human demonstrations" ...

Our synthetic data generation approach emulates this knowledge-building process at scale, allowing us to train AlphaGeometry from scratch, without any human demonstrations.

I understand that the language model doesn't see any actual IMO problems, only those generated by the synthesis pipeline. However, the way the researchers designed the synthesis method may be guided by existing IMO problems.

Pietro Monticone (Jan 17 2024 at 17:46):

Main References:

Nature Article: https://doi.org/10.1038/s41586-023-06747-5
GitHub Repository: https://github.com/google-deepmind/alphageometry
YouTube Video Explainer: https://youtu.be/TuZhU1CiC0k
DeepMind Blog Post: https://deepmind.google/discover/blog/alphageometry-an-olympiad-level-ai-system-for-geometry/

Kevin Buzzard (Jan 17 2024 at 17:58):

Jason Rute said:

Haven't read it yet. I'm hoping they carefully account for data leakage. Seeing a previous IMO problem could greatly distort the pass rate. But other than that concern, this looks exciting.

My understanding is that they generated 100 million synthetic problems from some basic axioms, so didn't train on "the internet" at all.

Junyan Xu (Jan 17 2024 at 18:27):

I wonder how the system fare against the "adventitious angle" type of problems. Is there even a way to input numerical angles into the system?

Floris van Doorn (Jan 17 2024 at 19:10):

I expect that this falls outside the scope of the problems they try to solve. Also geometric inequalities (where you have an inequality on lengths or angles in either a hypothesis or the conclusion) falls outside the scope of the problems they considered.

Junyan Xu (Jan 17 2024 at 19:29):

Actually it should be possible to describe all rational multiples of π using a combination of congruence conditions (eqangle in defs.txt, and the system certainly knows about the primitive notion perp, i.e. π/2). For example, angle_bisector and trisector in defs.txt are defined using this primitive notion eqangle. Testing out of domain generalization with problems it's not designed to solve is definitely valuable and fun!

Joseph Myers (Jan 17 2024 at 20:35):

Junyan Xu said:

From the example the symbolic engine processes concepts like midpoint, collinear, tangent, etc. In a sense these are APIs between the neural and symbolic parts. Should we develop / do we have parallel mathlib APIs?

docs#midpoint docs#Collinear (both defined in an affine space context, in accordance with the usual mathlib principles of generality). We don't have a definition of tangents yet, but it would be easy to add one. (I'd suggest two definitions: one taking a docs#EuclideanGeometry.Sphere and a point and returning the affine subspace through that point and orthogonal to the radius vector (which is the tangent space if the point is on the sphere) and one that's a predicate for an affine subspace being tangent to a sphere at a given point (the point lies on both sphere and affine subspace and the affine subspace lies inside the tangent affine subspace). When you get onto two spheres being internally or externally tangent, that adds some more definitions.)

Joseph Myers (Jan 17 2024 at 20:36):

Tangency is a case where it probably does make sense to have some specifically Euclidean definitions specifically for spheres and affine subspaces, and then later link them to concepts of tangency with more general surfaces.

Joseph Myers (Jan 17 2024 at 20:39):

I think the paper is exaggerating the difficulty of geometry formalization when it talks about "translation difficulties unique to geometry" or how it requires "substantial research". There's a large body of knowledge that needs encoding and lies outside the mainstream of undergraduate mathematics or mathematical research (hence less work done on encoding it), but not any particular difficulties unique to geometry in formalizing it.

Joseph Myers (Jan 17 2024 at 20:44):

Combinatorial geometry, on the other hand, is much closer to some mainstream research than classical Euclidean geometry is. (I still intend to formalize both aperiodic monotile papers - starting with the purely discrete and combinatorial parts which largely avoid dependencies on big pieces of theory missing from mathlib - though I don't expect to have much time for that until after IMO 2024.)

Junyan Xu (Jan 17 2024 at 20:57):

To better understand what "combinatorial geometry" means, I asked Perplexity to give me some examples :)

Heather Macbeth (Jan 17 2024 at 21:18):

I also got to read it a little early.

Quick summary: I quibble a bit with both the numerator and the denominator in their topline claim ("25/30 classical geometry IMO problems since 2000"), and would call it more like 21/32 .... but this is still impressive! This is a pipeline which will generally solve the easy geometry P1/P4 on an IMO.

Note: pre-existing automated reasoning engines also solved 4-10 problems, which is already better than previously reported AI results on IMO problems ... so part of the work is "just" putting the benchmark together and trying them with appropriate computational resources.

Numerator quibbles:
IMO 2003, problem 4 they drop one direction of an "if and only if"
IMO 2004, problem 5 they drop one direction of an "if and only if"
IMO 2005, problem 5 is an existential and they tell the bot what the witness is
IMO 2008, problem 1 they solve one of the two components of a problem and count it as a full problem

Denominator quibbles:
Unclear why the following classical geometry problems were left out of the benchmark -- it seems plausible that they can be expressed in the restricted language:
IMO 2005, problem 1
IMO 2007, problem 2
IMO 2013, problem 3
IMO 2014, problem 3
IMO 2018, problem 6

Alex Kontorovich (Jan 17 2024 at 21:18):

I see the statement: "General-purpose formal languages such as Lean still require a large amount of groundwork to describe most IMO geometry problems at present." This groundwork is exactly what @André Hernández-Espiet (Rutgers)'s #7300 is trying to accomplish...

Eric Wieser (Jan 17 2024 at 21:29):

One big problem with #7300 is that it attempts to build all of the groundwork from scratch, without any attempt to interface with the other geometric foundations already in mathlib. This would be fine for a standalone project, but interconnectedness is really important for mathlib. I think that this is fixable without a total rewrite, but it will require some work in review.

Joseph Myers (Jan 17 2024 at 22:33):

The geometry I set up is deliberately following the usual conventions of interconnectedness. But since mathlib covers known mathematics in general, and results such as "these sets of axioms for geometry are equivalent" or "given these axioms for geometry without the parallel postulate, all models are either Euclidean or hyperbolic spaces and those do satisfy the axioms" are certainly known mathematics (and would allow us to complete Freek theorem 12), I think it's appropriate also to have Avigad, Tarski, Hilbert, ... geometry, each following a specific set of axioms referenced to the literature rather than more interconnected, and each developing geometry at least as far as is needed to prove those consequences about how those axioms relate to geometry set up in other ways.

Joseph Myers (Jan 17 2024 at 22:41):

Junyan Xu said:

To better understand what "combinatorial geometry" means, I asked Perplexity to give me some examples :)

The first example there is a shortlist problem that didn't get chosen for the IMO, as is the second. The third example starts with one sentence based on IMO 2013 problem 2, then switches to IMO 2011 problem 2. I guess that sort of thing is par for the course not just as an example of LLM output but also considering how AI papers etc. working in this area quite often make a similar mess themselves with claiming something was an IMO problem that wasn't (at least the present paper under discussion is using problems that are at worst slightly simplified versions of real IMO problems).

Joseph Myers (Jan 17 2024 at 22:50):

For genuine combinatorial geometry on the IMO, consider IMO 2020 P6, IMO 2017 P3, IMO 2015 P1, IMO 2014 P6, IMO 2013 P2, IMO 2011 P2, IMO 2006 P6, IMO 2002 P6, ... - you can argue that some of those are more like combinatorics problems with limited geometrical elements, but they are at least real IMO problems.

Ning DY (Jan 18 2024 at 00:29):

I wonder if we could dramatically effect the model by adjusting the sequence of letters or use unconventional naming methods. For example, use A,B,C to name the the middle of the circle, and o1, o2, o3 to name triangle. Because LLM can be easily effected by just adjusting the expressions.

Ning DY (Jan 18 2024 at 03:16):

It is my illusion or not? I feel like alphageometry construct way less auxiliary constructions to solve problems than human did, especially in complex problems like p3 or p6. For example, in 2019 p6 alphageometry only have done 2 auxiliary constructions( using one line to joint two existing points doesn’t include). So in general, alphageometry tend to use more deduction methods than constructive methods than human did?

Joseph Myers (Jan 18 2024 at 13:54):

I think there's something odd with the autoinformalization of their translated problem statements.

Take IMO 2002 P2 as an example. This is a problem where the "angle less than 120 degrees" condition in the original statement was genuinely needed for the point to be the incentre rather than an excentre, and if you failed to use that condition to justify getting the incentre rather than the excentre, you got at most 6 points on that problem at the IMO. There's nothing in the translated problem statements that looks like that condition (I don't think the language used can actually express such angle inequalities), and nothing in the machine-generated solutions that looks like using it, so AlphaGeometry should not claim more than 6 points here.

The supplementary information with problem statements and proofs lists IMO 2002 P2A and P2B, with no explanation of why there are two such variants or what the difference between them, or between them and the original problem, is meant to be. (I think any serious machine proving benchmark based on human olympiad problems, that's not pure informal-to-informal with original problem statements, should pay much more attention to giving detailed descriptions of the choices made in translating problems to the language read by the machine solver and how those translations are intended to differ from the original problem.)

The informal versions of P2A and P2B in the supplementary information look very strange. P2A says "Deﬁne point D as the circumcenter of triangle BAO."; for D in the original problem, that would only be true for an angle equal to 120 degrees. P2B says "Deﬁne point E as the circumcenter of triangle BAO. Deﬁne point F as the circumcenter of triangle BAO.", so defining both as the same point. Now, looking at https://github.com/google-deepmind/alphageometry/blob/main/imo_ag_30.txt I don't see such issues with the statements of these two problems there (though I don't know the language being used in that file), so I suspect an autoinformalization issue. (But the issue that the solver probably isn't distinguishing the incentre from the excentres, in which case it can't get more than 6 points, seems like a real one.)

Floris van Doorn (Jan 18 2024 at 14:01):

Interesting. I was already wondering exactly what language they are using, and how it deals with configurations, and whether the logic is sound.
The paper states the following, but I don't know where to find a good source for that specialized language.

To sidestep this barrier, we instead adopted a more specialized language used in GEX, JGEX, MMP, Geometer and GeoLogic

It seems that the logic (or at least their version of it) is not sound, and that at least that proof makes a configuration mistake.

Notification Bot (Jan 18 2024 at 15:20):

Caglar Gulcehre has marked this topic as resolved.

Notification Bot (Jan 18 2024 at 15:22):

Jason Rute has marked this topic as unresolved.

Trieu H. Trinh (Jan 18 2024 at 15:55):

Hi everyone, Trieu (1st author) here. I'm responsible for collecting these problems and manual translating them to imo_ag_30. The supplementary PDF is generated from imo_ag_30, so it seems the generation of this PDF has some errors.

For IMO 2002 P2, the rationale is that to prove incenter, there are two subproblems, that the point lies on each of the two bisectors, so there are 2002_2A and 2002_2B. This should reflect correctly in imo_ag_30.

Deductive database (DD) rules cannot express inequality statements (e.g. <120)., so the translated problem is proving that the point is either an incenter or excenter, which shows in the corresponding figure of the problem.

A team member noticed a similar situation for IMO 2004 P1, where the translated version is about P lies on line BC, but not P between BC (an inequality). Figure 5 shows this difference between the original and translated problem.

Regarding any claims that is IMO-grading related such as "7/7 points" or "passing bronze medal threshold", I tried to put extra scrutiny on them by asking human expert to evaluate (IMO 2000 and 2015 problems). Even with that, the implied comparison to human is not perfect because the system is not taking in natural language but only a translated version. Figure 2 states this caveat and mentions that the IMO has 3 other domains.

With this Figure 2, my purpose is to get an intuitive sense where alphageo is to humans, but I def think a truly fair comparison must adopt the AIMO convention.

Other than that, all comparisons in the work is with other machine solvers, where they use the same problem statements and their outcome is either success/fail instead of the grading x/7.

Thanks for the feedback here, I'm taking them in and revising the supplementary.

edit: tbh, it is unfortunate that Fig 2 got so much focus in the press instead of Table 1 (our own doing..)
I also hope to do more analysis on the training data and will try to release it later.
I also think this youtube vid I made explain the ideas better: https://www.youtube.com/watch?v=TuZhU1CiC0k

Thanks all for the discussions, hope to learn more from people who are olympiad expert here (I'm not qualified for the IMO)

Oliver Nash (Jan 18 2024 at 16:18):

Congratulations @Trieu H. Trinh: minor corrections aside, this is great work!

Floris van Doorn (Jan 18 2024 at 16:35):

Thank you for dropping by. I agree that this is truly impressive work!

Is there a full description of the logic / deduction rules of the specialized language for geometry that you used?

Ning DY (Jan 18 2024 at 17:07):

Trieu H. Trinh said:

Hi everyone, Trieu (1st author) here. I'm responsible for collecting these problems and manual translating them to imo_ag_30. The supplementary PDF is generated from imo_ag_30, so it seems the generation of this PDF has some errors.

For IMO 2002 P2, the rationale is that to prove incenter, there are two subproblems, that the point lies on each of the two bisectors, so there are 2002_2A and 2002_2B. This should reflect correctly in imo_ag_30.

Deductive database (DD) rules cannot express inequality statements (e.g. <120)., so the translated problem is proving that the point is either an incenter or excenter, which shows in the corresponding figure of the problem.

A team member noticed a similar situation for IMO 2004 P1, where the translated version is about P lies on line BC, but not P between BC (an inequality). Figure 5 shows this difference between the original and translated problem.

Regarding any claims that is IMO-grading related such as "7/7 points" or "passing bronze medal threshold", I tried to put extra scrutiny on them by asking human expert to evaluate (IMO 2000 and 2015 problems). Even with that, the implied comparison to human is not perfect because the system is not taking in natural language but only a translated version. Figure 2 states this caveat and mentions that the IMO has 3 other domains.

With this Figure 2, my purpose is to get an intuitive sense where alphageo is to humans, but I def think a truly fair comparison must adopt the AIMO convention.

Other than that, all comparisons in the work is with other machine solvers, where they use the same problem statements and their outcome is either success/fail instead of the grading x/7.

Thanks for the feedback here, I'm taking them in and revising the supplementary.

edit: tbh, it is unfortunate that Fig 2 got so much focus in the press instead of Table 1 (our own doing..)
I also hope to do more analysis on the training data and will try to release it later.
I also think this youtube vid I made explain the ideas better: https://www.youtube.com/watch?v=TuZhU1CiC0k

Thanks all for the discussions, hope to learn more from people who are olympiad expert here (I'm not qualified for the IMO)

@Trieu H. Trinh
Congratulations to you and your team. That is really an impress work. But I have a little question.
I have read the auxiliary constructions in supplementary informations and compare it to the Table one. And I see the construction methods applied by machine seems to be the exactly actions to construct the random premises or their direct combinations. So does the machine did any creative or uncommon auxiliary constructions which are unrelated to the preset actions in generating the training datas to solve the problems? Do your team find any when analysis it’s performances? Thank you.

Joseph Myers (Jan 19 2024 at 00:21):

Trieu H. Trinh said:

For IMO 2002 P2, the rationale is that to prove incenter, there are two subproblems, that the point lies on each of the two bisectors, so there are 2002_2A and 2002_2B. This should reflect correctly in imo_ag_30.

Thanks for the explanation. It would seem preferable if the language used had a concept of "incenter_or_excenter", at least, to allow a single problem statement to be closer to the informal statement. (My view for Lean formalizations is that it's much better to add vocabulary to mathlib to talk about the concepts used in informal IMO problem statements, than to create more artificial problem statements because of lack of vocabulary in mathlib. But then, much of the point of IMO formalizations is to help build out the associated theory in mathlib proper.)

In https://github.com/leanprover-community/mathlib4/blob/master/Archive/Imo/Imo2019Q2.lean I included a detailed library note describing conventions for turning an informal IMO geometry problem statement into a formal Lean statement (most IMO geometry problems can't yet be stated in a natural way in Lean because they require other geometry not yet in mathlib - building up things needed for that one problem took 200 PRs). I feel it would be helpful if benchmarks generally for computer solving of mathematical problems originally set for humans, when based on a translation of the problems to some machine language (whether formal such as Lean, or a domain-specific language as in this case), came with a similarly detailed description of conventions used for the translation, along with notes for each individual problem on any further choices that needed to be made to translate it to machine language.

Joseph Myers (Jan 19 2024 at 00:29):

Trieu H. Trinh said:

Regarding any claims that is IMO-grading related such as "7/7 points" or "passing bronze medal threshold", I tried to put extra scrutiny on them by asking human expert to evaluate (IMO 2000 and 2015 problems). Even with that, the implied comparison to human is not perfect because the system is not taking in natural language but only a translated version. Figure 2 states this caveat and mentions that the IMO has 3 other domains.

With this Figure 2, my purpose is to get an intuitive sense where alphageo is to humans, but I def think a truly fair comparison must adopt the AIMO convention.

When I briefly talked to the AIMO people, I suggested that they should ask the IMO Board about publishing the mark schemes used at the IMO, so that they could be used for self-assessment by people working on AI solvers. I don't know whether they have asked the IMO Board about that, or whether the Board or Jury would be supportive of publishing mark schemes, or how many mark schemes from past IMOs could readily be collected, but personally I think that publishing mark schemes would be a good idea.

(Mark schemes do accumulate case law during coordination, but just having the original scheme without such case law should be sufficient for most purposes for self-assessment of AI solutions - it would tell you whether human contestants needed to prove a particular bit of geometrical configuration information to get full marks, for example - and hopefully the AIMO grand prize would be marked by people who are actually coordinators on each problem and know the case law for it that developed while marking solutions by human contestants.)

Siddhartha Gadgil (Jan 19 2024 at 15:21):

Does anyone have thoughts of integrating this with Lean? For instance, the DSL they use can be written in Lean and hopefully generated proofs can be at least post-facto verified. Or are there some obstructions that make this hard?

Joseph Myers (Jan 20 2024 at 01:21):

If the deductive system is sound, then the definitions and theorems could be expressed in Lean and used to convert the proofs into Lean.

Being sound means, for example, that if there's ever a requirement for a nondegeneracy condition for a theorem to be true (two points need to be different, two lines need to not be parallel, three points need to not be collinear, ...), or for a construction of an extra point (or line, or circle, or ...) to be valid (two lines need to not be parallel for them to intersect, two circles or a circle and a line need to be appropriately positioned for them to intersect, ...), those conditions must be appropriately proved (or present as hypotheses in the original problem). I found that most of the Lean formalization I did of an IMO geometry problem was taken up with proving such nondegeneracy conditions, but the (informalized) proofs from AlphaGeometry in the supplementary information don't seem to contain any intermediate results of such nondegeneracy form. I don't know how AlphaGeometry avoids needing such results, or whether they are all somehow included in the DSL statements of the problems (whether or not that's properly part of the informal statement - in my formalization I gave detailed conventions for when such a nondegeneracy condition should be read as implicit in the informal statement, such as informal mathematics not referring to an angle ABC if B is equal to A or C).

Of course many times human IMO contestants don't actually need to prove such conditions because they are sufficiently obvious to humans, but sometimes they do need to prove them. (IMO 2023 P6 had several places where failing to prove a nondegeneracy condition, such as that six points were not concyclic, could lose points.)

Andy Jiang (Jan 22 2024 at 00:54):

Curious to hear if people have any ideas about any parts of math (let's say preferably with active research) where similar techniques can be applied? Thinking maybe the such an area needs to have a very limited language/set of techniques.

Siddhartha Gadgil (Jan 22 2024 at 05:57):

I would think as a tool for research mathematics there are many cases where the similar idea of generating synthetic data for an inverse problem, training and coupling with a rule based system to verify and complete proofs should help.

Indeed Francis Charton has very interesting work in this vein finding solutions of differential equations, solving recurrences and a few other things. Similarly words in free groups and/or group presentations, symbolic representations of knots/manifolds (e.g. Kirby diagrams) seem promising targets.

Mirek Olšák (Jan 22 2024 at 11:51):

To my knowledge, AlphaGeometry's logic is similar to my GeoLogic in the sense that it is relying on the numerical model in topological statements. That means that its proofs cannot be translated directly to formal mathematics.

On the other hand, I am somewhat interested in building a widget similar to GeoLogic for Lean, then we will need to deal with the full formalization too. There are also other unrelated issues that we would need to deal with

To my knowledge, Lean currently doesn't have automation for angle-chasing modulo pi (equivalent to solving systems of linear equalities in a general abelian group, contrary to standard gaussian elimination, you are not allowed to divide)
I am not yet sure how to do well the widget integration. The internal state of the system should be recorded (so doesn't disappear when you restart the file), it should be cached (so it doesn't need to recalculate everything every time you make a step), and it is a bit beyond a standard tactic state (it has a fast-access database of known facts about the diagram).
In summary, would there be anyone interested actually working on making a GeoLogic-like environment for Lean? If we do this, adding AlphaGeometry suggestions / solutions on top of that should be doable.
lean-geologic.png

Eric Rodriguez (Jan 22 2024 at 12:04):

To my knowledge, Lean currently doesn't have automation for angle-chasing modulo pi (equivalent to solving systems of linear equalities in a general abelian group, contrary to standard gaussian elimination, you are not allowed to divide)

could this be done via polyrith?

Mirek Olšák (Jan 22 2024 at 12:24):

In a right formulation, polyrith succeeded

example (α β γ : Complex) :
  α ≠ 0 → β ≠ 0 → γ ≠ 0 → α = β * γ → β = α * γ → α^2 = β^2 :=
by
  intros
  polyrith

On the other hand, I want this to be something fast, running on the background every step, not something taking time and needing internet connection. And the natural formulation would be this one:

example (α β γ : Real.Angle) :
  α = β + γ → β = α + γ → 2 • α = 2 • β :=
by
  intros
  solve_abel_equations

Eric Wieser (Jan 22 2024 at 12:26):

polyrith requires a ring, not a module

Eric Wieser (Jan 22 2024 at 12:27):

not something taking time and needing internet connection

Making polyrith run locally would be straightforward, but asking new lean users to install sage instead would not be

Mirek Olšák (Jan 22 2024 at 12:46):

If it is possible to ask lean users to install npm... :smile: . But I also prefer less crazy dependencies.

Mirek Olšák (Jan 22 2024 at 12:49):

And I am not sure how well the polyrith scales since angle-chasing is in principle much easier task than solving polynomial equations.

Jason Rute (Jan 22 2024 at 13:16):

Andy Jiang said:

Curious to hear if people have any ideas about any parts of math (let's say preferably with active research) where similar techniques can be applied? Thinking maybe the such an area needs to have a very limited language/set of techniques.

I think this is the most important follow up question to this work. I haven’t looked at the paper in great detail, but from the explainer video I think it comes down to the following properties:

They have a formal system for checking solutions in this area of math.
They have an efficient way to generate millions of problems and their solutions.
The solutions often contain extra information (like a reference to a midpoint of some line segment) that is very useful for the proofs but not needed to state the problem. They have an automatic way to find that extra information, so they can remove it from the problem statements.
They can train a model to predict this auxiliary information.

I think many areas of math could possibly fit into this paradigm, but it might take some work to put it into that form. We have already seen work done using somewhat similar approaches for symbolic integration. In the integration work the auxiliary information they tried to predict is the final integral. After that, checking the solution is largely routine with automatic symbolic differentiation and equality checking.

Eric Rodriguez (Jan 22 2024 at 13:35):

Mirek Olšák said:

And I am not sure how well the polyrith scales since angle-chasing is in principle much easier task than solving polynomial equations.

it's seemed pretty fast most times I've used it, I just worry there's no need to reinvent the wheel

Julien Narboux (Jan 22 2024 at 13:42):

Floris van Doorn said:

Interesting. I was already wondering exactly what language they are using, and how it deals with configurations, and whether the logic is sound.
The paper states the following, but I don't know where to find a good source for that specialized language.

To sidestep this barrier, we instead adopted a more specialized language used in GEX, JGEX, MMP, Geometer and GeoLogic

It seems that the logic (or at least their version of it) is not sound, and that at least that proof makes a configuration mistake.

In JGEX, MMP Geometer there is no logic, or language for proofs, only a langage for describing statements (some geometric predicates mainly).

Julien Narboux (Jan 22 2024 at 13:51):

Joseph Myers said:

If the deductive system is sound, then the definitions and theorems could be expressed in Lean and used to convert the proofs into Lean.

Being sound means, for example, that if there's ever a requirement for a nondegeneracy condition for a theorem to be true (two points need to be different, two lines need to not be parallel, three points need to not be collinear, ...), or for a construction of an extra point (or line, or circle, or ...) to be valid (two lines need to not be parallel for them to intersect, two circles or a circle and a line need to be appropriately positioned for them to intersect, ...), those conditions must be appropriately proved (or present as hypotheses in the original problem). I found that most of the Lean formalization I did of an IMO geometry problem was taken up with proving such nondegeneracy conditions, but the (informalized) proofs from AlphaGeometry in the supplementary information don't seem to contain any intermediate results of such nondegeneracy form. I don't know how AlphaGeometry avoids needing such results, or whether they are all somehow included in the DSL statements of the problems (whether or not that's properly part of the informal statement - in my formalization I gave detailed conventions for when such a nondegeneracy condition should be read as implicit in the informal statement, such as informal mathematics not referring to an angle ABC if B is equal to A or C).

Of course many times human IMO contestants don't actually need to prove such conditions because they are sufficiently obvious to humans, but sometimes they do need to prove them. (IMO 2023 P6 had several places where failing to prove a nondegeneracy condition, such as that six points were not concyclic, could lose points.)

I have exactly the same question. For example line 7 of https://github.com/google-deepmind/alphageometry/blob/main/rules.txt you have the midpoint theorem : midp E A B, midp F A C => para E F B C this should assume that B is different from C. I guess one assume somehow that points are in general position in the statement of the problem, but one should also prove the ndgs for the constructed points, and this can lead to proof by case distinctions. @Trieu H. Trinh do you deal with the non degeneracy conditions ?

Julien Narboux (Jan 22 2024 at 14:24):

Congratulations to @Trieu H. Trinh it is a very interesting paper, the contribution is not only the machine learning part but also the extension of the deductive database method.

Julien Narboux (Jan 22 2024 at 14:34):

I wonder which implementation did you use to compare with Wu's method. Are you a version in a modern computer algebra system ? I think it may have an impact because polynomial computations have progressed a lot since the days Dongming Wang had an implementation https://wang.cc4cm.org/epsilon/ but I don't know if it works with modern maple.

Junyan Xu (Jan 22 2024 at 16:36):

The paper says:

Geometry theorem provers in the literature fall into two categories. The first category is computer algebra methods, which treats geometry statements as polynomial equations of its point coordinates. Proving is accomplished with specialized transformations of large polynomials. Gröbner bases and Wu’s method are representative approaches in this category, ... Because these methods often have large time and memory complexity, especially when processing IMO-sized problems, we report their result by assigning success to any problem that can be decided within 48 h using one of their existing implementations [17]. AlphaGeometry belongs to the second category of solvers, often described as search/axiomatic or sometimes ‘synthetic’ methods.

where [17] is the JGEX paper, which says:

The Proving and Reasoning Part. Beside the traditional algebraic methods such as Wu’s method and the Gröbner basis method, we have also implemented the full-angle method and the deductive database method [3,6] for generating short, elegant synthetic proofs.

Notice that the same volume includes a paper co-authored by Dongming Wang which mentions Epsilon.

Siddhartha Gadgil (Feb 22 2024 at 11:54):

I have been trawling through the source and have questions I am asking here because I do not know where else to ask. There are relations simtri* and contri* which are variants of simtri (similar triangles) and contri (congruent triangles). But I cannot figure out the difference, i.e., why the star.

Siddhartha Gadgil (Feb 22 2024 at 12:14):

Siddhartha Gadgil said:

I have been trawling through the source and have questions I am asking here because I do not know where else to ask. There are relations simtri* and contri* which are variants of simtri (similar triangles) and contri (congruent triangles). But I cannot figure out the difference, i.e., why the star.

@Yuhuai Tony Wu could you clarify (I believe you are an author)?

Jason Rute (Feb 22 2024 at 12:39):

Tony once told me he doesn’t check this Zulip often (even when DMed). But @Trieu H. Trinh seems to have already answered some questions about the paper, so he might be willing to answer this one.

Junyan Xu (Mar 13 2024 at 18:56):

On 2024-03-20T14:00:00-04:00, Trieu H. Trinh will speak on

Solving olympiad geometry without human demonstrations

image.png
https://cmsa.fas.harvard.edu/event/nt-32024/

Trieu will speak on AlphaGeometry, the state of the art neurosymbolic theorem proving system which can solve International Mathematical Olympiad level problems with performance approaching that of an IMO gold medalist.

Kevin Buzzard (Mar 13 2024 at 22:40):

Warning for those unwise enough to look at the poster time rather than the time zone-specific time Junyan posted: we're at that funny time of year when the Americans have put their clocks forward but the Europeans still haven't, so NYC is only 4 hours off London time right now, and only 5 off CET.

Last updated: May 02 2025 at 03:31 UTC