Zulip Chat Archive
Stream: Machine Learning for Theorem Proving
Topic: AlphaGeometry doesn't have a secure foundation
(deleted) (Jul 24 2025 at 10:39):
- Orientation checking is done by generating a numerical diagram. After a thorough research of the literature, I find that there is no rigorous basis behind this.
- There are wrong rules in the deduction database. This is the most egregious part. For example,
cyclic A B P Q => eqangle P A P B Q A Q B, wherecyclicjust means there is a circle that passes through A, B, P and Q.
This shows that there has yet to be any rigorous AI tool for high school geometry. More development is needed. We need a tool that solves high school geometry, in Lean.
The language of AlphaGeometry is very limited and it is incapable of stating many geometry problems.
Jason Rute (Jul 24 2025 at 11:44):
I can't speak to the logical flaws, but could one formally verify (or falsify) all the AlphaGeometry rules in Lean, or at least turn them into tactics (that work when the rules make sense), and then transpile the AlphaGeometry proofs to Lean (or some other formal system)?
(deleted) (Jul 24 2025 at 11:45):
Literally what I'm doing.
(deleted) (Jul 24 2025 at 11:46):
Turns out because of all the flaws it's harder than expected.
Joseph Myers (Jul 24 2025 at 11:59):
Have you read https://arxiv.org/abs/2411.11938 ?
I do think that a specialized geometry AI such as AlphaGeometry should most naturally end up as a tool used by more general AIs working with a fully formal system such as Lean (in the same way as such an AI might call out to a CAS, for example).
I also encourage reviewing geometry related PRs that I and others have open to help contribute to improving the Euclidean geometry support in mathlib. I do intend to formalize a solution to IMO 2024 P4 (thereby completing the set of IMO 2024 solutions in the mathlib archive) once I've got enough incenter theory into mathlib.
(deleted) (Jul 24 2025 at 12:00):
I'm building upon their code.
(deleted) (Jul 24 2025 at 12:00):
It's very good as the predicates are clearly explained
Joseph Myers (Jul 24 2025 at 12:00):
"In more extreme cases, for example, in the proof of Problem 2 of the IMO 2009 exam, given
in the Supplementary Material to the AlphaGeometry paper Trinh et al. [2024], page 29-30, the
very correctness of the proof depends on a choice made at random by the software"
(deleted) (Jul 24 2025 at 12:03):
Thank you. Yeah it's really extreme. Proving the orientation of points is a nontrivial task...
(deleted) (Jul 24 2025 at 12:03):
An SMT solver might be needed. But I don't even know how to encode the problem for the SMT solver to solve.
Alex Kontorovich (Jul 24 2025 at 16:53):
FYI : #7300
(deleted) (Jul 25 2025 at 00:54):
I originally chose "built upon sand" as the title because of the biblical imagery. But now I changed the title.
(deleted) (Jul 25 2025 at 00:59):
Language aside, I hope to be able to finish my work in the coming weeks.
(deleted) (Jul 25 2025 at 04:51):
I'll make a comprehensive list of wrong AlphaGeometry rules. These are the wrong rules that I found. I'll edit this message as I find more wrong rules.
-
cong O A O B → ncoll O A B → eqangle O A A B A B O B
Corrected:cong O A O B → ncoll O A B → eqangle B A A O O B B A -
cyclic A B P Q → eqangle P A P B Q A Q B
(deleted) (Jul 26 2025 at 11:40):
https://github.com/LMCRC/Newclid/issues/25#issuecomment-3121664768
(deleted) (Jul 26 2025 at 11:41):
Looks like the way I interpreted the eqangle predicate is wrong
(deleted) (Jul 26 2025 at 11:43):
My point about the orientation checks still remains. And looks like this is the hardest part.
friederrr (Jul 26 2025 at 12:04):
The issue GitHub has already been clarified I see :)
Newclid fixes issues with the original AG codebase, but it's not perfect.
Keen to see your work on Lean :)
(deleted) (Aug 01 2025 at 06:55):
I'm releasing this little benchmark for anyone willing to test their AI. I'm licensing this under Apache 2.0, the same license as Newclid.
(deleted) (Aug 01 2025 at 06:57):
This is a surprisingly good benchmark. Many off the shelf models just fail on this benchmark.
(deleted) (Aug 01 2025 at 07:00):
Meanwhile, I'll prove these rules by hand and keep the proofs a secret. At least until Numina releases another dataset.
Bas Spitters (Aug 02 2025 at 20:02):
@Huỳnh Trần Khanh how does your work compare to https://geocoq.github.io/GeoCoq/
(deleted) (Aug 03 2025 at 00:30):
The main purpose of my work is to translate proofs generated by Newclid, which is based on AlphaGeometry, into Lean. The API is for machine consumption, not for humans.
(deleted) (Aug 03 2025 at 00:31):
My code makes extensive use of complex numbers. When something can't be proved with the 41 rules algebraic manipulation of complex numbers is expected.
Gavin Zhao (Sep 16 2025 at 23:42):
I'm taking a course this semester on making our own AlphaGeometry (i.e. reproducing it). The default (and recommended) language to do this is obviously Python, but I think it'd be so cool to implement it in Lean and then I might be able to prove that every proof that (my) AlphaGeometry produces is based on the 5 axioms that form Euclidean Geometry.
Wang Jingting (Sep 17 2025 at 01:50):
Gavin Zhao said:
I'm taking a course this semester on making our own AlphaGeometry (i.e. reproducing it). The default (and recommended) language to do this is obviously Python, but I think it'd be so cool to implement it in Lean and then I might be able to prove that every proof that (my) AlphaGeometry produces is based on the 5 axioms that form Euclidean Geometry.
I think that unfortunately you might not be able to do that? As far as I know, the non-degeneracy conditions (like ncoll) in Alphageometry is checked by directly looking at an internal graph, and actually reasoning about these conditions are trickier than someone might imagine (and they are also ignored most of the times when a human is doing geometric problems).
Last updated: Dec 20 2025 at 21:32 UTC