Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: AxiomProver solves Putnam 2025

Gavin Zhao (Jan 10 2026 at 16:24):

Putnam 2025, the world's hardest college-level math test, ended December 6th. By the end of the competition, AxiomProver had solved 8 out of 12 problems. In the following days, it solved the remaining 4. AxiomProver is an autonomous multi-agent ensemble theorem prover for Lean 4.21.0, developed by Axiom Math.

Their Lean solution is here and claims to be verified by SafeVerify.

Their blog post is an interesting read.

Gavin Zhao (Jan 10 2026 at 16:25):

There’s Still Hope on Combinatorics, And Geometry Engine Is Not A Must-Have

Problem B1 is perhaps even more amusing. The statement involves the geometric notion of a circumcenter, and the system produced a solution that was genuinely geometric in flavor. Our mathematicians, upon reading it, found it difficult to follow without a diagram, which is somewhat ironic, since the machine never drew one. In the end, they had to sketch the configuration by hand to understand what was going on:

~~AlphaGeometry team in shambles rn~~

I think Axiom dropped down into analytical geometry, thereby avoiding using the Euclidean geometry style proof that AlphaGeometry did.

Joseph Myers (Jan 11 2026 at 02:35):

I think this blog post, looking into human versus AI approaches to different problems, helps illustrate my previous point that it's scientifically valuable to assess AIs on new competition problems even at a difficulty level where the AI can be expected to solve all the problems with high probability. One of my reasons in the previous discussion was the ability to look into the solution approaches taken by the AIs and compare with human approaches. (It would be even better, given sufficient resources, for each of several AIs to make many independent attempts at each problem rather than just producing a single solution, so you can start to do statistics on how many times the AI found which solution approaches and how long it took to do so each time - it can help there if you already have a taxonomy of the known essentially different solution approaches to each problem based on what was found by human entrants, so you're comparing a range of solutions from AIs with the range of solutions found by humans, not one AI solution with one human solution.)

Joseph Myers (Jan 11 2026 at 02:42):

As for doing geometry without a geometry engine, if an AI aims to be able to find formal proofs in arbitrary areas of mathematics that might not have much pre-existing formalized material and where there might be API gaps in mathlib - which is common for much of mathematics - and if (per the "bitter lesson") it's to do that by general methods, without needing a custom engine for each such area of mathematics - then ability to do geometry without a geometry engine could be a useful measure of the extent to which the AI has built up general methods to enable working in parts of mathematics without much API or formalized material. (Though Euclidean geometry is still much closer to the core of elementary material in mathlib than a lot of more advanced mathematics is, so being able to do things in that area doesn't guarantee being able to do them at a greater distance from basic mathlib material.)

Last updated: Feb 28 2026 at 14:05 UTC