Zulip Chat Archive
Stream: Machine Learning for Theorem Proving
Topic: ashme/aime in lean in < 10GB weights ?
TongKe Xue (Dec 08 2025 at 10:28):
[Context: I'm still not convinced using LLM to generate tactics is the best/right way to solve IMO/Putnam problems.]
Question: Are there any systems that:
- uses lean4 to solve Ashme/AIME problems
- program + all weights is <= 10 GB ?
Now, one might point out: ASHME is multiple choice, AIME is 0 - 999. Neither involves writing proofs. Why would you want to use Lean4?
Answer: it forces correct reasoning from step n to step n+1. Even if the final answer is 0-999 or multiple choice, it is useful for us to force every intermediate step to be mathematically correct.
Question: are there systems, using Lean4, a mix of good old fashioned symbolic AI + very small weight neural network, that can solve ASHME + AIME problems at > 90% accuracy?
The intuition here is I'm particularly interested in systems that "leans heavily" on lean4 tactics, good old fashioned symbolic ai, and less reliance on neural networks / LLMs.
Thanks!
Leni Aniva (Dec 09 2025 at 02:08):
I'm working on one such system and I'll try to get it published next year
Gerben Koopman (Dec 09 2025 at 13:22):
How would this work given that Mathlib itself is already ~32GB? You need to load this for premises no? Unless you prove every theorem ground up.
Gerben Koopman (Dec 09 2025 at 14:41):
I agree with your feeling though, that next-token generation seems ill-suited to generating tactics. It's just that the alternatives I can think of are also not better. But curious to see what other people came up with.
TongKe Xue (Dec 09 2025 at 19:58):
- Great question. Sorry for not being clear. You get all of MathLib, CSLib, any other public open source Lean4 db of formalized lemmas/theorems for free.
- The 10GB limit was meant mainly to focus on "weights/blobs of f32s".
- Taking a neural network's weights and "encoding" them somehow in a Lean4 db of theorems would be considered cheating. :-)
One reason I consider this might be possible is this:
- IMO style problems often require some form of
inspiration, i.e. a step that saysConsider the object/strategy XYZwhereXYZis a somewhat complicated expr that we need to come up without of thin air. - ASHME/AIME problems are "more direct"; after formalization, many of them just look like plug & play standard formulas, the 2d/3d geometry problems can often solved by "coordinates", combinatorics problems by case analysis, etc ...
Concretely, a problem that shows up on IMO but never on ASHME/AIME is something like this: "Consider [this weird 2 person game]. State which player has a winning strategy. Prove it."
So a solution to this involves this "inspiration step" of "wtf is the strategy; what are the invariants after each player makes their move"; this coming up of this expr/strategy from thin air. In contrast, the ASHME/AIME feels a bit more "bruteforce-able".
Thus, I too am curious what others are doing with this.
Last updated: Dec 20 2025 at 21:32 UTC