Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: ashme/aime in lean in < 10GB weights ?


TongKe Xue (Dec 08 2025 at 10:28):

[Context: I'm still not convinced using LLM to generate tactics is the best/right way to solve IMO/Putnam problems.]

Question: Are there any systems that:

  1. uses lean4 to solve Ashme/AIME problems
  2. program + all weights is <= 10 GB ?

Now, one might point out: ASHME is multiple choice, AIME is 0 - 999. Neither involves writing proofs. Why would you want to use Lean4?
Answer: it forces correct reasoning from step n to step n+1. Even if the final answer is 0-999 or multiple choice, it is useful for us to force every intermediate step to be mathematically correct.

Question: are there systems, using Lean4, a mix of good old fashioned symbolic AI + very small weight neural network, that can solve ASHME + AIME problems at > 90% accuracy?

The intuition here is I'm particularly interested in systems that "leans heavily" on lean4 tactics, good old fashioned symbolic ai, and less reliance on neural networks / LLMs.

Thanks!

Leni Aniva (Dec 09 2025 at 02:08):

I'm working on one such system and I'll try to get it published next year

Gerben Koopman (Dec 09 2025 at 13:22):

How would this work given that Mathlib itself is already ~32GB? You need to load this for premises no? Unless you prove every theorem ground up.

Gerben Koopman (Dec 09 2025 at 14:41):

I agree with your feeling though, that next-token generation seems ill-suited to generating tactics. It's just that the alternatives I can think of are also not better. But curious to see what other people came up with.

TongKe Xue (Dec 09 2025 at 19:58):

  1. Great question. Sorry for not being clear. You get all of MathLib, CSLib, any other public open source Lean4 db of formalized lemmas/theorems for free.
  2. The 10GB limit was meant mainly to focus on "weights/blobs of f32s".
  3. Taking a neural network's weights and "encoding" them somehow in a Lean4 db of theorems would be considered cheating. :-)

One reason I consider this might be possible is this:

  1. IMO style problems often require some form of inspiration, i.e. a step that says Consider the object/strategy XYZ where XYZ is a somewhat complicated expr that we need to come up with out of thin air.
  2. ASHME/AIME problems are "more direct"; after formalization, many of them just look like plug & play standard formulas, the 2d/3d geometry problems can often solved by "coordinates", combinatorics problems by case analysis, etc ...

Concretely, a problem that shows up on IMO but never on ASHME/AIME is something like this: "Consider [this weird 2 person game]. State which player has a winning strategy. Prove it."

So a solution to this involves this "inspiration step" of "wtf is the strategy; what are the invariants after each player makes their move"; this coming up of this expr/strategy from thin air. In contrast, the ASHME/AIME feels a bit more "bruteforce-able".

Thus, I too am curious what others are doing with this.


Last updated: Dec 20 2025 at 21:32 UTC