Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: Diverse Inference and Verification for Advanced Reasoning

Opt (May 23 2025 at 16:47):

Not sure this was posted here before but seems like these people were able to solve one of the combinatorics problems that AlphaProof could not on IMO 2024 with just Agent based approaches using O1/O3-mini as black box - https://arxiv.org/abs/2502.09955

Edit: code is here - https://github.com/codelion/optillm/tree/main/optillm

Eric Wieser (May 23 2025 at 17:15):

It's not immediately clear to me that they are claiming a lean proof (of IMO 2024 P5)

Simon Sorg (May 23 2025 at 17:27):

The paragraph on Autoformalization in Lean on page 5 details the method, which comes down to autoformalization, compiling in Lean, and formalizing back and having an LLM compare the original vs backtranslated. This might be at least a bit error-prone.

They also have a formalization of IMO P5 in the appendix (page 106). Note for example exactly_one_monster_per_row, which looks sketchy? I can't get it to compile (yet), but this might just be formatting.

Opt (May 23 2025 at 17:42):

Eric Wieser said:

It's not immediately clear to me that they are claiming a lean proof (of IMO 2024 P5)

In the conclusions section they claim a correct combinatorics proof and are using Lean. End of Sec 3.1 they also say they got the proof evaluated by a mathematician who has evaluated math olympiad proofs before.

Eric Wieser (May 23 2025 at 17:43):

Simon Sorg said:

They also have a formalization of IMO P5 in the appendix (page 106).

This is only a formalization of the statement, not the proof

Eric Wieser (May 23 2025 at 17:44):

Maybe they're asking the LLM to translate via a lean statement even if they're not asking for a lean solution, to force it to state the problem precisely?

Simon Sorg (May 23 2025 at 17:57):

Eric Wieser said:

Maybe they're asking the LLM to translate via a lean statement even if they're not asking for a lean solution, to force it to state the problem precisely?

They mention translating an informal proof to a Lean one in section 2.4, but it's not in the appendix.

Joseph Myers (May 23 2025 at 20:21):

The definition of TurboHasStrategy looks wrong to me; it's doing "for all monster placements, there exists a strategy" when you need to do that the other way round. And it doesn't actually refer to the monster placement at all; attempt_reaches_last_row doesn't depend on the monsters. And surely the IsGreatest in their final problem statement should be IsLeast. So any Lean proof they might have produced for this formal statement is worthless.

Joseph Myers (May 23 2025 at 20:24):

The claim "tipping performance to a gold medal level" based on adding their solution to one problem to solutions previously produced by completely separate AIs also seems incredibly dodgy to me (they make no claims that I can see to have attempted any non-combinatorics problems themselves). An AI entrant might use different techniques or languages for different problems (AlphaProof and AlphaGeometry) just as a human might, but that doesn't mean you can reasonably aggregate credit for any solution ever produced by any AI!

Last updated: Feb 28 2026 at 14:05 UTC