Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: InternLM-Step-Prover and Lean-Github

InternLM-Math (Jul 25 2024 at 02:02):

Introducing InternLM-Step-Prover and Lean-Github. A state-of-the-art math prover with LEAN 4 and obtain SOTA on MiniF2F, Proofnet, and Putnam benchmarks. Our model solved 3 IMO problems in MiniF2F, and one problem had never been solved by LEAN before (IMO1983P6).

Here is how our model proved IMO and how we collect our training dataset Lean-Github:
/user_uploads/3121/AOBIbrT3s7hq1xwx4T9e3QT9/147-Repos.png

We open source our model and our training dataset here.

:place_holder:Arxiv:https://arxiv.org/abs/2407.17227
:hug:Model:https://huggingface.co/internlm/internlm2-step-prover
:bar_chart:Dataset: https://huggingface.co/datasets/internlm/Lean-Github
:computer:Github: https://github.com/InternLM/InternLM-Math

Kim Morrison (Jul 25 2024 at 03:43):

Could you post the IMO1983P6 statement and (found) proof?

InternLM-Math (Jul 25 2024 at 03:47):

Kim Morrison said:

Could you post the IMO1983P6 statement and (found) proof?

theorem imo_1983_p6 (a b c : ℝ) (h₀ : 0 < a ∧ 0 < b ∧ 0 < c) (h₁ : c < a + b) (h₂ : b < a + c)
    (h₃ : a < b+c) : 0 ≤ a^2 * b * (a-b) + b ^ 2 * c * (b - c) + c ^ 2 * a * (c - a) :=
by
    ring_nf
    have h₄ : 0 < a+b+c := by linarith
    simp only [add_assoc]
    have h₅ : 0 ≤ (a-b)^2 * (a+b-c) := by nlinarith
    have h₆ : 0 ≤ (a-c)^2 * (a+c-b) := by nlinarith
    have h₇ : 0 ≤ (b-c)^2 * (b+c-a) := by nlinarith
    nlinarith [h₀.1, h₀.2.1, h₀.2.2, h₁, h₂, h₃, h₄, h₅, h₆, h₇]

Kim Morrison (Jul 25 2024 at 03:51):

Interesting. The first three lines of the proof don't actually contribute anything. I wonder where it "gets the idea" for the next three haves.

InternLM-Math (Jul 25 2024 at 03:54):

Kim Morrison said:

Interesting. The first three lines of the proof don't actually contribute anything. I wonder where it "gets the idea" for the next three haves.

We also have no idea how it guess these haves. :rolling_on_the_floor_laughing: We are training on Github-Lean and expert iteration on massive olympic-level problems. It may learn during this process.

Michael Stoll (Jul 25 2024 at 11:30):

Without h₄, the proof actually is ~~three times~~ a bit faster. Maybe nlinarith is slowed down by the presence of the unnecessary fact in the context... (EDIT: One has to remove h₄ from the list in the last nlinarith.)

Eric Wieser (Jul 25 2024 at 11:43):

According to https://huggingface.co/datasets/internlm/Lean-Github/viewer/default/train?q=compfiles, @David Renshaw's "Compfiles" is in the dataset, which probably has overlap with minif2f. Did you filter this from the data you actually used when training your model?

InternLM-Math (Jul 25 2024 at 14:15):

Michael Stoll said:

Without h₄, the proof actually is ~~three times~~ a bit faster. Maybe nlinarith is slowed down by the presence of the unnecessary fact in the context... (EDIT: One has to remove h₄ from the list in the last nlinarith.)

We paste the exact proof our model found. We found many of our proof contain redundant steps.

InternLM-Math (Jul 25 2024 at 14:17):

Eric Wieser said:

According to https://huggingface.co/datasets/internlm/Lean-Github/viewer/default/train?q=compfiles, David Renshaw's "Compfiles" is in the dataset, which probably has overlap with minif2f. Did you filter this from the data you actually used when training your model?

Imo1983p6 is not solved by compfiles (https://github.com/dwrensha/compfiles/blob/main/Compfiles/Imo1983P6.lean). We do not remove compfiles in our training, since olympic level math solutions are so scarce and it contains many useful lemmas. Putnam Benchmark and Imo1983P6 have no chance of leaking.

Last updated: May 02 2025 at 03:31 UTC