Zulip Chat Archive
Stream: Machine Learning for Theorem Proving
Topic: InternLM-Step-Prover and Lean-Github
InternLM-Math (Jul 25 2024 at 02:02):
Introducing InternLM-Step-Prover and Lean-Github. A state-of-the-art math prover with LEAN 4 and obtain SOTA on MiniF2F, Proofnet, and Putnam benchmarks. Our model solved 3 IMO problems in MiniF2F, and one problem had never been solved by LEAN before (IMO1983P6).
Here is how our model proved IMO and how we collect our training dataset Lean-Github:
/user_uploads/3121/AOBIbrT3s7hq1xwx4T9e3QT9/147-Repos.png
We open source our model and our training dataset here.
:place_holder:Arxiv:https://arxiv.org/abs/2407.17227
:hug:Model:https://huggingface.co/internlm/internlm2-step-prover
:bar_chart:Dataset: https://huggingface.co/datasets/internlm/Lean-Github
:computer:Github: https://github.com/InternLM/InternLM-Math
Kim Morrison (Jul 25 2024 at 03:43):
Could you post the IMO1983P6 statement and (found) proof?
InternLM-Math (Jul 25 2024 at 03:47):
Kim Morrison said:
Could you post the IMO1983P6 statement and (found) proof?
theorem imo_1983_p6 (a b c : ℝ) (h₀ : 0 < a ∧ 0 < b ∧ 0 < c) (h₁ : c < a + b) (h₂ : b < a + c)
(h₃ : a < b+c) : 0 ≤ a^2 * b * (a-b) + b ^ 2 * c * (b - c) + c ^ 2 * a * (c - a) :=
by
ring_nf
have h₄ : 0 < a+b+c := by linarith
simp only [add_assoc]
have h₅ : 0 ≤ (a-b)^2 * (a+b-c) := by nlinarith
have h₆ : 0 ≤ (a-c)^2 * (a+c-b) := by nlinarith
have h₇ : 0 ≤ (b-c)^2 * (b+c-a) := by nlinarith
nlinarith [h₀.1, h₀.2.1, h₀.2.2, h₁, h₂, h₃, h₄, h₅, h₆, h₇]
Kim Morrison (Jul 25 2024 at 03:51):
Interesting. The first three lines of the proof don't actually contribute anything. I wonder where it "gets the idea" for the next three haves.
InternLM-Math (Jul 25 2024 at 03:54):
Kim Morrison said:
Interesting. The first three lines of the proof don't actually contribute anything. I wonder where it "gets the idea" for the next three haves.
We also have no idea how it guess these haves. :rolling_on_the_floor_laughing: We are training on Github-Lean and expert iteration on massive olympic-level problems. It may learn during this process.
Michael Stoll (Jul 25 2024 at 11:30):
Without h₄
, the proof actually is three times a bit faster. Maybe nlinarith
is slowed down by the presence of the unnecessary fact in the context... (EDIT: One has to remove h₄
from the list in the last nlinarith
.)
Eric Wieser (Jul 25 2024 at 11:43):
According to https://huggingface.co/datasets/internlm/Lean-Github/viewer/default/train?q=compfiles, @David Renshaw's "Compfiles" is in the dataset, which probably has overlap with minif2f. Did you filter this from the data you actually used when training your model?
InternLM-Math (Jul 25 2024 at 14:15):
Michael Stoll said:
Without
h₄
, the proof actually isthree timesa bit faster. Maybenlinarith
is slowed down by the presence of the unnecessary fact in the context... (EDIT: One has to removeh₄
from the list in the lastnlinarith
.)
We paste the exact proof our model found. We found many of our proof contain redundant steps.
InternLM-Math (Jul 25 2024 at 14:17):
Eric Wieser said:
According to https://huggingface.co/datasets/internlm/Lean-Github/viewer/default/train?q=compfiles, David Renshaw's "Compfiles" is in the dataset, which probably has overlap with minif2f. Did you filter this from the data you actually used when training your model?
Imo1983p6 is not solved by compfiles (https://github.com/dwrensha/compfiles/blob/main/Compfiles/Imo1983P6.lean). We do not remove compfiles in our training, since olympic level math solutions are so scarce and it contains many useful lemmas. Putnam Benchmark and Imo1983P6 have no chance of leaking.
Last updated: May 02 2025 at 03:31 UTC