Zulip Chat Archive
Stream: Machine Learning for Theorem Proving
Topic: Can this GPT 5.1 be simplified?
Jonathan Crall (Nov 29 2025 at 20:35):
I've been learning lean on the side, and I've finally been able to get a working proof for a small result that I found: https://arxiv.org/abs/2305.00594
On one hand, I'm amazed that GPT 5.1 was finally able to get the formalization of the proof, I've been struggling with it. Previous GPT models 4o and 5 were not able to get there. I think this is a neat example of something simple, but somewhat non-trivial being proven using AI.
Simply stated, given:
/-- Precision / PPV = TP / (TP + FP) -/
def PPV (TP FP : ℝ) : ℝ :=
TP / (TP + FP)
/-- Recall / TPR = TP / (TP + FN) -/
def TPR (TP FN : ℝ) : ℝ :=
TP / (TP + FN)
/-- Fowlkes–Mallows index: geometric mean of precision and recall. -/
def FM (TP FP FN : ℝ) : ℝ :=
Real.sqrt (PPV TP FP * TPR TP FN)
/-- Matthews Correlation Coefficient (MCC). We treat all entries as reals. -/
def MCC (TP TN FP FN : ℝ) : ℝ :=
let num := TP * TN - FP * FN
let den := Real.sqrt ((TP + FP) * (TP + FN) * (TN + FP) * (TN + FN))
num / den
I have the proof for the statement:
/--
Main target theorem:
lim_{TN→∞} MCC(TP,TN,FP,FN) = FM(TP,FP,FN)
under non-degeneracy and `0 ≤ TP`.
The definition of the main theroem (given in full later) is written as:
-/
theorem tendsto_MCC_atTop_eq_FM
{TP FP FN : ℝ}
(hTPFPpos : 0 < TP + FP)
(hTPFNpos : 0 < TP + FN)
(hTP_nonneg : 0 ≤ TP)
(hFP_nonneg : 0 ≤ FP)
(hFN_nonneg : 0 ≤ FN) :
Tendsto (fun TN : ℝ => MCC TP TN FP FN) atTop (𝓝 (FM TP FP FN))
The working proof is: https://github.com/Erotemic/paper-g1-and-mcc/blob/main/attempt6/Attemp6/Basic.lean
But the proof is 654 lines, which is a lot longer than I thought it would need to be. On one hand: great, that explains why I was having such a hard time getting this, but on the other hand, I suspect that given that GPT struggled to get it right, it probably isn't going about it optimally.
My questions are:
- How typical is a length like this for a statement of this complexity?
- Can anyone else do better?
Kevin Buzzard (Nov 29 2025 at 21:02):
It's best to ask questions as a #mwe so here is one:
import Mathlib.Tactic
noncomputable section
/-- Precision / PPV = TP / (TP + FP) -/
def PPV (TP FP : ℝ) : ℝ :=
TP / (TP + FP)
/-- Recall / TPR = TP / (TP + FN) -/
def TPR (TP FN : ℝ) : ℝ :=
TP / (TP + FN)
/-- Fowlkes–Mallows index: geometric mean of precision and recall. -/
def FM (TP FP FN : ℝ) : ℝ :=
Real.sqrt (PPV TP FP * TPR TP FN)
/-- Matthews Correlation Coefficient (MCC). We treat all entries as reals. -/
def MCC (TP TN FP FN : ℝ) : ℝ :=
let num := TP * TN - FP * FN
let den := Real.sqrt ((TP + FP) * (TP + FN) * (TN + FP) * (TN + FN))
num / den
end
open Filter Topology
/--
Main target theorem:
lim_{TN→∞} MCC(TP,TN,FP,FN) = FM(TP,FP,FN)
under non-degeneracy and `0 ≤ TP`.
The definition of the main theroem (given in full later) is written as:
-/
theorem tendsto_MCC_atTop_eq_FM
{TP FP FN : ℝ}
(hTPFPpos : 0 < TP + FP)
(hTPFNpos : 0 < TP + FN)
(hTP_nonneg : 0 ≤ TP)
(hFP_nonneg : 0 ≤ FP)
(hFN_nonneg : 0 ≤ FN) :
Tendsto (fun TN : ℝ => MCC TP TN FP FN) atTop (𝓝 (FM TP FP FN)) := by
sorry
Kevin Buzzard (Nov 29 2025 at 21:08):
Mathematically I think the question is to prove that the limit as tends to of is
Kevin Buzzard (Nov 29 2025 at 21:09):
I would estimate that, as is currently par for the course with AI (edit: by which I mean "generic LLMs such as ChatGPT"; I should not have said "AI" here, as is clear from the below post), your vibe-coded proof is at least a factor of 10 larger than a human expert proof.
Lawrence Wu (llllvvuu) (Nov 30 2025 at 00:45):
This seems like the kind of thing that there should be a tactic like asymp_simp that would just close it. But in the meantime, here's Aristotle's 18-line proof which was returned from our CLI in 6 minutes:
import Mathlib.Tactic
noncomputable section
/-- Precision / PPV = TP / (TP + FP) -/
def PPV (TP FP : ℝ) : ℝ :=
TP / (TP + FP)
/-- Recall / TPR = TP / (TP + FN) -/
def TPR (TP FN : ℝ) : ℝ :=
TP / (TP + FN)
/-- Fowlkes–Mallows index: geometric mean of precision and recall. -/
def FM (TP FP FN : ℝ) : ℝ :=
Real.sqrt (PPV TP FP * TPR TP FN)
/-- Matthews Correlation Coefficient (MCC). We treat all entries as reals. -/
def MCC (TP TN FP FN : ℝ) : ℝ :=
let num := TP * TN - FP * FN
let den := Real.sqrt ((TP + FP) * (TP + FN) * (TN + FP) * (TN + FN))
num / den
end
open Filter Topology
/--
Main target theorem:
lim_{TN→∞} MCC(TP,TN,FP,FN) = FM(TP,FP,FN)
under non-degeneracy and `0 ≤ TP`.
The definition of the main theroem (given in full later) is written as:
-/
theorem tendsto_MCC_atTop_eq_FM
{TP FP FN : ℝ}
(hTPFPpos : 0 < TP + FP)
(hTPFNpos : 0 < TP + FN)
(hTP_nonneg : 0 ≤ TP)
(hFP_nonneg : 0 ≤ FP)
(hFN_nonneg : 0 ≤ FN) :
Tendsto (fun TN : ℝ => MCC TP TN FP FN) atTop (𝓝 (FM TP FP FN)) := by
-- Simplify the expression for the MCC as TN approaches infinity.
have h_simplify : Filter.Tendsto (fun TN : ℝ => (TP * TN - FP * FN) / Real.sqrt ((TP + FP) * (TP + FN) * (TN + FP) * (TN + FN))) Filter.atTop (𝓝 (TP / Real.sqrt ((TP + FP) * (TP + FN)))) := by
-- We divide numerator and denominator by $TN$:
suffices h_suff'' : Filter.Tendsto (fun TN : ℝ => (TP - FP * FN / TN) / Real.sqrt ((TP + FP) * (TP + FN) * (1 + FP / TN) * (1 + FN / TN))) Filter.atTop (𝓝 (TP / Real.sqrt ((TP + FP) * (TP + FN)))) by
-- By simplifying, we can see that the two expressions are equivalent for large TN.
have h_eq : ∀ TN : ℝ, TN > 0 → (TP * TN - FP * FN) / Real.sqrt ((TP + FP) * (TP + FN) * (TN + FP) * (TN + FN)) = (TP - FP * FN / TN) / Real.sqrt ((TP + FP) * (TP + FN) * (1 + FP / TN) * (1 + FN / TN)) := by
intro TN hTNpos
field_simp [hTNpos];
rw [ Real.sqrt_div ( by positivity ), Real.sqrt_sq hTNpos.le ] ; ring;
-- By simplifying, we can see that the two expressions are equal.
field_simp [hTNpos]
ring;
exact h_suff''.congr' ( by filter_upwards [ Filter.eventually_gt_atTop 0 ] with TN hTN using h_eq TN hTN ▸ rfl );
exact le_trans ( Filter.Tendsto.div ( tendsto_const_nhds.sub ( tendsto_const_nhds.div_atTop Filter.tendsto_id ) ) ( Filter.Tendsto.sqrt <| Filter.Tendsto.mul ( Filter.Tendsto.mul ( tendsto_const_nhds.mul <| tendsto_const_nhds ) <| tendsto_const_nhds.add <| tendsto_const_nhds.div_atTop Filter.tendsto_id ) <| tendsto_const_nhds.add <| tendsto_const_nhds.div_atTop Filter.tendsto_id ) <| ne_of_gt <| Real.sqrt_pos.mpr <| by positivity ) <| by norm_num;
convert h_simplify using 1;
unfold FM PPV TPR; ring;
field_simp;
rw [ Real.sqrt_div ( by positivity ), Real.sqrt_sq ( by positivity ) ] ; ring
Once cleaned up / simplified by a human it should be even shorter.
TongKe Xue (Nov 30 2025 at 20:55):
- This proof is longer than the Aristotle proof, but shorter than the ChatGPT proof; but I think the proof "follows the paper" fairly accurately.
- This problem turned out to be more "little pebbles" than I ever imagined. (dealing with division in Reals, sqrt, filter definition of limits).
- I have a short video walk through at https://www.youtube.com/watch?v=SdZyIvnACnE
-
I was running into MathLib issues on lean4.26.0-rc2; so I had to downgrade everything to lean4.25.2. No what I misconfigured to cause the pain.
Basic.lean -
Great learning experience for reading up on parts of MathLib. This one problem hits: limit / Tendsto, Sqrt, GroupWithZero, Div
The core "pebbles" in formalizing this were:
- The human proof informally uses "a * T / T = a". We can't do this i nLean4 because LHS is not defined when "T = 0", so we have to carry around "T \ne 0" proofs everywhere. How do we get "T \ne 0"? We exploit the neighborhood definition, to say we only consider the "TN >= 1" which implies "TN > 0" which implies "TN \ne 0".
- Lean4 is very strict on sqrt. So anything involving sqrt involves
0 <= xproofs. Luckily, this is generally dispatched via "by positivity" - (This could just be my fault for misreading things). I don't know what to make of Tendsto. On one hand, it's pretty intuitive; on the other hand, many of the lemmas for Tendsto seems to be spread out across multiple files, so it was not clear "what can we use" at first glance.
- (not a fault of Lean4): LLMs were hit & miss. Sometimes they were brilliant and taught me new tactics. Othertimes, they gave me lemma's that did not exist. The problem with the latter is I didn't know when (1) the LLM was hallucinating and (2) when it was a misconfiguration on my end. Especially fun when it gave me stuff from
Nat.*when all my variables were\Real.
Besides that, everything else was fairly straight forward.
TongKe Xue (Nov 30 2025 at 21:00):
Lawrence Wu (llllvvuu) said:
This seems like the kind of thing that there should be a tactic like
asymp_simpthat would just close it.
Do you have a "test set" in mind for this?
- After manually writing the proof, I think having an
asymp_simpwould be very useful. - I'm also interested in learning how to write Lean4 tactics
- Do you have a road map / test dataset for this ?
- Intuitively, I feel having a "bigger bank" of lemmas for manipulating sqrt / frac / trig / ... under limit/Tendsto would go a long way to simplifying alot of these proofs, and also let the tactic "operate at a higher level". [In particular, a more automated system for tracking >= 0, neq 0, ... ]
Last updated: Dec 20 2025 at 21:32 UTC