Zulip Chat Archive
Stream: Machine Learning for Theorem Proving
Topic: AlphaProof Paper (discussion)
Eric Wieser (Nov 12 2025 at 16:02):
Discussion thread for the AlphaProof Paper announcement.
Mauricio Collares (Nov 12 2025 at 16:32):
See also the accompanying news article at https://www.nature.com/articles/d41586-025-03585-5
Justin Asher (Nov 12 2025 at 16:50):
Thanks for publishing on the AlphaProof system! This is awesome! Will the preprint be put on the arXiv as well?
Notification Bot (Nov 12 2025 at 17:15):
A message was moved from this topic to #Machine Learning for Theorem Proving > formal-imo (discussion) by Eric Wieser.
Eric Wieser (Nov 12 2025 at 17:40):
For now we have no plans to put the preprint on arXiv, but I've updated the original post with a link that should work without a Nature subscription
Justin Asher (Nov 12 2025 at 17:55):
@Eric Wieser Thank you!
Alfredo Moreira-Rosa (Nov 12 2025 at 19:58):
Thank you,
So in IMO 2024 Deepmind used alphaProof to reach silver medal level. And this year, from what i understood, you did not use it, but used Gemini directly without using Lean as a checker.
Does it mean AlphaProof approach is lacking compared to using a General Model ? Did you try AlphaProof on 2025 IMO ? Just did not shared the results ? Is alphaProof deemed to evolve like what you did with AlphaZero familly ?
Timothy Chow (Nov 15 2025 at 04:43):
I just tried to copy and paste one of the proofs from the paper (Supplemental Data Figure 6, on the last page of the supplement) and I'm getting an error at "field_simp only". Am I doing something wrong, or has something in Lean changed since the proof was originally generated?
import Mathlib
open Function Polynomial
theorem putnam_2018_b5 (f : (Fin 2 → ℝ) → (Fin 2 → ℝ)) (h₁ : ContDiff ℝ 1 f)
(h₂ : ∀ x i j, 0 < fderiv ℝ f x (Pi.single i 1) j)
(h₃ : ∀ x, 0 < fderiv ℝ f x ![1, 0] 0 * fderiv ℝ f x ![0, 1] 1 - (1 / 4) * (fderiv ℝ f x ![1, 0] 1 + fderiv ℝ f x ![0, 1] 0) ^ 2) : Injective f := by
refine'λa B b=>by_contra (λ (x :) => _)
let R M:= f$ AffineMap.lineMap B a (M: ℝ)
suffices(M):deriv R M 0 * ( a 0-B 0)+deriv R M (1) * ( a (1)-B (1)) >(0)
· suffices:StrictMonoOn (λM=> R M 0*(a 0-B 0)+ R M (1) * ( a (1)-B (1))) (.Icc ↑0 1)
· use (by norm_num[b,R]:¬ _)$ this unitInterval.zero_mem unitInterval.one_mem
suffices(M):DifferentiableAt ℝ R M
· apply ((strictMono_of_deriv_pos) (‹∀ (x : ℝ),_ <_› ·|>.trans_eq (by ·norm_num [differentiableAt_pi.1 ↑((this) _),deriv_pi, $(id)deriv_mul_const]))).strictMonoOn
norm_num[differentiableAt_of_deriv_ne_zero (mt (.▸this M) _),id]
field_simp only [ R, X, false_iff_true, AffineMap.lineMap_apply,deriv]at*
refine fderiv.comp M (h₁.differentiable ↑le_rfl _) ((differentiableAt_id'.smul_const _).add_const _)▸(? _)
suffices:a-B=∑b,(a b-B b) •1 •Pi.single b 1
· norm_num[this▸map_sum _ _ _,deriv_add_const _,deriv_smul_const]
by_cases h: a ↑0 =B 0
· norm_num[mul_right_comm, mul_self_pos.2 (x ∘ _), Fin.forall_fin_two, sub_eq_zero,funext_iff,h₂,h]
specialize h₃$ M •(a-B)+B
simp_rw [ (by ·norm_num [funext_iff, Fin.forall_fin_two]:![0,01]= Pi.single @1 (1:Real) ∧![(1 : ℝ),0]=Pi.single @0 1)] at h₃
specialize h₂ (M • (a-B)+B) 1 1
revert h₂ h₃ h
generalize T:(fderiv ℝ f _)=m
use (λ (i R L) =>by nlinarith[sq (m (Pi.single 1 1) (1)*2*(a (1)-B (1))+(m (Pi.single 0 1) (1) + m ↑(Pi.single 01 1) (0))*(a 0-B 0)),sq_pos_of_ne_zero $ sub_ne_zero.mpr i])
exact (funext (by norm_num[Pi.single_apply, abs]))
Alex Meiburg (Nov 15 2025 at 04:45):
There was a change to field_simp, that tactic is substantially different now; generally simp [field] is a closer approximation, so I would try simp only [field] to see if that works.
Eric Wieser (Nov 15 2025 at 09:25):
You can get the Lean version in the paper by looking at the earliest commit of the formal-imo repository
Eric Wieser (Nov 15 2025 at 09:26):
We deliberately did not release plaintext (as opposed to PDF) Putnam proofs, since in the past @George Tsoukalas / @George Tsoukalas has been worried about leakage.
Eric Wieser (Nov 15 2025 at 09:39):
We do also link the precise PutnamBench version (implying a Lean / Mathlib version) near the end of the paper
Oliver Tacmon (Nov 17 2025 at 08:43):
I’ve recently read the paper “Olympiad-level formal mathematical reasoning with reinforcement learning”.
Looking at AlphaProof’s formal proof of IMO 2024 Problem 2, I noticed that the problem asks us to “find all pairs (a,b)”. The answer is of course the singleton set {(1,1)}, and the Lean code in the paper simply proves that this set is indeed the complete solution set.
However, the formal proof does not capture the “discovery” phase—how one might guess that (1,1) is the only pair in the first place.
My question is: can this “guessing” or discovery process itself be expressed and formalised in Lean?
Kim Morrison (Nov 17 2025 at 23:09):
This has been discussed at length before in this channel. Please see if you can find (and post!) some links. :-)
Eric Wieser (Nov 17 2025 at 23:50):
Note that the IMO code samples in the paper are effectively the same as the ones we released last year; the only change is the removal of the human comments (to make them fit on a page), and obviously the de-verso-ification since putting Verso in a PDF would surely cause a nightmare for the journal!
Oliver Tacmon (Nov 18 2025 at 04:56):
I apologize for not reading the paper carefully enough. I have now noticed the description on page 8: 'Several problems (P1, P2, P5, P6) required identifying the answer (e.g., “Find all functions…”) before constructing a proof. For these, hundreds of candidate answers were generated by querying the public Gemini 1.5 Pro model with Python tool-use enabled and a collection of hand-written examples given as a few-shot prompt [27], successfully guessing the correct answers for all applicable problems.' So I guess I'm no longer that confused.
Oliver Tacmon (Nov 18 2025 at 04:59):
Got it, thank you for the heads-up! I'll search the channel for relevant links :smiling_face_with_hearts:
Oliver Tacmon (Nov 18 2025 at 12:14):
Justin Asher (Nov 20 2025 at 05:24):
The TTRL is great! I am curious how long until we have enough to compute to effectively TTRL LLMs, which seems like a step towards continuous learning.
Last updated: Dec 20 2025 at 21:32 UTC