Zulip Chat Archive
Stream: new members
Topic: Solving with ChatGPT
Mr Proof (Jul 16 2025 at 10:40):
Hi,
I am using ChatGPT and LLMs to solve (known) proofs. The way I'm doing this is basically asking it to formalise the problem in Lean 4, and then split the problem into parts. First assume each part is an axiom.
Then take each axiom and ask GPT to split that into smaller parts then this becomes a theorem and so on.
We do this recursively until the smaller parts can be found in Mathlib.
I can show it the various math papers.
This top down recursive approach seems to work well.
I haven't automated this yet, but it could be done.
This will work best with proofs whose initial definition can be formalised simply with mathlib functions. It may fail for more esoteric proofs because what it actually proves will depend on the definitions it creates for each of the terms in the description. (e.g. it may prove something but not quite the thing you wanted!)
Can anyone suggest a list of proofs, known but yet unformalised but with simple definitions? (Someone challenge me with a unformalised yet do-able proof to formalise.)
Anyone else using AI to formalise proofs?
Kenny Lau (Jul 16 2025 at 10:45):
Mr Proof said:
Anyone else using AI to formalise proofs?
Yeah, like 10 other companies.
Mr Proof (Jul 16 2025 at 10:45):
I'm not a company. Just doing it for fun. Is there money in formalising proofs? Strange business model.
Well, saying that I know someone who got a million dollars to formalise Fermat.
Here is an example output (just fill in the blanks :grinning_face_with_smiling_eyes: ):
import Mathlib.Data.Nat.Basic
import Mathlib.Data.Rat.Init
axiom elliptic_curve : Type → Type
axiom semistable : elliptic_curve ℚ → Prop
axiom modular : elliptic_curve ℚ → Prop
axiom frey_curve (a b c : ℤ) (n : Nat) : elliptic_curve ℚ
axiom frey_like : elliptic_curve ℚ → Prop
axiom ribet_theorem :
∀ (E : elliptic_curve ℚ),
frey_like E → ¬ modular E
axiom modularity_theorem :
∀ (E : elliptic_curve ℚ), semistable E → modular E
theorem fermat_last_theorem
(n : Nat) (a b c : ℤ)
(hn : n > 2) (ha : a ≠ 0) (hb : b ≠ 0) (hc : c ≠ 0)
(h : a ^ n + b ^ n = c ^ n) :
False :=
let E := frey_curve a b c n
have frey_cond : frey_like E := sorry
have not_modular := ribet_theorem E frey_cond
have semistable_cond : semistable E := sorry
have modular_cond := modularity_theorem E semistable_cond
not_modular modular_cond
Klaus Gy (Jul 16 2025 at 11:02):
i feel that chatgpt is not very good with proofs. however, i use it to find the right lemma or definition in a large mathlib file, to use in my proof, in this case its mostly reliable.
Kenny Lau (Jul 16 2025 at 11:04):
@Klaus Gy do you know about loogle, leansearch, and leanexplore?
Klaus Gy (Jul 16 2025 at 11:08):
no, not yet!
Mr Proof (Jul 16 2025 at 11:13):
Yes, from what I've seen such as Terrance's lectures on Lean. People are mostly using AI to quickly find Mathlib lemmas etc.
Although, I think that since LLMs are great at translation. They should be ideally suited to translating already proved theorems into Lean language. Since, at least in principle this doesn't involve any creativity. (Some may disagree - perhaps there is an art in it). I think the trick is in breaking it into sub-tasks, so it doesn't try to bite off more than it can chew all at once.
I think the parts it might struggle with are actually ironically going to be the computational bits. e.g. if the proof involves multiplying two large numbers or expanding out a large polynomial.
Kenny Lau (Jul 16 2025 at 11:15):
by norm_num
Mr Proof (Jul 16 2025 at 11:16):
That's true. And I suppose if the proof is already solved it will have the answer already. OK I take that back.
suhr (Jul 16 2025 at 11:45):
Mr Proof said:
Since, at least in principle this doesn't involve any creativity.
It does though since a paper proof often lacks necessary details or is just poorly compatible with Lean.
(deleted) (Jul 16 2025 at 12:33):
Yeah there's a lot of money in formalizing proofs. Unfortunately plugging in a random LLM is unlikely to work well, though.
(deleted) (Jul 16 2025 at 12:34):
You have to be a lot more sophisticated.
Mr Proof (Jul 16 2025 at 12:37):
I'm currently trying my method on "sum of 4 squares problem". So far this is where I got to after about a half an hour. It got stuck in trying to convert the "axiom_prime_factorization_exists" into a provable theorem. Which is surprising as I would have thought this would just be in mathlib.
You have to coax it along and tell it "Don't use Sorry, always break things down into axioms which I'll prove later" and so forth.
import Mathlib.Data.Nat.Prime.Basic
import Mathlib.Data.Nat.Factors
import Mathlib.Data.List.Basic
import Mathlib.Tactic
import Mathlib.Data.Nat.Factorization.Defs
import Mathlib.Data.Nat.Factorization.Basic
def is_sum_of_four_squares (n : Int) : Prop :=
n > 0 ∧ ∃ a b c d : Int, n = a*a + b*b + c*c + d*d
-- Axiom S1: Every prime number (as Nat) is a sum of four squares (cast to Int)
axiom axiom_prime_sum_of_four_squares :
∀ p : Nat, Nat.Prime p → is_sum_of_four_squares (p : Int)
-- Axiom S2: Multiplicativity of sum of four squares representation
theorem product_of_sum_of_four_squares
(m n : Int)
(hm : is_sum_of_four_squares m)
(hn : is_sum_of_four_squares n) :
is_sum_of_four_squares (m * n) := by
rcases hm with ⟨hm_pos, ⟨a, b, c, d, hm_eq⟩⟩
rcases hn with ⟨hn_pos, ⟨e, f, g, h, hn_eq⟩⟩
let A := a*e - b*f - c*g - d*h
let B := a*f + b*e + c*h - d*g
let C := a*g - b*h + c*e + d*f
let D := a*h + b*g - c*f + d*e
have prod_pos : 0 < m * n := mul_pos hm_pos hn_pos
use prod_pos, A, B, C, D
calc
m * n = (a*a + b*b + c*c + d*d) * (e*e + f*f + g*g + h*h) := by rw [hm_eq, hn_eq]
_ = A*A + B*B + C*C + D*D := by ring
-- Axiom S3: One is a sum of four squares
theorem one_is_sum_of_four_squares : is_sum_of_four_squares 1 :=
⟨by norm_num, ⟨1, 0, 0, 0, by rfl⟩⟩
-- Axiom S4: Every positive Nat has a prime factorization
axiom axiom_prime_factorization_exists :
∀ n : Nat, n > 0 → ∃ factors : List Nat,
(∀ p ∈ factors, Nat.Prime p) ∧ factors.prod = n
-- Axiom S5: Combining S1 to S4 implies every positive integer (as Int) is sum of four squares
axiom axiom_combination_implies_four_squares :
(∀ p : Nat, Nat.Prime p → is_sum_of_four_squares (p : Int)) →
(∀ m n : Int, is_sum_of_four_squares m → is_sum_of_four_squares n → is_sum_of_four_squares (m * n)) →
is_sum_of_four_squares 1 →
(∀ n : Nat, n > 0 → ∃ factors : List Nat,
(∀ p ∈ factors, Nat.Prime p) ∧ factors.prod = n) →
∀ n : Int, n > 0 → is_sum_of_four_squares n
-- Main theorem
theorem lagrange_four_squares (n : Int) (hn : n > 0) :
is_sum_of_four_squares n :=
axiom_combination_implies_four_squares
axiom_prime_sum_of_four_squares
product_of_sum_of_four_squares
one_is_sum_of_four_squares
axiom_prime_factorization_exists
n
hn
(deleted) (Jul 16 2025 at 12:37):
Right now the best LLM for Lean 4 is DeepSeek Prover v2.
(deleted) (Jul 16 2025 at 12:37):
Honestly this code looks like garbage to me...
Mr Proof (Jul 16 2025 at 12:38):
What do you mean garbage?
Mr Proof (Jul 16 2025 at 12:38):
It has broken it down into steps. Which I'll prove one by one.
(deleted) (Jul 16 2025 at 12:39):
ChatGPT is outright useless at Lean. In the past I thought I could use ChatGPT to break a theorem down into multiple parts, but after months of repeated failures I decided to do that by hand.
(deleted) (Jul 16 2025 at 12:39):
And rely on DeepSeek Prover v2 to prove parts I feel the LLM can handle.
Mr Proof (Jul 16 2025 at 12:40):
Yes, we really need a model trained on Lean directly.
Last updated: Dec 20 2025 at 21:32 UTC