Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: Autoformalization of the probabilistic abc-conjecture

Srivatsa Srinivas (Jun 12 2025 at 16:22):

https://github.com/morph-labs/lean-abc-true-almost-always?tab=readme-ov-file

Pretty interesting. I wonder how much "Ouija boarding" went in to generating the "blueprint"

Srivatsa Srinivas (Jun 12 2025 at 16:28):

By "Ouija boarding" I mean the following; the formalization system takes in a human generated "blueprint" (or detailed proof strategy). If the proof strategy was edited many times to get the desired autoformalization result, one can drum up an analogy to "Ouija boards"

Srivatsa Srinivas (Jun 12 2025 at 16:33):

But since Lean is type checked, "Automated Ouija boarding" can be a valid path towards autoformalization

Martin Dvořák (Jun 12 2025 at 16:44):

For somebody who just wants a summary what the repo proves:

def rad (n : ℕ) : ℕ := if n = 0 then 0 else n.primeFactors.prod id

def exceptionalSet (N : ℕ) (ε : ℝ) : Set (ℕ × ℕ × ℕ) :=
  { (a, b, c) |
    1 ≤ a ∧ a ≤ N ∧
    1 ≤ b ∧ b ≤ N ∧
    1 ≤ c ∧ c ≤ N ∧
    a + b = c ∧
    Nat.Coprime a b ∧
    (rad (a * b * c) : ℝ) < (c : ℝ) ^ (1 - ε) }

theorem theorem67 (ε : ℝ) (hε : ε > 0) (hε_small : ε < 1/100)  :
  ∃ (C : ℝ), C > 0 ∧ ∃ (N₀ : ℕ), ∀ (N : ℕ) [Fintype ↑(exceptionalSet N ε)], N ≥ N₀ →
    ((exceptionalSet N ε).toFinset.card : ℝ) ≤ C * (N : ℝ) ^ ((2: ℝ) / 3)

And it does build on my computer, with the result:

'theorem67' depends on axioms: [propext, Classical.choice, Quot.sound]

Martin Dvořák (Jun 12 2025 at 16:49):

The nontrivial Mathlib definitions that it depends on are:

def Nat.minFacAux (n : ℕ) : ℕ → ℕ
  | k =>
    if n < k * k then n
    else
      if k ∣ n then k
      else
        minFacAux n (k + 2)

def Nat.minFac (n : ℕ) : ℕ :=
  if 2 ∣ n then 2 else minFacAux n 3

def Nat.primeFactorsList : ℕ → List ℕ
  | 0 => []
  | 1 => []
  | k + 2 =>
    let m := minFac (k + 2)
    m :: primeFactorsList ((k + 2) / m)

def Nat.primeFactors (n : ℕ) : Finset ℕ := n.primeFactorsList.toFinset

Jason Rute (Jun 12 2025 at 16:53):

It seems to me more like using natural language LaTeX to code in Lean, probably with a lot of back-and-forth (and some good Lean understanding). Probably way too expensive for a real user, but quite impressive at first glance. Love to see the details/fine-print.

Martin Dvořák (Jun 12 2025 at 16:53):

Don't get confused by the arrow in ↑(exceptionalSet N ε) as it is just Set.Elem where the arrow needn't be written explicitly.

Patrick Massot (Jun 12 2025 at 16:54):

You can watch the announcement on the Big Proof meeting website for more details.

Srivatsa Srinivas (Jun 12 2025 at 16:54):

It would make a killer tactic!

Jason Rute (Jun 12 2025 at 16:55):

@Patrick Massot Link?

Patrick Massot (Jun 12 2025 at 16:56):

#announce > Big Proof 2025 @ 💬

Jason Rute (Jun 12 2025 at 16:57):

Is this talk too new to be up?

Patrick Massot (Jun 12 2025 at 16:58):

https://youtu.be/zLqh1FidFRc?t=3159 is more precise

Patrick Massot (Jun 12 2025 at 16:59):

Although I encourage you to watch the full talk by Bhavik.

Jason Rute (Jun 12 2025 at 17:01):

Oh, you mean it is happening right now live?

Patrick Massot (Jun 12 2025 at 17:04):

It happened one hour ago.

Patrick Massot (Jun 12 2025 at 17:05):

But you can watch the recording.

Andy Jiang (Jun 12 2025 at 18:14):

seems like on the website https://www.morph.so/blog/trinity there's a github repo with instructions on how to run the autoformalization program yourself? is this correct? sorry I don't have time to look at it carefully right now

Justin Asher (Jun 12 2025 at 18:15):

@Andy Jiang I thought they just open sourced the Lean code and not the autoformalization program?

Andy Jiang (Jun 12 2025 at 18:17):

sorry I think you're right, their github link seems just to be some way to run a lean server on their cloud service? ->https://github.com/morph-labs/morphcloud-examples-public/tree/main/lean-server

Jason Rute (Jun 12 2025 at 18:18):

The blog makes it sound like they started from the paper alone and automatically formalized the paper. The repo makes it seem like they started with a human-written blueprint. Does anyone know? (I haven’t had time to watch the talk.)

Yakov Pechersky (Jun 12 2025 at 18:23):

image.png

Eric Wieser (Jun 12 2025 at 20:20):

Jason Rute said:

The blog makes it sound like they started from the paper alone and automatically formalized the paper. The repo makes it seem like they started with a human-written blueprint. Does anyone know? (I haven’t had time to watch the talk.)

They said the blueprint was human written

Eric Wieser (Jun 12 2025 at 20:21):

And said that they did extend the blueprint in response to the AI being unable to prove certain lemmas in it; but this is much the process that humans managing blueprints use when their contributors need more detail, so I don't think this detracts from the result

Eric Wieser (Jun 12 2025 at 20:21):

(the implication was that they didn't do this much)

Kevin Buzzard (Jun 12 2025 at 21:24):

My understanding was that this was not a blueprint in the traditional sense -- Jared broke the proof down into a bunch of statements and did not supply the proofs but merely said, for each statement, which of the previous statements it used in its proof.

Justin Asher (Jun 12 2025 at 22:00):

That would make sense because their Infinibranch is meant for parallelized tactic execution, e.g., last mile search.

Bhavik Mehta (Jun 13 2025 at 02:15):

Martin Dvořák said:

For somebody who just wants a summary what the repo proves:

This is indeed what the repo proves. As I mentioned in my talk though, the idea of formalising things related to this was there before AI involvement, by myself and Arend, and one of the first things I did was check that the AI-generated definitions made sense.
In particular, that the AI-generated definitions matched with the ones that Arend and I had written (again, before any AI was involved): I showed the result of some of these in the talk.

The details of these are also available in this branch of the ABC-Exceptions project: https://github.com/b-mehta/ABC-Exceptions/tree/other. The compat section shows that the definition of radical matches (for positive n) with that in mathlib, and that definition of the exception set matches that which Arend and I were working with. Theorems theorem67' and theorem67'' are restatements of the (AI-formalised) theorem67 (the first to avoid this weird [Fintype _] assumption, the second to use filter and asymptotic language). The rest of the file demonstrates the actual definitions and theorem statements, these ones were written by me (before AI involvement), but the proof of deBruijn goes by converting my definitions to Trinity's, then using its (monster) proof to close the goal.
And I'll add that a room with many Lean experts seemed convinced at least that the definitions I had written weren't crazy!

The second thing I did, after having Lean confirm that the AI-generated definitions were in alignment with the existing ones, was run the code through lean4checker, which passed. I encourage others to clone that branch of the repo and test this for yourself.

Justin Asher (Jun 13 2025 at 02:44):

@Bhavik Mehta How do Trinity's proofs and its approach to the formalization compare to your own process? Do they feel verbose? Clever? Intuitive?

Luigi Massacci (Jun 17 2025 at 09:38):

The video appears to be down so I can't check if it was said in the presentation, but is the published code exactly what the model produced, or was it "humanized" afterwards? (fixing eventual weird formatting, etc). In particular are the comments also generated by the model?

Bhavik Mehta (Jun 17 2025 at 09:59):

The published code is exactly what the model produced, including comments. This is why there are long lines, unused hypotheses, and other lint failures

Luigi Massacci (Jun 17 2025 at 11:23):

Cool, thank you

Bolton Bailey (Jun 17 2025 at 15:37):

Here's the blueprint graph.
455202289-7b4cdf23-04ee-4cef-a78e-cb7c09f8a535.png
Why is lemma 32 disconnected from the rest of the graph? Is this something from the original proof that turns out not to be needed?

Bhavik Mehta (Jun 18 2025 at 00:07):

I believe that's correct, but I'll need to check to make sure. Recall that the blueprint graph was human-generated, so the other option is that it was just a mistake.

Bhavik Mehta (Jun 18 2025 at 11:34):

I don't yet have a complete answer, but I have a partial one. Lemma 32 is a binary version of lemma 33, which is finitary: if lemma 33 were to be proved by induction then lemma 32 may be a helpful component. The statement of lemma 33 is written inline in the original proof (it's the middle equality in equation 2.2 near the top of page 3); but lemma 32 is not written in any form in the original latex document.

Yakov Pechersky (Jun 18 2025 at 12:13):

I often have side lemmas that I never end up using, but proving them helped be discover the API. So I imagine that the RL tool was able to discover API by proving a simpler lemma, and then extrapolate from that to the API it needed for the real goal.

Last updated: Feb 28 2026 at 14:05 UTC