Zulip Chat Archive

Stream: general

Topic: Erdos 707


Junyan Xu (Oct 22 2025 at 01:27):

https://x.com/SebastienBubeck/status/1980804267524116569

Kim Morrison (Oct 22 2025 at 01:43):

Okay, make everyone you know read the introduction and section 7 of this paper.

Kim Morrison (Oct 22 2025 at 01:45):

If anyone is suffering from tl;dr:

  • The authors found a counterexample to a long-standing "open" Erdos problem (in fact one with a $1000 prize -- that's a lot for Erdos problems!)
  • While writing it up, they discovered an ancient counterexample apparently even pre-dating Erdos' interest in the problem, that everyone had forgotten about (despite the paper containing it being cited).
  • They decided this was weird, and they better write a formal proof of both their counterexample and the old one.
  • But they didn't know any Lean.
  • But that was okay, because they got ChatGPT to do it for them.

From the acknowledgements:

Our formalization would have been much more difficult without the proof assistant Lean. The authors are especially grateful to the Mathlib community for their unified library of mathematics in Lean4. Even with this extensive library, we found it surprisingly painful to successfully write proofs in Lean, so we also thank OpenAI for producing a large language model (namely, ChatGPT with GPT-5) that is capable of meeting a human mathematician halfway (or at least partway).

From the intro:

The resulting formal proof is thousands of lines, nearly all of which were written by ChatGPT. Accordingly, we believe that ChatGPT is properly an author of the formal proof accompanying this paper. Unfortunately, large language models are known to hallucinate and otherwise produce incorrect results, so we would not be able to trust this proof unless it was in a formal language. Therefore, we believe Lean is also an author, or perhaps ChatGPT and Lean may be credited together

Screenshot 2025-10-22 at 12.38.47 PM.png

And they note that you don't have to fear that they vibe-proved the wrong theorem, because the theorem statement they are interested in has already been formalized in GDM's "Formal Conjectures" repository!

Adam Topaz (Oct 22 2025 at 01:48):

we didn’t know Lean, and it isn’t terribly user-friendly.

:frown:… but fair. The blue check marks in front of the theorems is a nice touch!

Bryan Gin-ge Chen (Oct 22 2025 at 01:51):

Their Lean work is contained a single file, here: https://borisalexeev.com/papers/Erdos707.lean (I haven't tried to run it myself.)

Adam Topaz (Oct 22 2025 at 01:54):

It typechecks for me!

Aaron Liu (Oct 22 2025 at 01:55):

oh it's so long

Adam Topaz (Oct 22 2025 at 01:55):

(I didn’t cross check with the formal conjectures statement though)

Andrew Yang (Oct 22 2025 at 01:58):

Quoting the paper:

The Formal Conjectures [...] are attempting to translate all of Erdős’s problems, and we were lucky that this particular Erdős problem had already been translated. [...] However, the corresponding statement in Lean was in fact incorrect for a subtle reason!

Jason Rute (Oct 22 2025 at 02:51):

That file's unicode is corrupted for me in my browsers.

Jason Rute (Oct 22 2025 at 02:52):

Beside checking the statements carefully, it would be nice practice to also check the code carefully too with:

  • print axioms
  • lean4checker
  • SafeVerify: In particular, you can make two versions of the file, one with just the theorem statements and the definitions needed to state the theorems, and then the whole file. SafeVerify will compare them, checking that the statements have the same terms and that the proofs/axioms are correct. If SafeVerify is happy, then you only need to check the smaller file for mathematical correctness of the definitions and statements.

Kim Morrison (Oct 22 2025 at 04:51):

Jason Rute said:

That file's unicode is corrupted for me in my browsers.

Downloading it with curl seems fine. Here's a clean copy, or a link to live.

David Michael Roberts (Oct 22 2025 at 05:05):

A non-X link to the actual paper would be nice.....

Terence Tao (Oct 22 2025 at 05:06):

https://borisalexeev.com/pdf/erdos707.pdf (with some more links at https://borisalexeev.com/papers/erdos707.html )

David Michael Roberts (Oct 22 2025 at 05:07):

Thanks, @Terence Tao ^_^ appreciate the abstract page link

David Michael Roberts (Oct 22 2025 at 05:17):

For us, however, convincing ChatGPT to formally prove the result that “if f is a fixed point–free
involution on a finite set S, then S has even cardinality” was a multi-day struggle! Furthermore, the
resulting argument is 250 lines of code, many of which deal with entirely trivial claims.

Surely one would expect a nice short proof of this? (responses in a different channel or topic, as appropriate, I guess; the authors also suspect this to be the case)

Terence Tao (Oct 22 2025 at 05:20):

I actually just worked this out as an exercise for myself: https://github.com/teorth/analysis/blob/main/analysis/Analysis/Misc/erdos_707.lean (lines 89-101). (It was the only step that really required some mathematical thought, everything else could be done by "routine" Lean tactic use and tool search.) The trick is to use the Burnside lemma docs#AddAction.sum_card_fixedBy_eq_card_orbits_mul_card_addGroup .

Kim Morrison (Oct 22 2025 at 05:27):

So is that a 30-fold improvement in the de Bruijn factor? :-)

Thomas Bloom (Oct 22 2025 at 05:29):

@Bhavik Mehta pointed out this is actually already in mathlib:
Equiv.Perm.two_dvd_card_support

Terence Tao (Oct 22 2025 at 05:29):

I didn't try to prove their most difficult result, instead focusing on the easier half-page proof of Theorem 8 on page 5 of that paper. I only formalized the main core of that proof (roughly speaking, the final four paragraphs of their six paragraph proof), as I could visually see how the first two paragraphs were routine for humans (but a little bit finicky for Lean, and I started running out of steam)).

Terence Tao (Oct 22 2025 at 05:31):

Thomas Bloom said:

Bhavik Mehta pointed out this is actually already in mathlib:

https://leanprover-community.github.io/mathlib4_docs/find/?pattern=Equiv.Perm.two_dvd_card_support#doc

Dang, I tried searching for this, but I made the tactical error of only looking for tools that involved Function.Involutive and so missed this one. Maybe finding the right Mathlib tools for a task is the next frontier in LLM-assisted Lean formalization.

Thomas Bloom (Oct 22 2025 at 05:34):

Terence Tao said:

Dang, I tried searching for this, but I made the tactical error of only looking for tools that involved Function.Involutive and so missed this one. Maybe finding the right Mathlib tools for a task is the next frontier in LLM-assisted Lean formalization.

I'm surprised that GPT didn't find that already, given that it had clearly absorbed enough of Mathlib to produce a working proof of the final result. Or perhaps it did 'know' that was there, but got sidetracked into trying to prove it itself as an exercise...

Terence Tao (Oct 22 2025 at 05:36):

By the way, grind was a very effective tool in speeding up the little formalization exercise I just did. I'm liking the proof style of just dumping all relevant facts into the proof context and closing with a call to grind.

Bryan Gin-ge Chen (Oct 22 2025 at 05:42):

In case there are folks who want to check / play around with the original Lean file locally, I went ahead and put it into a Lean project here (I'm sure several other folks have already done this as well!): https://github.com/bryangingechen/erdos707

After making a new Lean project with lake new and tweaking the lakefile, etc., I saved the original Lean file into the project and made sure it built, and then somewhat arbitrarily split up the earlier parts of the file (everything before the MAIN THEOREMS section) into ~12 smaller files in a subdirectory, using #min_imports to only import what was needed in each step. (If you're just interested in the original file before my split, you can check out the first commit in the repo.)

Johan Commelin (Oct 22 2025 at 07:16):

It seems that both human authors are on Zulip, btw.

Also, can an admin split this to discussion into a new Erdos707 thread? (I'm on mobile.)

Notification Bot (Oct 22 2025 at 07:16):

25 messages were moved here from #general > Lean in the wild by Michael Rothgang.

Johan Commelin (Oct 22 2025 at 07:29):

Cc @Dustin Mixon @Boris Alexeev

Oliver Nash (Oct 22 2025 at 09:17):

David Michael Roberts said:

A non-X link to the actual paper would be nice.....

For those of us who block X / Twitter in their hosts file:
https://xcancel.com/search?f=tweets&q=1980804267524116569

Bhavik Mehta (Oct 22 2025 at 10:30):

For fun, I wrote a proof of section 3 of the paper, which they didn't formalise: they instead formalised a different proof (given in the other sections) of this result though. The main thing I wanted to see here was what the length would look like; I've made no attempt to golf this. https://gist.github.com/b-mehta/de70748a09186df53417804717ad6ef9

Henrik Böving (Oct 22 2025 at 13:02):

Jason Rute said:

Beside checking the statements carefully, it would be nice practice to also check the code carefully too with:

  • print axioms
  • lean4checker
  • SafeVerify: In particular, you can make two versions of the file, one with just the theorem statements and the definitions needed to state the theorems, and then the whole file. SafeVerify will compare them, checking that the statements have the same terms and that the proofs/axioms are correct. If SafeVerify is happy, then you only need to check the smaller file for mathematical correctness of the definitions and statements.

The verifier that I've been developing at the FRO says yes :)

Jason Rute (Oct 22 2025 at 13:03):

@Henrik Böving I assume this verifier is not yet public? Can I still ask you more about it (in a seperate post)?

Asei Inoue (Oct 22 2025 at 13:07):

@Henrik Böving I’m also interested in that verifier. How is it different from existing verifiers?

Henrik Böving (Oct 22 2025 at 13:07):

Sure

Boris Alexeev (Oct 22 2025 at 14:21):

Hi all, thanks for reading and type-checking our paper.

Boris Alexeev (Oct 22 2025 at 14:29):

It was a fun experience writing this paper and working on the Lean code. (Discovering that the problem had already been solved ~80 years ago wasn't my favorite, but it was also the primary drive for the formalization, so I have mixed feelings there.)

Dustin and I were indeed already registered here (though I don't think we've ever messaged?). Accordingly, I knew all along that there was a nice and helpful group of humans in the Lean Zulip that would certainly be able to help us.

But we were specifically curious whether it was possible to vibe code the proof using ChatGPT, and we did not ask any humans for help (nor attempt to learn any Lean, nor write code by hand). Now that it's complete, I'm happy to actually learn some Lean and see how we should/could have done things.

Boris Alexeev (Oct 22 2025 at 14:39):

Thomas Bloom said:

Bhavik Mehta pointed out this is actually already in mathlib:
Equiv.Perm.two_dvd_card_support

Thanks for sending this along. ChatGPT indeed didn't find this. It did find card_modEq_card_fixedPoints which says that the cardinality of the fixed point set and the set are equal for a p-group action (which is what was used in the end, for an action of the two-element group).

But for this approach, we likely would still have had some trouble with ChatGPT hooking up all the pipes, as we did with card_modEq_card_fixedPoints. Our hypotheses were of the form

{S : Type*} (f : S  S) (hff :  x : S, f (f x) = x)
  (hno :  x : S, f x  x)

and ChatGPT wasn't the best at translating that into other forms (like the p-group). I do think Perm wouldn't be too bad. A result with Involutive would have been good. I think it says something that the result wasn't easy to find.

(One thing I also tried was looking at https://gist.github.com/alexjbest/1fcc06a67f4d56275f99c44037bb1b6d because I knew the mod-2 result was used within Zagier's proof but that was in Lean 3 and ChatGPT couldn't translate into Lean 4 correctly.)

Jason Rute (Oct 22 2025 at 14:41):

Discussion about new lean verifier here: #lean4 > FRO's new verifier @ 💬

Boris Alexeev (Oct 22 2025 at 14:42):

Bhavik Mehta said:

For fun, I wrote a proof of section 3 of the paper, which they didn't formalise: they instead formalised a different proof (given in the other sections) of this result though. The main thing I wanted to see here was what the length would look like; I've made no attempt to golf this. https://gist.github.com/b-mehta/de70748a09186df53417804717ad6ef9

Thanks for doing this. It's incredible to see what well-written Lean looks like. Just proving the results in the first two paragraphs of the proof (basic "counting", as in IsPerfectDifferenceSetMod.card_eq) took us sooo long.

Boris Alexeev (Oct 22 2025 at 14:45):

I tried and failed to prove Claim 10 from our paper (that corresponds to the minor mis-formalization issue in the formal-conjectures project, namely the "infinite" case v=0). Can anyone formalize that?

Artie Khovanov (Oct 22 2025 at 14:48):

Terence Tao said:

By the way, grind was a very effective tool in speeding up the little formalization exercise I just did. I'm liking the proof style of just dumping all relevant facts into the proof context and closing with a call to grind.

This is how a lot of people work with Sledgehammer in Isabelle (so clearly grind is doing its job to some extent)

Terence Tao (Oct 22 2025 at 14:49):

One minor suggestion for the abstract: I would more explicitly acknowledge the risks of over-reliance on either Lean formalization or "vibe coding" (which should probably still be in quotes, due to its recent coining), and state how one is addressing them. (In general, I am thinking it should be an emerging best practice to have a AI risk mitigation section in any AI-assisted paper enumerating the potential risks of relying on AI tools, and the steps taken within the paper to address these.)

Boris Alexeev (Oct 22 2025 at 16:47):

Terence Tao said:

By the way, grind was a very effective tool in speeding up the little formalization exercise I just did. I'm liking the proof style of just dumping all relevant facts into the proof context and closing with a call to grind.

As a data point, I have never heard of grind. ChatGPT did not suggest it a single time, and based on my understanding right now, it would have helped a lot.

Jovan Gerbscheid (Oct 22 2025 at 16:58):

Yes, this is because grind was released only a few months ago, and chatGPT isn't up to date!

Kevin Buzzard (Oct 22 2025 at 18:53):

My guess is that you can tell the LLM about grind in the prompt if you know what you're doing?

Bhavik Mehta (Oct 22 2025 at 19:55):

Boris Alexeev said:

I tried and failed to prove Claim 10 from our paper (that corresponds to the minor mis-formalization issue in the formal-conjectures project, namely the "infinite" case v=0). Can anyone formalize that?

I've added a proof of that here: https://gist.github.com/b-mehta/de70748a09186df53417804717ad6ef9#file-infinite-lean

Terence Tao (Oct 22 2025 at 21:52):

We (= @Mario Carneiro mostly) actually developed a bunch of formalism for greedy algorithms as part of the Equational Theories project (in particular equational#exists_greedy_chain). Not sure whether we ever ported this to Mathlib

Boris Alexeev (Oct 23 2025 at 21:49):

The Formal Conjectures project includes an "example/special case" for embedding 3-element sets: https://github.com/google-deepmind/formal-conjectures/blob/7ff1f8533e5ed7a63a065ac1941f402de7f5b50c/FormalConjectures/ErdosProblems/707.lean#L156

Boris Alexeev (Oct 23 2025 at 21:49):

We asked about it on Math Overflow: https://mathoverflow.net/questions/501983/can-every-sidon-set-of-size-3-be-extended-to-a-finite-perfect-difference-set

Boris Alexeev (Oct 23 2025 at 21:50):

Will Sawin sketches a proof that the result is true in that case.
But I don't think I'll be formalizing his proof ..

Timothy Chow (Nov 09 2025 at 03:32):

Terence Tao said:

I'm liking the proof style of just dumping all relevant facts into the proof context and closing with a call to grind.

I'm late to this discussion, but when I did the "odd cardinality implies fixed point" formalization exercise, I fumbled around for an embarrassingly long time before realizing that grind could save me something like 30 lines of code. You can see my code here.


Last updated: Dec 20 2025 at 21:32 UTC