Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: Erdos 124 AI Solution


Anh Nguyễn (Nov 30 2025 at 08:53):

https://x.com/SebastienBubeck/status/1994946303546331508

Anh Nguyễn (Nov 30 2025 at 08:53):

This is proved (in Lean) by Aristotle (Harmonic) all by itself

Anh Nguyễn (Nov 30 2025 at 08:54):

Is the result a big deal?

Kevin Buzzard (Nov 30 2025 at 09:34):

The first thing we need, before everyone goes crazy, is for someone who cares about Erdos problems (e.g. Tom Bloom or someone from DeepMind involved with a formalization of the statement) to confirm that we've got the statement correct here. I haven't read carefully but again there seems to be some issue with DeepMind's formal statement of the problem.

But solving an Erdos problem (assuming it's solved), even a problem which not many people have thought about, is definitely a bigger deal than solving a random problem which someone raises in an obscure maths paper and perhaps nobody other than the author has thought about.

Kevin Buzzard (Nov 30 2025 at 09:35):

Another thing one needs to check is that the problem is not already solved in the literature (as has happened before with the Erdos problems website; this is not a criticism of the website, this is merely the observation that it's hard to keep track of the literature).

Kevin Buzzard (Nov 30 2025 at 09:38):

But if those basic boxes are ticked (problem not already solved in the literature, Lean statement of problem faithfully represents Erdos' idea) then this is a very rare event and in my mind just reinforces how mathlib needs to get its act together so it can state interesting problems at the boundary of research areas of mathematics where the objects are more technical than those studied in combinatorics. To give a random example: the Birch and Swinnerton-Dyer conjecture (a Clay Millenium problem like P=NP) is a statement that two invariants attached to elliptic curves over the rationals are equal, and mathlib has elliptic curves over the rationals, but mathlib does not have a definition of either invariant.

Kevin Buzzard (Nov 30 2025 at 09:42):

Note also that the solution is very short. It is unlikely that there is a very short solution to most of the interesting problems in mathematics; indeed a theorem of logic guarantees that for every B there is a provable mathematical statement whose shortest proof is B times the length of the statement. Maths is hard -- in general.

TongKe Xue (Nov 30 2025 at 09:58):

  1. Congrats Harmonic team!

  2. In situations like this, given all the research labs keep their training data private, is there a way to measure "how close/far is the output proof from 'closest training data points'" ?

I.e. for some definition of similarity/closeness, I'm really curious what the closest/most-similar proofs in the training data (to the proof of this particular Erdos problem) looks like.

Notification Bot (Nov 30 2025 at 10:29):

8 messages were moved here from #Machine Learning for Theorem Proving > Discussion: AI-written mathematical proofs by Jason Rute.

Jason Rute (Nov 30 2025 at 10:31):

Direct link to the erdos problem thread: https://www.erdosproblems.com/forum/thread/124#post-1892

Kevin Buzzard (Nov 30 2025 at 11:53):

Bloom's comments on X here put the work in some persepective. https://x.com/thomasfbloom/status/1995094668879462466

Thomas Bloom (Nov 30 2025 at 12:01):

I will copy my comments from X here for discussion. In short, yes this is a correct proof/formalisation of a problem asked by Erdős in at least one source, although the problem turns out to be very easy, and I doubt it is one that people have seriously thought much about until now.

(Anyone interested in this should go and look at the human description of the proof provided by tsaf on the forum, which is very short, and you can judge for yourself how 'hard' this problem really was.)

  • The solution turns out to be quite easy and short in hindsight, making the problem on the level of the problems in mathematical competitions, where we've seen AI have great success. So this should be viewed on a par with this, rather than anything new.

  • There are two different versions of this problem asked by Erdős. The one AI solved is the easier one. The one asked in the original paper is the harder one, which the AI solution (appears to) say nothing about.

  • I would not be surprised if the solved problem had in fact appeared in some mathematical competition (perhaps without knowing the Erdős connection), so it could have been part of some training data, and the solution not quite as novel as it seems.

  • Just because the problem has, in a sense, been 'open for 30 years', this does not mean that anyone has seriously tried to solve it in this time. For example, I did not, just writing it on the site and then moving on. The easy version of the problem which was solved was asked by Erdős alone in two reasonably obscure problem collections in 1997. Indeed, it is possible that Erdős was actually thinking of the original 'hard' version of the problem, being aware of this 'easy' solution, and simply miswrote in these problem collections. This is pure speculation though (and it is strange that Burr, Erdős, Graham, and Li did not note this easy solution in their original 1996 paper - its absence is weak evidence that they had simply overlooked this easy argument).

  • The Erdős problems website tries to collect all problems of Erdős. Many of these will turn out to be easy, or not important, and may have remained unsolved just because nobody has looked at them in decades.

James E Hanson (Nov 30 2025 at 18:33):

Kevin Buzzard said:

Note also that the solution is very short. It is unlikely that there is a very short solution to most of the interesting problems in mathematics; indeed a theorem of logic guarantees that for every B there is a provable mathematical statement whose shortest proof is B times the length of the statement. Maths is hard -- in general.

But the statements that come from Gödel's speed-up theorem are things that I feel like you would be quick to call unnatural or 'manufactured' in a different context. There's no theorem that says that there are arbitrarily hard-to-prove mathematically interesting theorems. We can't rule out the possibility that there's an entirely elementary three-page proof of Fermat's last theorem, for instance.

I'm not saying that this Erdős problem is an example of this—it seems pretty reasonable to guess that the statement was just written down wrong—but I just feel like it's important to recognize that you are making a jump in your reasoning here.

James E Hanson (Nov 30 2025 at 18:41):

In the context of Go there was a drastic change in the metagame after computers got good enough to regularly beat human players. I'm not really that familiar with it myself, but my understanding is that the whole character of Go openings changed because humans had missed better strategies for literally thousands of years.

Deleted User 968128 (Nov 30 2025 at 19:29):

Thomas Bloom said:

I will copy my comments from X here for discussion. In short, yes this is a correct proof/formalisation of a problem asked by Erdős in at least one source, although the problem turns out to be very easy, and I doubt it is one that people have seriously thought much about until now.

This is one of the largest advantages of AI, especially since it already exists. Egolessly and tirelessly solving problems that for reasons people will not apply effort to.

However, it's unfortunate there isn't a more automated way of determining value and re-use, or at least a rubric that could be applied without having an apriori skillset to solve these problems.

If there were, the advantage above could be more strategically executed.

That said, there is one utility Aristotle has performed that is impossible to deny - narrowing the problem set to a greater density of interesting problems.

Deleted User 968128 (Nov 30 2025 at 19:59):

TongKe Xue said:

I.e. for some definition of similarity/closeness, I'm really curious what the closest/most-similar proofs in the training data (to the proof of this particular Erdos problem) looks like.

If LLMs, by 'stochastically parroting' can significantly accelerate the speed at which humans can productively and collectively collaborate on difficult problems, that seems like a measurable improvement and one that should be celebrated and encouraged rather than disparaged (not that anyone here is doing the latter).

AI having better skills in citing would be appreciated though.

TongKe Xue (Nov 30 2025 at 20:34):

  1. The first sentence of my post was "Congrats to the Harmonic team!". The "edit distance to closest proof" comment was not in anyway meant to be disparaging. I am also not accusing you of accusing me of making a disparaging comment. :-)

  2. The motivation behind my question is as follows: I recall playing with an earlier [unnamed] open weight model that did NOT use Lean4. This model had a fairly high score on the AIME benchmark, and I started playing with making random problems for it to solve. After a few days of giving it problems it could and could not solve, the impression I got was: this model has a very large memory bank of "problem template -> solution template" pairs; and if I hit this distribution, it was great; and if I hit outside this distribution, it tried to combine stuff, resulting in incorrect hallucinated answers. My take away from this experiment was: much of the "intelligence" of this non-lean4 model, on AIME problems, comes from having a vast vast vast (and probably expensive) input training dataset of many AIME and AIME-like problems. (As an aside, I'm not disparaging having large training data sets -- assembling a large data training set, whether by human labor or synthetic data, is IMHO an impressive feat of engineering -- I tried and failed at this myself.)

  3. My "edit distance" comment was in no way meant to "disparage" the achievement of the Harmonic team -- it is quite impressive they have solved this particular problem no other Human + LLM model has. What I am trying to get at here is this: where did this extra "boost" in power come from ? Is it

3a. The Aristotle model is just better at everything else at solving Lean4 proofs ? OR
3b. The Aristotle model has a better auto-formalization tech, managed to crawl more of springer-textbooks / Arxiv-papers, built a larger database of "facts humans have proved" ?

Now, both 3a & 3b are impressive feats of engineering; but they're different, 3a is about better at "grinding tactics", 3b is about better at "indexing + auto formalizing all known math knowledge"; and I'm just curious if we're in world 3a or 3b.

No hard feelings. Just wanted to clarify motivation. :-)

Deleted User 968128 (Nov 30 2025 at 20:47):

More powerful stochastic parrots could significantly improve and cement a global culture of collaboration. Especially if these parrots could cite and credit more effectively.

Imagine, for a moment, doing realtime math research and some theorem you prove incidentally in your session could be realtime suggested and applied to some adjacent effort someone else is working on across the world.

A globally coordinated AI session on the Erdos problems itself, for example.

Unfortunately, everyone disdains the parrot and is so focused on AI replacing rather than empowering and enhancing and it is unlikely this will ever get built.

Deleted User 968128 (Nov 30 2025 at 21:21):

That said, it might be interesting to build an agentic flow tracking all of the Erdos artifacts (git, forums, etc) in realtime and synthesizing what might be applicable and re-used. In a way, if you read the forums, you can see Terrence Tao is doing this manually (making frequent followups with GPT and Gemini)

Deleted User 968128 (Dec 01 2025 at 17:52):

There is https://openai.com/index/group-chats-in-chatgpt/ as well, but would require folks to get accounts and use it. A bit inflexible (maxes out at 20 people, miss out on cross pollination, extrinsic re-use) but still an interesting experiment, imho. Having a separate chat per erdos problem, perhaps? You'd want to use Pro, ofc.

Maybe folks might reach out to https://x.com/SebastienBubeck and see if OpenAI might sponsor this with credits. If it does indeed accelerate progress it would be a huge win for everyone.

Andy Jiang (Dec 02 2025 at 03:06):

There seems to be a new problem which is also independently solved by the same system. https://x.com/llllvvuu/status/1995596080042721740 But it's unclear to me why they are not running the entire collection of Erdos problems through Aristotle (is it for cost reasons or is there some human work involved in inputing the problems?) Not sure on specifics of the system.

Anh Nguyễn (Dec 02 2025 at 03:46):

Yes it is #481 Erdos

Anh Nguyễn (Dec 02 2025 at 04:09):

https://x.com/HarmonicMath/status/1995236102001815947

Thomas Bloom (Dec 02 2025 at 06:17):

I expect people are right now. I am personally am not doing any, but I'd be surprised if e.g. Boris Alexeev wasn't trying at least quite a few. One thing is that there are lots of problems and each takes non-trivial time to run (e.g. a few hours).

But, for almost all the problems, the AI (at the moment) won't be able to solve them, or say anything new about them. They vary widely in difficulty, and the two solved ones are definitely on the easy end. (Not to diminish the accomplishments, I just mean by comparison to the likely difficulty of most of the others.)

Joseph Myers (Dec 02 2025 at 12:55):

The advent of AI has helped to illustrate how there are many different forms of intelligence, and AI can have different abilities, and make progress at different rates, in different areas - so that questions about possible "AGI" now seem to be over-simplified, but we can now ask better questions based on a better understanding of different forms of intelligence.

Similarly, AI progress on theorem proving has helped to illustrate that there is no clear binary division between solved and unsolved problems; there can be a problem that was thought unsolved but where a solution actually existed in the literature from decades ago, or a problem that's unsolved but is almost immediate from a known result, or that's unsolved because no-one spent much time thinking about it (or because no-one thinking about it knew about the key result for solving it), or one where the literal statement is much easier than intended to prove or disprove but can be reformulated to keep the intended idea while eliminating the easy argument - and all of these can apply to Erdős problems. And that's apart from all the ways in which disputes over the nature of a claimed (informal) solution can affect that binary - where no consensus is reached on the validity of a solution or the significance of any mistakes in it.

So given all those issues, it's fairly clear that the difficulty of easier "unsolved" problems overlaps a lot with that of harder competition problems on which AI has had reasonable success (and also that of lemmas that come up in working on larger problems, for which the utility of AI assistance is obvious). And where we might previously have asked simply whether or when AI could solve unsolved problems, as AI actually contributes in that area we see that we should actually be asking better multidimensional questions, about the strengths and weaknesses of AI for problems at all levels from "student exercise" to "Millennium Prize Problem", formal and informal, in different areas of mathematics, both precisely formulated problems and problems where it's less clear what the shape of a proper answer would be.

Testing an AI on "everything in formal-conjectures as of [date]" (and reporting all successes, failures and problems where the AI showed up a misformalized statement, and repeating as the AIs improve and formal-conjectures grows) might give some interesting information, since certainly formal-conjectures must have a spread of difficulties. But it's hard to know what the actual spread of difficulties is for unsolved problems. And undoubtedly formal-conjectures is massively biased to those areas of mathematics for which the definitions exist in mathlib at present to be able to state the problems at all.

Timothy Chow (Dec 05 2025 at 19:38):

Kevin will not get into as much trouble for saying that maths is hard as Barbie did for saying something similar, but I think it's worth thinking about what makes hard math problems hard.

Proof length is part of the story, but not the whole story, and I don't think it's even the most important part of the story. The proof of the classification of finite simple groups is renowned for its length, and the classification was certainly a hard problem, but let's not forget that ultimately it was solved (I'm temporarily setting aside any lingering skeptical doubts about possible gaps in the proof), whereas there are presumably a lot of hard conjectures out there with undiscovered proofs that are much shorter.

I have sometimes half-jokingly said that successful mathematical research requires combining Wile E. Coyote with MacGyver with the Borg. Wile E. Coyote never gives up despite repeated setbacks. MacGyver ingeniously crafts miraculous solutions with whatever tools are at hand. The Borg assimilates everything and leverages the power of the collective.

Math competition problems emphasize the MacGyver angle, which is certainly important. I'll just add that problem-solving ability is important not just for elementary problems. There are many papers in highly technical areas of mathematics that solve difficult problems via a relatively short and ingenious argument using standard tools. Assuming one has put in the effort to learn the standard tools, the skills involved here are (I claim) not that different from the skills needed to solve challenging elementary problems.

I don't think Wile E. Coyote requires further comment, but the Borg angle is interesting. I strongly agree with Kevin Buzzard that building up mathlib, especially key definitions, is extremely important. But there's more to the power of the collective than simply documenting known results. One thing that made the classification of finite simple groups hard was that it wasn't something that could be attacked "frontally": it needed to be decomposed into modular pieces that could be attacked separately. If you like, the success of the classification illustrated the power of the blueprint.

In my opinion, for computers to truly excel at mathematical research, they will need to acquire the ability to develop large-scale plans of attack for hard problems. In particular, that means being able to formulate good conjectures. Automated generation of conjectures goes back to Graffiti if not earlier, but what I'm talking about here are not just isolated conjectures inferred from computational data, but conjectures whose solutions are likely to produce useful tools for solving other problems. A major reason that Erdős problems are so justly famous is that he had an uncanny ability for identifying problems whose solutions would not just answer one isolated question but would likely generate tools for overcoming a frequently-arising difficulty.

The IMO Grand Challenge has successfully stimulated the development of better silicon MacGyvers. It would be nice to formulate an analogous challenge to stimulate the development of a silicon Erdős or Langlands or [insert name of mathematician famous for brilliant conjectures]. That seems difficult, but maybe not impossible?


Last updated: Dec 20 2025 at 21:32 UTC