Zulip Chat Archive
Stream: Machine Learning for Theorem Proving
Topic: Danger of Autoformalizers (e.g. Gauss)
Jeremy Tan (Sep 11 2025 at 22:35):
Have you considered the grave danger inherent in such powerful autoformalisers like this? Undergraduate maths students will no longer have an incentive to understand proofs, they'll just feed their questions to the autoformaliser and passively accept its results
Jeremy Tan (Sep 11 2025 at 22:37):
Also, if autoformalisers aren't open-sourced, we may end up with an old joke where a commercial company claims to have solved something like the Riemann hypothesis but never produces an independently verifiable proof
Jeremy Tan (Sep 11 2025 at 22:40):
The name Math, Inc is very ominous
Jeremy Tan (Sep 11 2025 at 22:57):
tl;dr For the sake of its community's health and its future, we cannot afford mathematical proof becoming a commodity
Jesse Michael Han (Sep 11 2025 at 23:29):
@Jeremy Tan we care a lot about making sure this technology is useful to mathematicians and moves mathematics as a human enterprise forward. historically, math advances when we can hide complexity, and autoformalization has the potential to allow us to work at higher levels instead of spelling out details all the time. a similar change happened when compilers were invented; it's true that the fraction of programmers familiar with low-level instructions and working really close to the machine dwindled, but moving to higher-level languages unlocked an entire universe of software. we're excited about what that could look like.
Jeremy Tan (Sep 11 2025 at 23:33):
I am not against Gauss or any autoformaliser; I am just warning you that such tools must be used correctly since they carry a lot of potential risk
Jeremy Tan (Sep 11 2025 at 23:34):
I would certainly not want "theorems for sale"
Ching-Tsun Chou (Sep 11 2025 at 23:35):
May I suggest that meta-discussions like the last few messages above be moved to a different thread? I am not sure that everyone is interested in such issues. Thanks!
Notification Bot (Sep 11 2025 at 23:40):
8 messages were moved here from #Machine Learning for Theorem Proving > Math, Inc, and Gauss discussion by Jason Rute.
Bryan Wang (Sep 11 2025 at 23:55):
Jeremy Tan said:
I would certainly not want "theorems for sale"
There's an infinite supply of theorems out there, so I personally think any business in the "market for theorems" would go out of business quite quickly, either due to lack of demand, or oversaturation of the market with theorems, or (most probably) both.
Bryan Wang (Sep 11 2025 at 23:57):
Regarding autoformalisation (and automation in general), I too am quite pessimistic about such things, but I've quickly realised that in this day and age, we just can't stop people out there from trying to automate anything. But the mathlib community (and wider mathematical Lean community in general) put in a lot of effort to ensure that we have good-quality, well-organised, and (at least locally) understandable code in mathlib (this is something that I think is not advertised strongly enough to mathematicians and the public). If autoformalisation really takes off then I can see organisation and maintenance becoming the main job of formalisers, with autoformalisers being just one of many tools to speed up progress towards the frontiers of math. For what it's worth, there is no analogue of this and no interest in this (a central repository of all of math which is high-quality and well-organised) at all in traditional mathematics! And while undergrad education is a separate issue which deserves separate attention, even currently, traditional mathematicians have no issue routinely citing big results without understanding most of the details.
Kevin Lacker (Sep 12 2025 at 00:29):
theorems for sale would be great. i would like to buy a theorem that proves the apps on my phone aren't going to steal my personal data
(deleted) (Sep 12 2025 at 00:39):
I was about to make a post saying LLMs are unlikely to be able to do formal verification. But I woke up and saw this post. :eyes: Gotta keep my eyes peeled
Yan Yablonovskiy 🇺🇦 (Sep 12 2025 at 01:20):
Unfortunately such dangers have already become a reality, if the concern is math students. Essentially all undergrads I teach , already try to use LLMs excessively for natural language proofs or to explain such proofs, no matter what you say to the students.
It just isn't in a formalised setting like Lean 4, so really autoformalisation is an improvement of the status quo.
It is like if a student always had a solution manual to everything, of sorts, its up to the student to practice self-discipline. And typically courses try to wash that out via paper invigilated exams.
Bryan Wang (Sep 12 2025 at 02:31):
Yan Yablonovskiy 🇺🇦 said:
Unfortunately such dangers have already become a reality, if the concern is math students. Essentially all undergrads I teach , already try to use LLMs excessively for natural language proofs or to explain such proofs, no matter what you say to the students.
It just isn't in a formalised setting like Lean 4, so really autoformalisation is an improvement of the status quo.
Yes - and so I feel that, if anything, an inscrutable auto-generated Lean-verified proof of the Riemann hypothesis is orders of magnitude better than an inscrutable auto-generated natural language proof, because at least we can verify correctness. And at this point, there will always be people out there trying to do either of these (automate formal math, or automate natural language math), regardless of the dangers.
Jeremy Tan (Sep 12 2025 at 03:14):
Huỳnh Trần Khanh said:
I was about to make a post saying LLMs are unlikely to be able to do formal verification
Who said that was unlikely to happen?
Yao Liu (Sep 12 2025 at 03:52):
As a math student before LLM era, I also was too lazy to study proofs. We all rely on things that other have proved, until the point we have to teach it ourselves (perhaps multiple times, before we actually understand it). Those who like to learn proofs will continue to learn proofs.
(deleted) (Sep 12 2025 at 04:47):
Jeremy Tan said:
Huỳnh Trần Khanh said:
I was about to make a post saying LLMs are unlikely to be able to do formal verification
Who said that was unlikely to happen?
Me. Having seen repeated failures of GPT-5 and Claude Code. But too bad. In fact I was about to make a post saying that we should use AI to generate tools that can solve specific classes of problems, or to generate an auxiliary interactive theorem prover that is more limited and faster than Lean for rapid iteration
(deleted) (Sep 12 2025 at 05:08):
I'm not sure why theorems for sale is bad. I think it's a very good idea.
Chris Henson (Sep 12 2025 at 05:12):
Huỳnh Trần Khanh said:
I'm not sure why theorems for sale is bad. I think it's a very good idea.
To paraphrase the famous six-word story: "For sale: proven theorem, never understood"
Bryan Wang (Sep 12 2025 at 05:35):
One beauty of pure math is that the perceived value of a given theorem/result is directly proportional to (or at least strongly correlated with) the perceived difficulty of obtaining said result. Automation by nature decreases the difficulty of obtaining these results, and so the irony is that the better automation gets, the more it devalues (in the eyes of pure mathematicians) the results it can obtain. Therefore while formalisation and autoformalisation could change pure math in some other ways (e.g. changing what pure mathematicians value), I personally don't believe there is any profitable business to be found in pure math in the long run. I do agree though that perhaps applied math, on the other hand, could be a quite different story...
Yao Liu (Sep 12 2025 at 06:01):
I don't think they refer to anything specific when saying "theorems for sale", but just a hint from Szegedy's posts, he wants to have formalized verification for superintelligence, whatever that means.
Michael Rothgang (Sep 12 2025 at 08:04):
Bryan Wang said:
Yes - and so I feel that, if anything, an inscrutable auto-generated Lean-verified proof of the Riemann hypothesis is orders of magnitude better than an inscrutable auto-generated natural language proof, because at least we can verify correctness.
I would argue that an inscrutable machine-checked proof of even the Riemann hypothesis on its own is not worth much: the point of proof is to impart understanding, not a binary "is this correct" bit.
Michael Rothgang (Sep 12 2025 at 08:05):
(Though in practice, producing one will be hard without also producing a paper proof, which can hopefully be the basis for a clear exposition.)
Johan Commelin (Sep 12 2025 at 08:43):
I actually think that any inscrutable proof will turn out to give some hints as to where to look for understanding. So it will always give more info than just the "is this correct" binary signal. At he very minimum, it gives the info that "a proof of 37 GB exists", and hence the motivation to find a smaller proof.
Alfredo Moreira-Rosa (Sep 12 2025 at 09:15):
And even more, the proof generated by Gauss is not inscrutable, it may not be clean, but once it's formalized, you can refactor. It's something developpers do everyday.
Because in reality, all programmer write junk code. But more senior ones, guide them to refactor it so that it becomes understandable and maintainable.
And in the same vain, you can now ask LLMs to produce proofs and then ask them to refactor it with some senior mathematician insight.
I don't see anything wrong with this.
Michael Rothgang (Sep 12 2025 at 09:19):
I am worried that LLMs will spit out more "not good" code than more experienced users can refactor. (But part of the answer could be to make LLMs generate better code. And human insight, of course.)
Bryan Wang (Sep 12 2025 at 09:30):
Michael Rothgang said:
Bryan Wang said:
Yes - and so I feel that, if anything, an inscrutable auto-generated Lean-verified proof of the Riemann hypothesis is orders of magnitude better than an inscrutable auto-generated natural language proof, because at least we can verify correctness.
I would argue that an inscrutable machine-checked proof of even the Riemann hypothesis on its own is not worth much: the point of proof is to impart understanding, not a binary "is this correct" bit.
I agree completely, my point was just to compare it to a similarly-inscrutable natural language proof :wink:
Kevin Lacker (Sep 12 2025 at 15:28):
I think if we had an inscrutable auto-generated Lean-verified proof of the Riemann hypothesis that would be amazing. I mean, can you imagine how many people would be poring over it trying to figure out what was going on? Even if you only value human understanding, it would be such a great source of direction for where to look.
It would be like Renaissance scholars who learned Latin and Greek in order to interpret the wisdom of the ancients. Mathematicians could learn Lean in order to interpret the wisdom of the AIs.
Eric Wieser (Sep 12 2025 at 16:33):
I've seen people claim it would be simultaneously amazing and terrible
Andy Jiang (Sep 12 2025 at 21:43):
Kevin Lacker said:
I think if we had an inscrutable auto-generated Lean-verified proof of the Riemann hypothesis that would be amazing. I mean, can you imagine how many people would be poring over it trying to figure out what was going on? Even if you only value human understanding, it would be such a great source of direction for where to look.
It would be like Renaissance scholars who learned Latin and Greek in order to interpret the wisdom of the ancients. Mathematicians could learn Lean in order to interpret the wisdom of the AIs.
I would think translating from lean to informal would be fast and the hard part is that a superhuman math entity can probably reason about more complex object directly which may not be easily simplified by abstractions or general theory which may lead to some proof where it's inhuman to understand why you are studying various objects until they all fit together.
Bryan Wang (Sep 16 2025 at 19:46):
Possibly the biggest problem/'danger' with autoformalization in the long term is how to integrate these massive projects, whose sizes may soon dwarf that of mathlib itself, into mathlib/the mathlib ecosystem in general. Perhaps this is something that the mathlib maintainers and community will have to think about in the not-so-distant future...
Terence Tao (Sep 16 2025 at 22:07):
I started a discussion on a related topic at #general > Minimally exporting a theorem from a repository . Hopefully there is some way to state the output of these projects as providing a term of some type X:Prop definable using some minimal subset of Mathlib, then one could then import an axiom of type X in another Mathlib-derived project, hopefully using either an identical or nearly identical minimal subset of Mathlib so that one can easily match types.
Timothy Chow (Nov 04 2025 at 19:31):
There seems to be something missing from the discussion of a hypothetical machine-generated proof of the Riemann hypothesis. The Riemann hypothesis, and certain other famous conjectures such as P != NP, are not just "random hard problems." In both cases, there are known barriers to proving these conjectures. The strong expectation is that there doesn't exist a proof that simply plows through a massive inscrutable computation one trivial step at a time and emerges at the other end with the goal accomplished.
By contrast, the proof that a Robbins algebra is a Boolean algebra, or the similarly-flavored Wolfram's theorem, is exactly the kind of result where it's unsurprising for there to exist a proof that just "chases equations" until the proof is complete. In such cases, there may or may not be an "idea" underlying the proof.
If we get access to a proof the Riemann hypothesis or P != NP, then an immediate question is going to be, how were the known barriers circumvented? I would say that with high probability, there is going to be an answer to that question, even if it takes a huge effort to extract it from the machine-generated mess of a proof. (A starting point would be to identify which definitions get repeatedly reused.) Once the idea is extracted, then it's going to reveal a new technique that can be used to solve other problems.
There is a small probability that despite our best efforts, no such idea can be extracted, and that the proof really does turn out to be a Robbins-algebra-type of proof, where every line of the proof looks virtually indistinguishable from every other line. That might be disappointing from an aesthetic point of view, but IMO it would be a revolutionary discovery in its own right to find that the Riemann hypothesis has a short (i.e., at most exabyte or so) proof that requires no new definitions.
Either way, I see it as a massive win for mathematics.
Now, if the problems get resolved in the "unexpected" direction, then the win might not be as massive. If the computer just spits out a zero off the critical line, or we get some inscrutable Turing machine that solves SAT in time O(n^k) with k being Graham's-number, then we might not get much generalizable insight from the proof. But it would still be nice to know these things.
Alex Meiburg (Nov 05 2025 at 20:09):
I think that a proof of P=NP, even with O(n^[something huge]), would still be at least a huge a win. For starters, we would expect that constant to be at least greatly reduced through effort. Second, it would tell us something very surprising about the structure of NP-hard problems, the insights would surely be relevant all over. In particular, whatever structure they expose could probably be used to accelerate/complement existing algorithms.
For example, Reingold's proof that L = SL shows this by an O(n^(2^64)) algorithm. But the insights have still had application in other places, esp. graph theory
Jireh Loreaux (Nov 05 2025 at 20:32):
Likewise, MIP*=RE doesn't have much computational content (:laughing: pardon the pun), but still offers plenty of insight.
Ching-Tsun Chou (Nov 05 2025 at 20:35):
Perhaps it is worth pointing out that Robbins algebra result was not proved using the current LLM-based methods. So far, has any LLM-based method produced a proof of an open problem or even a new proof of a known result?
Timothy Chow (Nov 05 2025 at 20:43):
Ching-Tsun Chou said:
So far, has any LLM-based method produced a proof of an open problem or even a new proof of a known result?
This was a recent question on MathOverflow. None of the examples listed there is as compelling as the Robbins problem. One could also maybe cite Ken Ono's experience with a private version of o4-mini that apparently autonomously solved what Ono thought was a Ph.D.-level problem about elliptic curves. I think that at the time Ono thought up the problem and supplied it to the machine, it was open, although I think he managed to figure it out himself fairly quickly.
Kevin Buzzard (Nov 05 2025 at 20:48):
Was this "Ph.D.-level problem about elliptic curves" a question whose answer was a number (rather than a proof)? My experience with these is that the machine can get lucky and vibe their way to the number whilst not really understanding why various intermediate claims they need along the way are true.
Kevin Buzzard (Nov 05 2025 at 20:49):
Another issue with these number questions is that sometimes people keep asking the same question to the machine until it gets the number right; if it gets it right once then victory is declared and "machine answers hard research problem". This does not scale to a tool which is useful to mathematicians.
Timothy Chow (Nov 05 2025 at 21:16):
No, it wasn't just a number. The question was, "What is the 5th power moment of Tamagawa numbers of elliptic curves over Q?" You can read about it in Scientific American, or read Ono's summary of the most relevant interaction here. Ono said that o4-mini autonomously searched the literature, found a paper, correctly rejected it as not relevant, searched again, found the right paper, identified an obstacle, got around the obstacle, and finally produced a correct formula. To make sure o4-mini "understood" the formula, he then asked it to do a numerical computation, which it did correctly.
Timothy Chow (Nov 05 2025 at 21:46):
Some people have expressed skepticism that the model may not have acted as autonomously as Ono's report makes it seem, but even if we accept the report at face value (which I'm inclined to do), it still appears to be a cherry-picked outlier at this point in time. I had a brief email exchange with Ono on the topic, and he said that ChatGPT Instant (which is free) still can't solve the problem. ChatGPT Thinking (subscription) can, but probably because it has access to a record of the original conversation with Ono.
Ching-Tsun Chou (Nov 05 2025 at 21:46):
[rephrased] It truly annoys me that I had to log into Facebook to read a math discussion.
Michael Rothgang (Nov 05 2025 at 22:18):
See #butterfly to learn what this emoji means.
Notification Bot (Nov 06 2025 at 10:12):
3 messages were moved from this topic to #Machine Learning for Theorem Proving > How have you guys found GPT 5? by Oliver Nash.
Anh Nguyá»…n (Nov 06 2025 at 13:35):
Kevin Buzzard said:
Was this "Ph.D.-level problem about elliptic curves" a question whose answer was a number (rather than a proof)? My experience with these is that the machine can get lucky and vibe their way to the number whilst not really understanding why various intermediate claims they need along the way are true.
It is good at problems which answer is a number. It can do well with FrontierMath questions
Anh Nguyá»…n (Nov 06 2025 at 13:37):
There are cases that it can get the right number but the reasoning is wrong
Timothy Chow (Nov 06 2025 at 17:57):
You can listen to Ken Ono describing his interaction with o4-mini here. See also this tweet by Daniel Litt.
Anh Nguyá»…n (Nov 07 2025 at 01:54):
Timothy Chow said:
You can listen to Ken Ono describing his interaction with o4-mini here. See also this tweet by Daniel Litt.
Thank you
Anh Nguyá»…n (Nov 07 2025 at 11:03):
https://arxiv.org/pdf/2407.15360
Anh Nguyá»…n (Nov 07 2025 at 11:03):
There are people trying to make sense what happening inside Transformer doing simple arithmetic
Anh Nguyá»…n (Nov 07 2025 at 11:42):
Maybe there is a reason explain why the AI model can reach the correct answer in FrontierMath questions
Anh Nguyá»…n (Nov 07 2025 at 11:53):
Anh Nguyá»…n said:
Maybe there is a reason explain why the AI model can reach the correct answer in FrontierMath questions
By training on problems/papers, they can develop their own ways of navigating through a problem
Timothy Chow (Nov 07 2025 at 15:47):
Alex Meiburg said:
I think that a proof of P=NP, even with O(n^[something huge]), would still be at least a huge a win.
It would win you $1 million. What's not so well known is that, according to the Millennium Prize Rules (in particular, Rule 5), disproving the Riemann Hypothesis won't necessarily win you the $1 million. Only P = NP and Navier-Stokes are guaranteed to win you the $1 million either way they are resolved.
Last updated: Dec 20 2025 at 21:32 UTC