Zulip Chat Archive

Stream: general

Topic: sloplib

Kevin Buzzard (Feb 04 2026 at 11:18):

Autoformalization is getting better. My impression is that most maintainers are kind of horrified at the idea of letting AI write mathlib. But what exactly is the problem with this?

There are a gazillion things about elliptic curves which I will need for FLT, and progress is slow. On the other hand I can imagine that current AI tools are now kind-of good enough that we could tell a carefully-chosen collection of them to develop theory X in Lean (e.g. "the Tate curve", a part of the theory of elliptic curves, which is well-documented in the literature) and the output would be some really poorly-written Lean code with the following properties:

Proofs which are 10x as long as they would be if expertly-written;
Intermediate and unnecessary definitions with no API;
lemmas which definitely should not be in mathlib because they should be inlined;
proofs which would be hard to maintain;

but perhaps also

sorry-free proof of the main theorem.

Say then that we make a fork of mathlib where we simply add these proofs, let AI maintain it, let AI bump it, and let AI add more and more poorly-written Lean code to it.

What then happens to this library in the long term? How does it collapse, if it collapses at all? Assume that there is enough funding to keep the AI tools running.

Basically my impression is that lots of people think that this is a bad idea but I am realising that I don't have a coherent argument as to why. I am well aware that mathlib has an extremely high bar for code review but I am unclear as to how important this fact has been in mathlib's success and, more importantly, how important it will remain going forwards.

I am expecting to be firmly put in my place here!

Chris Birkbeck (Feb 04 2026 at 11:33):

After playing with Claude code for this valence formula experiment I am doing, I am starting to think that something like this could be made (I thought it would be called methlib instead :rolling_on_the_floor_laughing: ). I've not yet tried asking claude to bump a repo, but I think it could probably do it, but it might take longer and longer as it has to clean up more and more slop. I think the interesting thing here is can we get AI to clean up code to get us 50 or 60% of the way to mathlib standards. I think that would help a lot in keeping a sloplib updated with mathlib.

Robin Carlier (Feb 04 2026 at 11:33):

Let's take the experiment further, and only allow for AI reviews of the PRs to sloplib (or was this part of the "let AI maintain it"?)

Chris Birkbeck (Feb 04 2026 at 11:36):

The other thing that might be interesting is that we could probably brute force a big chunk of maths into lean, so it could also work as a testing ground for mathlib, say a mathlib beta, where we just test out definitions/infrastructure and see how it goes (with the AI doing everything quickly) and we then look at whats useful and put that into mathlib.

Anne Baanen (Feb 04 2026 at 11:39):

I assume such a repository would quickly need a lot of beefy machines for compilation, since Mathlib already has the 10 Hoskinson runners, and this approach would result in huge files.

Kevin Buzzard (Feb 04 2026 at 11:51):

Aha so one objection to this plan is cost. If we drop our standards and rely on machines to keep everything afloat then it might quickly start costing a lot of money, and compile times will probably go through the roof (I've seen plenty of machines writing set_option maxHeartbeats 100000000). But there are huge amounts of money in this area right now so maybe this can be solved.

Joseph Myers (Feb 04 2026 at 11:52):

If half the work on a mathlib PR comes after opening the PR, then "50 or 60% of the way to mathlib standards" should mean "at the level of a mathlib PR that's just been opened, before it's gone through review".

Andrew Yang (Feb 04 2026 at 11:53):

What is the value of a poorly written unreadable and unmaintainable sorry-free proof? Do you have doubts as to whether the statements are correct or not?

Kevin Buzzard (Feb 04 2026 at 11:54):

Of course another objection is "but mathlib is about finding the right abstractions and the art of developing them" and I'm well aware of this, but if the goal is just to get AI proving new theorems in all domains then maybe we can ignore this side of things -- after all I've seen some terribly-written human papers proving great theorems.

Kevin Buzzard (Feb 04 2026 at 11:54):

Andrew Yang said:

What is the value of a poorly written unreadable and unmaintainable sorry-free proof? Do you have doubts as to whether the statements are correct or not?

It's part of the route towards poorly-written but sorry-free AI proofs of new theorems which humans can't do and that current mathlib can't even state because it moves too slowly.

Joseph Myers (Feb 04 2026 at 11:55):

I suspect "10x as long" applies at multiple levels. The individual by blocks are 10x as long, there are 10x as many intermediate lemmas and definitions as are actually needed, lack of appropriate structure and generality means larger pieces at various different levels keep getting duplicated 10x times in slightly different variants. If so, it would be $10^n\times$ as long for proofs of depth $n$ .

Snir Broshi (Feb 04 2026 at 11:55):

As a start we could create m/leanprover on :lobster: moltbook and post a "complete this sorry" challenge for all the AIs out there, crowdsourcing AI compute

Kevin Buzzard (Feb 04 2026 at 12:19):

I am not suggesting that this actually happens -- I am just trying to get a better understanding of why it shouldn't happen.

The naive argument which led to the question is: "AI is solving Erdos problems in Lean, AI can't solve problems in the Langlands program in Lean because we can't state them, AI could probably accelerate the path towards stating them, but what is the cost of this?"

Bryan Wang (Feb 04 2026 at 12:19):

Kevin Buzzard said:

What then happens to this library in the long term? How does it collapse, if it collapses at all? Assume that there is enough funding to keep the AI tools running.

IMHO if this does collapse, it won't be for technical reasons, but because the various AI companies out there might not want to work with one another as nicely as the human contributors to mathlib do with one another. It will probably devolve into each company having their own private copy of such a repo, especially once there is some big prize (proof of big open problem) coming in sight. But one could argue that this is already exactly what is happening, because every repo being worked on by an AI company already includes mathlib as a dependency and so it already satisfies the criteria for a 'sloplib' of sorts. But this is just my speculation.

Yan Yablonovskiy 🇺🇦 (Feb 04 2026 at 12:22):

Bryan Wang said:

Kevin Buzzard said:

What then happens to this library in the long term? How does it collapse, if it collapses at all? Assume that there is enough funding to keep the AI tools running.

IMHO if this does collapse, it won't be for technical reasons, but because the various AI companies out there might not want to work with one another as nicely as the human contributors to mathlib do with one another. It will probably devolve into each company having their own private copy of such a repo, especially once there is some big prize (proof of big open problem) coming in sight. But one could argue that this is already exactly what is happening, because every repo being worked on by an AI company already includes mathlib as a dependency and so it already satisfies the criteria for a 'sloplib' of sorts. But this is just my speculation.

It depends I guess, if you are using commercial AI tools on an open source project , and not some sort of sponsorship, then it likely cannot happen that way.

Bryan Wang (Feb 04 2026 at 12:27):

That's true, but the big issue raised above was cost, that's why I began thinking in this direction.

Yan Yablonovskiy 🇺🇦 (Feb 04 2026 at 12:28):

Kevin Buzzard said:

but if the goal is just to get AI proving new theorems in all domains then maybe we can ignore this side of things -- after all I've seen some terribly-written human papers proving great theorems.

Would you be opposed to an AI written math journal, that would take sloplib lean discoveries and disambiguate them into natural language?

Joseph Myers (Feb 04 2026 at 12:49):

There should be lots of ways for AI to accelerate stating things other than building a sloplib. For example, AI could probably help with many of the more mechanical parts of PR review for mathlib (that aren't mechanical enough to turn into linters) and so reduce the amount of time maintainers need to spend on those when doing reviews for mathlib. People have written extremely detailed instructions to AIs on Linux kernel patch review, why shouldn't we have similarly detailed instructions on reviewing mathlib changes and a service that does such reviews on PRs (automatically, or when requested)?

Chris Henson (Feb 04 2026 at 12:50):

Kevin Buzzard said:

AI can't solve problems in the Langlands program in Lean because we can't state them

Ignoring for a moment engineering and communication issues of AI generated tactics, my view is that while we can rely on generated proofs from the viewpoint of proof irrelevance there remains an inherently human judgement when it comes to definitions actually corresponding to the appropriate informal math. In general I'm skeptical of AI producing useful definitions the further you drift away from existing material in Mathlib, but even if they are successful there is a required diligence in human review that I feel exceeds that required for proofs. The question to me seems to be if this shifting of work is still a net benefit.

Kevin Buzzard (Feb 04 2026 at 12:52):

Yes, important definitions need to be human-verified but this would not be particuarly time-costly.

Chris Birkbeck (Feb 04 2026 at 12:55):

Is there a risk that a sloplib would harm mathlib by stopping people contributing to it? One could imagine the situation where something exists in sloplib and no one can be bothered to put in the effort to bring it up to mathlib's standards. Conversely, if people did spend the time cleaning up some sloplib, then they might not get proper recognition as "this was already done by an AI in sloplib".

Chris Henson (Feb 04 2026 at 13:02):

Kevin Buzzard said:

Yes, important definitions need to be human-verified but this would not be particuarly time-costly.

I guess my intuition differs here, as I imagine this as costly. Anytime I see definitions where I know AI was involved, I spend a significant amount of effort confirming whether it corresponds to the informal version. (I also mostly look at CS and not math, so maybe some divergence here).

Kim Morrison (Feb 04 2026 at 13:13):

I haven't read all the messages above, and that will have to wait for morning, but this nevertheless seems like a good time to advertise the merely-true repository that Johan and I thought about late last year, but never launched.

Anatole Dedecker (Feb 04 2026 at 13:13):

I unfortunately don't have a lot of time to write all my thoughts on the subject, but I would like to quickly make a few points before disappearing from the discussion.

There are a gazillion things about elliptic curves which I will need for FLT, and progress is slow.

As a thought experiment, would that be the case if we had just a portion of the financial, human an technical power of AI companies? Of course we can't do much about this (except being grateful that we did, in fact, get a significant amount of money as well, and try to show that we can make better use of it than dumping it on AI). Basically, my claim is that, if your utilitarian goal is "mathematical progress" (whatever that means), AI is overall a bad investment of society's financial, environmental, and human resources.

Shreyas Srinivas (Feb 04 2026 at 13:19):

@Kevin Buzzard a point to note with getting AI to formalize definitions further away from what we know is the loss of human skill in understanding what definitions are useful.

Anatole Dedecker (Feb 04 2026 at 13:19):

Even among the most AI enthusiasts, I don't think anyone seriously suggested having AI design standard programming libraries, and I don't see how that would be different for Mathlib. This ties into the fact that Mathlib is not the AFP. In fact, I have trouble imagining that AI would be able to prove anything without the careful API design in Mathlib, and as we increase complexity of the objects involved, I would imagine AI gets in trouble when trying to use its own API (of course this can be counteracted by writing the statements in advance, but then you might as well write a lot of the proofs)

Shreyas Srinivas (Feb 04 2026 at 13:20):

When you have worked in area X, then in a small open neighbourhood of X, you might be skilled at identifying and debugging good definitions. The further one goes from X, the less insight one has to fix a mess created by AI.

Shreyas Srinivas (Feb 04 2026 at 13:21):

I have a negative data point to contribute. I recently tried to guide codex (GPT-5.2) to formalize the fischer-lynch-patterson theorem. This is fairly famous and known as the impossibility of consensus result. I followed a "vibe coding and prompting only" approach to see how far it works.

Shreyas Srinivas (Feb 04 2026 at 13:21):

The initial set up and definitions went smoothly and I even got reasonable API lemmas

Shreyas Srinivas (Feb 04 2026 at 13:22):

And then at some point the AI started messing the formalization of informal proofs. First it tried to add ridiculously strong assumptions to important internal lemmas. Then as I pointed out what it was doing, it started pushing these lemmas higher or changing definitions.

Shreyas Srinivas (Feb 04 2026 at 13:23):

By the end of the experiment, the AI was unable to even debug a simple issue (that it was calling rcases on an implication whose consequent could be destructed).

Shreyas Srinivas (Feb 04 2026 at 13:23):

and it was blatantly lying about fixing it and going into an infinite loop. and of course the AI couldn't prove the top level theorems because it was getting stuck in a deep rabbit hole of strong assumptions in lower level theorems that it simply couldn't prove. It left me with a big mess that I haven't bothered cleaning up since.

Kevin Buzzard (Feb 04 2026 at 13:27):

Anatole Dedecker said:

In fact, I have trouble imagining that AI would be able to prove anything without the careful API design in Mathlib

Yes, so do I, and the question I am asking is whether my imagination is just being limited by prior experience (which may not be a guide to future performance) and a desire not to be knocked out of my comfort zone. We are seeing AI writing multi-thousand-line Lean developments now and I'm wondering at what point sloplib becomes a viable tool for accelerating progress in AI theorem proving, and indeed whether we are at that point already. I'm sure that plenty of AI-lovers who know nothing about mathlib's design would just claim that of course we are already there today, but I don't really care about what those people think. Conversely I'm sure that many mathlib-devotees would say that this could never work because we know what worked in the past and it was nothing like what I am proposing. But I am trying to understand better what will probably fail if someone had a big bag of money and chose to gamble with it in this way, given recent advances in AI proving and the desire of people to get AI to prove new theorems beyond Erdos problems.

Kevin Buzzard (Feb 04 2026 at 13:28):

Shreyas Srinivas said:

the AI couldn't prove the top level theorems because it was getting stuck

How do you know that the fix isn't "pay more money for a better AI"?

Kim Morrison (Feb 04 2026 at 13:32):

(please also assume it will break if you touch it --- at this point I've run it twice, ever)

Shreyas Srinivas (Feb 04 2026 at 13:35):

If resources are not a problem then we should probably try to run vampire or some other SMT solver too. I wonder how grind would work if we could throw masses of CPU power at it.

Shreyas Srinivas (Feb 04 2026 at 13:38):

One thing I have noticed in a few of these big autoformalization attempts in TCS (that haven't been brought up on this Zulip) is that they don't exactly prove what they claim on paper. They use super-specialized definitions and then work when they stick close to mathlib content. This can lead to the AFP-problem mentioned above : A jungle of poorly linked definitions and theorems.

Shreyas Srinivas (Feb 04 2026 at 13:39):

and once you have this messy jungle and AI starts learning from them, there may very well be diminishing returns from AI. After all the messy jungle will have more voluminous data than mathlib. It's basically the same story as AI learning art from other AI.

Ruben Van de Velde (Feb 04 2026 at 14:00):

Kevin Buzzard said:

Shreyas Srinivas said:

the AI couldn't prove the top level theorems because it was getting stuck

How do you know that the fix isn't "pay more money for a better AI"?

That seems hard (and expensive) to disprove conclusively

Justin Asher (Feb 04 2026 at 14:19):

One of the annoying things about programming nowadays—outside of Lean—is that people are capable of writing code much faster than reviewing it, and this seems like a similar problem to having AI write mathematics in Lean. Albeit, I guess if there is a target theorem, then I suppose this does not matter as much.

In the past, I noticed that a lot of AI produced Lean code was much more focused on the proof of a single main result with little discretion given to the utility of intermediatory results and formalizations. I think that autoformalization should be just as concerned about building a library than just proving a single thing.

To that end, I want to run a study on how to effectively write definitions and theorems in Mathlib, and then use that as a prompting template for AI. I think that in the past most of the focus was on proofs, and now we should be more fixated on how to get AI to produce Mathlib-ready definitions and theorems.

I think this is also why there have been no large conjectures proven yet by AI, even though it could be feasible with current tools: there is a much stronger fixation on individual proofs than there is on “theory building”, or what we would call “API design” in Lean.

Matthew Ballard (Feb 04 2026 at 14:41):

In the current agentic framework, the best results occur when context bloat is properly managed and tools are targeted and well scoped.

Giant slop proofs actively degrade the former (not to mention the increased overhead of actually running Lean). One can convincingly argue that AI will get better at formal math if it is incentivized toward tigher, more idiomatic code in proofs.

I think much better use of AI would be curating tactic-specs (perhaps domain specific), based both on natural language arguments and repeated patterns in libraries, and creating test suites for those potential tactics. Then set some models loose on those plus have people develop them. This would provide better tools.

Plan mode seems to be underutilized for design questions also.

Things also go a longer way when domain experts generate the context for things like skills.

Bryan Wang (Feb 04 2026 at 14:43):

Another issue that I just thought of while waiting for a Lean file to load: what if sloplib gets so big/inefficient that it can no longer fit on the average mathematician's personal computer, much less work properly or be able to do anything with it? Would it still gain traction among the wider mathematical community? At least with current AI-generated proofs, they can all be inspected on one's personal computer. But if AI-maintained sloplib is allowed to grow unchecked, this may no longer be the case.

Justin Asher (Feb 04 2026 at 14:45):

Matthew Ballard said:

the best results occur when context bloat is properly managed and tools are targeted and well scoped

I think using subagents effectively can help a lot here.

Yakov Pechersky (Feb 04 2026 at 14:46):

I've had success in using AI to optimize rate-limiting code paths in complex C++ code bases without deteriorating of the agent. The "trick" I've used is to make sure that there are more work products during the engineering loop -- it should write a log describing its attempts and where it succeeded/failed. This way it is able to offload its context management onto disk and not have to just hold it in the provider's API/model memory. It also helps for the human-in-the-loop re-guiding.

So for a mathematical context on autoformalization, I think the loop should spend more time writing non-lean code. It should write to disk/file what section/sentence it is focusing on, and what it tried.

I haven't tried Aristotle, but from what I've seen in copied output, the comments it leaves are very nice. But we have lost the "Thinking" output that could have been relevant on the next go around on some other relevant section.

Said differently, when we approach each task in isolation, you lose the benefit of the learning of the 1000s of failures and the context around it.

Matthew Ballard (Feb 04 2026 at 14:48):

For me, subagents $\subset$ tools but I recognize that the latter has a precise meaning in, eg claude code.

Justin Asher (Feb 04 2026 at 14:53):

Joseph Myers said:

why shouldn't we have similarly detailed instructions on reviewing mathlib changes and a service that does such reviews on PRs (automatically, or when requested)?

I just created a repository to consolidate work in this area, if anyone wants to contribute.

https://github.com/justincasher/lean-prompts

Shreyas Srinivas (Feb 04 2026 at 15:31):

Shreyas Srinivas said:

and it was blatantly lying about fixing it and going into an infinite loop. and of course the AI couldn't prove the top level theorems because it was getting stuck in a deep rabbit hole of strong assumptions in lower level theorems that it simply couldn't prove. It left me with a big mess that I haven't bothered cleaning up since.

Here's the repo if anyone wants to finish it (or use AI to finish it). I vibe coded this entire thing in < 3 hours late in in night, and stopped when GPT stopped being helpful. I am not guaranteeing its correctness. Vibe coded with a lot of prompt-based guidance provided by me to catch and fix mistakes.
https://github.com/Shreyas4991/Vibe-Code-Experiment-Consensus

Wrenna Robson (Feb 04 2026 at 16:12):

I think if you're calling something sloplib, you already know the problem with it.

Wrenna Robson (Feb 04 2026 at 16:13):

I don't rule out that AI can write high-quality code. Having been experimenting with Claude Code recently, it's seriously powerful in what it can do. But it seems to do a lot better with a human guiding and steering it.

Wrenna Robson (Feb 04 2026 at 16:13):

Fundamentally, the question is "do you care about what is after the := in a theorem". I get the sense that some people don't (as long as it isn't a sorry). I do, I think. I think one can meaningfully talk about "good quality" proofs.

Wrenna Robson (Feb 04 2026 at 16:14):

Moreover, someone needs to review AI definitions etc. very carefully to make sure they haven't allowed for holes, and on the frontier I would expect a lot of definitions work. Effective API design is hard - I think AI can assist with it but I don't think it can do it by itself.

Wrenna Robson (Feb 04 2026 at 16:15):

I think Mathlib should not only contain many proofs - it should contain well-engineered proofs that are systematic and logically constructed and it should contain useful sets of theorems that provide a sensible API surface on which to build more things.

Wrenna Robson (Feb 04 2026 at 16:15):

Arguably it doesn't achieve this right now.

Wrenna Robson (Feb 04 2026 at 16:16):

Also I have to say the cost seems maybe prohibitive, especially longer term. We cannot rely on all this money sloshing around currently to remain in place. It's nice that it's there - I am very glad for the philanthrophy that has existed.

Wrenna Robson (Feb 04 2026 at 16:19):

I realise I'm echoing a lot of what people have said above. I will say something that I have said before: the mathlib community's notion of theory building as API design is an interesting and semi-novel one. Obviously the observation that programs = proofs is not new, something something Church Turing. But actually an observation that someone made to me recently - at a Lean meetup, from someone interested in Lean but who was a trained software engineer - is that all this talk of API design is mirroring discussions that happen in software engineering. I think it's genuinely interesting that actually it turns out that not only do programs = proof, but software engineering = proof engineering = effective mathematical theory-building.

Wrenna Robson (Feb 04 2026 at 16:20):

I think you lose sight of this, which I continue to think is one of the real gems of this endeavour and that I have yet to see nicely written up in the literature, if you farm out things wholesale to AI. You'll just end up in the position we are now with pen and paper proofs, where verifying them is in some cases infeasible.

Wrenna Robson (Feb 04 2026 at 16:25):

Incidentally using Opus 4.5 (admittedly relatively naively - I am only just learning to use subagents etc. properly) with Claude Code Pro, I get <1hour of work every four hours out of it.

Wrenna Robson (Feb 04 2026 at 16:25):

And any more than that is way outside my price range

Kevin Buzzard (Feb 04 2026 at 17:36):

Wrenna Robson said:

I think Mathlib should not only contain many proofs - it should contain well-engineered proofs that are systematic and logically constructed and it should contain useful sets of theorems that provide a sensible API surface on which to build more things.

I think mathlib is doing two distinct jobs. First, it is a repository where we strive to prove the cleanest theorems using the right abstractions. And then it is a repository where AI companies can build on it and e.g. formalize solutions to Erdos problems and maybe solve new problems in other areas of mathematics.

From the perspective of striving to be a beautiful thing, of course we should keep it a beautiful thing. But from the perspective of trying to get AI to do new mathematics so we can figure out what's true, one really could argue that we don't care about what's after the := in a proof.

My question is not about "why do we care about beautiful proofs", it is "what are the likely practical consequences if we sacrifice beauty for 10x more rapid growth, and rely on AI and money to solve the obvious problems such as longer compile times and more difficulty in refactoring".

In other words why do you think that mathlib should contain well-engineered proofs? If a mathlib-powered AI tells me new things about the strong Birch and Swinnerton-Dyer conjecture because we accelerated mathlib sloppily to the point where it can understand the statement, isn't this a win?

Alex Kontorovich (Feb 04 2026 at 17:38):

what are the things we do when formalizing a mathematical statement?

check if it's already in the library. if not,
think of what the main definitions should look like
try some API lemmas to see if we got the definitions right. if not,
refactor definitions until they feel usable.
start proving the lemmas, building up to the main theorem. refactor definitions/API as necessary.
Look over the whole formalization; is it in the right generality? did we end up not using assumptions we thought were necessary? refactor accordingly
Golf. Extract abstractions, start over.
PR, get feedback, refactor, etc.
Which of these things is something AI can't eventually get good at? We could start with one AI populating sloplib/methlib/merely-true, and have a "golf" agent go through and clean it up, and an "abstractify" agent clean up the structure, and a "mathlibify" agent resynthesize everything and put it in the right file and with the right theorem name, etc etc etc. Everything I've seen so far is pointing to such a future...

Alex Kontorovich (Feb 04 2026 at 17:39):

The technical debt question is real, sloplib as is will surely hit computation limits. But if the cleaning bots do their jobs, it should be no worse (in fact, possibly better...) than current mathilb.

Alex Kontorovich (Feb 04 2026 at 17:44):

Then there's the question of getting the definitions/ theorem statements wrong. I had a long discussion with Christian Szegedy about this (see https://arxiv.org/abs/2510.15924), and his argument is: "Math is robust". That is, we human mathematicians, with all of our faults and errors, manage, in the long run to error-correct, because our first pass at definitions didn't do what it was supposed to and led to contradictions and wasn't general enough, etc etc. We already do our own mathematical error correction through the creation of more math. So why can't AI do the same? If it gets the definition of Group wrong, it'll hit snags, and it'll eventually try a refactor (just like us)... Same with theorem statements - theorems prove things for a reason. If theorems fail to be useful in consequences where they're expected, then they're the wrong theorems, and need to be refactored. So he says: don't worry if the first versions are wrong; in the long run, the AI will correct itself, just like we do.

Kevin Buzzard (Feb 04 2026 at 17:45):

I just talked to Alex about this on a call (Alex as I'm sure you can guess I asked this question precisely because I was thinking about what to say at the talk I just gave) and one thing that came out of the discussion is that one could actually try this experiment with a repo downstream. Idea: you take some theory you want (e.g. we need the theory of the Tate curve for FLT), you feed in the relevant sections of Silverman's books on elliptic curves and say "please formalize these", and if autoformalization is good and cheap enough then you end up with a repo containing a sorry-free proof of the main result I need, unreadable horrible code, and then you just get an AI to bump now and again, maybe humans tinker with the main definitions to make them good, and then an AI or a human slowly PRs the proofs to mathlib and I sorry the corresponding theorems in FLT and say we have slop proofs and they're not fit for human consumption.

What I'm trying to understand is the risks of allowing AI slop anywhere near a hopefully-respectable project.

Alex Kontorovich (Feb 04 2026 at 17:50):

In a sense, PNT+ is already running that experiment. Here's a sample page of closed PRs.
image.png
We label which are contributed by AI (and which AI), and which fix (human) misformalizations. This is exactly the workflow I envisioned after the DeepMind visit to IAS last May. (Now that Codex and the like are coming online and getting better, agentic workflows will just take off like mad...)

Shreyas Srinivas (Feb 04 2026 at 17:54):

Kevin Buzzard said:

I just talked to Alex about this on a call (Alex as I'm sure you can guess I asked this question precisely because I was thinking about what to say at the talk I just gave) and one thing that came out of the discussion is that one could actually try this experiment with a repo downstream. Idea: you take some theory you want (e.g. we need the theory of the Tate curve for FLT), you feed in the relevant sections of Silverman's books on elliptic curves and say "please formalize these", and if autoformalization is good and cheap enough then you end up with a repo containing a sorry-free proof of the main result I need, unreadable horrible code, and then you just get an AI to bump now and again, maybe humans tinker with the main definitions to make them good, and then an AI or a human slowly PRs the proofs to mathlib and I sorry the corresponding theorems in FLT and say we have slop proofs and they're not fit for human consumption.

What I'm trying to understand is the risks of allowing AI slop anywhere near a hopefully-respectable project.

You will get an AFP like web of formalisations consisting of many variants of definitions of the same concept and spend a lot of time figuring out what libraries to use and what “subtly different definitions” you want to avoid using. The added challenge is now you need to do this on an industrial scale. And you can’t trust AI to do this correctly for you.

Shreyas Srinivas (Feb 04 2026 at 17:54):

Math may be robust. But formalisations are brittle. This why proof repair, AI-driven or otherwise, remains an active research topic.

Shreyas Srinivas (Feb 04 2026 at 17:59):

Also AI is far more capable of dropping you down into a deep pit of despair inside a proof as autoformalizing a proof. A state of despair that it cannot get you out of. It’s just that AI companies and even users are not advertising their failures as enthusiastically as their successes.

Shreyas Srinivas (Feb 04 2026 at 18:01):

And as Wrenna points out, what is the use of formalisation if being sloppy is fine? We could very well manage that on pen and paper, with all its disadvantages.

Justin Asher (Feb 04 2026 at 18:02):

I would imagine most of these AI induced bottlenecks will be resolved in the next year or so. I think things are moving quickly.

Just last summer we were using compiler loops with LLM calls, and now you can hook up Claude Code to some MCP tools and make decent progress.

Shreyas Srinivas (Feb 04 2026 at 18:03):

I keep hearing these claims with zero solid evidence. Just advertisements of successes.

Shreyas Srinivas (Feb 04 2026 at 18:03):

An AI is trustworthy when it can be rigorously reasoned about by humans, I.e by rigorous theory and verification, neither of which exists yet. Until then it is not.

Bryan Wang (Feb 04 2026 at 18:42):

Alex Kontorovich said:

The technical debt question is real, sloplib as is will surely hit computation limits. But if the cleaning bots do their jobs, it should be no worse (in fact, possibly better...) than current mathilb.

"I have discovered a truly marvelous proof of this theorem, which your computer is too narrow to contain." ;)

Kevin Buzzard (Feb 04 2026 at 19:07):

<puts mathlib maintainer hat on> I think that finite_abs_eval_lt_of_degree_lt should be deduced from the claim that if one function is little-o of another one then the smaller one only beats the bigger one on a bounded set rather than grinding it out in the specific case of polynomials as Aristotle has seemed to have done, so yes I would say there was an element of slop about it; it is not clear to me that you engaged your brain in thinking about the mathlib way to prove that lemma; you just took the AI output and attempted to golf the argument rather than asking if this was the right argument. In particular the lemma could have been broken down into some nice intermediate steps rather than that one 30-line proof. But this is off-topic.<removes hat>

Shreyas Srinivas (Feb 04 2026 at 19:32):

Actually this is very on topic. It’s the perfect illustration of issues that can arise even with experienced and exceptionally good contributors using AI. Because if AI spits out a huge detailed proof, then not everyone is likely to have the time or give the proof the birds-eye-view analysis of the math that you gave.

Shreyas Srinivas (Feb 04 2026 at 19:35):

Depending on the size of the proof (like the tens of thousands of lines you mentioned in the beginning), it might not even be feasible.

Bolton Bailey (Feb 04 2026 at 19:37):

I have noticed that one of the prime issues with AI-generated proofs is that they are too long. This seems likely to be an artifact of AI systems not being trained with the goal of short proofs in mind, and is a source of a lot of their other issues, like unmaintainability. Simply asking AI to "factor out have statements into separate lemmas" often improves readability, and improves the usability of code as a library in a way that is not necessary when all you want is any proof of a single theorem. So I think "sloplib" like concepts would do well to emphasize refactoring proofs to be shorter and more split-up.

Kevin Buzzard (Feb 04 2026 at 20:31):

It's off-topic because with my maintainer hat on I'm pointing out that the code could be split into smaller pieces but what I'm arguing in this thread is that maybe AI is sufficiently good at grinding out proofs that we simply don't need to care about this sort of issue any more. If all we want is a formal verification that a theorem is true (which would be the case if for example we were thinking only about a tool for refereeing maths papers) then who cares about the art of proof?

Edward van de Meent (Feb 04 2026 at 20:34):

surely we want to be sure the method of proof used on paper is also what happens in the verification? It seems to me that simply checking "the AI proves this too" doesn't seem to help much in that regard, no?

Kevin Buzzard (Feb 04 2026 at 20:35):

I think that already a tool which says "the main theorems are true" would have a terrific practical impact on mathematicians.

Yakov Pechersky (Feb 04 2026 at 20:37):

maybe AI is sufficiently good at grinding out proofs that we simply don't need to care about this sort of issue any more. If all we want is a formal verification that a theorem is true (which would be the case if for example we were thinking only about a tool for refereeing maths papers) then who cares about the art of proof?

based on what I've skimmed from mathematicians reflecting on status of Mathlib, PNT+, sometimes they're very surprised when an intermediate lemma that is critical for some larger proof isn't readily available. So if we rely on big theorem, great, but we probably rely on BigTheorem+ which means having properly accessible/discoverable intermediate results that the professional mathematician is implicitly relying on to be available when they thing about BigTheory.

Yakov Pechersky (Feb 04 2026 at 20:38):

I am not yet ready to use AI to do global software package optimization. It too eagerly does local optima of one off scripts,, because that is effective and achieves the goal of the task at hand. And in general, we're not great at getting across an overarching vision and breaking it down into a coherent and consistent set of tasks.

Kevin Buzzard (Feb 04 2026 at 20:40):

Here's another way of looking at it. There's that time at the end of a long (weeks or months) formalisation when you may no longer care about mathlib-standard code, you just put your head down and go for it. I'm just saying that maybe we could "put our heads down", get the remaining 1000 or so definitions needed in order to state all the big new theorems in all areas of mathematics (I suspect we need less than 1000 if Annals is anything to go by) and then say "yay AI let's go, Lean now understands every mathematical question that humans can ask, so now go prove some". Right now we're not going to get to this point for several years at least but I'm arguing that we could be there in several months with sloplib.

Yakov Pechersky (Feb 04 2026 at 20:41):

I get it. You're saying, go as fast as possible to the top down approach. That will give a path and vision to what we actually need to factor out.

Rémy Degenne (Feb 04 2026 at 20:45):

We are seeing AIs that can prove theorems about objects defined in Mathlib, where the definition comes with nicely designed API. I'm curious about how much of that capability remains if we are talking about a theorem that depends on several definitions that were AI generated in a sloppy way, with slop-API and whatever generality the AI used at that point.

Martin Dvořák (Feb 04 2026 at 20:48):

I'd be happiest if the AI library was only adding new theorems about existing (Mathlib) definitions. Ideally if public definitions were forbidden. Then we also wouldn't have to worry about missing API.

Justin Asher (Feb 04 2026 at 20:52):

It seems like the main point here is we need more research into how to effectively autoformalize definitions, because we have largely solved proving things in Lean using AI when given enough structure.

Martin Dvořák (Feb 04 2026 at 20:53):

My personal opinion is that we shouldn't autoformalize definitions at all.

Bolton Bailey (Feb 04 2026 at 20:54):

Kevin Buzzard said:

I'm just saying that maybe we could "put our heads down", get the remaining 1000 or so definitions needed in order to state all the big new theorems in all areas of mathematics (I suspect we need less than 1000 if Annals is anything to go by) and then say "yay AI let's go, Lean now understands every mathematical question that humans can ask, so now go prove some".

I think this idea makes a lot of sense. I have thought about something like this in these terms: The most starred repository on reservoir is mathlib, which is only for theorems we know are true and have sorry-free proofs of. We also have a repository in formal-conjectures which is the fifth-most starred on reservoir (I might put it as the second most-starred out of repos accepting generally any field of math) which is for conjectures we don't know are true, and therefore can only be sorrys. It feels like there should be a middle ground library for theorems we know are true, but don't have sorry free proofs of. Not least because these are the main theorems where today's AI could help us (proofs of various Eros problems notwithstanding).

Kevin Buzzard (Feb 04 2026 at 21:11):

But that's exactly the repo I'm making with my RenPhil money -- formal statements of recent Annals papers. It's just not public yet because it's not big enough to be interesting. The main bottleneck is missing definitions.

Bolton Bailey (Feb 04 2026 at 21:25):

Well, I guess I think that your RenPhil project sounds great and I think and hope it will be a big success!

I think the only difference between your description of this project and what I usually envision when I think about a project like this is that I am not really familiar with Annals or big academic math journals in general, so I think more along the lines of "statements of theorems in the 1000+ theorems list" than "statements in the annals".

Shreyas Srinivas (Feb 04 2026 at 21:37):

Kevin Buzzard said:

If all we want is a formal verification that a theorem is true (which would be the case if for example we were thinking only about a tool for refereeing maths papers) then who cares about the art of proof?

@Kevin Buzzard , you and a couple of others were the ones from whom I learnt that there are more uses for formalisation than verification. I don’t have the specific link right now.

Rémy Degenne said:

We are seeing AIs that can prove theorems about objects defined in Mathlib, where the definition comes with nicely designed API. I'm curious about how much of that capability remains if we are talking about a theorem that depends on several definitions that were AI generated in a sloppy way, with slop-API and whatever generality the AI used at that point.

@Rémy Degenne : see an example in the repository I posted above. The tldr is, it is terrible.

Shreyas Srinivas (Feb 04 2026 at 21:38):

Here it is: https://leanprover.zulipchat.com/#narrow/channel/113488-general/topic/sloplib/near/571941942

The theorem is bog-standard in the area. GPT could even give a good explanation of the proof upto some detail informally

Shreyas Srinivas (Feb 04 2026 at 21:40):

But the further it went away from library level stuff , the more messed up the formalisation became. I never intended to publish this repo because it was a one-off experiment. But I think it’s a great example of how things work at the moment.

Michael Rothgang (Feb 04 2026 at 21:41):

Kevin Buzzard said:

I'm arguing that we could be there in several months with sloplib.

I find that hard to believe. In differential geometry, finding the right Lean definitions is the hardest part of the work. I wouldn't want to imagine AI winging one. There is a bottleneck in "this code is almost done, clean it up and chop it into nice PRs" (and there used to be one of "get the code reviewed", that has improved), not in "let's write more code that's not ready yet".

Bryan Wang (Feb 04 2026 at 21:41):

Here is a tiny argument 'in favour' of some form of 'sloplib': one of the biggest advantages of mathlib is its centrality, which ensures compatibility and coherence. Assuming that people are going to produce the same amount of AI-generated Lean code with or without 'sloplib' anyway, it would be much better to put everything one central repo (which also tracks mathlib as much as possible), rather than ending up with lots of different AI-generated repos which are all drifting further and further apart from one another and from mathlib.

Shreyas Srinivas (Feb 04 2026 at 21:42):

Even if AI could prove the theorem I stated tomorrow, let’s say it gave a 10000 line proof. Given the variability in AI performance, would you feel comfortable asking it to refactor the proof knowing fully well that it could mess things up again?

Shreyas Srinivas (Feb 04 2026 at 21:43):

Bryan Wang said:

Here is a tiny argument 'in favour' of some form of 'sloplib': one of the biggest advantages of mathlib is its centrality, which ensures compatibility and coherence. Assuming that people are going to produce the same amount of AI-generated Lean code with or without 'sloplib' anyway, it would be much better to put everything one central repo (which also tracks mathlib as much as possible), rather than ending up with lots of different AI-generated repos which are all drifting further and further apart from one another and from mathlib.

The central repo idea works when code is entered into it following certain standards and after a certain level of review and effort to make it all work together. It doesn’t work when these elements become infeasible.

Ruben Van de Velde (Feb 04 2026 at 21:48):

Yeah, the advantage of mathlib as a monolithic library lies in there being a single formalism for "real numbers" or "group" or whatever - it's not a given that a centralized AI-generated library would achieve that (and that's as generous a phrasing I can come up with)

Martin Dvořák (Feb 04 2026 at 21:55):

One way or the other, creating sloplib as an experiment will teach us something, and it may potentially have a huge payoff.

Ruben Van de Velde (Feb 04 2026 at 22:02):

I guess the main question is the opportunity cost - would or could the money and effort going into this library be better used for something else

Shreyas Srinivas (Feb 04 2026 at 22:02):

As long as they make it clear that it is sloplib. I wonder how this will influence AI training.

Shreyas Srinivas (Feb 04 2026 at 22:04):

For additional context this just landed yesterday:

https://arxiv.org/abs/2602.03837

In the future directions section there is talk of autoformalisation pipelines. Many of the authors are famous TCS researchers who aren’t actively into formalisation. We are about to see sloplib x 100, because very few people have worked with lean and understand what trusting lean means.

Bryan Wang (Feb 04 2026 at 22:05):

Shreyas Srinivas said:

The central repo idea works when code is entered into it following certain standards and after a certain level of review and effort to make it all work together. It doesn’t work when these elements become infeasible.

I guess that was the idea behind Kevin's original question (if I read correctly):

Kevin Buzzard said:

What then happens to this library in the long term? How does it collapse, if it collapses at all? Assume that there is enough funding to keep the AI tools running.

In other words, it could well collapse, but why not try? And my point above was that I realised it could well be worse if people are going to use AI to write Lean anyway, but without any sort of effort to make it all come together in the long run. We are going to have all sorts of 'hit-and-run' AI projects where some piece of maths X is formalised in some AI-generated repo somewhere, and that kills the interest in actually formalising X properly because 'it's already done and so I won't get proper credit for it', but the stuff in that repo never actually makes it anywhere useful and just dies over time. If this happens to sufficiently many X it could have a much worse impact on formal maths than 'sloplib' would.

Bryan Wang (Feb 04 2026 at 22:07):

In a way 'sloplib' is just choosing between the lesser of two evils..

Shreyas Srinivas (Feb 04 2026 at 22:11):

It is not the lesser of two evils. It’s the same evil with a brand name slapped on it that will give it more credibility than it deserves in the short run.

Bryan Wang (Feb 04 2026 at 22:23):

Btw I should say that I'm personally still not too optimistic about the idea of 'sloplib'. But if the name remains as 'sloplib' (or is remembered as such) then it might not be that much of a credible brand name :stuck_out_tongue:

Kim Morrison (Feb 04 2026 at 22:36):

Let's just use https://github.com/merely-true/merely-true

Kevin Buzzard (Feb 04 2026 at 22:59):

Shreyas Srinivas said:

@Kevin Buzzard , you and a couple of others were the ones from whom I learnt that there are more uses for formalisation than verification.

Yes absolutely. And this question is about "why don't we forget some of those for now" so your comment is kind of irrelevant. The question is about the thought experiment which involves forgetting this fact.

Kevin Buzzard (Feb 04 2026 at 23:00):

Michael Rothgang said:

In differential geometry, finding the right Lean definitions is the hardest part of the work.

Can you think of a super-annoying definition in diff geom which we don't have, and then perhaps someone can ask an AI to come up with a definition, so we can test the hypothesis that AI is no help in solving this hard problem?

Shreyas Srinivas (Feb 04 2026 at 23:01):

Okay that’s fair. But even in that thought universe there are several reasons to avoid sloplib which I and others have listed above. Even if those engineering catastrophes don’t concern you, consider that sloplib is likely to grow much faster than mathlib and that’s going to train the next generation of AI models.

Kevin Buzzard (Feb 04 2026 at 23:01):

Ruben Van de Velde said:

I guess the main question is the opportunity cost - would or could the money and effort going into this library be better used for something else

Right. Which is why my question isn't "who will give me money to do this stupid experiment", it is "can people please explain to me why this stupid experiment is doomed to failure? What are the probable failure points?"

Shreyas Srinivas (Feb 04 2026 at 23:02):

Already an Isabelle person mentioned to me that models output lean code inside Isabelle because of the training data.

Kevin Buzzard (Feb 04 2026 at 23:03):

Kim Morrison said:

Let's just use https://github.com/merely-true/merely-true

Oh so you are 4 months ahead of me!

Shreyas Srinivas (Feb 04 2026 at 23:03):

That indicates the importance of both data quantity and quality.

Kevin Buzzard (Feb 04 2026 at 23:06):

OK great so perhaps a next step here is that @Michael Rothgang proposes some really annoying definitions in differential geometry, or perhaps I propose the definition of automorphic reprentation over a connected reductive group in Langlands, and then we watch AI fail to come up with definitions?

What's the easiest way to ask an AI to formalize the definition of a connected reductive group over a field? That's a place to start. If AI can't do this without serious help then one of the main motivations of sloplib (accelerate formalization of definitions) is dead in the water.

Kevin Buzzard (Feb 04 2026 at 23:07):

If it gives a reasonable definition of connected reductive group then the next question is automorphic form for a connected reductive group over a number field (and this is much harder).

Kim Morrison (Feb 04 2026 at 23:12):

@Kevin Buzzard, how about you write three different prompts.

Just a couple of sentences saying the scope of what you want defined, but without explanation.
That, plus some references to (ideally publicly available, but PDFs are fine too) standard explanations of that material.
A couple of paragraphs of explanation (in prose) of how you would like it done.

I'm happy to then let loose Aristotle / Claude (with/without custom lean skills) / Codex on those prompts and show you what we get.

Bhavik Mehta (Feb 04 2026 at 23:38):

Kim Morrison said:

show you what we get

Could you also show the rest of us? :D I think the results of this experiment would be super interesting to see, for many people!

Kim Morrison (Feb 04 2026 at 23:39):

Of course! :-)

Artie Khovanov (Feb 05 2026 at 00:01):

I'd rather we had the money that would be invested into sloplib compute, instead invested into Mathlib review (ie paying more experts to maintain the library). However that's not a decision I (or indeed anyone here, I believe?) can actually take - it's up to the big tech companies and billionaires that fund us, and they usually (not always) prefer the AI approach.

Joseph Myers (Feb 05 2026 at 00:06):

Alex Kontorovich said:

what are the things we do when formalizing a mathematical statement?

check if it's already in the library. if not,

think of what the main definitions should look like

try some API lemmas to see if we got the definitions right. if not,

refactor definitions until they feel usable.

start proving the lemmas, building up to the main theorem. refactor definitions/API as necessary.

Look over the whole formalization; is it in the right generality? did we end up not using assumptions we thought were necessary? refactor accordingly

Golf. Extract abstractions, start over.

PR, get feedback, refactor, etc.
Which of these things is something AI can't eventually get good at? We could start with one AI populating sloplib/methlib/merely-true, and have a "golf" agent go through and clean it up, and an "abstractify" agent clean up the structure, and a "mathlibify" agent resynthesize everything and put it in the right file and with the right theorem name, etc etc etc. Everything I've seen so far is pointing to such a future...

For whatever reason, more AI developers seem more interested in making AIs that prove things than in making AIs that do all these other software engineering pieces needed for good mathlib contributions. (Some ability to prove things, at least on a small scale and guided by an informal proof, is necessary as part of being able to do the rest of the above, but far from sufficient to get an AI contributing directly to mathlib.)

Luca Casagrande (Feb 05 2026 at 00:53):

I am basically living inside this scenario right now. I'm sure the code I created with a structure of cross-inferencing AIs is far from optimized, but it works with no shortcuts and that's for me is the beauty of it.

Kim Morrison (Feb 05 2026 at 01:05):

Luca Casagrande said:

I am basically living inside this scenario right now. I'm sure the code I created with a structure of cross-inferencing AIs is far from optimized, but it works with no shortcuts and that's for me is the beauty of it.

And indeed, I just suspended your account for a day for posting AI slop (an announcement about an AI-written repository with unpackaged Lean files, which I've junked)!

Moritz Firsching (Feb 05 2026 at 08:15):

Bolton Bailey said:

We also have a repository in formal-conjectures which is the fifth-most starred on reservoir (I might put it as the second most-starred out of repos accepting generally any field of math) which is for conjectures we don't know are true, and therefore can only be sorrys. It feels like there should be a middle ground library for theorems we know are true, but don't have sorry free proofs of. Not least because these are the main theorems where today's AI could help us (proofs of various Eros problems notwithstanding).

Formal Conjectures not only contains open problems, but also currently 588 statements of research problems which are solved. Most of these with a sorry-proof. We include those either because they are variants of open problems, or part of a problem collections, or in some cases were still open when added to the repo, but got solved in the meantime.
We don't want to include proofs for these especially if it is thousands of lines of AI-generated code, because of the maintenance burden. However, @Paul Lezeau added a mechanism to point to those proofs FC#1851.

Moritz Firsching (Feb 05 2026 at 08:23):

Another reason we include solved variants: they are potentially good tests for the unsolved variants

Ralf Stephan (Feb 05 2026 at 08:25):

Kevin Buzzard said:

What's the easiest way to ask an AI to formalize the definition of a connected reductive group over a field?

My procedure for this is first asking Gemini for the verbal definition with the condition that it is close to the research literature (you need Grounding for this, of course), and second a variation of this prompt: "Which definitions or structures are missing in lean4 mathlib in order to state the above definition in lean4 (just stating)?"

What I got with the above was a detailed list of what's missing (available on request). The summary:

To write the definition today, you would first need to define (or locate on a development branch) the following structures which are not in the main Mathlib library:

structure LinearAlgebraicGroup (k : Type*) [Field k] (extending Affine Scheme + Group).

def IsUnipotent (G : LinearAlgebraicGroup k) : Prop (requiring representation theory or matrix embeddings).

def unipotent_radical (G : LinearAlgebraicGroup k) : LinearAlgebraicGroup k.

def IsReductive (G : LinearAlgebraicGroup k) : Prop := (unipotent_radical (base_change G (algebraic_closure k))).is_trivial.

Kevin Buzzard (Feb 05 2026 at 08:35):

Kim Morrison said:

Kevin Buzzard, how about you write three different prompts...

A couple of paragraphs of explanation (in prose) of how you would like it done.

Well that's the thing -- I don't know how I would like it done.

One thing's for sure -- a connected reductive group over a field is not a group. It's either a group variety (or a group scheme, although historically these things were well-understood before schemes because they're smooth and of finite type over a field so one can get away with Weil's machinery, which of course we don't have because we only have Grothendieck's machinery), or it's a Hopf algebra (because it's an affine group variety over a field so everything can be done on the level of K-algebras). So already there is a huge design decision: should the basic object secretly be a Hopf algebra or a scheme.

This is as far as I got when I tried to do it with Andrew Yang sitting next to me a couple fo weeks ago (for connected reductive groups over $\mathbb{Q}$ although it's obvious how to generalise):

import Mathlib

open CategoryTheory AlgebraicGeometry

-- let G be an affine group scheme of finite type over ℚ
variable (G : Scheme) (φ : G ⟶ Spec <| CommRingCat.of ℚ)
  [GrpObj (Over.mk φ)] [IsAffine G] [LocallyOfFiniteType φ]

-- can't say "geometrically connected" until a PR is merged

-- can't make the Hopf algebra until stuff from Toric is merged

and that was for "connected affine algebraic group of finite type over a field" if we go down the scheme route. Then there's the question of how to say "reductive" . I asked Brian Conrad and his response was

Note that the "there exists a completely reducible rep with finite kernel" definition
is no good over imperfect fields: if k’/k is purely inseparable of
degree p = char(k) > 0 then the Weil restriction G = R_{k’/k}(GL_1)
acting on the affine space R_{k’/k}(G_a) = A^p_k is a faithful
irreducible linear representation (though not absolutely irreducible),
so completely reducible, but G is not reductive (it is pseudo-reductive,
but that’s a separate can of worms).

So what is the best definition that works over all fields?
I think the only good definition, treating all fields
on equal footing, is the one in Borel’s book: that
G is k-smooth and the unipotent radical of G_{kbar} is
trivial.

Writing this, I realize you may be unhappy since you don’t
want to have to first formalize the notion of unipotent
radical (you can’t object to bringing in kbar, just as
well perfect closure if you wish, since that is needed
to circumvent the issue in the example above). But you don’t:
you could just say “G_{kbar} has no nontrivial smooth
connected unipotent normal closed k-subgroup” and thereby
postpone formalizing “k-unipotent radical” until later. :)

On “closed k-subgroup”: I was thinking we are limiting ourselves
to the affine case (and thereby in my head invoked the fact that a monomorphism
between affine group schemes of finite type over a field is always a closed immersion).

What's the correct definition of a unipotent subgroup?

Here are some equivalent notions (some equivalences proved in Raynaud’s expose on
unipotence in SGA3) for a group scheme G over a field k:

(i) G admits an isomorphism onto a closed k-subgroup scheme of the k-group U_n of
strictly upper-triangular nxn matrices in GL_n (i.e., 1’s on diagonal) for some n

(i’) Same as (i) for G_{kbar}

(ii) Every k-homomorphism G —> GL_n factors through U_n after some GL_n(k)-conjugation

(iii) G admits a finite composition series by closed k-subgroup schemes (each normal
in the next) for which each of the successive quotients is a closed k-subgroup scheme
of G_a

(iii’) For char(k) = p > 0, same as (iii) for G_{kbar} but the successive quotients
are each among {alpha_p, Z/pZ, or G_a}.

(iv) For G smooth, there is tthe condition from Borel’s book that everything in
G(kbar) is unipotent (meaning that under some homomorphism closed immersion G —> GL_n
over kbar the image on kbar-points consists entirely of unipotent matrices, or
equivalently under all kbar-homomorphism G —> GL_n not necessarily with trivial
kernel; equivalence proved in Borel’s book).

The official definition in SGA3 is (iii), but if you’re only going to focus on
unipotence for smooth affine k-groups then (iv) may be more convenient to get
going (though it is useful even for the study of smooth G that we have these
equivalences allowing non-smooth cases, such as for work with non-smooth kernels).

Remark: Unlike for tori, there is no good notion of “unipotence” over a base
that isn’t a field (or rather, a product of finitely many fields). It’s a bit of
a miracle that for parabolic subgroup schemes P of reductive group schemes over
a general base S there is a good relative notion of R_u(P) that has a variety
of desired properties; this is discussed in my expository article on reductive
group schemes. It’s also a miracle that there is a good notion of “torus” over
a general base (with multiple equivalent characterizations)...

Note that this is the easy part: the hard part is automorphic forms for these objects.

I read all this stuff now and I am immediately thinking "clearly AI is not yet ready for this". I am now fearing that if we keep going with this experiment then I'll have to review a bunch of slop to see if it correctly conforms to the definition and I'll be thinking "well it would have been easier just to write it myself" even though maybe I'm wrong because I keep not writing it myself.

Kevin Buzzard (Feb 05 2026 at 08:40):

I asked cheapo Claude to write me a definition and it came up with https://gist.github.com/kbuzzard/dc7f3f41af331b8c9ba0bdf063934a6d which uses Hopf algebras but completely skips the unipotent radical part (i.e. doesn't even try).

Kevin Buzzard (Feb 05 2026 at 09:00):

I need to think about my FLT lecture for a few hours now but perhaps it's worth splitting off the connected reductive group challenge into its own thread. Now I've thought a bit more about this I've realised that Michael Rothgang's comment is very pertinent -- sometimes for really complicated definitions it's not at all clear how best to formalise them. Several times in my life I have attempted to solve this by "just having a go" -- for example it took us 3 attempts to get schemes right and 3 attempts to get group cohomology right and all the earlier failed attempts in each case were projects done by Imperial students which basically boiled down to me experimenting with design decisions, looking at what was problematic, and rerouting based on student feedback.

Rémy Degenne (Feb 05 2026 at 09:04):

The discussion of the PR is off topic here. Please move that discussion to the github page or to the PR reviews channel.
(edit for confused future readers: the discussion I mentioned has been moved)

Wrenna Robson (Feb 05 2026 at 09:16):

Kevin Buzzard said:

In other words why do you think that mathlib should contain well-engineered proofs? If a mathlib-powered AI tells me new things about the strong Birch and Swinnerton-Dyer conjecture because we accelerated mathlib sloppily to the point where it can understand the statement, isn't this a win?

Beautiful things good. But I am not a working mathematician so perhaps I have a different perspective on this. I do find proofs more meaningful when they are well-structured.

Wrenna Robson (Feb 05 2026 at 09:17):

Justin Asher said:

I would imagine most of these AI induced bottlenecks will be resolved in the next year or so. I think things are moving quickly.

Just last summer we were using compiler loops with LLM calls, and now you can hook up Claude Code to some MCP tools and make decent progress.

If you've got the money for it. I don't.

Arthur Paulino (Feb 05 2026 at 11:17):

Software is, well, malleable. Just because a theorem is first added with a messy and unreadable proof it doesn't mean it needs to freeze there. Humans can improve it. Or machines, even :shrug:

To me, a layman who barely knows basic group theory, the problem of machines doing math is convergence. Is the machine doing things that interest us, humans? If the direction is driven by humans, that's a solved case. I'm just not convinced of fully autonomous machines writing mathematics on their own.

Michal Buran (Feb 05 2026 at 11:56):

Shreyas Srinivas said:

Okay that’s fair. But even in that thought universe there are several reasons to avoid sloplib which I and others have listed above. Even if those engineering catastrophes don’t concern you, consider that sloplib is likely to grow much faster than mathlib and that’s going to train the next generation of AI models.

Not necessarily. A good curation of training data would suppress it.

Joseph Tooby-Smith (Feb 05 2026 at 11:58):

Let me add my two cents here, from the perspective of a different subject: Physics and with regard to the project PhysLean. Sorry if I am just adding noise and painting a bike shed.

Given the much earlier stage of this endeavour compared to Mathlib, I've often thought about whether it would be possible for a (not-community motivated) AI company to just spend a bunch of money to reproduce what is in PhysLean and overtake the human-centric community effort. Here I've come to the similar conclusions as others above, but with a physicists coating:

Lean is not verifying the physics context of results, it is not verifying "yes this really corresponds to quantum mechanics in the real world", it is not verifying that what is written is main-stream-physics, it is not verifying whether a result is useful or interesting to the community (also true in maths), it is not verifying whether a given definition is intuitive to humans, or is easy to use .... etc. Lean verifies the mathematical context of the results, and this is really it. The rest of the this all needs to be verified by something other than Lean - and to me in a world where we are writing Lean code for humans, the only way we can do this is via human input and human verification.

I personally am starting to care less about whether "this result from physics (or maths) has been formalised" - and more about "how have you formalised this", "what design choices did you make", "what can we learn from your formalisation", "how has it been designed to be used by others", "how does it fit in with other formalisations". These are questions I feel a sloplib would not answer.

Michal Buran (Feb 05 2026 at 12:03):

Kim Morrison said:

I haven't read all the messages above, and that will have to wait for morning, but this nevertheless seems like a good time to advertise the merely-true repository that Johan and I thought about late last year, but never launched.

Have you considered resurrecting it?

Justin Asher (Feb 05 2026 at 14:01):

Kevin Buzzard said:

What's the easiest way to ask an AI to formalize the definition of a connected reductive group over a field?

I tried this using a Claude Code setup I have on my computer, which utilizes the LeanExplore MCP tooling.

I got the following file:

Formalization of the definition of a reductive group over a field

It seems like there is (natural) difficulty. How does this look? Any clear problems with the way Claude approached this? Would this be fixable via better prompting or self-improvement loops?

Justin Asher (Feb 05 2026 at 14:05):

I see that it went for a stronger definition. I think that in order to build up things properly Claude would need a bit more structure from us.

Justin Asher (Feb 05 2026 at 14:09):

To that end, we should autoformalize the Stacks Project. :smile:

Markus de Medeiros (Feb 05 2026 at 14:41):

Joseph Tooby-Smith said:

I personally am starting to care less about whether "this result from physics (or maths) has been formalised" - and more about "how have you formalised this", "what design choices did you make", "what can we learn from your formalisation", "how has it been designed to be used by others", "how does it fit in with other formalisations". These are questions I feel a sloplib would not answer.

Yes, I agree with this. Recently in Iris-Lean recently there has been a person who has been speeding far ahead of the main porting effort by just asking AI to do it. Genuinely not to deride them or anything (I mean if they are really getting something out of this endeavor then godspeed) but the code is extremely poor. They are are starting from a full implementation in a very similar proof assistant and their code is like 10x longer. For comparison, the human written parts of Iris-Lean are approximately as complex as the Rocq version.

What I find especially interesting is that this person did the AI porting process in stages, so their later PR's depend on the earlier (AI-generated) PR's. While some of the early stages look not that bad, the later ones all end up having these huge proof terms everywhere... I'm not even really sure the library is human usable in that state. It's an interesting experiment in what happens when you just let these AI's do their thing with presumably very little supervision, though I can tell you that I will be redoing this AI code myself instead of trying to fix the slop version.

Jireh Loreaux (Feb 05 2026 at 16:48):

Kevin, I think there are a number of reasons that sloplib is a bad idea.

It lacks purpose.

I'm sure you've read it, but I encourage you to reread Why formalize mathematics?. A sloplib repo would only tick the first box there: checking, and even that, it could be argued, it wouldn't be doing terribly well, as mathematicians still have to vet the definitions. It doesn't satisfy the other reasons listed therein to formalize mathematics, namely:

explaining and learning: because digging into slop is a hindrance to both
teaching: when you get into ridiculous goal states that are nigh on impossible to read because the machine doesn't care, that's of no use in the mind of a student
creating: I'll just quote Patrick directly

This clear thinking is part of the creation process but it can also become part of the end result.
Indeed, formalized mathematics encourage, or even sometimes require, powerful abstractions.

I think it's clear that sloplib would be doing quite that opposite of requiring or encouraging powerful abstractions.

collaborating and having fun: making an AI do this is the opposite of human activity.

It's costly.

You might argue that AI companies have tons of money to throw around, and, while that seems to be true for the moment, it seems like a terrible idea to make any of our progress in mathematics dependent upon such a large influx of cash. In fact, I think we should celebrate the fact that mathematics is one of the least expensive disciplines in terms of the cost society pays to fund our profession. Moreover, I'll note that I think there is no reason to encourage wasteful spending by anyone, not only AI companies, especially as, at least in the US, our government has thrown tremendous money at them. So, at least in part, it's a waste of my money as a taxpayer (or, perhaps, it's a waste of the money of future generations who will suffer because of our insane spending habits at the moment). Of course, you could also argue here that it's an environmental hazard, but I won't make that a key point here since it applies more generally to any extremely computationally intensive endeavor (e.g., bitcoin), and I don't think it's necessarily the key point here.

It doesn't encourage library design.

This is akin to the fact that it doesn't encourage powerful abstractions, but it's slightly different. When designing a library, as you well know, it's essential to do a number of things.

choose the right generality: abstractions that are too general require too much glue code to be reasonably useful, and the number of lemmas true in that generality is lower. Concepts that are too narrow don't apply in enough situations and lead to code duplication or frustration when one formalization doesn't apply in another situation, even though it reasonably could do so.
prioritize usability: writing something the compiles is at least an order of magnitude off from writing something that is readable, which is an order of magnitude off from writing something that is maintainable, which is an order of magnitude off from writing something that is usable for future endeavours.
make it disocverable and searchable: arguably this is one of the weakest aspects of Mathlib as it currently stands, but we continually strive to improve this. I'll also note that this is the one place where Patrick explicitly mentioned that artificial intelligence (likely envisioned in a very different form than what we see today, given that it was last updated in 2021) may be useful to us. For me personally, I never use it for that, as Loogle generally provides me a much better searching experience.

I'll allow myself a digression into a personal matter. A few years ago, I decided to switch the focus of my career from proving new things about operator algebras to formalizing mathematics. Why? Well it certainly wasn't because I was bored. I did it because I thought my life's work would be of more use to the profession developing (even foundational) material in my field in Lean. Of course, that's not to say I don't enjoy it, I do, but I seriously considered my motivation before embarking on it.

I think it will be unsuccessful.

The obvious retort to this is "you never know until you try." While true in the strictest sense, I think there have been plenty of points raised upstream to doubt the likelihood of success. Moreover, given that this point is in fact less important than the aforementioned ones, I think that there's no reason to even bother.

Call me a luddite if you want, but that's my two cents.

Martin Dvořák (Feb 05 2026 at 17:25):

Jireh Loreaux said:

A sloplib repo would only tick the first box there: checking, and even that, it could be argued, it wouldn't be doing terribly well, as mathematicians still have to vet the definitions.

My two cents:

Even if it is only the first box, it is a very important box! Just knowing that something is true can be a huge deal.
Mathematicians wouldn't have to vet the definitions if, as I keep suggesting, creating public definitions was banned (only adding more theorems about Mathlib definitions would be in the scope of the library).

Jireh Loreaux (Feb 05 2026 at 17:27):

Martin Dvořák said:

Mathematicians wouldn't have to vet the definitions if, as I keep suggesting, creating public definitions was banned (only adding more theorems about Mathlib definitions would be in the scope of the library).

You can't get too far with that.

Shreyas Srinivas (Feb 05 2026 at 17:27):

Also Kevin is explicitly talking about definitions.

Martin Dvořák (Feb 05 2026 at 17:28):

Jireh Loreaux said:

You can't get too far with that.

True, but it would still give me a library that I could import and obtain exact? on steroids.

Justin Asher (Feb 05 2026 at 18:12):

Jireh Loreaux said:

It's costly.

I think these things are relatively cheap to run. I am not sure where this is coming from. Much cheaper than the cost of formalizing things by hand.

Kevin Buzzard (Feb 05 2026 at 18:40):

I'm definitely not calling you a luddite Jireh! As I say, I completely expected to be shot down; I asked because I wanted to get my head straight about this matter. I think you (as ever) have written down some very coherent thoughts!

I think that my question has morphed into the following two questions:

1) What if we only wanted a AI tool which could check that papers were correct, forget all the other visions of mathlib. Would a sloplib approach work here? At what point should be drop what we're doing and race to the AI tool? Do we really need to make the O(1000) missing definitions in mathlib first?

Perhaps a reasonable comment here is that even for a paper where all the concepts are in mathlib already, current AI tools are probably not able to check it, so we could wait for this problem to be solved first (by people like Math Inc) and by that point maybe we'll have all the missing definitions I'm fretting about.

2) Michael's comment made me realise that I know full well one of the problems here -- some definitions are hard. I think this is worth splitting out into a new thread; independent of the discussion here one could ask if AI tools could make definitions and I think we have a couple of interesting candidates.

Justin Asher (Feb 05 2026 at 18:48):

Kevin Buzzard said:

independent of the discussion here one could ask if AI tools could make definitions and I think we have a couple of interesting candidates.

Yes, we should do this. This is very important.

Bryan Wang (Feb 05 2026 at 18:53):

I actually think "an AI tool which could check that papers were correct" would actually drive more interest among mathematicians towards the kind of standards for Lean code that mathlib aspires towards. Mathematicians prioritise beauty and elegance, and if the default way of doing mathematics becomes "produce some Lean code to show you are correct" then I think mathematicians will actually want to produce beautiful and elegant Lean code for their theorems rather than have their work turned into AI-slop Lean code. But this is just my personal feeling and different people will certainly have different thoughts. (Also, note this is orthogonal to the question of whether AI can eventually produce 'mathlib-standard' Lean code, which is always a possibility.)

Justin Asher (Feb 05 2026 at 18:54):

Here is the discussion for AI Lean definitions:

#general > AI Lean definitions

Floris van Doorn (Feb 05 2026 at 19:00):

I agree with most of the critics here, I don't think this is a good way to develop a large new library of formalized mathematics. However, I'm all in favor of small-scale experiments.

Imagine that Aristotle (or another AI tool) just has an option/button to PR successful results to merely-true. Hardly any extra compute will be spent, and we get some public AI proofs.

I expect that CI will be quite cheap: even though every file might be slow to compile, I expect that most contributions just add 1 (or a few) new files, not touching existing files, and that the import hierarchy will be almost completely flat. Therefore, besides bumping Mathlib, compute is just checking the results added by only that PR. And there will be no linting.

I also want to highlight the most genius rule from the merely-true repository:

Each time Mathlib moves to a new Lean toolchain, we will [...] delete all files which no longer compile!

I think this is actually very interesting:

Most AI-generated files using existing capabilities will be removed: they are not at all robust, and will fail compilation on every Mathlib bump
Maybe some kind of AI proof is more robust, or maybe some AI tool is able to write more robust proofs, and perhaps we can learn something from that
Maybe someone will develop a fix-errors-after-bumping-mathlib-tool, which could be useful elsewhere!
Maybe someone will write a cleanup-AI-proofs-tool, which could also be very useful.
A repository like this could give these AI tools a benchmark: fixing deleted proofs, cleaning up existing or deleted proofs.
Or very possibly someone will just write something reprompts an AI tool to formalize deleted results from scratch, which would be less useful.

Ruben Van de Velde (Feb 05 2026 at 19:16):

Kevin Buzzard said:

1) What if we only wanted a AI tool which could check that papers were correct, forget all the other visions of mathlib. Would a sloplib approach work here?

I imagine that we'd get a bunch of files worth of (purported) lean code, but with many hours of human effort required to suss out whether the lean matches the paper closely enough to be useful

Joseph Myers (Feb 06 2026 at 00:10):

Joseph Tooby-Smith said:

I personally am starting to care less about whether "this result from physics (or maths) has been formalised" - and more about "how have you formalised this", "what design choices did you make", "what can we learn from your formalisation", "how has it been designed to be used by others", "how does it fit in with other formalisations". These are questions I feel a sloplib would not answer.

This fits in with my views of the 100-theorems and 1000-plus lists: (a) they should be more nuanced in the information they present rather than mainly presenting things as a binary formalized/not-formalized, and (b) there should be a very clear division between "formalized in an integrated and reusable way (in mathlib, or failing that the mathlib archive)" - which implies the underlying theory / lemmas are available for reuse, and indeed such lemmas may well be of more use than the final result - and "may or may not be formalized, but not in mathlib / the mathlib archive".

(This is not to say that's the only information to present in those lists. The existence of a lower-quality formalization may well be useful information to people interested in the result, even if they have the completely reimplement it to get something reusable for their purposes. Where unlisted formalizations exist, they ought to be mentioned on those lists in some form, just clearly distinguished from fully integrated and reusable ones.)

Daniel Morrison (Feb 06 2026 at 08:37):

There's been discussion about technical concerns, but I'm worried about messaging: What message does this send about Lean and formalization to the broader mathematical community?

One of my concerns is that using a repository of AI generated theorems that are unreadable to humans will push people away from wanting to learn about formalization efforts. Imagine if someone's first impression of Lean was a block of code that they cannot read and no one can explain to them. There is nothing for that person to engage with and I think you'll have a hard time convincing them of the value of formalization.

Another issue is that using that kind of work is implicitly saying that "correct" is the same as "compiles in Lean". My problem here is not with Lean, but that the concept of "correct" in mathematics is a community based definition. What I mean is that we don't call a theorem proved or a problem solved when one person believes they have a proof - they need to convince the mathematical community that it is true. For this reason, I think the answer to

Kevin Buzzard said:

1) What if we only wanted a AI tool which could check that papers were correct, forget all the other visions of mathlib. Would a sloplib approach work here?

is no, regardless of the ability to generate a proof. For example, what happens when the AI generates an incomprehensible proof that says that a paper is incorrect? You may believe the generated proof, but will the author? I'd argue that proof irrelevance is a technical concept and not a mathematical one, because in practice we very much do care about the quality of a proof and being able to read it, because that is exactly how we can learn from it and use it.

Gregory Wickham (Feb 06 2026 at 08:41):

And the same problem exists even when the computer verifies that a theorem is correct: proving that a theorem is true doesn't imply that the proof provided in the paper is correct.

David Michael Roberts (Feb 06 2026 at 10:15):

For example, what happens when the AI generates an incomprehensible proof that says that a paper is incorrect? You may believe the generated proof, but will the author?

What happens when the paper is also incomprehensible, and the author insists nobody understands because they are "profoundly ignorant"? :smirk:

Matteo Cipollina (Feb 06 2026 at 11:52):

Just two comments to add on the many:

One risk is that sloplib would become like facebook of lean, a place where people drop anything and where we all end up wasting a lot of time out of curiosity; time which would be more profitably spent elsewhere. Without an ultimate goal/standard I fear none will have necessary friction and the incentive to sculpt their formalizations until they fulfill the minimal requirements of meaningfulness outlined by @Jireh Loreaux .
While obvioous, I think the optimal approach to accelerate formalisation (if I understood correctly the main practical problem which motivates this discussion) would be to channel the same funds that would be apparently ready for purely agentic systems, into many more (synchronized) well thought projects run by motivated humans, which leverage AI but in a principled way; but this doesn't seem aligned with the current marketing of most AI companies which seem to depict agents as replacements rather than extensions/power-up.

Marcin Pilipczuk (Feb 06 2026 at 12:30):

Martin Dvořák said:

Jireh Loreaux said:

A sloplib repo would only tick the first box there: checking, and even that, it could be argued, it wouldn't be doing terribly well, as mathematicians still have to vet the definitions.

My two cents:

Even if it is only the first box, it is a very important box! Just knowing that something is true can be a huge deal.

I second here, the first box is already a huge deal.

I had a number of coffee talks over the last few weeks where I was pitching an optimistic view "In a decade, at TCS conference we will mandate a Lean proof along the submission. People will generate such a proof automatically from a sufficiently detailed written manuscript in natural language." People were generally super excited about it. This would unclog the current reviewing system in TCS, which is way overloaded. In particular, it will solve the problem of 50-pages hastily written appendix proof that looks ok-ish, but nobody will verify it in the time frame of reviewing for top TCS conferences.

Joseph Tooby-Smith (Feb 06 2026 at 13:06):

Marcin Pilipczuk said:

... "In a decade, at TCS conference we will mandate a Lean proof along the submission. People will generate such a proof automatically from a sufficiently detailed written manuscript in natural language." People were generally super excited about it. This would unclog the current reviewing system in TCS, which is way overloaded...

But are you sure the consequence of this is not going to be that instead of been able to get an expert in area A to review the review a paper on A, you instead need someone who is an expert on A and in Lean to check that the autoformalization does what it says it does and as someone above (sorry I can't find the message) said make sure the proof matches that in the paper? It seems to me this will limit the pool of potential reviewers

Shreyas Srinivas (Feb 06 2026 at 13:29):

@Marcin Pilipczuk : This is something I have thought about as well. I have reviewed for a lot of distributed algorithms conferences. My feeling is that a large part of the mess comes from us repeating ourselves in these papers and definitions being ever so slightly inconsistent across papers. In our field, we should definitely not create a jungle of formalisations. We need to unify the field under definitions that we all agree on and make it clear where differences lie.

Shreyas Srinivas (Feb 06 2026 at 13:31):

One reason (among others) that some of us started distributed graph algorithms formalisation is that some of us discovered that when people are talking about "the XYZ model" there are actually a dozen different XYZs which are different in relevant ways.

Matthew Ballard (Feb 06 2026 at 13:34):

We already on the path to Github become sloplib https://newsletter.semianalysis.com/p/claude-code-is-the-inflection-point and this doesn't count the people who want to hide AI authorship.

Tobias J. Osborne (Feb 06 2026 at 14:22):

disclaimer: I tried to read the whole thread, but I may have missed key relevant posts, I am sorry if I didn't acknowledge you for making similar points!

I have spent the past couple of months investing serious deliberate practice into perfecting a claude code/lean workflow. I think I have it pretty much dialled in. On a good day I get 2K LOC lean code (with a single claude max subscription). Is it good quality? I confess I don't feel qualified to judge.

I have multiple motivations: So far I have been using lean to formalise LLM-generated physics/math questions because I just don't trust LLMs. For various reasons I have been formalising operator algebras (GNS theorem etc.), Jordan algebras and tensor categories.

I never considered submitting PRs to mathlib. Instead I have been collecting my various experiments in my own monorepo: It is completely clear to me that it would be indecent to inflict my growing codebase on human reviewers. The numbers don't add up.

I guess, if LLM-assisted code gen takes off (and there are reasons to think frontier AI labs might be interested in doing this), we will simply witness a cambrian explosion of repos like mine.

But that is ok, right? I guess the way I have been thinking of mathlib as analogous to the linux kernel: a massive coordinated and carefully curated effort to create a robust infrastructure that lays the foundation for vast numbers of creations built on top. Does that resonate?

[EDIT: BTW you guys are amazing! Mathlib is an extraordinary awesome creation!]

Chris Henson (Feb 06 2026 at 14:22):

Shreyas Srinivas said:

In our field, we should definitely not create a jungle of formalisations. We need to unify the field under definitions that we all agree on and make it clear where differences lie.

This problem is not unique to CS, but I think it is a bit more pronounced because to an extent we are more inclined to fundamentally want to compare different formalizations that do not always have an easily recognizable or ergonomic abstraction. Doing formalization abstracted over something as fundamental as binding for instance (as in Tealeaves) is in the realm of active research.

Matthew Ballard (Feb 06 2026 at 14:31):

LOC are a very bad measure of quality here

Arthur Paulino (Feb 06 2026 at 14:45):

Tobias J. Osborne said:

I guess the way I have been thinking of mathlib as analogous to the linux kernel: a massive coordinated and carefully curated effort to create a robust infrastructure that lays the foundation for vast numbers of creations built on top. Does that resonate?

Idk, I think there's a fundamental difference.

An OS kernel tries to solve a known set of well-defined problems. If one's able to use their hardware appropriately, that's pretty much it.

Mathematics, at least to me, is more like an open world with no clearly specified finality. So we'd like it to build up (indefinitely?). But two different definitions of Nat won't ever match in Lean, so the build up goal can't be achieved if definitions branch off, even if isomorphically and functionally equivalent.

Tobias J. Osborne (Feb 06 2026 at 14:56):

Arthur Paulino said:

Mathematics, at least to me, is more like an open world with no clearly specified finality. So we'd like it to build up (indefinitely?). But two different definitions of Nat won't ever match in Lean, so the build up goal can't be achieved if definitions branch off, even if isomorphically and functionally equivalent.

many thanks for your comment! I am trying to orient myself and activities hopefully in a way that aligns with the mission and vision of lean community. (I certainly don't want to become a bad actor!) If I understand it correctly, the proliferation of individual repos each building their own little incompatible versions of mathlib would be counterproductive? I do take your point. How would you advise that I, and similar persons (I know of several well-known physicists mathematicians doing similar things as me), should proceed? I have been leaving my repos as open so far. I am genuinely unsure.

Henrik Böving (Feb 06 2026 at 14:57):

An OS kernel tries to solve a known set of well-defined problems. If one's able to use their hardware appropriately, that's pretty much it.

Saying that "using the hardware appropriately" is well defined when all mainstream kernels have random sleeps in their NIC drivers because nobody actually understands why the hardware does things is a bit of a stretch I think :P

Chris Henson (Feb 06 2026 at 15:02):

As I like to joke, any day where Lean lets me forget that hardware actually physically exists is a success. Godspeed to those doing hardware verification :saluting_face:

Arthur Paulino (Feb 06 2026 at 15:10):

@Henrik Böving I said that the goal is clear, not that the solution is already here :joy_cat:

Also, whether or not drivers are considered as part of the "kernel" depends on the architecture as well as on the definition of "kernel" itself. But that's a whole other topic!

@Tobias J. Osborne no need to interrupt your learning journey! Lean is a free and open source system. You're not obliged, in any sense, to use it in order to build up a common math lib.

It's :100: okay to spin up isolated repos, as long as you're not expecting your work to be integrated in a ecosystem without extra effort (i.e. rewriting theorems in terms of other definitions).

Note: I don't speak for the community! I speak as a simple individual user of Lean

Oliver Nash (Feb 07 2026 at 17:02):

I think that trying to answer the question “would sloplib work” is too easy, and instead we should be trying to answer the question “would sloplib work, and if not, what would?”. It seems clear that AI has much to offer mathematics and formalisation. The AI crowd know this but they don’t really know what to offer us, and it’s up to us to ask. Two obvious tools are

A fast, powerful hammer tactic
A reliable autoformaliser

What else could we request? What other experiments could we run? If sloplib is not the right experiment what would be?

Over the last few years, time and time again AI has shown itself more capable than many assumed. Probably already today there are ways it can help us that we don’t know about. If sloplib is not the right experiment to run to try to figure these out, what is?

Filippo A. E. Nuccio (Feb 07 2026 at 20:04):

Kevin Buzzard said:

Basically my impression is that lots of people think that this is a bad idea but I am realising that I don't have a coherent argument as to why. I am well aware that mathlib has an extremely high bar for code review but I am unclear as to how important this fact has been in mathlib's success and, more importantly, how important it will remain going forwards.

My position here is quite similar to @Jireh Loreaux 's but mostly adressing the question "is there a risk, and if yes which one, in sloplib?". I belive that the very high risk is to lose common knowledge on how to build a library of formalised mathematics (so, I'm not speaking about @Patrick Massot 's point that by formalising one learns better mathematics, although I certainly share his point: I'm really thinking on learning how to formalise it). This common knowledge is "a thing" that it is very hard to value, because it is shared, spread at different levels in different places, and certainly non quantifiable. But I would argue that it is one of the strongest assest of Mathlib, even beyond the 2M LOC or the great theorems we have here; and extremely fragile at the same time, because here we all took the same path that was climbing the steep ladder of getting to know Mathlib and learning a great deal from the endevour. The advantage of the high bar we have is that we're creating, albeit indirectly and not as our primary focus, a school on how to formalise mathematics, and this would be endangered by sloplib. The big problem is that once the school is gone, and with it the milion of small seeds of knowledge that are spread in hundreds of users, it will be very hard to come back because there won't be a "steep ladder" to climb any more.

Michael Rothgang (Feb 07 2026 at 21:29):

I think a "clean-up AI" would be really valuable. Getting a PR of mine from "code is sorry-free" to "ready for a PR" is a substantial amount of work, which is often a useful exercise (think about good abstractions/intermediate lemmas, etc.) and at the same time more tedium than perhaps necessary. Getting help with that (and be it "move this single lemma to the right place"; "now repeat with that lemma") would be beneficial.

Patrick Massot (Feb 08 2026 at 22:01):

I’m sorry I’m very late to the party, I was traveling, advertising Lean and then catching up with administration. I think all this discussion about feasibility of sloplib is completely missing the point. I don’t mind assuming, for the sake of discussion, that one day AI could become capable of writing a better Mathlib than what humans can do. Let us even assume (this is really becoming pure science fiction for the sake of a philosophical discussion) that it can do that without destroying our planet and funding fascist states. Then we’re talking about an horrifying dystopia, and I think we should stop and think about why we’re doing that. Here I’m not talking about software verification or any other real world use of Lean. I’m talking about utterly useless fundamental mathematics such as the Langlands program or symplectic topology. Why are we doing that? We’re doing that because it gives immense intellectual pleasure to human beings to slowly understand those areas and communicate their understanding to other human beings, and participate in building something that is so much bigger than any individual mathematician and does not harm anybody. Formalization of those areas of mathematics is simply a new added direction in that story. It adds more ways to enjoy this pleasure. One can argue endlessly about how much new understanding we get. But I think the impact on the feeling of building something together is really undeniable. And I conjecture it’s the main motivation of all mathematicians here who converted a lot of their research time from traditional math to formalized math. Mathlib is the first project where I feel I have 650 collaborators, beating my previous highscore by a factor of 200. And it allowed me to have meaningful mathematical discussions with experts such as Kevin or Johan who are very far away from my corner of maths. Can anyone tell why I should dream of having AI destroy all that? Do we want to live our mathematical life in https://assets.newsie.social/media_attachments/files/116/013/035/609/022/920/original/f04277ee5ca87e06.jpg (drawing by Tjeerd Royaards).

Chris Henson (Feb 08 2026 at 22:42):

Patrick Massot said:

Here I’m not talking about software verification or any other real world use of Lean. I’m talking about utterly useless fundamental mathematics...

Even though I work primarily on CSLib, count me entirely in this camp. Earlier today we had some discussion about the AI porting of fundamental CS from other proof assistants. I think some might question for instance my recent Lean Together talk on formalizing lambda calculi as something that was done 20 years ago in other proof assistants and could be done with AI much faster than I have progressed. But completely in line with what Patrick says, this completely misses the point. Even if AI could idiomatically match such porting (and I am very doubtful...) it completely misses that formalization is just as much a social and personal intellectual activity, and understanding is something that fundamentally cannot be automated. When I first ran into Lean 3, I loved it so much that I eventually quit my (much better paying!) finance job to do a PhD and have loved the experience that Lean has provided me in interacting with this community. I think that that even if AI managed to produce some mangled version of the work I've done in CSLib the last few months, it would not be able to explore this space of what Lean has to offer formal metatheory with the same nuance or grant the understanding that careful study of fundamental material has provided me, and it certainly couldn't automate these years of my life.

Mirek Olšák (Feb 09 2026 at 00:09):

On the contrary to @Patrick Massot , I am often sad to see such an anti-AI sentiment. We have super-human chess engines for over than 25 years. Did chess stop being human intellectual endeavor that players find fulfilling? Did it stop people from studying chess and socializing around it?

One of the ways I futuristically promote Lean is as follows

AI is getting better in understanding programming languages. It is quite plausible that in the future, it will be able to reason and code better than people, so standard math papers and programming languages become obsolete. We will need a rich precise language so that we can "talk" with it. To understand its results, and to have guarantees about what it is doing. Lean has the potential to become such language.

If we get to such point (and yes, it is sci-fi for now. For me, AI solving IMO was sci-fi 5 years ago), anyone who works on Lean now could feel satisfaction in the future of having their part of contribution to the ultimate AI system that can eventually encompass mathematics, programming, and natural sciences. I understand I cannot force someone to stop worrying about the future but I think a positive vision should be also heard.

Snir Broshi (Feb 09 2026 at 00:22):

Mirek Olšák said:

Did chess stop being human intellectual endeavor that players find fulfilling? Did it stop people from studying chess and socializing around it?

For a game, if bots play better than me I can choose not to play against them and have fun playing against humans.

For an endeavor that has clear goals that can be completed, if a bot comes around and completes it better than I can I'm not going to say "okay let's uncheck the checkbox to let me enjoy completing it myself".

(and the fact that the amount of goals is potentially endless doesn't solve the issue)
(btw I don't necessarily identify with Patrick's sentiment, but I disagree with the chess analogy)

Shreyas Srinivas (Feb 09 2026 at 00:51):

Mirek Olšák said:

On the contrary to Patrick Massot , I am often sad to see such an anti-AI sentiment. We have super-human chess engines for over than 25 years. Did chess stop being human intellectual endeavor that players find fulfilling? Did it stop people from studying chess and socializing around it?

One of the ways I futuristically promote Lean is as follows

AI is getting better in understanding programming languages. It is quite plausible that in the future, it will be able to reason and code better than people, so standard math papers and programming languages become obsolete. We will need a rich precise language so that we can "talk" with it. To understand its results, and to have guarantees about what it is doing. Lean has the potential to become such language.

If we get to such point (and yes, it is sci-fi for now. For me, AI solving IMO was sci-fi 5 years ago), anyone who works on Lean now could feel satisfaction in the future of having their part of contribution to the ultimate AI system that can eventually encompass mathematics, programming, and natural sciences. I understand I cannot force someone to stop worry about the future but I think a positive vision should be also heard.

Chess is explicitly funded as a sport i.e. as a form of entertainment with the full understanding that it is socially beneficial, but there are no immediate returns. Sciences are funded by public institutions with expectations of returns

Shreyas Srinivas (Feb 09 2026 at 00:51):

So this comparison is pointless

Mirek Olšák (Feb 09 2026 at 00:56):

Shreyas Srinivas said:

Chess is explicitly funded as a sport i.e. as a form of entertainment with the full understanding that it is socially beneficial, but there are no immediate returns. Sciences are funded by public institutions with expectations of returns

I am sorry but this feels like some inconsistency here. On one hand, you are saying that there are expectations of returns, on the other hand, Patrick was talking about "useless math". So choose one.

(a) You do it because you think it can be eventually good for something. Then we should be glad if AI brings more of this good to the society.
(b) You do it because you enjoy it, as a form of game, or art. Then the analogy with chess is appropriate.

Shreyas Srinivas (Feb 09 2026 at 00:57):

It is not inconsistent if you understand that different institutions and people have different reasons for valuing an activity, and those who hold the purse strings often have to be pleased and satisfied w.r.t their reasons.

Shreyas Srinivas (Feb 09 2026 at 01:02):

If you want more success with promoting AI, you need to address the social problems of our current setup of funding intellectual activity first . Secondly, at a technicaly level, AI needs to automate basic tasks like farming, mining, transportation, laundry, and dishwashing. What's that saying "I want AI to do my dishes and my laundry...". Making breathtaking claims about how AI is going to solve a fun and intellectual task is not going to win you accolades

Mirek Olšák (Feb 09 2026 at 01:20):

Shreyas Srinivas said:

If you want more success with promoting AI

I don't need success with promoting AI -- AI future is now rather an undeniable reality. Just a single sample point, vast majority of students of computer science here in Czechia has chosen AI as their specialization.

I am rather promoting Lean as something that fits into the AI future :wink:

Chris Henson (Feb 09 2026 at 01:24):

Mirek Olšák said:

(a) You do it because you think it can be eventually good for something. Then we should be glad if AI brings more of this good to the society.

(b) You do it because you enjoy it, as a form of game, or art. Then the analogy with chess is appropriate.

I can imagine some potential nuance about these points:

Mathematics has a complex state of affairs regarding the interplay of theory and application. These are not clearly delineated cases where you can conclude that producing more of a good to society through applications can be separated from areas where AI involvement may be questioned.
Humans made the decision to continue playing chess, with use of chess engines relegated to very specific roles as supplemental tools. In formal mathematics there are similarly a variety of modes of use for AI that need to be individually compared to make a proper comparison. The exact methods and the changing nature of AI is an important consideration in forming a analogy here.

Yan Yablonovskiy 🇺🇦 (Feb 09 2026 at 05:06):

Mirek Olšák said:

Shreyas Srinivas said:

Chess is explicitly funded as a sport i.e. as a form of entertainment with the full understanding that it is socially beneficial, but there are no immediate returns. Sciences are funded by public institutions with expectations of returns

I am sorry but this feels like some inconsistency here. On one hand, you are saying that there are expectations of returns, on the other hand, Patrick was talking about "useless math". So choose one.

(a) You do it because you think it can be eventually good for something. Then we should be glad if AI brings more of this good to the society.

(b) You do it because you enjoy it, as a form of game, or art. Then the analogy with chess is appropriate.

There are more gamified elements of math , math competitions are plenty, and you can talk more about an analogy to chess there. Math research , this analogy breaks down I think even if a similarity is “people do it for enjoyment” .

“Useless” math is a misnomer , it’s just a way to say it’s hard to find direct application that leads to returns in the current framework. It is very hard to delineate as theoretical math has been a source of many discoveries seemingly useless at the time and finding rich applications later. In fact AI likely wouldn’t exist in its state now without such endeavours a long time ago. So it’s more like “capitalistic/societal applications pending” math ( assuming we are talking about math of some depth and novelty ).

I think it may be off topic but I’d be curious as to what the students that pick AI as their specialisation would actually end up doing day to day at work after graduating , under the premise AI is so good it can do Mathlib and math/coding better than humans?

Ralf Stephan (Feb 09 2026 at 08:23):

Floris van Doorn said:

I also want to highlight the most genius rule from the merely-true repository:

Each time Mathlib moves to a new Lean toolchain, we will [...] delete all files which no longer compile!

I think this is actually very interesting:

Most AI-generated files using existing capabilities will be removed: they are not at all robust, and will fail compilation on every Mathlib bump

You have missed the possibility of cryptographically certified proofs tied to specific compiler+mathlib versions. Or would you argue that a proof loses its validity just because the mathlib version changed?

First attempts to implement it:

Artie Khovanov (Feb 10 2026 at 14:51):

(message redacted by author)

Bryan Wang (Feb 10 2026 at 15:39):

I (and perhaps others) have expressed elsewhere on this zulip that as soon as one is able to fully automate pure mathematics, by which I mean maths that has no immediate applications in the foreseeable future, a lot of that 'cultural prestige' will be lost instantly (edit: referring to the previous message which is now removed). So while I agree that such endeavours may be scientifically interesting in their own right, I personally struggle to see any long-term sustainability down this path of automating pure mathematics, and one has to be clear about what our long-term goals are (compared to, say, the long-term goals of AI companies). I would very, very much love to be proven wrong however.

Maybe the next step is to create AI that can find real-world applications of previously useless mathematics, but I think this is even more far-fetched than what anyone is discussing right now.

Artie Khovanov (Feb 10 2026 at 20:38):

(message redacted by author)

Siddhartha Gadgil (Feb 12 2026 at 12:27):

Bryan Wang said:

I (and perhaps others) have expressed elsewhere on this zulip that as soon as one is able to fully automate pure mathematics, by which I mean maths that has no immediate applications in the foreseeable future, a lot of that 'cultural prestige' will be lost instantly (edit: referring to the previous message which is now removed). So while I agree that such endeavours may be scientifically interesting in their own right, I personally struggle to see any long-term sustainability down this path of automating pure mathematics, and one has to be clear about what our long-term goals are (compared to, say, the long-term goals of AI companies). I would very, very much love to be proven wrong however.

Maybe the next step is to create AI that can find real-world applications of previously useless mathematics, but I think this is even more far-fetched than what anyone is discussing right now.

Since this was raised here let me make a bold conjecture about what can happen: a lot of scientific/engineering problems which are mathematical in nature are instead approached using numerics, simulations, heuristics etc because the mathematics is too hard. If AI+Lean increases the power of mathematics then theorems can replace simulations in many domains, and theorems have many advantages.

Shreyas Srinivas (Feb 12 2026 at 16:51):

@Siddhartha Gadgil you know PDEs far better than me but I think you would agree that most non linear PDEs that engineers use don’t have closed form solutions. So they turn to linearisation or simulations. I’ll give you an example. There is this numerical tool called spice. It takes in transistor models and simulates entire circuits in analog. Each transistor has 3-digit numbers of parameters. A circuit may have hundreds or even thousand of such transistors working in concert. If we could “prove” or “deduce” even a power series solution (let alone closed form) for the behaviour of such circuits, would it yield any meaningful insights that a simulation couldn’t?

Shreyas Srinivas (Feb 12 2026 at 16:52):

At that scale would such solutions even be computationally better than numerical solutions?

Kim Morrison (Feb 12 2026 at 23:31):

@Shreyas Srinivas, I think that is straw-manning Siddhartha's point.

Siddhartha Gadgil (Feb 13 2026 at 01:29):

To clarify it is not the cases where numerics works perfectly well that I am talking about but the cases (above the critical exponent) where they don't and for mathematically well understood reasons. These include fluid dynamics to predict turbulence and weather and the equations for plasmas to control fusion. People in the field also know these issues and do manage to work around in many intelligent ways. But they are seeking more robust solutions (many fluid mechanics people are talking of using AI to relate scales, to cope with the effect of small scales on larger ones).

What I am talking about is not a "power series solution" or something like that but theory. For instance, an effective finite-time existence and regularity result for solutions; results about the nature of singularities and entropies/energies that determine these; effective controllability results (these say that boundary conditions can be varied to determine trajectories) etc. Such a theory, if possible, will certainly be of value.

The broader point is that there is a concern (and there should be) that problems that are important and difficult but possible today will become easy, and so lose their role in terms of purpose/prestige. But one should expect that the same technology will mean that things that are important but beyond our abilities today become feasible. So the scope of mathematics need not shrink but can shift (and may even expand). To me this is the great promise of Lean+AI.

Shreyas Srinivas (Feb 13 2026 at 01:33):

Okay I see what you mean. But this only works in a world where those with resources believe mathematics has more to offer and is worth funding. We appear to be in a world where those with resources believe AI will “solve mathematics” and human mathematicians will be obsolete. That’s the kind of scenario depicted in Patrick’s comic. This is not a hypothetical. We are seeing funding agencies already push limited grant money towards AI and away from other areas of study.

Siddhartha Gadgil (Feb 13 2026 at 01:58):

This is me being a mathematician, proving an existence theorem and declaring my job done :smile:

I agree there are economic, social, political factors here which I am not addressing - not because they are not important, but because they are so big that they will drown out the more limited scientific point I am making. But to only partially address these - one needs to only convince those controlling some of the resources of the continued utility of mathematics (and the world has things like the Simons foundation and Renaissance Philanthropy for example). But to convince even some of the continued utility of mathematics as AI doing maths becomes powerful one should have some vision of how mathematics is going to continue to be useful. I laid out one such vision, and perhaps others will have a much better one. But betting on AI not crossing some thresholds is risky.

But what I said is to me mainly simply a personal guide.

Bryan Wang (Feb 13 2026 at 02:05):

I completely agree but I also think a lot of these discussions will look wildly different depending on subfield!

Soltani Mehrez (Feb 13 2026 at 02:57):

I'm not a mathlib maintainer, just someone learning Lean by formalizing basic complex number identities.

I see a gap between:

Writing simp and hoping it works, and
Writing a clear calc proof where every step is justified.

When I do the second, I understand the theorem. When I do the first, I just pass the tactic.

If mathlib becomes a pile of AI-generated proofs that no human really understands, I'm not sure even an AI could maintain it long term. Because to fix broken code, you need to understand what it was supposed to do — and that's exactly what these bloated, non-idiomatic proofs hide.

So yes, AI can write fast. But if it writes code that even it doesn't comprehend, we're just accelerating toward unmaintainability.

Terence Tao (Feb 13 2026 at 05:54):

I just wanted to point out that the Equational Theories Project is, in some sense, an existing example of a "sloplib". We proved 22 million implications or anti-implications between over 4000 equational laws in Lean (well, technically, we only formalized about 600,000 of these in Lean, the rest can be deduced by transitive closure). Almost all of these implications are "AI slop" in that they are proven by (good old-fashioned) AI such as automated theorem provers, or by using computer-generated counterexamples found mainly by brute force. Only a very small number of the proved implications yield any human insight in their proofs. It would have been totally infeasible to complete this project without such heavy reliance on some sort of AI.

My view is that this type of "slop-powered" projects can enable a complementary style of mathematics to traditional styles, analogously to how experiment can complement theory in the physical sciences. Much as we would not want empirical experiment to substitute for theory, it would be ill-advised to use mass generation of uninterpretable AI proofs for any traditional math research project, but I think there is a role for such tools in exploratory, experimental mathematics of a type which was almost non-existent before the AI era (where I also include good old-fashioned AI as a type of AI in addition to modern LLM-powered tools).

Shreyas Srinivas (Feb 13 2026 at 05:58):

I think the discussion has meandered in two directions

Should we use AI for formalisations?
Should we collect them in a mathlib style lib. Would it be feasible, useful, and not really detrimental to the purposes as well as uses of formalisation.

J. J. Issai (project started by GRP) (Feb 13 2026 at 06:48):

Terence Tao said:

I just wanted to point out that the Equational Theories Project is, in some sense, an existing example of a "sloplib". We proved 22 million implications or anti-implications between over 4000 equational laws in Lean (well, technically, we only formalized about 600,000 of these in Lean, the rest can be deduced by transitive closure). Almost all of these implications are "AI slop" in that they are proven by (good old-fashioned) AI such as automated theorem provers, or by using computer-generated counterexamples found mainly by brute force. Only a very small number of the proved implications yield any human insight in their proofs. It would have been totally infeasible to complete this project without such heavy reliance on some sort of AI.

I think more careful language could be used here. While there is a certain degree of expertise present in automated theorem proving or in clever uses of (I am guessing) pruned brute force, I would refrain from calling these things Artificial Intelligence. I think the more general description of 'machine-assisted' is better, and that one should keep track of the nuances involved in the kind of assistance rendered by the machine, be it faster number-crunching, text-processing, random searching, or guided sentence generation (read as suggestion for proof completion).

Terry is entitled to categorize his (and his collaborators') own work as he wishes. However, I would call the ETP output a database of curated equational proofs much of which were programmatically generated once enough theory was developed and expertise gathered to decide which programs to use to generate parts of the database. In particular, some thought and care were put into the shape and form and size of the database entries. Kevin Buzzard's thought experiment is a programmatically generated collection of (recipes for) proof terms and expressions of types that might be interpreted as theorems in some form of logic which is not curated and not guided by much (if any) protocol or design beyond the essential framework dictated by the Lean system.

More importantly, one can look at the ETP and the accompanying documentation and come to some understanding and mastery of the material, and replicate the results and explain them to someone else.
This is quite unlike a sloplib, in which one would have a collection of statements that are asserted to be true, and one is given a not very straightforward way of running a verification machine. Further, a sloplib entry may well be incomprehensible to every human, including the designers of the sloplib system, and the entry could only be digitally copied (not replicated or its proof confirmed by a different process), and definitely not understood nor be able to be explained to someone else.

I see the ETP database and documentation as being useful for future research. I see sloplib as useful for suggesting theorems that might be true and possibly provable by humans, but (to exaggerate Kevin's thought experiment some) in an inscrutable and humanly unverifiable form. Perhaps Kevin can again explain why this idea of sloplib would be useful; I am not seeing it.

Siddhartha Gadgil (Feb 13 2026 at 07:28):

My view on this:

The name merely true perfectly captures the nature of the library.
It is up to each person whether or not knowing that something is merely true is of value. I feel it violates the spirit of Lean (which I expand on that below) if people try to say others should find no value in things being merely true, or to actively discourage the building of such libraries.
Mathlib has always had high standards for its code. As of now, AI completely fails these. So obviously it cannot be accepted. Whether or not higher standards should be applied to AI can be debated, but applying lower standards is daft.
What I mean by spirit of Lean: when explaining Lean to programmers, I emphasize that the spirit of Lean is to enable but not compel. We can prove correctness or simply not prove it. We can prove terminatioin or simply bypass this using partial. This is in contrast some other functional programming communities, which while actually enabling less safety than Lean make it clear that something is really wrong with you if you do terrible things like running a monadic value. Indeed calling things run and not runUnsafe is part of the difference.

Terence Tao (Feb 13 2026 at 07:49):

J. J. Issai (project started by GRP) said:

I think more careful language could be used here. While there is a certain degree of expertise present in automated theorem proving or in clever uses of (I am guessing) pruned brute force, I would refrain from calling these things Artificial Intelligence. I think the more general description of 'machine-assisted' is better, and that one should keep track of the nuances involved in the kind of assistance rendered by the machine, be it faster number-crunching, text-processing, random searching, or guided sentence generation (read as suggestion for proof completion).

Fair point - for ETP, while each individual machine-generated proof was only barely human-readable and does not provide much human insight, there was also a record of a human directing that generation through writing automated scripts, discussing in the Zulip, etc.. The analogue for the sloplib would be if the AI-generated proofs were not generated by completely autonomous AI agents, but through AI prompted by humans, with the prompts themselves (as well as AI chain of thought, etc.) being made openly available as part of the sloplib documentation, and perhaps discussed by human participants in the project in a suitable project forum. This may be an intermediate point on the tradeoff between pure scale and interpretability that may offer more value than a project optimized only for scale.

Alex Meiburg (Feb 15 2026 at 21:53):

Since the original question was "what would go wrong with sloplib", here's my view, that I don't think has been articulated exactly here yet:

The outcome would be very mostly boring, mostly because of bad definitions.

In the context of mathlib development, a "bad" definition typically means one that will be unnecessarily limiting the scope of a theory, or is very awkward to work with. Because we're implicitly ruling out the case of a definition that actually is just stupid or doesn't correspond to any informal math at all. But with the standard of "anything an AI submits that compiles" there will very soon be nonsense in there like

def greatestTwinPrime (n : ℕ) : Prop := ∃ k > n, k > n + 2 ∧ k.IsPrime

or whatever. And then the next AI will come along and "build" on this and make further nonsense. And in general I think this will quickly lead to 'context poisoning' where AIs that see a bunch of crappy low-effort definitions in context are more likely to do so themselves. In the end, it's possible that an interesting theorem gets proved, but I think significant fractions of math will get 'doomed' exponentially quickly as these bad definitions compound.

I think a sloplib experiment could be interesting, in the setting of "no code standards for API / useful lemmas / picking the best definitions", but I think that human auditing to check the definitions are at least meaningful would be a necessary precondition to the experiment going anywhere fun.

Last updated: Feb 28 2026 at 14:05 UTC