Zulip Chat Archive

Stream: new members

Topic: How to convince of mathlibs utility?

Yan Yablonovskiy 🇺🇦 (Nov 06 2025 at 06:58):

I presented to some professors and students in my math school today, who have no exposure to any topics about theorem proving or lean.

It was about Lean 4, ITP and AI, mentioning aristotle and the Gauss autoformaliser with PNT.

I encountered one obstacle I really didn't know how to push back against, I was asked who is in charge of mathlib / what is the hierarchy. Others joined the discussion saying its an open source project, and i showed how PRs work and are reviewed ( to the extent of my knowledge ).

But that wasn't really enough , I encountered "I might not care about some open source software having a bug but if its meant to be for verifying validity of math" and "How is consensus actually reached?".

There was also a lot of scepticism about even the feasibility of such a system ever being able to contain enough math to be useful in research, and that professors don't really have the time to learn such a complicated language much less want to spend time formalising well established maths, and whether it will ever be usable enough for day to day research.

Any ideas?

Alfredo Moreira-Rosa (Nov 06 2025 at 09:52):

You'll learn that trying to convince people is useless.
Use a different approach. Inform people, do presentations of your work, explain what you do with lean and what you gained (like a lot) and let people interrested join you on your journey.
If you are perseverant, and people see it's not just a fad, you'll gain interrest.
The sceptics may become less skeptics over time, but you need to let them mature their decisions.
This is my experience with decades of technology adoptions behind me.

Michael Rothgang (Nov 06 2025 at 11:37):

Governance of open source projects is an interesting topic, and answers are usually complex. Some projects adopt a "BDFL" (i.e., one person with final decision authority on all questions). Other projects try for a more distributed model, with e.g. a core team making final decisions --- and often creating sub-teams for particular areas. Such projects include the Rust compiler and mathlib.

In the case of mathlib, the maintainers kind of have the final say --- but there are subteams on e.g. moderation issues, CI details, and the reviewers group also helps to spread the work. The maintainers are a large group (around 30 people), and deciding in such groups can be interesting. We're aiming for consensus - i.e., discuss a question, try to understand each other's point of view and try to find a solution that takes all points and concerns into account.

Michael Rothgang (Nov 06 2025 at 11:38):

In short:

there is no one person making all the decisions: while such a model is tempting, I personally believe more distributed models can generate better outcomes (and allow any individual to move on from the project)
the maintainers team comes close to the "one group making final decisions"
we aim for decisions by consensus, if possible

Michael Rothgang (Nov 06 2025 at 11:39):

Does that answer your question about governance? If not, can you clarify what you would really like to know?

Michael Rothgang (Nov 06 2025 at 11:42):

Regarding feasibility: I consider the long list of advanced formalisation projects proof enough that almost any piece of mathematics can be formalised, given enough time and interest. The challenge is not if it is possible, but how ergonomic it is.
Personally, I'm sometimes impressed by Lean, and in other places there are still ways to go. Good documentation, finding the right abstraction, more powerful tactics, etc. will all help. This takes work, but it's getting better: we can make it better together!

Yan Yablonovskiy 🇺🇦 (Nov 06 2025 at 11:54):

Michael Rothgang said:

Does that answer your question about governance? If not, can you clarify what you would really like to know?

Thanks that was helpful, I personally never even thought about that so I just wasn't sure how to answer it. I mostly just said that it should be the same as always: definitions taken and cited from a reputable source and the disagreements would be mostly about implementation.

But I guess their underlying question was if there is a disagreement, how is a final decision settled on, and if there was some explicit hierarchy, so now I understand it a lot more.

Floris van Doorn (Nov 06 2025 at 12:36):

It is a completely valid criticism that many professors do not have enough time to learn Lean (or it's reasonable to say that it's not a useful time investment). I try to encourage professors to work together with a Master/PhD student: tell them to learn Lean, and then to formalize a result in the area of the professor (and let the student explain Lean to the professor).

Can Mathlib contain enough math to be useful in research? It currently already does in various areas. Already a few research papers from this decade have been formalized. This is mostly in combinatorics (search for the cap-set problem, polynomial Freiman-Rusza conjecture, exponential improvement in upper bound for Ramsey number, and more), but the Carleson project was a formalization from harmonic analysis (and this announcement from today is relevant: #announce > ERC grant: Harmonic Analysis and Lean Formalization). In other areas (e.g. number theory, differential geometry), the answer is that currently we are far behind the cutting edge research. In that case, you can ask them to come back in 1-5 years (unless they want to help reach the frontier more quickly, of course!).

Eric Wieser (Nov 06 2025 at 12:37):

Yan Yablonovskiy 🇺🇦 said:

I might not care about some open source software having a bug but if its meant to be for verifying validity of math

I think this sort of misunderstands what a "bug" in mathlib looks like.
In order of likelihood, I would say the bugs are:

One of the "tactics" fails where it is supposed to succeed. This is annoying, but has no effect on correctness, and forces the user to do things more manually.
One of the tactics produces a proof that is invalid. This is caught by the Lean kernel, so is not a soundness risk.
One of the definitions in mathlib is subtly the wrong thing; for instance, perhaps Foo R is the right thing when R is commutative, but is somehow the wrong generalization when R is not. This would usually become obvious while trying to write the proof for the general case.
Someone inserts a malicious instance to make 2 + 2 = 5. This would be very hard to introduce accidentally, and should be caught in review.

Eric Wieser (Nov 06 2025 at 12:38):

Note that "mathlib thinks X is proved but the proof is invalid" is not a plausible bug; at worst it's "mathlib definitely proves X' but I the human misread it as X".

Yan Yablonovskiy 🇺🇦 (Nov 06 2025 at 12:52):

Eric Wieser said:

Note that "mathlib thinks X is proved but the proof is invalid" is not a plausible bug; at worst it's "mathlib definitely proves X' but I the human misread it as X".

Yea I basically said you can't prove something that's wrong, but maybe something that isn't as useful as you thought.

Floris van Doorn said:

Can Mathlib contain enough math to be useful in research? It currently already does in various areas. Already a few research papers from this decade have been formalized.

Fair enough, I think a large concern or point of focus was if its automation/interactivity has the utility to speed up research output in their day to day yet. I guess it would be more motivating to learn it if it does.

Floris van Doorn (Nov 06 2025 at 13:05):

Yan Yablonovskiy 🇺🇦 said:

Fair enough, I think a large concern or point of focus was if its automation/interactivity has the utility to speed up research output in their day to day yet. I guess it would be more motivating to learn it if it does.

I think the initial goal of Lean is not too speed up the research output of professors (but one day it might - especially if combined with AI tools).
Another promising application is to verify a difficult proof that is hard to understand, especially if the author has lingering doubts about the correctness, which was the case in the Liquid Tensor Experiment formalization of a key result in condensed mathematics.

Patrick Massot (Nov 06 2025 at 14:01):

Yan, are you aware of https://leanprover-community.github.io/teams.html? It tries to answer some of those governance questions.

Patrick Massot (Nov 06 2025 at 14:02):

But note it’s not the full picture because there are strong interactions with https://lean-fro.org/ and now also with https://mathlib-initiative.org/.

Patrick Massot (Nov 06 2025 at 14:03):

Those two organization have no decision power over Mathlib, but of course we need very close cooperation.

Patrick Massot (Nov 06 2025 at 14:03):

About the motivations, my personal answer is still pretty much https://www.imo.universite-paris-saclay.fr/~patrick.massot/files/exposition/why_formalize.pdf

Ryan Smith (Nov 06 2025 at 19:20):

Floris van Doorn said:

Yan Yablonovskiy 🇺🇦 said:

Fair enough, I think a large concern or point of focus was if its automation/interactivity has the utility to speed up research output in their day to day yet. I guess it would be more motivating to learn it if it does.

I think the initial goal of Lean is not too speed up the research output of professors (but one day it might - especially if combined with AI tools).
Another promising application is to verify a difficult proof that is hard to understand, especially if the author has lingering doubts about the correctness, which was the case in the Liquid Tensor Experiment formalization of a key result in condensed mathematics.

Someday I'd like to see the classification on finite simple groups in lean, it seems like a quintessential example of a proof which is too large for any one person to check. And of course, there have been gaps found in the past. The scope of such an effort would put FLT to shame though, so maybe it's just wishful thinking.

Ruben Van de Velde (Nov 06 2025 at 19:40):

Perhaps a more achievable goal would be to add the statement to mathlib :)

Riccardo Brasca (Nov 06 2025 at 20:08):

Ruben Van de Velde said:

Perhaps a more achievable goal would be to add the statement to mathlib :)

This is not at all science fiction. Even the fact that the groups in the list are simple is something we can do in the future. The work of @Antoine Chambert-Loir is in this direction.

Michael Rothgang (Nov 06 2025 at 22:40):

(A statement of the CFSG has been written in Lean 3, for the formal abstracts project. Making it self-contained, i.e. proving the necessary theorems to define all the sporadic groups, would be additional effort - but that should be doable and would definitely be worthwhile.)

Antoine Chambert-Loir (Nov 06 2025 at 23:18):

(My goal is to define the groups that appear in the classification and to prove that they are indeed simple. I'll start with SL, but I wish to do it more geometrically than what is allowed for by mathlib for the moment, hence the recent bunch of PRs to define the special linear group of a module, and transvections. I don't want neither to just prove what I need but also to have the definitions and intermediate results done properly. The second step will by the symplectic groups. The case of orthogonal groups is trickier.)

Yan Yablonovskiy 🇺🇦 (Nov 07 2025 at 01:48):

Floris van Doorn said:

Another promising application is to verify a difficult proof that is hard to understand, especially if the author has lingering doubts about the correctness, which was the case in the Liquid Tensor Experiment formalization of a key result in condensed mathematics.

I think being sure of a result, can speed up research output in some sense, and the ecosystem already helps collaboration . It is just seems like such a high barrier of entry to people before that potential net speed up, though retrospectively I should have emphasised that more.

Patrick Massot said:

Yan, are you aware of https://leanprover-community.github.io/teams.html? It tries to answer some of those governance questions.

No I wasn't aware of some of those links , and that exposition article is also very helpful, thanks a lot!

And thank you to everyone else for your responses :pray:

Matthew Ballard (Nov 07 2025 at 13:48):

Here is a story that indicates why formal math and mathlib is even more important these days.

In May I was at AIM and working with collaborators on describing the Serre functor for some particular algebro-geometric category (details not important yet). Through a relationship called mirror symmetry, we could transfer the question to symplectic geometry. One of the collaborators mentioned that we could adapt an argument in his paper slightly to describe the Serre functor/bimodule.

This happens all the time in math. I have an argument where I realize that pieces can slightly tweaked to produce a different, new statement. I see this argument as a trivial modification of the hard work I already did and at best leave the details to the reader.

I thought it would be a good chance to test GPT 5 Pro on its assistance capabilities. Generating details for something like this would be pretty nice. Here is the result.

Is this correct or not? If not, what is the error? I'll come back in a couple hours with the answer.

LLMs are very good at approximating truth and less good at exact truth. At the same time, we are getting more and more Marty McFly moments like #Machine Learning for Theorem Proving > AlphaEvolve @ 💬 So there is honest insight.

When the frontier models can generate highly approximated research level mathematics always and deep insight sometimes, are you going to spend all your time wading through the outputs to figure out which is which? Is that what you want to become a serious part of your job as a researcher?

Much better to leverage a framework where you are assured of correctness and can focus on insight.

Arthur Paulino (Nov 07 2025 at 14:49):

I'm not a mathematician and my experience may be useless for those who choose to study mathematics as a focused field in itself. But a few years ago I was pretty shocked to realize that I couldn't even convince myself that addition is commutative. Thanks to formalization (in Lean) I was able to mitigate that source of ignorance in my own reasoning :sweat_smile:

I guess this rewarding experience varies in depth, depending on one's level of expertise. For me it was commutativity of addition. For others, it might be something way more relevant.

Matthew Ballard (Nov 07 2025 at 17:41):

Is this correct or not? If not, what is the error? I'll come back in a couple hours with the answer.

In one interpretation, everything it said was true. The references existed and said what the model claimed they do.

However, like with problematic human generated proofs, the issue was in what was not said. The Kuo-Li results are not sufficiently general (cotangent bundles only).

So it got reasonably close but didn't do the job.

It took 10-15 minutes to generate the response but that time will be down to 10-15 seconds pretty soon. We will be bombarded with convincing but wrong mathematical arguments. I can think of few stronger arguments for investing in trust.

Yury G. Kudryashov (Nov 08 2025 at 04:22):

Matthew Ballard said:

We will be bombarded with convincing but wrong mathematical arguments. I can think of few stronger arguments for investing in trust.

I hope, Aristotle will bombard us by verified mathematical proofs. At least, I'm working on it.

metakuntyyy (Nov 08 2025 at 14:58):

Ruben Van de Velde said:

Perhaps a more achievable goal would be to add the statement to mathlib :)

Just defining the monster group, and proving that's a group with given size should be fun :grinning:

Oliver Nash (Nov 08 2025 at 16:40):

I suppose for the right audience, one could try to draw an analogy between error-correcting codes and formalisation where one regards an AI as a noisy communication channel. Without error-correction, modern electronic communication could not work.

Timothy Chow (Nov 09 2025 at 22:47):

Ryan Smith said:

Someday I'd like to see the classification on finite simple groups in lean

First, I'd like to see the classification of finite simple groups on paper. The GLS project is in the home stretch but there are still several volumes to go. I asked Ron Solomon earlier this year and he was hesitant to give a prediction, since previous predictions have been wrong, but he's hoping it will be done by the end of 2026.

If people are serious about formalizing CFSG, then probably the place to start is to formalize the results that the GLS project calls "background results." There are a lot more of these than most people realize, even if we treat Aschbacher-Smith as part of the CFSG rather than a "background result."

Timothy Chow (Nov 09 2025 at 23:13):

Eric Wieser said:

One of the definitions in mathlib is subtly the wrong thing

This may not be a big deal now, but I feel sure that it will become a significant issue once mathlib scales up by two or three orders of magnitude. Making sure that a definition means what humans think it means is the one thing that cannot be fully automated.

Over in the Flyspecking Flyspeck discussion, I've been learning that upgrades to mathlib routinely break working code. Of course this doesn't mean that anything is "incorrect" in a strong sense, but I regard it as an early warning sign of how misunderstandings of what has actually been proved can creep into the corpus.

Also, missing from your list is that there could be a bug in one of the libraries (e.g., GMP) that Lean trusts. It's still a low-probability event that such a bug could cause a wrong theorem to be proved, but over time, the temptation will grow stronger to add more and more stuff to the trusted code base. For example, highly computational proofs will probably not be fully checkable in the same way that run-of-the-mill proofs can. See Vasilii Nesterov's comment. I seem to recall seeing a T-shirt that said that everything you had to trust about HOL Light was printed on that T-shirt. That may have been an exaggeration, but as I understand it, nothing remotely resembling such a claim can be made for Lean, and it's only going to get worse in the future.

Eric Wieser (Nov 10 2025 at 00:13):

Timothy Chow said:

Over in the Flyspecking Flyspeck discussion, I've been learning that upgrades to mathlib routinely break working code. Of course this doesn't mean that anything is "incorrect" in a strong sense, but I regard it as an early warning sign of how misunderstandings of what has actually been proved can creep into the corpus.

This does highlight a risk in overassuming the assurance provided by a formal proof. If your final result is formal there is no possibility of incorrectly combining two previously formalized proofs with incompatible definitions (e.g from different mathlib versions), but this can indeed happen if you attempt to combine these formal proofs in an informal setting.

Of course, this failure mode is not unique to formal systems.

Last updated: Feb 28 2026 at 14:05 UTC