Zulip Chat Archive
Stream: mathlib4
Topic: Mission of mathlib
Martin Dvořák (Feb 02 2024 at 11:04):
[Warning]
This post is very philosophical, unnecessarily long, sloppily written, and possibly highly controversial. If you are busy or you care about using your time effectively, you better keep scrolling instead of reading the wall of text below...
I am curious about the overall mission of Mathlib. Is the main goal of Mathlib to (1) provide tools for verifying that the MathematiCS people have done on paper is correct? Or is the main goal of Mathlib to (2) develop a new (and hopefully better) version of MathematiCS?
Why am I asking? If (1) is the main goal, we are maybe doing a disservice to the userbase by always aiming for the maximum generality. It is definitely amazing to have a very abstract mathematical library in Lean. I have no doubts about that. However, it becomes problematic when you realize that Mathlib is de facto the only mathematical library in Lean. The bar for "I have this paper written in LaTeX and now I want to verify the correctness in Lean with the help of Mathlib" is currently very high (to be fair, the design decision to aim for the maximum generality is usually not the main reason why it is difficult, but it contributes to the problem).
If (2) is the main goal, then I think the design decisions of Mathlib are (nearly) perfect! However, we should improve the communication. Namely, I don't like seeing LaTeX in the docs, and I am not a big fan of using LaTeX on Zulip either. If (2) is the main goal of Mathlib, I will totally support it! But, we should start communicating in a way that allows Mathlib contributors and Mathlib users to know only the Lean notation. I will be very happy if I cut off the LaTeX mathematical notation completely out of my life and use Lean only! I noticed my English became much better when I started to use English in my head, instead of thinking in Czech and translating my thoughts to English. I would like to do the same thing for MathematiCS (think directly in Lean, without translating from LaTeX or pen-and-paper/blackboard notation).
Ruben Van de Velde (Feb 02 2024 at 11:15):
Are we assuming mathlib has a single main goal? :)
Johan Commelin (Feb 02 2024 at 11:16):
/me is not
Mario Carneiro (Feb 02 2024 at 11:16):
I think (1) is closer to the truth than (2), although one has to reinvent things sometimes to ease formalization tasks
Mario Carneiro (Feb 02 2024 at 11:17):
The bar for "I have this paper written in LaTeX and now I want to verify the correctness in Lean with the help of Mathlib" is currently very high (to be fair, the design decision to aim for the maximum generality is usually not the main reason why it is difficult, but it contributes to the problem).
I don't really understand what you mean here. Mathlib being high generality should not make proofs (significantly) harder, and if it does that should be treated as an issue to be solved. Obviously this makes things harder for mathlib contributions but that's a separate matter
Mario Carneiro (Feb 02 2024 at 11:18):
It can make things harder to find though (which is also an issue to be solved, but I'm not sure we have good tools for tackling it at this point)
Josha Dekker (Feb 02 2024 at 11:21):
Regarding tools for making things easier to find: what if we picked a selection of standard, ideally openly accessible, textbooks on the various areas of math and made a file linking (almost) every result in these books to results in Mathlib? That allows for a nice way of finding the generalization of your result that was implemented in Mathlib, and should be relatively easy to maintain…
Riccardo Brasca (Feb 02 2024 at 11:51):
I think one of the main goals is to making possible to formalize serious mathematics.
There is a big difference between working on its own formalization project and working on mathlib. If in my project I need x^n = 1
for x
in a group of order n
, I want this result to be in mathlib, since it is undergraduate mathematics, and I don't want to reprove everything. But I don't care if the actual mathlib statement is for some crazy semimonoid with blah blah. For the same reason I don't want to know why groups have this crazy definition, I just want to use the library, hoping that it works as a good blackboard.
Kevin Buzzard (Feb 02 2024 at 12:02):
For me the main goal is to make a library which is powerful enough to be able to make it feasible to interact with concepts in current mathematical research, because empirically this is what makes mathematicians notice the software and hence grow the area. I believe it's important to make mathematicians notice the software because Lean represents a new and I believe better-than-the-status-quo way of doing mathematics. But this is just personal opinion coming from the fact that mathematicians in my area have in my opinion become quite sloppy when it comes to writing up their results (see for example the section on "The work of Arthur" on this Frank Calegari blog post , an issue I have been banging on about since 2020 but all of a sudden my community has woken up to), and to be honest I'm very happy with the idea that mathlib can mean different things to different people.
Martin Dvořák (Feb 02 2024 at 12:21):
Mario Carneiro said:
I don't really understand what you mean here. Mathlib being high generality should not make proofs (significantly) harder
It does, albeit in an indirect way (modulo some rare cases where the more general definition is really harder to work with).
For example, consider #6042 SVD, which is the top item on the #queue currently. It would have been already merged if we didn't insist on maximum generality. Hence, if I were formalizing anything that uses SVDs, I would have to rebuilt the theory myself, as it is currently missing in Mathlib (where the consensus seems to be that contributions are not accepted until they are provided in the "best generality"). The situation would be much different if Mathlib accepted PRs that aren't general enough and left the refactoring to generality for later contributors (the narrative would be "if a hypothetical future user wants a more general version, let them do it"). The latter approach would have significant drawbacks as well, hence I am not advocating for it, but I nonetheless want to point out that the current narrative ("don't contribute unless you are willing to contribute the most general version") has significant practical issues.
Mario Carneiro (Feb 02 2024 at 12:22):
For example, consider #6042 SVD, which is the top item on the #queue currently. It would have been already merged if we didn't insist on maximum generality.
That's a different issue, the one that I mention in the following sentence:
Obviously this makes things harder for mathlib contributions but that's a separate matter
Mario Carneiro (Feb 02 2024 at 12:23):
Clearly asking for more generality puts more work on mathlib contributors, but it doesn't (shouldn't) put more work on mathlib users
Mario Carneiro (Feb 02 2024 at 12:24):
Mathlib is in this sense aiming to be as useful as possible to a wide variety of use cases
Kevin Buzzard (Feb 02 2024 at 12:24):
There is a very strong argument both mathematically and practically that if you do it wrong the first time then you pay more later.
Kevin Buzzard (Feb 02 2024 at 12:26):
The literature is full of claims such as "this reference proves X but clearly the ideas in it prove the slightly more general Y which is what we need here" and this is terrible. We're not making that mistake here.
Martin Dvořák (Feb 02 2024 at 12:27):
Mario Carneiro said:
Clearly asking for more generality puts more work on mathlib contributors, but it doesn't (shouldn't) put more work on mathlib users
It is true it puts more work on mathlib contributors in the first place, but I have just explained how it puts more work on mathlib users as well.
Mario Carneiro (Feb 02 2024 at 12:27):
I'm not completely against redoing work with the benefit of hindsight, but when you can already foresee what iteration 2 is going to look like then it's time saved to not have to do iteration 1 first
Martin Dvořák (Feb 02 2024 at 12:30):
Mario Carneiro said:
when you can already foresee what iteration 2 is going to look like then it's time saved to not have to do iteration 1 first
... if there is a person who is competent to do the iteration 2 and wants to put the effort into it. Otherwise, having to choose between iteration 1 and nothing, I will opt for "done is better than perfect".
Martin Dvořák (Feb 02 2024 at 12:34):
It seems to me that both @Kevin Buzzard and @Riccardo Brasca want to do (1) but say it in different words.
Mario Carneiro (Feb 02 2024 at 12:34):
I think that merging things which aren't fully baked yet will make things harder for everyone down the line
Mario Carneiro (Feb 02 2024 at 12:35):
that is, we need an actual plan for tracking and fixing issues deliberately introduced in the library in that way
Martin Dvořák (Feb 02 2024 at 12:35):
Not for everyone, but for many people, yes, and that's why I respect the status quo.
Anatole Dedecker (Feb 02 2024 at 12:38):
This is getting even more important since we start to have projects depending on mathlib asking us to have a proper deprecation policy. If some temporary version ends up staying in mathlib for months because of this it would be kind of annoying.
Martin Dvořák (Feb 02 2024 at 12:39):
Yeah, the aim to avoid breaking changes is a solid argument for "better nothing than imperfect".
Sébastien Gouëzel (Feb 02 2024 at 12:56):
Talking about deprecation policy, I have questions about #10185. It renames op_norm
to opNorm
in lemma names, and op_nnnorm
to opNnnorm
(not sure about the capitalization, by the way). Done with a brutal search and replace (and fixing the #align
lines with another search and replace), and touching 568 lines. If I have to add manually a deprecation statement to all lemmas whose name has changed, I will probably drop the PR (it's not that important, after all). I'd be happy to have your opinions on this:
(1) Should we drop the PR because it doesn't follow the deprecation policy, and accept the current suboptimal state? Then deprecation policy is actively harming mathlib.
(2) Should we get the PR in, and accept the breakage for downstream users? Not very nice, since these lemmas are pretty common.
(3) Should we use a better tool for the job, that would automatically add the deprecation command for each renamed lemma? (my regex-foo is not good enough for this, unfortunately) -- or could leaff
do it for me?
Thoughts, anyone?
Anatole Dedecker (Feb 02 2024 at 13:06):
I'm obviously against option (1). Option (3) is of course what we should tend to, but I don't know what is realistic here. I think the solution is to go for option (2) and make an announcement somewhere. Maybe a reasonable policy would be "either do the deprecation stuff and then the tooling basically handles the transition for downstream users, or write down a detailed explanation of the changes (including the regex?) to some kind of changelog" ?
Oliver Nash (Feb 02 2024 at 13:07):
@Sébastien Gouëzel speaking just for myself I would say:
- No
- Yes
- Yes, but we do not block ourselves by waiting for such tooling.
Ruben Van de Velde (Feb 02 2024 at 13:07):
Also, would expect opNNNorm
Sébastien Gouëzel (Feb 02 2024 at 13:10):
Yes, I hesitated between opNnnorm
and opNNNorm
, and I can definitely change to the latter.
Damiano Testa (Feb 02 2024 at 13:37):
What would the deprecation policy ask in this case? To add @[deprecated]
to each theorem name that was changed? If that is the case, I can try to look into how to do it (somewhat) systematically.
Sébastien Gouëzel (Feb 02 2024 at 13:45):
Let me give an example. I rename the theoremtheorem linfty_op_nnnorm_col
to linfty_opNnnorm_col
. The proper fix would be
theorem linfty_opNnnorm_col ...
@[deprecated] alias linfty_op_nnnorm_col := linfty_opNnnorm_col -- deprecated on 2024/02/02
Sébastien Gouëzel (Feb 02 2024 at 13:48):
Or maybe even better
theorem linfty_opNnnorm_col ...
@[deprecated linfty_opNnnorm_col] alias linfty_op_nnnorm_col := linfty_opNnnorm_col -- deprecated on 2024/02/02
Mario Carneiro (Feb 02 2024 at 13:51):
@Sébastien Gouëzel Speaking as someone working on a migration tool, my answers align with Oliver Nash 's. Eventually we'll have something and can retrofit some migrations on old work by running leaff or something like it, until then don't worry too much about it
Mario Carneiro (Feb 02 2024 at 13:52):
in any case, having to manually add deprecations for all these is not the plan
Damiano Testa (Feb 02 2024 at 13:54):
Sébastien, I'll see if I can cook up something quick that would work in simple cases, while waiting for a more robust fix!
Sébastien Gouëzel (Feb 02 2024 at 13:59):
OK! There's no hurry anyway, it's been like that since the port.
Yury G. Kudryashov (Feb 04 2024 at 05:19):
I'm looking at the leaff
output.
Yury G. Kudryashov (Feb 04 2024 at 05:25):
leaff
reports either old or new lemma as "added" but it looks like @Damiano Testa added all the needed @[deprecated]
aliases (I used git diff
+ grep
+ wc
to count lines), so I merged it.
Yury G. Kudryashov (Feb 04 2024 at 07:17):
As for the original topic, for me one of the missions of Mathlib is to have a large library of correctly stated results, with a strong preference towards formulating each result in the strongest form available.
Yury G. Kudryashov (Feb 04 2024 at 07:27):
At least in dynamics, too often people formulate theorems in the "essential" case, e.g.,
-
Discuss each bifurcation scenario only in the least-parametric family that makes sense for this scenario.
If you want to apply the result to a larger bifurcation (with more parameters) that has this one as a part of the picture,
then you either pretend that it's OK to reuse the theorem or have to prove classical results in slightly higher generality.
Clearly, most people take the first route. -
Prove theorems with unspecified smoothness assumptions. If you want to deal with a finitely-smooth family, then you have to carefully read the proofs of all the results you reference to figure out what smoothness is enough.
Antoine Chambert-Loir (Feb 04 2024 at 17:50):
I believe it would be good if, say once a year, mathlib maintainers (not necessarily in the strict sense) spent some time on the library and indicated directions in which it should go further.
That would mean collecting “TODOs”, sketching some directions for things to formalize, possibly stating theorems to prove (so that “formalizing Doe's theorem” would mean eliminating sorry
from some file with theorem doe … := sorry
), etc.
Maybe that would be too much of a burden, I don't know, but it would help, and maybe it would secure the library as soon as it grows in an amazing number of directions.
Antoine Chambert-Loir (Feb 04 2024 at 17:55):
I can't answer @Sébastien Gouëzel 's question, but something similar happened recently, where FunLike
was replaced (so it seems) by DFunLike
. I did the replacement in my files after merging master, but I wasn't really sure of what I was doing, but didn't find any information about the change (maybe I didn't search hard enough). Is it possible that FunLike
would have still existed, but be tagged as deprecated.
Martin Dvořák (Feb 04 2024 at 17:56):
Antoine Chambert-Loir said:
(so that “formalizing Doe's theorem” would mean eliminating
sorry
from some file withtheorem doe … := sorry
)
We have proof_wanted
for things we can already state but haven't proved yet.
Antoine Chambert-Loir (Feb 04 2024 at 17:58):
Yury G. Kudryashov said:
As for the original topic, for me one of the missions of Mathlib is to have a large library of correctly stated results, with a strong preference towards formulating each result in the strongest form available.
There are people who worked with that goal in mind, and in some cases, “the strongest form available” is too much a burden for what it's worth. Again, having “strong enough” intermediate steps are sometimes useful, possibly formalizing them compatibly with planned future refinements, but not necessarily formalizing these utmost general results on the onset.
Kevin Buzzard (Feb 04 2024 at 18:51):
Antoine Chambert-Loir said:
I believe it would be good if, say once a year, mathlib maintainers (not necessarily in the strict sense) spent some time on the library and indicated directions in which it should go further.
For what it's worth, that's one of the visions I have for the AIM workshop -- establishing a roadmap for future directions in algebraic geometry.
Last updated: May 02 2025 at 03:31 UTC