Zulip Chat Archive
Stream: mathlib4
Topic: What is not yet in mathlib?
Bryan Wang (Aug 30 2025 at 03:48):
One of the challenges I discovered in talking to mathematicians about Lean/mathlib is that it is still difficult to tell what is and isn't in Lean/mathlib. This is not helped by the fact that, as we know, there is an extreme variation across subfields. I see that there is a page indicating what definitions/concepts are in mathlib (although I am not sure about the updatedness of this page, and what can be done to keep it up-to-date, but this is not my main point). Now there is also a version for undergraduates, but more importantly, the undergraduate version has a page indicating what is not yet in mathlib. Why don't we have a 'not yet' page for mathematics in general? I am aware that this is theoretically of infinite length, but I believe that in practice some limit will be reached, especially if we restrict ourselves to just key definitions/concepts which are relevant to modern mathematicians. This would be similar in spirit to the 1000+ theorems page that we have, but for definitions/concepts instead rather than theorems, i.e. a '1000+ definitions' page. My empirical observation is that the 1000+ theorems page is not very helpful to mathematicians in practice, not just because it is not organised by subfield, but also because most mathematicians don't really think about their work in terms of "what named theorem did I use", but more "what definitions/concepts do I need".
Michael Rothgang (Aug 30 2025 at 06:52):
My understanding is that the up-to-date-ness of the overview depends on the field also. I've been keeping the differential geometry section mostly up to date --- I would guess it's outdated in other areas.
Michael Rothgang (Aug 30 2025 at 06:54):
A "key concepts not (yet) in mathlib" page is a nice idea! I can imagine this being very useful. If we curate it well enough, the scope will also stay not too huge. If we add it, I would strongly advocate also having a comment field, where we can point to previous discussion or partial work --- and perhaps a "last updated" field which indicates the status of that information.
Michael Rothgang (Aug 30 2025 at 06:57):
(I'm a bit on the fence about a date field. I want to avoid very outdated comment information giving a false impression. At the same time, the date might increate the incentive for lots of busywork "just bumping the date" (when it reality, the work should be 'look if there's further meaningful work or discussion on this', i.e. not an easy contribution for outside people). With clear communication of this fact, perhaps this is fine?)
Michael Rothgang (Aug 30 2025 at 06:58):
Relatedly, Floris van Doorn and I were discussing if there should be a venue for summaries "here's the current status of X in Lean/mathlib" updates. (I've been writing these for differential geometry since people asked, but having them in a discoverable place would actually be nice.)
If you have such summaries, linking them from such a "concepts not in mathlib yet" page would also go a long way.
Ruben Van de Velde (Aug 30 2025 at 10:30):
Ideally we'd have some place where you could find "next steps in subject X", with references to earlier discussions/ maybe textbooks / people who could help / recent progress / relevant projects outside mathlib. But that's only useful if it's up to date, as you noted
Weiyi Wang (Aug 30 2025 at 13:08):
How feasible is it to create "1000+ definitions" from wikipedia like 1000+ theorems did?
Kevin Buzzard (Aug 30 2025 at 15:25):
Part of my starting-in-October project to formalise statements of recent theorems from the top journals will inevitably lead to a list like this but I'm not sure it will have length anywhere near 1000
Ruben Van de Velde (Aug 30 2025 at 16:53):
I thought you already had a project :innocent:
Michael Rothgang (Aug 30 2025 at 19:05):
Weiyi Wang said:
How feasible is it to create "1000+ definitions" from wikipedia like 1000+ theorems did?
Depending on what you mean, quite feasible. The infrastructure for labelling "this is in mathlib" is not difficult; I could help you with that. The hard part is collecting a list of key definitions (and perhaps asking for feedback on the overall idea).
Michael Rothgang (Aug 30 2025 at 19:13):
That said: the 1000+ theorems data was gathered from wikipedia's list of theorems. (@Freek Wiedijk has a script for that, which they could presumably share.)
If there is a similar list of definitions (that's a bit if; I'm not sure if that exists!!) you could use that as a starting point.
Julia Scheaffer (Aug 30 2025 at 19:17):
I tried to type up a little sparql query for that, but I dont know if there's a metaclass for "mathematical defintions". I did write a query up for any wikidata object instance of a subclass* of "mathematical term" which returned quite a few things like "Natural number", "operator", and "without loss of generality"
Michael Rothgang (Aug 30 2025 at 19:34):
This list https://en.wikipedia.org/wiki/Lists_of_mathematics_topics could also be a start to explore. It contains some terms which are much broader than a definition --- but between this and the previous, some useful list can probably be produced. In any case, my impression is that this requires significant manual curation.
Weiyi Wang (Aug 30 2025 at 19:41):
As a starting point we can do "100+ Euler" :wink:
Michael Rothgang (Aug 30 2025 at 19:44):
Kevin Buzzard said:
Part of my starting-in-October project to formalise statements of recent theorems from the top journals will inevitably lead to a list like this but I'm not sure it will have length anywhere near 1000
Such a list would certainly be welcome. I know 100 theorems is biased (e.g. towards elementary geometry, and many proofs are relatively small projects); 1000+ theorems is surely biased in other way. I'd be happy about lists biased towards contemporary research :-)
Kevin Buzzard (Aug 31 2025 at 12:22):
Right, that's exactly the point. If we just scrape the internet for 1000 mathematical definitions including recreational ones which were made in some random paper and nobody cares about then it will cause more harm than good!
Bryan Wang (Aug 31 2025 at 12:29):
Just a random thought, but how about using the MSC?
Michael Rothgang (Aug 31 2025 at 12:30):
That's just a list of subjects (in the sense of "areas"), not of definitions/concepts, right?
Michael Rothgang (Aug 31 2025 at 12:31):
I'm not sure about "more harm than good": if somebody formalises a piece of recreational mathematics, a priori I consider this good. If everybody wants to submit to mathlib and the review queue becomes even more overwhelmed... well, we need to find more money to pay reviewers. That's already a problem we want to solve :-)
Michael Rothgang (Aug 31 2025 at 12:32):
Ah, or do you mean "the list will do more harm than good" (by steering people to formalise that list)? Gotcha, that makes a lot more sense. I can definitely relate.
Kevin Buzzard (Aug 31 2025 at 12:32):
Yes my fear is that we encourage people to formalise useless stuff and then PR it to mathlib
Bryan Wang (Aug 31 2025 at 12:32):
Michael Rothgang said:
That's just a list of subjects (in the sense of "areas"), not of definitions/concepts, right?
Yep, but a good proportion of them are definitions/concepts. It would need some filtering, but maybe a good starting point
Violeta Hernández (Sep 03 2025 at 21:23):
Kevin Buzzard said:
Yes my fear is that we encourage people to formalise useless stuff and then PR it to mathlib
I think we should make contribution guidelines for what's in scope. I already feel like I'm threading the needle every time I PR something about ordinal arithmetic.
Kevin Buzzard (Sep 03 2025 at 21:27):
The maintainers seem to have several different opinions about this question, which I agree is important. Perhaps the Initiative will take the initiative on this?
Michael Rothgang (Sep 04 2025 at 00:15):
You mean "drive a discussion", not "make a decision", right?
Jireh Loreaux (Sep 04 2025 at 01:44):
Yes, the maintainer team still has authority on the content and scope of Mathlib. This has not been relinquished to the Mathlib Initiative.
Bryan Wang (Sep 04 2025 at 03:17):
I see now that perhaps the original question may be better phrased as "what is not yet in Lean?", rather than mathlib specifically. For example the 1000+ theorems list is not specific to mathlib, but rather to Lean, indeed some entries have links which point to external repos, and it probably won't be the case that all 1000+ theorems will eventually make it to mathlib. The massive advantage that Lean offers is that one can feel free to use anything from anywhere, regardless of whether it is in mathlib or not, without having to believe in anything more than the Lean kernel. If something in Lean has sufficiently many people wanting to use it sufficiently often, then that should be enough to justify its inclusion in mathlib. But first, we need to make a way for the "sufficiently many people who want to use these things" (i.e. mathematicians) to figure out the status of all these things that they want and which are not already in mathlib. Hence the motivation for my original post!
Bartosz Piotrowski (Nov 10 2025 at 16:59):
I have a question: does anyone know how up-to-date the 1000+ list is regarding the tags telling if a given theorem was formalized in Lean? I and collaborators were thinking that this list could be a good guide for selecting theorems to a new benchmark for AI evals, and this is why I wanted to make sure about it.
Ruben Van de Velde (Nov 10 2025 at 17:01):
Probably it's incomplete
Michael Rothgang (Nov 10 2025 at 21:10):
Note that there is also https://leanprover-community.github.io/1000.html, which lists all theorems formalised in Lean/mathlib (as opposed to the big project page, which is for any proof assistant - with the tagging for most other proof assistants being really incomplete or not merged yet). The list at https://leanprover-community.github.io/1000.html is updated much more often (the other webpage update happens every once in a while, when I decide that a fair number of results have been added so an update could be nice).
Michael Rothgang (Nov 10 2025 at 21:11):
The last update was merged today, so right now the pages should coincide (but they will draft apart again).
Michael Rothgang (Nov 10 2025 at 21:11):
Regarding the tagging in mathlib: I made a reasonable effort to tag proven theorems, but I'm not an expert on all areas --- I'm sure some results are in mathlib, but missing from that list. Pull requests adding them are very welcome!
Michael Rothgang (Nov 10 2025 at 21:13):
(Also, let me reiterate that the 1000+ list is not representative of interesting research mathematics --- so might not be a good benchmark in the first place.)
Bartosz Piotrowski (Nov 10 2025 at 21:35):
Thank you for useful explanations! Yes, I'm aware that 1000+ list may not be representative of research math. Still, I feel that there is a gap between benchmarks like miniF2F, PutnamBench (Olympiad-style math), ProofNet (undergraduate math), and advanced stuff like formal-conjectures project provides. And I think that the 1000+ list contains many theorems that are established mathematical results, and going beyond undergrad material. Is it correct? And are you perhaps aware of some areas of mathematics that may be underrepresented on this list?
Michael Rothgang (Nov 10 2025 at 21:43):
First off, note that wikipedia is inconsistent: the list does not include every named theorem (about 900 entries are missing; fixing that and getting the list on the 1000+ webpage updated would be useful).
Michael Rothgang (Nov 10 2025 at 21:45):
Indeed, 1000+ goes beyond undergraduate material... but not systematically so. I would say it's systematically behind current research: cutting-edge results usually don't have a fancy name. Fundamental theorems in any one area might (but don't always do). And of course, named results includes lots of mathematics that was cutting edge at some point in the past, but is a mere lemma nowadays (such as, Liouville's theorem in complex analysis).
Michael Rothgang (Nov 10 2025 at 21:45):
I haven't looked if particular areas are underrepresented. I can imagine e.g. elementary geometry or number theory being overrepresented.
Joseph Myers (Nov 10 2025 at 22:52):
The Wikipedia list does have entries pointing to parts of pages rather than to a named theorem with its own page. For example, an entry for "Sphere packing theorems in dimensions 8 and 24" (which is one of those that postdates the last update of 1000+ from Wikipedia, and is of course really two theorems not one). If a cutting-edge result is significant enough in its field to be worth discussing in a section of a broader article, but doesn't have its own "X's theorem" page, that's one approach for getting it on the list (with a suitable descriptive phrase in place of a name). (Since the pages that would be linked to don't generally appear in a theorem category at all, unlike those for many named theorems with their own pages, it's hard to find such missing theorems in any automated way, however.)
Joseph Myers (Nov 10 2025 at 22:55):
To fill out such material in the list more systematically, maybe you'd need to find sources reviewing significant developments in mathematics as a whole, or in particular areas of mathematics, over the past few decades.
Ruben Van de Velde (Nov 10 2025 at 23:05):
It feels like that would put a lot of pressure on the "+" part of "1000+"
Wang Jingting (Nov 11 2025 at 02:59):
Bartosz Piotrowski said:
Thank you for useful explanations! Yes, I'm aware that 1000+ list may not be representative of research math. Still, I feel that there is a gap between benchmarks like miniF2F, PutnamBench (Olympiad-style math), ProofNet (undergraduate math), and advanced stuff like formal-conjectures project provides. And I think that the 1000+ list contains many theorems that are established mathematical results, and going beyond undergrad material. Is it correct? And are you perhaps aware of some areas of mathematics that may be underrepresented on this list?
For part of the algebra section (abstract algebra & commutative algebra), I guess #general > Discussion: FATE might help in this direction?
Last updated: Dec 20 2025 at 21:32 UTC