Zulip Chat Archive

Stream: general

Topic: O(1000) definitions for Annals


Johan Commelin (Feb 06 2026 at 14:52):

@Kevin Buzzard I've seen you mention this O(1000) stat several times, and I'm curious how you got to this estimate. Personally, without having done any research on this, I would have estimated at least 10x this number, if not more.
Maybe I missed the backstory that led to this stat. Would you mind sharing it?

(And certainly, Annals-def \ne Mathlib-def. According to https://leanprover-community.github.io/mathlib_stats.html we currently have 124201 defs in Mathlib. So that would mean that we are 99% of the way to victory. Whereas intuitively it feels like we are ~10% or even less. Which would suggest that there is a 10^3 multiplier between Annals-defs and Mathlib-defs.)

Yury G. Kudryashov (Feb 06 2026 at 15:33):

What counts towards 124201? Actual defs? instances? classes? Auxiliary defs (e.g., projections) generated by structures?

Johan Commelin (Feb 06 2026 at 15:39):

Not sure, I grabbed that number from our stats page (edited the link in above)

Kevin Buzzard (Feb 06 2026 at 16:24):

I got the stat by counting :-) I even think it's less than 1000.

We have been looking at about 250 recent Annals papers, writing down definitions which are missing from mathlib. I was initially hoping that some definitions would come up again and again, making them natural candidates for my post-docs to concentrate on, but there doesn't seem to be any one thing which comes up 20 times and which should be a clear target. Instead what happens is that there are many definitions which come up a few times (of course there can be several missing definitions needed to formalize the main theorem in one paper).

Also I definitely don't want to stick to just Annals of Math. One thing the project has taught me is that Annals is very skewed in the kind of mathematics which it likes to publish, and my impression is that other journals like Inventiones are publishing a rather different distribution of AMS classification numbers.

Here is a list which I think would get us hundreds of recent Annals papers.

Abelian variety (in alg geom), Abelian variety (in diff geom), Jacobian (in alg geom), Jacobian (in diff geom). Tate-Shaferevich group. Calabi-Yau variety (in alg geom -- presumably there is also a diff geom version). Chern classes. K-theory (algebraic and topological). Knot. Knot bounding a space. Moduli space of smooth curves with marked points (in alg geom and diff geom). Moduli space of stable curves. Sheaf cohomology. Etale/ell-adic cohomology in alg geom. Compactly-supported cohomology (in topology and alg geom). de Rham cohomology (in alg and diff geom). Reductive groups (over a field, over a scheme). Algebraic spaces. Algebraic stacks. Automorphic representations. Langlands parameters. Arthur parameters. Vector bundles on curves (in alg and diff geom). Finite simple groups of Lie type. Maass forms. L-function of an elliptic curve. L-function of an automorphic representation. Profinite completion of a group. Fano varieties. Riemannian geometry. Infinity categories. Linear systems. Log canonical stuff in alg geom. Kazhdan's property T. Dehn functions. Semisimple real Lie groups. Semisimple algebraic groups. Intersection numbers. Rapoport-Zink moduli spaces. Geodesics. The Ising model. Ample divisors. Gromov-Witten theory. E-functions. Lubin-Tate formal groups. Contact geometry. Symplectic geometry. Mapping class groups. Legendrian torus links. Lagrangian fillings. Legendrian links. Stein surfaces. Nef line bundles in alg geom. Excellent schemes. Distribution theory. Smooth p-adic formal schemes. Derived tensor products. Hodge numbers. Betti numbers. Enriques surfaces. Vertex operator algebras. Artin L-functions. Symmetric spaces. Injectivity radius. Rank of a Lie group. Rank n irreducible Q_ell-bar-local systems on a curve. Kaehler manifolds. Hypergraphs. Shimura varieties. Toroidal compactifications. The Boltzmann equation. Markov chains and related definitions. Topological surfaces. Mapping class group. Smooth measures. Hermitian manifolds. DG categories. Brauer algebras. Brauer p-blocks. The defect group. Etale fundamental group. Ricci curvature. Periods of an abelian variety. Mumford-Tate group of an abelian variety. Selmer groups. Unstable motivic homotopy theory. K-theory for stalbe infinity-categories. Grothendieck-Witt groups. The Sen operator.

You give me them and I can give you most of the Annals papers from this decade, and that is only about 100 definitions. Maybe we get another list, equally long, if we want to do Inventiones as well. So maybe that takes us to 200. Maybe for each of these definitions we need two more definitions (this definitely isn't true for many of them). That takes us to 600. How is 1000 sounding now?

Kevin Buzzard (Feb 06 2026 at 16:25):

@Thomas Browning @Katy Hristova @Justus Springer @David Ledvinka do you have anything to add here? Do you think this figure of "under 1000 definitions unlocks essentially all of modern mathematics" is a naive statement? It's you guys who have been doing a lot of the work to make the list which I just posted above.

Thomas Browning (Feb 06 2026 at 16:29):

Well, it's probably more than 2 more definitions per each entry, I'd say

Kevin Buzzard (Feb 06 2026 at 16:33):

But many definitions are trivial, and as we established with Jordan curves recently (which we had to make and did make), some trivial definitions may not be even wanted in mathlib #mathlib4 > Jordan curve / Smooth Jordan curve @ 💬 .

Robin Carlier (Feb 06 2026 at 16:41):

I would imagine that for instance the entry K-theory for stable infinity-categories is at least a hundred (and I’m probably being nice here) of intermediate (non-easy) definitions away from mathlib, and I think for many other definitions listed that I don’t know anything about, the same could equally apply.

Matthew Ballard (Feb 06 2026 at 16:43):

Do we still not have Profinite completion of a group?

Michael Rothgang (Feb 06 2026 at 16:46):

As a symplectic geometer, I think symplectic geometry could easily expand into a few dozen definitions. (Perhaps hundreds... I have not tried to make that list. It would certainly depend on your count.) Something similar may be true for Riemannian geometry.

Matthew Ballard (Feb 06 2026 at 16:47):

Yes that is a whole field

Robin Carlier (Feb 06 2026 at 16:51):

Also, the latest annals paper about Unstable motivic homotopy theory I can think of is also stock-full of infinity-categories, we are probably close to a definition of unstable motivic homotopy theory thanks to @Joël Riou's work on model categories in mathlib, but this would not be the one they use in said paper, add at least a few dozens (again, very very conservative here) of defs for the comparison/reinterpretation/translation :D.
The point is that each entry might be a huge rabbit hole.

Aaron Liu (Feb 06 2026 at 16:51):

Matthew Ballard said:

Do we still not have Profinite completion of a group?

I see an open PR #34893

Bryan Wang (Feb 06 2026 at 16:58):

I think many of these aren't just definitions but also 'theories', some random examples: knot theory, cohomology theory, infinity-category theory, etc etc etc, and it applies across the board. One maths definition may equal one Lean definition, but the surrounding theory is basically its API (and more), which even in current mathlib that can grow very large.

The most down-to-earth example I can think of: just having the definition of a Group is not enough, we need all the surrounding group theory API, which is so massive in mathlib that it has its own top-level folder..

David Ledvinka (Feb 06 2026 at 17:08):

Yes, a "theory/field" on this list means that the corresponding paper is blocked by (at least) the basic definitions and API of said theory or field. In some (but probably not most) of these cases you could state the main theorems without this in a hacky way, for example the knot theory paper could probably be done in a very silly way without even defining a knot. But then of course one would be no closer to the proof.

Kevin Buzzard (Feb 06 2026 at 17:08):

So perhaps I should be more guarded in this O(1000) claim: perhaps the claim is that we are "under 1000 key definitions" away from nirvana, but, given the way mathlib works, a "key definition" might involve several other definitions along the way (e.g. definition of reductive group will need definition of unipotent subgroup), but that some of these intermediate definitions might well be very easy. In fact some of the key definitions might also be easy.

Yan Yablonovskiy 🇺🇦 (Feb 06 2026 at 17:19):

Kevin Buzzard said:

So perhaps I should be more guarded in this O(1000) claim: perhaps the claim is that we are "under 1000 key definitions" away from nirvana, but, given the way mathlib works, a "key definition" might involve several other definitions along the way (e.g. definition of reductive group will need definition of unipotent subgroup), but that some of these intermediate definitions might well be very easy. In fact some of the key definitions might also be easy.

And it rolls off the tongue so well too!

Yan Yablonovskiy 🇺🇦 (Feb 06 2026 at 17:27):

Could you say O(M) files / folders away as a much rougher bound ? A lot less precise , but a lot easier to say.

Dagur Asgeirsson (Feb 06 2026 at 17:29):

Robin Carlier said:

I would imagine that for instance the entry K-theory for stable infinity-categories is at least a hundred (and I’m probably being nice here) of intermediate (non-easy) definitions away from mathlib, and I think for many other definitions listed that I don’t know anything about, the same could equally apply.

Even for algebraic K-theory of rings, there would be several intermediate definitions. Whether we choose a modern or old-fashioned approach, we would need a bunch of stuff we don't have: higher algebra (group completion for spectra), or classical homotopy theory

Joël Riou (Feb 06 2026 at 17:39):

Dagur Asgeirsson said:

Even for algebraic K-theory of rings, there would be several intermediate definitions. Whether we choose a modern or old-fashioned approach, we would need a bunch of stuff we don't have: higher algebra (group completion for spectra), or classical homotopy theory

I have a definition of exact categories, and the Q-construction for these https://github.com/joelriou/mathlib4/blob/jriou_localization/Mathlib/CategoryTheory/ExactCategory/Q.lean, and I have formalised the homotopy theory of simplicial sets... If you give me a few hours, I can certainly provide a definition of the higher algebraic K-groups of rings. (To have this in mathlib may take much more time...)

Oliver Nash (Feb 06 2026 at 17:40):

I really like this point of view that there are O(1000) words we must add to Mathlib’s vocabularly. Do I understand correctly that the claim is that if we add these definitions then we basically have vocab for most of contemporary Annals? Of course adding these will require adding many more on the way but that’s not the point: the idea is we actually have a somewhat data-driven quantification of the job to be done.

Kevin Buzzard (Feb 06 2026 at 17:54):

Yes that's what I am claiming, and I am absolutely convinced by the above discussion that for some of these key words we will have to add many other words. For example cuspidal automorphic representation (a phrase used in the statement of a recent Annals theorem) <- automorphic representation <- automorphic form <- reductive group variety <- unipotent subgroup variety <- unipotent element and several other things (it's a finite directed graph of missing definitions)

Kevin Buzzard (Feb 06 2026 at 17:56):

Oh and there's also the small mater of "C^infty real manifold associated to a smooth real algebraic variety" along the way (another definition which will take some thought).

Michael Rothgang (Feb 06 2026 at 18:14):

Kevin Buzzard said:

Oh and there's also the small mater of "C^infty real manifold associated to a smooth real algebraic variety" along the way (another definition which will take some thought).

Is that related to analytification of schemes? (My algebraic geometry is a bit rusty.)
@Christian Merten and I have a prototype branch for the latter. Completing everything will depend on two master's theses I'm supervising. The second thesis is about gluing (of smooth manifolds); the local case of analytification mostly just needs the regular value theorem (the first thesis) and a few properties of smooth embeddings (which are not too hard).

Kevin Buzzard (Feb 06 2026 at 18:55):

Yes this is exactly analyticification of schemes. In mathlib's language I suspect that the general result should be that for a smooth variety of relative dimension d defined over a field k equipped with whatever extra structure you need for it to make sense to say that you can have a manifold which locally like k^n, there's an associated d-dimensional k-manifold, and for an arbitrary variety defined over the complexes there's an associated complex analytic space. Note that you have to have a fixed relative dimension d because in algebraic geometry you can have the disjoint union of a curve and a surface, but this is not allowed in lean's version of differential geometry. There will be a lot of work in getting this definition going, I guess you'll need these standard results (inverse/implicit function theorem?) at least one of which is maybe still missing or am I out of date? This is a good indicator that sometimes for definitions you need to prove theorems.

Johan Commelin (Feb 06 2026 at 19:06):

But I guess you are fine with sorry-ing the proof of such theorems? And you only care about the data?

Junyan Xu (Feb 06 2026 at 19:07):

There's analytic spaces too which allow singularities. I guess non-archimedean analytification comes up too, and people are already working on the Tate curve ...

Kevin Buzzard (Feb 06 2026 at 20:13):

Whether I'm happy with sorrying proofs depends on what hat I have on. If I need something for FLT, sure! If I need something for the Annals project -- part of the point of the project is to get definitions into mathlib and then of course sorries aren't allowed

Kim Morrison (Feb 06 2026 at 23:06):

It really is embarrassing that we're this far along and have zero usable material on knots, which are after all a quite elementary subject! Despite being the "shortest" definition on the list above...

Kim Morrison (Feb 06 2026 at 23:10):

But there's a lot to do. There are so many different "knots are Xs modulo the equivalence relation R" that you might start with, that you're barely off the ground until you have equivalences between these.

But getting between "smoothly embedded S^1 modulo ambient isotopy" and "grid diagrams modulo grid moves" is a big project! And to do it properly -- e.g. so the theorem about Reidemeister moves falls out of a stratification of the space of embeddings -- wow!

Kevin Buzzard (Feb 06 2026 at 23:29):

They're just 3d smooth Jordan curves :-)

Joseph Myers (Feb 07 2026 at 00:21):

If it's 1000 definitions for Annals, it's probably at least 10000 for modern mathematics as a whole. As just one small example, there's nothing on that list about graph theory - combinatorialists tend to make less use of generalist journals - and we seem to be at a pretty early stage of finding out what good graph theory definitions should look like in mathlib. (And, yes, I'd like all 10000 definitions in mathlib.)

Joseph Myers (Feb 07 2026 at 00:23):

And then you get into how informal mathematics often has many equivalent versions of a concept where you don't have to decide which one is "the" definition - and adding such a concept properly to mathlib means not just picking one as "the" formal definition, but also adding API lemmas proving the other versions are equivalent (sometimes straightforward, sometimes not at all straightforward).

Snir Broshi (Feb 07 2026 at 00:30):

FWIW https://en.wikipedia.org/wiki/Glossary_of_graph_theory has 359 definitions

Joseph Myers (Feb 07 2026 at 00:38):

I don't know how many of those 359 we have in mathlib, but I'm guessing not many, considering that people were only recently discussing how best to express very basic concepts such as edge contraction without getting into DTT hell with index types.

Snir Broshi (Feb 07 2026 at 00:43):

Yeah we're still missing many defs from undergraduate graph theory, e.g. cuts / flows / edge coloring / maximal/maximum clique/indep-set/matching / edge covers / dominating set / ...
(though a few of these have open PRs)

Shreyas Srinivas (Feb 07 2026 at 01:02):

Snir Broshi said:

Yeah we're still missing many defs from undergraduate graph theory, e.g. cuts / flows / edge coloring / maximal/maximum clique/indep-set/matching / edge covers / dominating set / ...
(though a few of these have open PRs)

We still don’t have flows? I wrote one on two separate occasions.

Kevin Buzzard (Feb 07 2026 at 01:08):

It might be worth distinguishing between two types of definitions here. There are also many predicates on natural numbers or prime numbers which we don't have in mathlib, probably hundreds, however almost all of these are predicates which could be (a) defined in one line and (b) we might not want in mathlib anyway (because they can just be defined in one line and there are no particularly interesting theorems or conjectures about them). In contrast making some of the definitions above will unlock our ability to state major theorems or conjectures, and the construction of the definitions will be a major project. Somehow it feels not quite right to compare these two types of definitions.

Snir Broshi (Feb 07 2026 at 01:10):

Shreyas Srinivas said:

We still don’t have flows? I wrote one on two separate occasions.

Nope, the current attempt is #34028

Snir Broshi (Feb 07 2026 at 01:13):

I actually thought about putting these random number predicates in Kim's merely-true repo and waiting to see what the LLMs will prove about them

Kevin Buzzard (Feb 07 2026 at 01:17):

Yes I think a more interesting thing to do with those predicates than adding them all to mathlib is telling them all to an AI, linking to the OEIS sequence and then asking AI to find some interesting conjectures. Note however that people have been trying this for literally decades (I second marked a computer science MSc project on this topic in about 2005) and it's not clear (to me) that LLMs are any better a tool for this than the bespoke tools that people have created in the past.

Jeremy Tan (Feb 07 2026 at 01:20):

Kevin Buzzard said:

I second marked a computer science MSc project on this topic in about 2005

What is second marking?

Kevin Buzzard (Feb 07 2026 at 01:33):

Someone else supervised the project and I gave my independent opinion (and grade) on it so the project recieved two marks, which were then averaged. In my university this is the standard way to mark Master's projects.

Shreyas Srinivas (Feb 07 2026 at 02:26):

Kevin Buzzard said:

It might be worth distinguishing between two types of definitions here. There are also many predicates on natural numbers or prime numbers which we don't have in mathlib, probably hundreds, however almost all of these are predicates which could be (a) defined in one line and (b) we might not want in mathlib anyway (because they can just be defined in one line and there are no particularly interesting theorems or conjectures about them). In contrast making some of the definitions above will unlock our ability to state major theorems or conjectures, and the construction of the definitions will be a major project. Somehow it feels not quite right to compare these two types of definitions.

I made a mathlib ready definition. Well maybe not exactly mathlib ready. But something mathlib contributors could easily refine.

Shreyas Srinivas (Feb 07 2026 at 02:31):

This one is slightly mathlib non-ready in the sense that it was meant to experiment with a more convenient definition of graph walks (fewer dependent indices). https://github.com/Shreyas4991/LeanFlows/tree/main/Flows

The bellman ford stuff should be ignored.

Shreyas Srinivas (Feb 07 2026 at 05:59):

For the record I have a lot more API for these walks than shown here. In my current variants, there is a nonemptiness condition for the support which makes life more convenient.

Scott Carnahan (Feb 07 2026 at 07:04):

I see that vertex operator algebras is on the list, and as far as I can tell, they showed up in the Annals 3 times in the last 25 years.

  1. In 2004, Miyamoto wrote "here is a new construction of the monster vertex operator algebra using codes".
  2. In 2015, Arakawa wrote "principal nilpotent W-algebras are rational". That is, a class of vertex operator algebras constructed in a special way (BRST cohomology) satisfy the property that their modules are completely reducible.
  3. In 2023, Möller and Scheithauer wrote "here is an explicit formula for the dimension of a distinguished finite dimensional subspace that holds uniformly for this class of vertex operator algebras, and we also give a new proof that the class has 70 isomorphism types".

It is quite possible to write a definition of vertex operator algebra with what we have in mathlib now, but expressing the main results would require many more definitions.

Robin Carlier (Feb 07 2026 at 09:35):

I just noticed that the list has DG categories

Yury G. Kudryashov (Feb 07 2026 at 13:41):

I see de Rham cohomologies on the list. I'm working on adding them to Mathlib, and it's going to be quite a few new definitions (and generalizations of the existing ones) just to define the vector bundle of k-forms on a manifold.

Joseph Myers (Feb 07 2026 at 14:11):

When I suggest 10000 definitions for modern mathematics as a whole, I'm not talking about the recreational natural number predicates, I'm talking about serious definitions of current research relevance with useful API to add alongside the definitions and significant theorems or conjectures to state using those definitions. Some will be hard to add for various reasons, some straightforward, but I expect the same applies to the 1000 Annals definitions.

Joseph Myers (Feb 07 2026 at 14:14):

Here's an argument that the number might be a lot bigger than 10000. How many graduate-level textbooks would you need to cover most of modern mathematics as a whole? I'd expect rather more than 100. I'd also expect rather more than 100 unique definitions per textbook in such a set of textbooks covering modern mathematics.

Kevin Buzzard (Feb 07 2026 at 15:54):

Yeah the question is more confusing than it seems at first sight. The two confounding factors seem to be : definitions which need other definitions (in any sensible formalisation of a theory, obviously you could inline everything), and definitions-you-need-to-state-all-the-big-theorems vs definitions-you-need-to-also-state-all-the-small-lemmas.

David Holmes (Feb 11 2026 at 13:30):

Starting from the top: "Abelian variety (in alg geom)". I wrote the def of abelian scheme in lean, v easy. Then I tried to prove they are commutative, not so simple. Probably doable if one assumes (Cohomology and Base-Change) and (proper pushforward of coherent is coherent), though I still have some sorries where gremlins may lurk. But actually writing down the def seems to me in this case not the hard part, unlike proving anything non-trivial.

Michael Stoll (Feb 11 2026 at 13:52):

Hi David! Welcome!

David Holmes (Feb 13 2026 at 07:59):

Thanks Michael! Sorry for slow reaction, still not great at zulip...


Last updated: Feb 28 2026 at 14:05 UTC