Zulip Chat Archive

Stream: general

Topic: Size of lean/mathlib

Fabian Huch (Jun 05 2025 at 12:56):

I am currently creating a comparison of large libraries of ITPs. I am only considering libraries with more than 1M (nonempty) LOC, of which I know (in alphabetical order): ACL2, HOL4, Isabelle, Lean, Metamath, Mizar, PVS, Rocq.
To that end, I want to approximate the number of definitions (incl. axioms / excl. abbreviations, instances, fixed values/constants, examples) and proven proper theorems (excl. automatically generated ones) by simple analysis of the sources (without running the system).

For Lean/mathlib (v4.19.0), my approach would be to look at tokens in .lean files and count:

For Definitions:

"structure"
"class"
"def"
"inductive"
"axiom"
"opaque"

For Theorems:

"theorem"
"lemma"

Does this sound reasonable? I am missing something? Any hints are appreciated.

Michael Rothgang (Jun 05 2025 at 12:58):

For mathlib specifically, are you aware of https://leanprover-community.github.io/mathlib_stats.html? This has answers for your two questions, automatically updated (daily? not sure). I am not 100% sure how these are counted, though.

Fabian Huch (Jun 05 2025 at 12:58):

That is my point -- I want to measure them with a common methodology.

Michael Rothgang (Jun 05 2025 at 12:58):

@Bryan Gin-ge Chen I'm sure you know how often the webpage updates --- do you also happen to know how these two numbers are computed?

Fabian Huch (Jun 05 2025 at 12:59):

But yes, the comparison will include the reported metrics as well, if any.

Michael Rothgang (Jun 05 2025 at 12:59):

Fabian Huch said:

That is my point -- I want to measure them with a common methodology.

If the stats page reported numbers matching your methodology (that's a big if), would you happy with just using those? Because I'm not convinced a simple lexical analysis will be able to answer your questions right.

Michael Rothgang (Jun 05 2025 at 13:00):

One example: what about automatically generated declarations? (simps and to_additive come to mind) Those are not easily visible from the source.

Fabian Huch (Jun 05 2025 at 13:00):

It does not because I'm getting different numbers ;)

Fabian Huch (Jun 05 2025 at 13:00):

No, I only want to count user-stated things

Michael Rothgang (Jun 05 2025 at 13:01):

One may argue that putting to_additive on a declaration is a way of user-stating results...

Fabian Huch (Jun 05 2025 at 13:02):

Sure. But it's not what would be ordinarily considered a definition in Mathematics, which is what I'm shooting for.

Michael Rothgang (Jun 05 2025 at 13:03):

Different question: if mathlib had a tactic to automatically deduce an order dual statement (i.e., prove a lemma about minima, and auto-generate the corresponding result about maxima), I presumably want to count this as a new result. (Such a tactic exists, but only in a pull request. It's not merged into mathlib yet.)

Michael Rothgang (Jun 05 2025 at 13:04):

My point being: counting the number of mathlib definitions/lemmas may be less meaningful than what you think. (I don't know what you think, of course.)

Fabian Huch (Jun 05 2025 at 13:05):

Well, I don't for this comparison. I want to have a methodology that can be applied to all ITPs reasonably well. Systems with derived specifications routinely prove hundreds of theorems for a single definition. I certainly don't want to count that.

Michael Rothgang (Jun 05 2025 at 13:05):

Mathlib has many "API" lemmas (random example: docs#fourier_coe_apply) --- which carry more than "no" content, but you wouldn't put those as independent theorems in a textbook.

Fabian Huch (Jun 05 2025 at 13:07):

Yeah, there are many theorems that you wouldn't state in a textbook. I don't think there is a way around that.

Michael Rothgang (Jun 05 2025 at 13:07):

I guess my question is what "reasonably well" means. How many derived/auto-generated lemmas a system has does not strike me as meaningful... is there a way to compare your systems by looking at selected "major theorems" and comparing how many of these are formalised in each system?

Michael Rothgang (Jun 05 2025 at 13:07):

A bit like Freek's 100 theorems list (and the proposed successor, the 1000+ theorems list).

Michael Rothgang (Jun 05 2025 at 13:08):

Perhaps that's too much work/not suitable here - but it might be a more meaningful metric.

Fabian Huch (Jun 05 2025 at 13:08):

Yes, it is one way at looking at the comparison.

Fabian Huch (Jun 05 2025 at 13:08):

The 100 list that is, sadly the 1000+ ~~never really took off (so far)~~ is not complete is what I meant to convey

Michael Rothgang (Jun 05 2025 at 13:10):

What are the odds the e.g. Isabelle/HOL community would rise to the challenge when you ask them if they have formalised any theorems in that list, and look at their archives? I agree that the list is rather incomplete; is there a way to gather more interest?

Fabian Huch (Jun 05 2025 at 13:13):

Perhaps yes, especially if there was tool support to do the annotation in the source text and derive the table from that. I would love to include this number in the comparison as well, but in any case it's orthogonal to the measurement I want to do here.

Michael Rothgang (Jun 05 2025 at 13:16):

Final question from me: are you aware of Mathlib 4.20? (If you need to make a cut-off and it's at mathlib 4.19, that also makes sense.)

Fabian Huch (Jun 05 2025 at 13:16):

Yes, and yes, but thank you for the hint(s).

Bryan Gin-ge Chen (Jun 05 2025 at 13:24):

Michael Rothgang said:

Bryan Gin-ge Chen I'm sure you know how often the webpage updates --- do you also happen to know how these two numbers are computed?

Yes, it's updated daily. They are computed from a list of declarations generated by doc-gen4. Here's a link to the data (warning, about ~43MB). For weird GitHub reasons, the extension is .bmp but the file is actually in JSON format.

Michael Rothgang (Jun 05 2025 at 13:27):

Ah, so this also counts auto-generated declarations?

Bryan Gin-ge Chen (Jun 05 2025 at 13:37):

I believe so, it doesn't look like we're filtering them out at any stage.

Floris van Doorn (Jun 05 2025 at 15:11):

I'm quite sure that on the website we count declarations where IsBlackListed (link) is false. So this will include automatically generated lemmas by simps or to_additive, but not internal lemmas such as equational lemmas for definitions or automatically abstracted subproofs of a definition.

Floris van Doorn (Jun 05 2025 at 15:14):

Note that you're not including instance in the words you're searching for. This is a bit tricky, since instance can be used to both construct data or to prove some proposition, and purely from the source code there is not an easy way to find out which one is which. I expect that most instances are "proofs".

I'm also not sure how we count instances for the website. Previously, all instances were definitions (i.e. stored as docs#Lean.DefinitionVal), but I'm not sure if that is still the case.

Floris van Doorn (Jun 05 2025 at 15:16):

irreducible_def is another way to write a def.
alias is a way to copy a declaration with to a different name. I think it's fair to exclude those (especially since many (most?) aliases are deprecations).

Fabian Huch (Jun 05 2025 at 15:17):

Thanks for the input; I specifically do not want to count abbreviations or instance proofs, but irreducable_def is certainly missing.

Floris van Doorn (Jun 05 2025 at 15:18):

Note: take care when counting! You do want to catch declarations written as e.g. @[irreducible] def, but not count theorem subset_def as a definition.

Floris van Doorn (Jun 05 2025 at 15:19):

Wait, why do you not want to count instance proofs? (In your first post you do explicitly mention that you do want to count them.)

Fabian Huch (Jun 05 2025 at 15:19):

I don't quite understand that: In the first example I see tokens irreducable and def, and in the second theorem and subset_def?

Fabian Huch (Jun 05 2025 at 15:20):

Floris van Doorn said:

Wait, why do you not want to count instance proofs? (In your first post you do explicitly mention that you do want to count them.)

Sorry if I wasn't clear, it is including axioms but excluding all of abbreviations, instances, ...

Floris van Doorn (Jun 05 2025 at 15:21):

Oh, I read your first post too quickly, you do mention it clearly.

Floris van Doorn (Jun 05 2025 at 15:26):

The decision to exclude instances is somewhat surprising to me, since some major mathematical theorems are instances in Lean, e.g. docs#instProperSpaceReal is (roughly) the Heine-Borel theorem. And this will skew the count with e.g. Mathcomp, which uses canonical structures instead of instances.

Floris van Doorn (Jun 05 2025 at 15:27):

Fabian Huch said:

I don't quite understand that: In the first example I see tokens irreducable and def, and in the second theorem and subset_def?

You will see e.g. src#HurwitzZeta.evenKernel or src#Set.subset_def in the source, and need to know that the second occurrence of def is not defining a new definition.

Fabian Huch (Jun 05 2025 at 15:29):

Floris van Doorn said:

The decision to exclude instances is somewhat surprising to me, since some major mathematical theorems are instances in Lean, e.g. docs#instProperSpaceReal is (roughly) the Heine-Borel theorem. And this will skew the count with e.g. Mathcomp, which uses canonical structures instead of instances.

I would expect that most are not, though? This is done mostly to be consistent with the analysis of other ITPs, but I'll consider adding instances.

Fabian Huch (Jun 05 2025 at 15:31):

Floris van Doorn said:

Fabian Huch said:

I don't quite understand that: In the first example I see tokens irreducable and def, and in the second theorem and subset_def?

You will see e.g. src#HurwitzZeta.evenKernel or src#Set.subset_def in the source, and need to know that the second occurrence of def is not defining a new definition.

I see what you mean. That shouldn't be a problem as I tokenize the text and see the entire ..._def token.

Floris van Doorn (Jun 05 2025 at 15:32):

Fabian Huch said:

I would expect that most are not, though? This is done mostly to be consistent with the analysis of other ITPs, but I'll consider adding instances.

Most of them are still "real" mathematical results, though most of them are trivial (which is also true for lemmas, btw).
Examples: the product of two groups is a group, a metric space defines a topological space, the (lexicographical) sum of well-orders is a well-order.

Jz Pan (Jun 05 2025 at 15:36):

Michael Rothgang said:

Mathlib has many "API" lemmas (random example: docs#fourier_coe_apply) --- which carry more than "no" content, but you wouldn't put those as independent theorems in a textbook.

Maybe we should implement a tagging system for results in mathlib: whether it's API, trivial, or has mathematical importance...

I don't know if LLM can do that well.

Fabian Huch (Jun 05 2025 at 15:36):

Floris van Doorn said:

Fabian Huch said:

I would expect that most are not, though? This is done mostly to be consistent with the analysis of other ITPs, but I'll consider adding instances.

Most of them are still "real" mathematical results, though most of them are trivial (which is also true for lemmas, btw).
Examples: the product of two groups is a group, a metric space defines a topological space, the (lexicographical) sum of well-orders is a well-order.

Sure. In that sense, many examples are "real" results as well.

Fabian Huch (Jun 05 2025 at 15:38):

Jz Pan said:

Michael Rothgang said:

Mathlib has many "API" lemmas (random example: docs#fourier_coe_apply) --- which carry more than "no" content, but you wouldn't put those as independent theorems in a textbook.

Maybe we should implement a tagging system for results in mathlib: whether it's API, trivial, or has mathematical importance...

I don't know if LLM can do that well.

That is certainly a good approach and done in other systems, but I would leave it to the author and not some LLM.

Floris van Doorn (Jun 05 2025 at 15:42):

Oh, another thing: you might want to remove all metaprogramming definitions, e.g. you probably don't want to count docs#ToAdditive.Config as a definition, especially not a mathematical one.
A decent (but definitely not perfect) approach is to exclude everything from certain subfolders (Tactic, Lean, Util, Testing (and maybe Control) from Mathlib)

Fabian Huch (Jun 05 2025 at 15:43):

I initially did that, but unfortunately it's not feasible to do for all ITP systems so I don't do it to be consistent.

Eric Wieser (Jun 05 2025 at 15:59):

fixed values/constants

What do you characterize this by?

Eric Wieser (Jun 05 2025 at 16:00):

Is docs#Real.pi a constant? Is docs#Polynomial.X ? docs#DualNumber.eps ?

Fabian Huch (Jun 05 2025 at 16:02):

Those are proper definitions - with 'fixed' I mean arbitrary constants without defining equations.

Fabian Huch (Jun 05 2025 at 16:05):

So in particular, everything declared via constant

Aaron Liu (Jun 05 2025 at 16:05):

Can you give an example of a "fixed value/constant"?

Fabian Huch (Jun 05 2025 at 16:05):

Yes, constant a : ℕ

Aaron Liu (Jun 05 2025 at 16:06):

Is that valid Lean code?

Fabian Huch (Jun 05 2025 at 16:07):

It was in Lean 3; not sure about Lean 4

Eric Wieser (Jun 05 2025 at 16:09):

It was a synonym for axiom

Kevin Buzzard (Jun 05 2025 at 22:55):

Are you also going to compute the derivatives of these counts? As well as "how big are these libraries" you could also rerun the experiment a month later and get an indication of how fast they are growing.

Fabian Huch (Jun 06 2025 at 07:32):

For this overview I am not planning to do that, no. This would be a different kind of experiment where smaller libraries should be considered as well as some of them are fast-growing, e.g. cubical Agda.

Last updated: Dec 20 2025 at 21:32 UTC