Zulip Chat Archive

Stream: general

Topic: Accurate description of `mathlib`'s explosive growth


Jason Rute (Oct 03 2021 at 00:31):

For a paper, we want to express that mathlib is growing quite rapidly, both as an open source project and/or as a theorem proving project. In a previous paper we had wrote

Lean’s mathlib is one of the most active open-source software projects in the world, achieving explosive growth in recent years.

That was rightly criticized as [citation needed], and even if we replace "open-source software" with "theorem proving project", I'm worried it would come across as an unsourced exaggeration to say other ITP communities. Any suggestions on an accurate replacement wording?

Jason Rute (Oct 03 2021 at 00:35):

If say mathlib is truly the largest ITP library by some standard (bytes, numbers of theorems, number of contributors) we could put that, but I don't want to just make this up.

Scott Morrison (Oct 03 2021 at 00:42):

I think it's hard here. I would argue that it is the largest mutually-compatible ITP library, but that's a hard thing to quantify.
Certainly Coq, Isabelle, Mizar, ...? have more code out there still than Lean. But I've had the following conversation several times: "Does Coq have X? Yes! Does Coq have Y? Yes! Can I use them at the same time? Err... It's complicated, but no."

Jason Rute (Oct 03 2021 at 00:49):

Is Isabelle's AFP mutually compatible? Or Mizar's for that matter? (HOL-Light's core library is but that library is much smaller if I remember correctly, but it gets much larger if you add the flyspeck project.)

Scott Morrison (Oct 03 2021 at 00:51):

I would love to know the answers for AFP and Mizar!

Jason Rute (Oct 03 2021 at 01:02):

And Metamath also as I think about it?

Jason Rute (Oct 03 2021 at 01:08):

Statistics:

Jason Rute (Oct 03 2021 at 01:12):

I think AFP is much larger than Lean and still growing slightly faster at least in lines of code per year.

Yakov Pechersky (Oct 03 2021 at 01:13):

If you're describing growth, perhaps the rate of addition of theorems to Freek's list over a recent period of time could be one measure.

Jason Rute (Oct 03 2021 at 01:13):

That is complicated to explain succinctly.

Jason Rute (Oct 03 2021 at 01:58):

I think I'm going to go with hard numbers and no comparisons to outside projects:

Lean's mathlib is an exponentially growing open source library of formal mathematics which has nearly doubled in size each year for the past four years.

Scott Morrison (Oct 03 2021 at 02:27):

Let's not use Freek's list. It has terrible correlation with "interesting". :-)

Jason Rute (Oct 03 2021 at 14:33):

Scott Morrison said:

I would love to know the answers for AFP and Mizar!

@Mario Carneiro Do you know if set.mm is self consistent like mathlib is?
@Jasmin Blanchette Do you know about AFP? Same question.
Does anyone know about Mizar's MML?

Patrick Massot (Oct 03 2021 at 14:37):

@Sebastien Gouezel probably knows about AFP.

Patrick Massot (Oct 03 2021 at 14:38):

It would be nice to get the analogue of our undergrad math list for Isabelle, but it would be a lot of work.

Sebastien Gouezel (Oct 03 2021 at 14:56):

In general, there is no reason why AFP entries should be compatible, but there is no reason why they should be incompatible either, if they don't step on each other toes. On the other hand, develoments in one single area tend to build on one another, and so they are most often compatible. But note that AFP is mostly made of computer science developments, for which the problematics are not the same (you're more often building things one next to each other, instead of one on top of the other as in mathematics).

Patrick Massot (Oct 03 2021 at 14:57):

Could you mix topology and algebra in Isabelle like we do in mathlib?

Sebastien Gouezel (Oct 03 2021 at 14:59):

The issue is not even in the AFP here, it's in the main Isabelle repo. And the answer is mostly no, because topology is built using type classes, while algebra is built using locales and carriers (otherwise, you couldn't work nicely with subgroups). As far as I understand, people are not really happy with the main algebra library, and think it should probably be rewritten, but it's such a huge refactor that it has never been done.

Jason Rute (Oct 03 2021 at 15:02):

Are they logically compatible (i.e. consistent), but not usefully compatible in practice?

Wenda Li (Oct 03 2021 at 15:44):

They are logically compatible, but have unpleasant duplications due to the ultilisation of both locale and typeclass. For example, our most used polynomial development is based on typeclasses but in Isabelle algebra there is another polynomial development based on locales for maximum flexibility -- I know there are logically equivalent but mixing both version (of polynomials) in a single development may require non-trivial effort to show their equivalence.

I am not very familiar with the topology in Lean, but in Isabelle there are, again, two versions of topology and one is based on typeclass...

Results in algebra in Isabelle are generally easy to be imported to typeclass-based analysis (not sure about what kind of topology you mean here, but I can say for analysis). An example is the devepment of Vector Spaces (based on HOL-Algebra, https://www.isa-afp.org/entries/VectorSpace.html) is being used in the development of the Expressiveness of Deep Learning (https://www.isa-afp.org/entries/Deep_Learning.html), which is heavily based on typeclass analysis.

Kevin Buzzard (Oct 03 2021 at 17:23):

@Wenda Li I guess Patrick is talking about things like the theory of topological rings, where you need to be able to do algebra and topology at the same time on one object. These appear everywhere in the perfectoid space work for example

Mario Carneiro (Oct 03 2021 at 20:33):

Jason Rute said:

Scott Morrison said:

I would love to know the answers for AFP and Mizar!

Mario Carneiro Do you know if set.mm is self consistent like mathlib is?

Yes. In fact I took a lot of lessons from the design of set.mm at the beginning of mathlib, and one of the more important ones was that centralized maintenance is crucial for large scale proof development, even though this goes against CS best practice.

set.mm is almost pathologically centralized. There is one file (the eponymous set.mm), which is a million lines long plaintext file. You need a good editor to handle it, but it has lots of sectioning in it, styled like a really long and detailed treatise on everything. It also has tons of comments, and one of the contribution requirements is that every theorem gets a doc string. If I had to guess I would say about 30% of the lines are comments.

There is a section at the end (about 20% of the file) which consists of user "mathboxes", which are basically a place where contributors have additional editorial leeway to design their own developments. However, mathboxes are not supposed to use each others' theorems, and if user A wants to use B's theorem then they should coordinate to get it moved into the main part of set.mm, which plays a role roughly similar to mathlib PRs. But at all times the whole file checks, including the mathboxes.

Jasmin Blanchette (Oct 04 2021 at 08:33):

@Jason Rute You already posted the AFP stats. I don't have any further information about it apart from our outdated paper "Mining the Archive of Formal Proofs".

Johan Commelin (Oct 05 2021 at 14:16):

@Jason Rute Btw, I think that https://leanprover-community.github.io/mathlib_stats.html no longer supports the claim that mathlib grows exponentially.

Johan Commelin (Oct 05 2021 at 14:17):

We saw massive growth during the 2nd covid wave. But what seemed exponential now seems to be quite linear.

Jason Rute (Feb 11 2022 at 01:49):

Scott Morrison said:

I would love to know the answers for AFP and Mizar!

Ok, I finally have stats on all of the major libraries (except Coq, which doesn't really have a central math library, and HOL light which I can't find numbers but I think it is much smaller than the rest): https://proofassistants.stackexchange.com/questions/141/where-can-i-find-lists-of-theorems-that-have-been-verified/315#315


Last updated: Dec 20 2023 at 11:08 UTC