Zulip Chat Archive

Stream: general

Topic: mathlib statistics


view this post on Zulip Rob Lewis (Jun 27 2020 at 12:16):

June is now a record setting month for mathlib! 70% more commits this month than our pre-corona high. Our 2020 total is already very close to our 2019 total. And there's a very good chance we'll have 300k lines of code in the repository before the end of the month.

view this post on Zulip Kevin Buzzard (Jun 28 2020 at 00:18):

My blog has also had has many hits this year as in the whole of last year

view this post on Zulip Kevin Buzzard (Jun 28 2020 at 00:23):

43000 views and 20000 visitors this year. My post on mathematics in type theory has already had as many views as my viraly fashionable mathematics post. These posts are both just basically adverts for the lean and mathlib community. I'm really hoping that as the API gets better we'll be able to really start attracting mathematicians

view this post on Zulip Rob Lewis (Jul 01 2020 at 08:34):

Yesterday was a big day! We closed out June by hitting a bunch of milestones. For one, we broke 300k loc. For another, the number of commits in June was more than double the pre-corona high. And hafway into 2020, we have more commits for the year than we did in all of 2019.

view this post on Zulip Patrick Massot (Jul 01 2020 at 08:41):

Let's hope we'll have another lockdown to migrate mathlib to Lean 4. We'll need that kind of productivity.

view this post on Zulip Johan Commelin (Jul 01 2020 at 09:00):

And we didn't even count the commits to the community fork.

view this post on Zulip Heather Macbeth (Oct 27 2020 at 18:05):

On the mathlib statistics page:
https://leanprover-community.github.io/mathlib_stats.html
there's a running total of the number of definitions and theorems and meta declarations in mathlib. Could there also be a running total of the number of people who have contributed to mathlib?

view this post on Zulip Yury G. Kudryashov (Oct 27 2020 at 18:06):

https://github.com/leanprover-community/mathlib/graphs/contributors

view this post on Zulip Eric Wieser (Oct 27 2020 at 18:07):

Pretty sure those never show more than 99 contributors

view this post on Zulip Yury G. Kudryashov (Oct 27 2020 at 18:07):

Sorry, this page doesn't show total

view this post on Zulip Heather Macbeth (Oct 27 2020 at 18:07):

Yes (there's also a list like this further down the website page
https://leanprover-community.github.io/mathlib_stats.html
). But I'm suggesting adding the absolute number to the top-line total. Just for marketing!

view this post on Zulip Yury G. Kudryashov (Oct 27 2020 at 18:07):

https://github.com/leanprover-community/mathlib/ says 119

view this post on Zulip Rob Lewis (Oct 27 2020 at 18:09):

It's slightly trickier than you'd expect. The numbers at the top come from doc-gen, at the time the website is generated. The stats below come from a JS file, which is generated once a day by gitstats, and you get the most recent numbers whenever you load the website.

view this post on Zulip Rob Lewis (Oct 27 2020 at 18:09):

Presumably gitstats can give us the contributor count.

view this post on Zulip Rob Lewis (Oct 27 2020 at 18:10):

But it won't be static at the time the website is generated, we'd have to update the count with JS or something.

view this post on Zulip Kevin Buzzard (Oct 27 2020 at 18:10):

There are certainly Imperial undergrads who contributed in 2018 and are mentioned in the first few lines of Lean files but who never made a PR (e.g. they were working in a small group together, and one PR'ed).

view this post on Zulip Rob Lewis (Oct 27 2020 at 18:11):

Kevin Buzzard said:

There are certainly Imperial undergrads who contributed in 2018 and are mentioned in the first few lines of Lean files but who never made a PR (e.g. they were working in a small group together, and one PR'ed).

You beat me to it, this is the second reason it's tricky. The "real" count is something like 15-20 more than what GitHub says.

view this post on Zulip Heather Macbeth (Oct 27 2020 at 18:11):

But this can be addressed, right? Just add 15 :)

view this post on Zulip Rob Lewis (Oct 27 2020 at 18:11):

And there are all kinds of ambiguities in how you count this.

view this post on Zulip Heather Macbeth (Oct 27 2020 at 18:12):

Frankly, I don't think it matters if the count is exactly right. What would be nice would be to have a count which is reasonable, approximately correct, and keeps going up.

view this post on Zulip Gabriel Ebner (Oct 27 2020 at 18:12):

I wouldn't use the git statistics, is sgouezel <urkud@urkud.name> a contributor? Counting the number of github usernames who made a PR seems more robust.

view this post on Zulip Yury G. Kudryashov (Oct 27 2020 at 18:13):

We have a mailmap file

view this post on Zulip Eric Wieser (Oct 27 2020 at 18:13):

You can use a .mailmap to merge together emails from the same github user

view this post on Zulip Rob Lewis (Oct 27 2020 at 18:13):

Gabriel Ebner said:

I wouldn't use the git statistics, is sgouezel <urkud@urkud.name> a contributor? Counting the number of github usernames who made a PR seems more robust.

I think this is what the GitHub stats do.

view this post on Zulip Bryan Gin-ge Chen (Oct 27 2020 at 18:13):

I've been maintaining a mailmap which takes care of sgouezel <urkud@urkud.name>: https://github.com/leanprover-community/mathlib-mailmap (maintainers only so that spam crawlers won't immediately get all our email addresses)

view this post on Zulip Rob Lewis (Oct 27 2020 at 18:13):

Oh, wait, maybe we can use the PyGithub API to get a contributor count as part of the website build?

view this post on Zulip Eric Wieser (Oct 27 2020 at 18:14):

There are certainly Imperial undergrads who contributed in 2018 and are mentioned in the first few lines of Lean files but who never made a PR

As a hack, you could add their names to the .mailmap, and stick co-authored-by in the commit message with their names

view this post on Zulip Heather Macbeth (Oct 27 2020 at 18:14):

Or just use the algorithm, "count the number of entries in the Authors table".

view this post on Zulip Heather Macbeth (Oct 27 2020 at 18:15):

Sine that is maintained by hand by @Bryan Gin-ge Chen to be approximately duplication-free.

view this post on Zulip Eric Wieser (Oct 27 2020 at 18:15):

(maintainers only so that spam crawlers won't immediately get all our email addresses)

arguably this is a lost cause, any crawler clever enough to crawl github for email addresses probably knows they can get the emails from downloading the repo. I can get the email for any given PR by appending .patch to the URl

view this post on Zulip Bryan Gin-ge Chen (Oct 27 2020 at 18:15):

Yeah, that's true.

view this post on Zulip Yury G. Kudryashov (Oct 27 2020 at 18:17):

Some crawlers can browse the Internet randomly and sometime come to a link to .mailmap

view this post on Zulip Rob Lewis (Oct 27 2020 at 18:26):

Is it better to put this number at the top of https://leanprover-community.github.io/mathlib_stats.html (and change the "Declaration Counts" header) or down by the authorship table/graphs?

view this post on Zulip Heather Macbeth (Oct 27 2020 at 18:27):

I suggest at the top (possibly even replacing "Meta Declarations")

view this post on Zulip Heather Macbeth (Oct 27 2020 at 18:28):

With no disrespect intended to meta declarations ... :)

view this post on Zulip Heather Macbeth (Oct 27 2020 at 18:28):

Or next to it, if there's space for 4 entries.

view this post on Zulip Eric Wieser (Oct 27 2020 at 18:29):

Or fold meta declarations under definitions as 12345 (+ 678 meta)

view this post on Zulip Heather Macbeth (Oct 27 2020 at 18:29):

New heading: "mathlib by the numbers" (I like this less on second thought), or "at a glance", or ...

view this post on Zulip Eric Wieser (Oct 27 2020 at 18:29):

Since lots of non-meta definitions are still useful in meta code anyway

view this post on Zulip Rob Lewis (Oct 27 2020 at 18:30):

https://github.com/leanprover-community/leanprover-community.github.io/pull/149

view this post on Zulip Heather Macbeth (Oct 27 2020 at 18:31):

Awesome!

view this post on Zulip Eric Wieser (Oct 27 2020 at 18:31):

Some other interesting subsets of definitions worth counting might be:

  • Types
  • type classes
  • noncomputable vs computable
  • instances

view this post on Zulip Rob Lewis (Oct 27 2020 at 18:31):

image.png

view this post on Zulip Heather Macbeth (Oct 27 2020 at 18:46):

The stats count makes for interesting reading. Mathlib has doubled in size (from 200k to 400k lines of code) in the last year?

view this post on Zulip Johan Commelin (Oct 27 2020 at 18:46):

That's what a lockdown does for you

view this post on Zulip Yury G. Kudryashov (Oct 27 2020 at 18:47):

Note that we moved some code from stdlib

view this post on Zulip Heather Macbeth (Oct 27 2020 at 18:48):

I was going to say, and 40% of those new lines of code are from @Yury G. Kudryashov -- but in fact since he refactors so much, "only" 20% :)

view this post on Zulip Yury G. Kudryashov (Oct 27 2020 at 18:49):

3.5K lines were moved from stdlib in #2697

view this post on Zulip Heather Macbeth (Oct 27 2020 at 18:50):

Compared to the other growth, that's a rounding error!

view this post on Zulip Kevin Lacker (Oct 27 2020 at 19:04):

hopefully the work required to port to Lean 4 is not a linear function of the lines of code in mathlib....

view this post on Zulip Mario Carneiro (Oct 27 2020 at 19:05):

it is, although the constant is probably small

view this post on Zulip Bryan Gin-ge Chen (Oct 27 2020 at 19:08):

A lot will depend on how quickly Lean 3 users can get trained in Lean 4. So those who get proficient quickly should get ready to write lots of documentation and tutorials.

view this post on Zulip Yury G. Kudryashov (Oct 27 2020 at 19:11):

I expect that the main issue is the tactic code (probably because I don't feel proficient in writing tactics).

view this post on Zulip Yury G. Kudryashov (Oct 27 2020 at 19:12):

I mean, the fact that I'm not proficient in writing meta code affects my impression about relative difficulty of migrating meta vs non-meta code.

view this post on Zulip Yury G. Kudryashov (Oct 27 2020 at 19:13):

I wrote a sentence, then realized that it was ambiguous.


Last updated: May 12 2021 at 23:13 UTC