Zulip Chat Archive
Stream: general
Topic: Plots of mathlib4 and lean4 lines of code over time
Joscha Mennicken (Aug 02 2024 at 23:58):
Hi, I recently wrote a small tool to plot lines of code based on git blame
and thought mathlib4 would be a nice benchmark. There are two (interactive) graphs (that don't work well on mobile, sorry):
- Lines of mathlib4 code over time, split up by the year they were written
- Lines of mathlib4 code over time, split up by author (I limited the graph to 50 authors but it's still pretty unwieldy.)
The commits are roughly ordered by their committer date, but inside each month they are ordered topologically and spread out to cover the entire month evenly. This is also the reason why the graph seems to flatten out around the beginning of the current month - there's just not many commits yet.
Controls: Click and drag or scroll to zoom. Double-click to fully zoom back out. Click a commit to pin its info to the right, for copy-pasting. Click on the legend to hide/show individual years or authors. Ctrl-click to show only a specific year or author.
Edward van de Meent (Aug 03 2024 at 09:12):
big thanks to our greatest contributor Misc. Authors for making the project possible :grinning_face_with_smiling_eyes:
Edward van de Meent (Aug 03 2024 at 09:18):
i'm curious, is there a significant change in policy/workflow that caused the slowing of growth around the start of july 2023?
Yaël Dillies (Aug 03 2024 at 09:18):
The port ended!
Edward van de Meent (Aug 03 2024 at 09:18):
ahhh
Edward van de Meent (Aug 03 2024 at 09:21):
it looks like the sudden drop is the removing of #align
-like lines, for those wondering.
Joscha Mennicken (Aug 03 2024 at 11:54):
I've produced the same set of graphs for the lean4 repository (excluding stage0 as best as I could), and they look a lot more exciting :P. I've also updated the mathlib4 author graph to use the author names instead of email addresses, and to count the amount of hidden misc authors.
- Lines of lean4 code over time, split up by the year they were written
- Lines of lean4 code over time, split up by author
If you're wondering about the vertical jumps from mid-2021 to mid-2023, those are due to lake and lean4 being developed in parallel but separate repos/branches, and they go away when lake is merged into lean.
Ruben Van de Velde (Aug 03 2024 at 11:56):
Perhaps for mathlib it would be more interesting to have plots ignoring everything before the end of the port
Joscha Mennicken (Aug 03 2024 at 12:21):
According to https://leanprover-community.github.io/mathlib-port-status/, the port was declared complete on 2023-07-16, so here are the mathlib4 graphs but every commit before 2023-07-16 is ignored:
- Lines of mathlib4 code since 2023-07-16, split up by the year they were written (somewhat boring)
- Lines of mathlib4 code since 2023-07-16, split up by author (now Yaël Dillies is at the bottom)
By the way, authors are sorted roughly by their area below the graph.
Yaël Dillies (Aug 03 2024 at 12:29):
Yet I don't beat Misc. :pensive:
Scott Carnahan (Aug 03 2024 at 15:16):
These were very interesting charts! I got credit (or blame) for more lines than I had expected. Looking a little more closely, it turns out that about half of them came from splitting up big files, i.e., other people's hard work.
Joscha Mennicken (Aug 03 2024 at 15:32):
I find it interesting how mathlib seems to be growing at a fairly consistent rate compared to other repos. The size and complexity doesn't seem to be slowing it down.
Patrick Massot (Aug 03 2024 at 19:51):
Scott, that kind of phenomenon is part of why we stopped having such statistics on the community website. And of course the port completely changed those statistics.
Joscha Mennicken (Jan 11 2025 at 22:10):
I've updated the graph of lines of mathlib4 code over time, split up by the year they were written and created a version without the refactor jump at 2024-07-19 (the lines before the refactor jump are scaled down linearly). See the original message for why the last month may look too flat.
Eric Wieser (Jan 11 2025 at 23:55):
I would guess the refactor jump can be addressed by including mathlib dependencies (following the shas in the lake-manifest.json)
Eric Wieser (Jan 11 2025 at 23:55):
That is, it represents code being upstreamed
Kim Morrison (Jan 12 2025 at 01:01):
Oh, I assumed it was removing #align
.
Joscha Mennicken (Jan 12 2025 at 01:03):
Yes, that's the cause of the jump (commit 465f26fde59aa31d051ef0b6db1fb34476802b64) as you can see if you zoom into the graph enough.
Last updated: May 02 2025 at 03:31 UTC