Zulip Chat Archive
Stream: Equational
Topic: Project language stats
Shreyas Srinivas (Apr 14 2025 at 09:00):
Also I think we crossed the 4000 file mark. Many of our files can be rather large too. Can someone get a line count of our project (both overall and lean only counts for comparison with Mathlib which has ~6500ish files).
Pietro Monticone (Apr 14 2025 at 11:50):
Language | Files | Blank | Comment | Code |
---|---|---|---|---|
Python | 2981 | 200,733 | 376,196 | 658,413 |
SVG | 5 | 1 | 31,677 | 213,111 |
Lean | 1307 | 14,723 | 8,182 | 140,193 |
Text | 87 | 3,561 | 0 | 93,144 |
CSV | 30 | 0 | 0 | 36,702 |
C | 10 | 250 | 263 | 30,516 |
Cython | 43 | 5,888 | 2,272 | 28,971 |
HTML | 38 | 7,145 | 4 | 26,999 |
C/C++ Header | 77 | 2,433 | 5,347 | 15,573 |
JavaScript | 17 | 2,279 | 668 | 13,856 |
JSON | 17 | 0 | 0 | 13,338 |
TeX | 54 | 1,342 | 92 | 6,705 |
Markdown | 92 | 424 | 0 | 3,962 |
Ruby | 18 | 407 | 153 | 1,964 |
YAML | 12 | 183 | 53 | 1,057 |
Fortran 90 | 55 | 126 | 86 | 924 |
Haskell | 11 | 137 | 39 | 919 |
CSS | 12 | 104 | 13 | 569 |
Fortran 77 | 22 | 26 | 50 | 383 |
Elm | 2 | 49 | 4 | 366 |
Bourne Shell | 10 | 58 | 90 | 267 |
reStructuredText | 2 | 74 | 1 | 216 |
XML | 2 | 1 | 0 | 186 |
Meson | 5 | 36 | 9 | 162 |
C++ | 1 | 13 | 14 | 143 |
TypeScript | 1 | 14 | 5 | 101 |
Fish Shell | 1 | 19 | 14 | 70 |
PowerShell | 3 | 19 | 12 | 63 |
C Shell | 1 | 13 | 7 | 35 |
INI | 3 | 5 | 0 | 34 |
TOML | 1 | 8 | 0 | 30 |
Nix | 1 | 0 | 0 | 18 |
Snakemake | 16 | 0 | 0 | 16 |
SCSS | 1 | 3 | 0 | 14 |
Scheme | 1 | 0 | 0 | 12 |
Fortran 95 | 1 | 0 | 1 | 4 |
DOS Batch | 1 | 2 | 4 | 3 |
Unity-Prefab | 1 | 0 | 0 | 2 |
Total | 4942 | 240,076 | 425,256 | 1,289,041 |
Shreyas Srinivas (Apr 14 2025 at 11:52):
lake build
shows 4004 files. Does this include all the dependency files as well?
Shreyas Srinivas (Apr 14 2025 at 11:52):
also where did we use haskell, fortran 77, fortran 90, and elm?
Pietro Monticone (Apr 14 2025 at 11:53):
No, it excludes everything in .lake
, .venv
, .vscode
, data
, .github
.
Pietro Monticone (Apr 14 2025 at 12:06):
If you look at the equational_theories
directory only you get the following:
Language | Files | Blank | Comment | Code |
---|---|---|---|---|
Lean | 1304 | 14,687 | 8,163 | 139,941 |
C | 4 | 131 | 96 | 28,678 |
Text | 17 | 2,992 | 0 | 11,036 |
Python | 30 | 855 | 531 | 3,785 |
Ruby | 10 | 284 | 85 | 1,359 |
Haskell | 11 | 137 | 39 | 919 |
Markdown | 14 | 161 | 0 | 411 |
Bourne Shell | 1 | 10 | 23 | 74 |
Scheme | 1 | 0 | 0 | 12 |
C/C++ Header | 2 | 3 | 0 | 10 |
XML | 1 | 1 | 0 | 5 |
JSON | 2 | 0 | 0 | 2 |
Total | 1397 | 19,261 | 8,937 | 186,232 |
Aaron Hill (Apr 14 2025 at 12:07):
Where are the other ~650k lines of python coming from?
Notification Bot (Apr 14 2025 at 12:07):
7 messages were moved here from #Equational > Crowdsourcing the proof of 1729 != 817 by Shreyas Srinivas.
Shreyas Srinivas (Apr 14 2025 at 12:08):
the scripts folder I suppose. I still don't see where we get any fortran, let alone three different versions of Fortran (77/90/95) from. Why does lake claim that there are 4004 files?
Mario Carneiro (Apr 14 2025 at 12:21):
what technique was used to get these numbers @Pietro Monticone ?
Pietro Monticone (Apr 14 2025 at 12:21):
I have just run cloc
.
Pietro Monticone (Apr 14 2025 at 12:24):
Running
open Lean Elab Command
run_cmd
let all := (← getEnv).allImportedModuleNames
dbg_trace all.size
in the build file I get the number of total files (project + imports not necessarily used): 5003.
Shreyas Srinivas (Apr 14 2025 at 12:25):
yeah, but I don't think lake build
counts the dependencies
Mario Carneiro (Apr 14 2025 at 12:25):
$ cloc . --exclude-dir=.lake,.git
1709 text files.
1695 unique files.
100 files ignored.
github.com/AlDanial/cloc v 1.90 T=2.04 s (788.9 files/s, 240296.7 lines/s)
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
SVG 4 1 31677 213031
Lean 1307 14723 8182 140193
C 4 131 96 28678
JavaScript 9 2004 608 11648
JSON 14 0 0 10542
Python 52 1558 837 6769
TeX 52 1342 92 5607
Markdown 89 399 0 3861
Ruby 18 407 153 1964
HTML 9 231 4 1586
YAML 12 183 53 1057
Haskell 11 137 39 919
CSS 5 55 9 366
Elm 2 49 4 366
Bourne Shell 10 58 90 267
XML 2 1 0 186
TypeScript 1 14 5 101
TOML 1 8 0 30
Nix 1 0 0 18
PowerShell 2 7 9 17
Sass 1 3 0 14
Scheme 1 0 0 12
C/C++ Header 2 3 0 10
DOS Batch 1 2 4 3
-------------------------------------------------------------------------------
SUM: 1610 21316 41862 427245
-------------------------------------------------------------------------------
Mario Carneiro (Apr 14 2025 at 12:26):
I'm not getting any fortran
Mario Carneiro (Apr 14 2025 at 12:26):
the first line that seems likely to be an error is Elm
Pietro Monticone (Apr 14 2025 at 12:27):
It might be because of .gitignore
d stuff.
Shreyas Srinivas (Apr 14 2025 at 12:27):
The finite magma explorer uses Elm
Mario Carneiro (Apr 14 2025 at 12:27):
yeah actually all of these seem to be real hits, this is just an incredibly multi-lingual project
Pietro Monticone (Apr 14 2025 at 12:27):
And the app uses Haskell.
Mario Carneiro (Apr 14 2025 at 12:28):
ah, the scheme is a false hit (not sure what finsched.sch is but it's definitely not scheme; I guess vampire has some custom format for scheduling?)
Shreyas Srinivas (Apr 14 2025 at 12:30):
I can't find scheme files right away, but aren't there like a gazillion flavours of scheme
Shreyas Srinivas (Apr 14 2025 at 12:30):
More relevant to lean : where are the 4004 build targets of lake build coming from? can we list them explicitly without building everything?
Mario Carneiro (Apr 14 2025 at 12:31):
probably parts of mathlib are involved there
Mario Carneiro (Apr 14 2025 at 12:33):
I managed to confirm all of the languages listed except for the Scheme mis-classification
Pietro Monticone (Apr 14 2025 at 12:36):
Yes, the finsched.sch
file is considered Scheme by cloc
(see docs).
Mario Carneiro (Apr 14 2025 at 12:41):
do you have .f90 files in your local copy?
Pietro Monticone (Apr 14 2025 at 12:43):
Yes, but in the .venv
I stupidly forgot to exclude at the beginning (more precisely I forgot a ,
between .venv
and .vscode
in the --exclude-dir
list).
Numpy uses Fortran.
Shreyas Srinivas (Apr 14 2025 at 12:49):
I think even in terms of raw stats, we accomplished something spectacular here. While I am aware that a lot of the lean files are autogenerated, 1307 files is not a small number (although ungolfed and somewhat disorganised). Many of these files are massive. It has been 6 months more or less.
Pietro Monticone (Apr 14 2025 at 12:52):
I will golf the latest changes later today.
In the meantime, the next major objective is bumping to the latest stable toolchain (v4.18.0
). Any help would be greatly appreciated.
Shreyas Srinivas (Apr 14 2025 at 12:58):
All the errors I get right now when running lake update are docgen errors
Shreyas Srinivas (Apr 14 2025 at 12:58):
and lake is simply choosing to update to v4.18.0 even when the mathlib depedency rev is set to v4.15.0. I really wish there was a lake command that just updated the toolchain to a specified version.
Shreyas Srinivas (Apr 14 2025 at 13:07):
In fact it would be nicer still if we could say something lake pin v4.18.0
and lake did just that instead of forcing toolchain updates every time cache is called.
Mario Carneiro (Apr 14 2025 at 13:15):
We never finished #Equational > A final end-to-end theorem in Lean :(
Shreyas Srinivas (Apr 14 2025 at 13:37):
Oh. Wasn’t that because we couldn’t get native decide to be accepted by lean4checker
Shreyas Srinivas (Apr 14 2025 at 13:38):
FWIW, we could still finish it.
Last updated: May 02 2025 at 03:31 UTC