Zulip Chat Archive

Stream: Equational

Topic: Project language stats


Shreyas Srinivas (Apr 14 2025 at 09:00):

Also I think we crossed the 4000 file mark. Many of our files can be rather large too. Can someone get a line count of our project (both overall and lean only counts for comparison with Mathlib which has ~6500ish files).

Pietro Monticone (Apr 14 2025 at 11:50):

Language Files Blank Comment Code
Python 2981 200,733 376,196 658,413
SVG 5 1 31,677 213,111
Lean 1307 14,723 8,182 140,193
Text 87 3,561 0 93,144
CSV 30 0 0 36,702
C 10 250 263 30,516
Cython 43 5,888 2,272 28,971
HTML 38 7,145 4 26,999
C/C++ Header 77 2,433 5,347 15,573
JavaScript 17 2,279 668 13,856
JSON 17 0 0 13,338
TeX 54 1,342 92 6,705
Markdown 92 424 0 3,962
Ruby 18 407 153 1,964
YAML 12 183 53 1,057
Fortran 90 55 126 86 924
Haskell 11 137 39 919
CSS 12 104 13 569
Fortran 77 22 26 50 383
Elm 2 49 4 366
Bourne Shell 10 58 90 267
reStructuredText 2 74 1 216
XML 2 1 0 186
Meson 5 36 9 162
C++ 1 13 14 143
TypeScript 1 14 5 101
Fish Shell 1 19 14 70
PowerShell 3 19 12 63
C Shell 1 13 7 35
INI 3 5 0 34
TOML 1 8 0 30
Nix 1 0 0 18
Snakemake 16 0 0 16
SCSS 1 3 0 14
Scheme 1 0 0 12
Fortran 95 1 0 1 4
DOS Batch 1 2 4 3
Unity-Prefab 1 0 0 2
Total 4942 240,076 425,256 1,289,041

Shreyas Srinivas (Apr 14 2025 at 11:52):

lake build shows 4004 files. Does this include all the dependency files as well?

Shreyas Srinivas (Apr 14 2025 at 11:52):

also where did we use haskell, fortran 77, fortran 90, and elm?

Pietro Monticone (Apr 14 2025 at 11:53):

No, it excludes everything in .lake, .venv, .vscode, data, .github.

Pietro Monticone (Apr 14 2025 at 12:06):

If you look at the equational_theories directory only you get the following:

Language Files Blank Comment Code
Lean 1304 14,687 8,163 139,941
C 4 131 96 28,678
Text 17 2,992 0 11,036
Python 30 855 531 3,785
Ruby 10 284 85 1,359
Haskell 11 137 39 919
Markdown 14 161 0 411
Bourne Shell 1 10 23 74
Scheme 1 0 0 12
C/C++ Header 2 3 0 10
XML 1 1 0 5
JSON 2 0 0 2
Total 1397 19,261 8,937 186,232

Aaron Hill (Apr 14 2025 at 12:07):

Where are the other ~650k lines of python coming from?

Notification Bot (Apr 14 2025 at 12:07):

7 messages were moved here from #Equational > Crowdsourcing the proof of 1729 != 817 by Shreyas Srinivas.

Shreyas Srinivas (Apr 14 2025 at 12:08):

the scripts folder I suppose. I still don't see where we get any fortran, let alone three different versions of Fortran (77/90/95) from. Why does lake claim that there are 4004 files?

Mario Carneiro (Apr 14 2025 at 12:21):

what technique was used to get these numbers @Pietro Monticone ?

Pietro Monticone (Apr 14 2025 at 12:21):

I have just run cloc.

Pietro Monticone (Apr 14 2025 at 12:24):

Running

open Lean Elab Command
run_cmd
  let all := ( getEnv).allImportedModuleNames
  dbg_trace all.size

in the build file I get the number of total files (project + imports not necessarily used): 5003.

Shreyas Srinivas (Apr 14 2025 at 12:25):

yeah, but I don't think lake build counts the dependencies

Mario Carneiro (Apr 14 2025 at 12:25):

$ cloc . --exclude-dir=.lake,.git
    1709 text files.
    1695 unique files.
     100 files ignored.

github.com/AlDanial/cloc v 1.90  T=2.04 s (788.9 files/s, 240296.7 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
SVG                              4              1          31677         213031
Lean                          1307          14723           8182         140193
C                                4            131             96          28678
JavaScript                       9           2004            608          11648
JSON                            14              0              0          10542
Python                          52           1558            837           6769
TeX                             52           1342             92           5607
Markdown                        89            399              0           3861
Ruby                            18            407            153           1964
HTML                             9            231              4           1586
YAML                            12            183             53           1057
Haskell                         11            137             39            919
CSS                              5             55              9            366
Elm                              2             49              4            366
Bourne Shell                    10             58             90            267
XML                              2              1              0            186
TypeScript                       1             14              5            101
TOML                             1              8              0             30
Nix                              1              0              0             18
PowerShell                       2              7              9             17
Sass                             1              3              0             14
Scheme                           1              0              0             12
C/C++ Header                     2              3              0             10
DOS Batch                        1              2              4              3
-------------------------------------------------------------------------------
SUM:                          1610          21316          41862         427245
-------------------------------------------------------------------------------

Mario Carneiro (Apr 14 2025 at 12:26):

I'm not getting any fortran

Mario Carneiro (Apr 14 2025 at 12:26):

the first line that seems likely to be an error is Elm

Pietro Monticone (Apr 14 2025 at 12:27):

It might be because of .gitignored stuff.

Shreyas Srinivas (Apr 14 2025 at 12:27):

The finite magma explorer uses Elm

Mario Carneiro (Apr 14 2025 at 12:27):

yeah actually all of these seem to be real hits, this is just an incredibly multi-lingual project

Pietro Monticone (Apr 14 2025 at 12:27):

And the app uses Haskell.

Mario Carneiro (Apr 14 2025 at 12:28):

ah, the scheme is a false hit (not sure what finsched.sch is but it's definitely not scheme; I guess vampire has some custom format for scheduling?)

Shreyas Srinivas (Apr 14 2025 at 12:30):

I can't find scheme files right away, but aren't there like a gazillion flavours of scheme

Shreyas Srinivas (Apr 14 2025 at 12:30):

More relevant to lean : where are the 4004 build targets of lake build coming from? can we list them explicitly without building everything?

Mario Carneiro (Apr 14 2025 at 12:31):

probably parts of mathlib are involved there

Mario Carneiro (Apr 14 2025 at 12:33):

I managed to confirm all of the languages listed except for the Scheme mis-classification

Pietro Monticone (Apr 14 2025 at 12:36):

Yes, the finsched.sch file is considered Scheme by cloc (see docs).

Mario Carneiro (Apr 14 2025 at 12:41):

do you have .f90 files in your local copy?

Pietro Monticone (Apr 14 2025 at 12:43):

Yes, but in the .venv I stupidly forgot to exclude at the beginning (more precisely I forgot a , between .venv and .vscode in the --exclude-dir list).

Numpy uses Fortran.

Shreyas Srinivas (Apr 14 2025 at 12:49):

I think even in terms of raw stats, we accomplished something spectacular here. While I am aware that a lot of the lean files are autogenerated, 1307 files is not a small number (although ungolfed and somewhat disorganised). Many of these files are massive. It has been 6 months more or less.

Pietro Monticone (Apr 14 2025 at 12:52):

I will golf the latest changes later today.

In the meantime, the next major objective is bumping to the latest stable toolchain (v4.18.0). Any help would be greatly appreciated.

Shreyas Srinivas (Apr 14 2025 at 12:58):

All the errors I get right now when running lake update are docgen errors

Shreyas Srinivas (Apr 14 2025 at 12:58):

and lake is simply choosing to update to v4.18.0 even when the mathlib depedency rev is set to v4.15.0. I really wish there was a lake command that just updated the toolchain to a specified version.

Shreyas Srinivas (Apr 14 2025 at 13:07):

In fact it would be nicer still if we could say something lake pin v4.18.0 and lake did just that instead of forcing toolchain updates every time cache is called.

Mario Carneiro (Apr 14 2025 at 13:15):

We never finished #Equational > A final end-to-end theorem in Lean :(

Shreyas Srinivas (Apr 14 2025 at 13:37):

Oh. Wasn’t that because we couldn’t get native decide to be accepted by lean4checker

Shreyas Srinivas (Apr 14 2025 at 13:38):

FWIW, we could still finish it.


Last updated: May 02 2025 at 03:31 UTC