Zulip Chat Archive

Stream: condensed mathematics

Topic: Project statistics


David Michael Roberts (Mar 15 2021 at 06:51):

I thought it might be interesting to keep track of how the LTE is going as far as length of code produced. I appreciate it may not be easy with material going into mathlib, so maybe even just tracking the main self-contained parts is reasonable. Is there any automatic way to do this? Or are the parts all moving too fast at present?

Johan Commelin (Mar 15 2021 at 07:12):

@David Michael Roberts Does https://github.com/leanprover-community/lean-liquid/pulse give you some of the stats you want?

Johan Commelin (Mar 15 2021 at 07:13):

You can also run git ls-files | xargs wc to get very rough line-count statistics

David Michael Roberts (Mar 15 2021 at 08:46):

I'd checked that, but it only goes back so far. I found this online tool that seems to a reasonable job, giving a current snapshot only: https://codetabs.com/count-loc/count-loc-online.html (so no historical added/removed)

Johan Commelin (May 26 2021 at 07:27):

I'm giving a talk about LTE later today, so I decided to gather a bit more statistics.

  • Scholze's proof: 12p(+3pBD resolutions)\approx 12p \quad (+ 3p \quad\text{BD resolutions})
  • Blueprint: (2000l,12000w,87000c)21p\approx (2\,000l, 12\,000w, 87\,000c) \approx 21p
  • Lean code: (20000l,120000w,1060000c)\approx (20\,000l, 120\,000w, 1\,060\,000c)
  • strip-lean.tar.gz =512798b= 512\,798b \quad (nofor_mathlib/)
  • strip-blue.tar.gz =23846b= 23\,846b
  • De Bruijn factor =512898b23846b21.5= \frac{512\,898b}{23\,846b} \approx 21.5

Scott Morrison (May 26 2021 at 07:29):

Which you interpret to mean ...? :-)

Johan Commelin (May 26 2021 at 07:31):

I don't really interpret. That's the job of my audience...

Johan Commelin (May 26 2021 at 07:32):

It seems that the blueprint compresses better than the lean code. So I guess there's more redundancy in the blueprint? :stuck_out_tongue_wink:

Johan Commelin (May 26 2021 at 07:34):

(Edit: I forgot to count 8.17 and 8.19, so the proof in Analytic.pdf is more like 12p.)

Riccardo Brasca (May 26 2021 at 07:38):

Is the talk public?

Johan Commelin (May 26 2021 at 07:39):

http://individual.utoronto.ca/groechenig/K.html

Scott Morrison (May 26 2021 at 07:46):

Johan is on at . Too late for me. :-)

Johan Commelin (May 26 2021 at 07:47):

Sorry :shrug: I'm also not sure it's recorded. But ey, you guys won't here anything new anyways.

Sebastien Gouezel (May 26 2021 at 07:52):

How come the Lean text only compresses by a factor of two? It's just text, right, with a lot of words coming back, so it should be much better. Or has it something to do with unicode? I guess a more efficient compression engine would give different results.

Johan Commelin (May 26 2021 at 08:03):

I'm not a compression expert...

Johan Commelin (May 26 2021 at 08:09):

-rw-rw-r-- 1 jmc jmc  21764 May 26 08:08 strip-blue.tar.xz
-rw-rw-r-- 1 jmc jmc 468512 May 26 08:08 strip-lean.tar.xz

Johan Commelin (May 26 2021 at 08:13):

lz4 only makes things worse

-rw-rw-r-- 1 jmc jmc  37870 May 26 08:12 strip-blue.tar.lz4
-rw-rw-r-- 1 jmc jmc 619758 May 26 08:12 strip-lean.tar.lz4

Last updated: Dec 20 2023 at 11:08 UTC