Zulip Chat Archive

Stream: general

Topic: longest file


Sebastien Gouezel (Oct 15 2021 at 18:43):

Is there a magical grep command to find the longest files in mathlib? I just needed to add a lemma to topological_space.algebra.ordered.basic, but then I realized it was already 3870 lines long...

Yury G. Kudryashov (Oct 15 2021 at 18:48):

I was going to split it. I'll try to do it tonight.

Kyle Miller (Oct 15 2021 at 19:12):

Longest files by number of lines:

$ find . -iname '*.lean' -exec wc -l {} + | sort -rn | head -n 20
  645859 total
    5138 ./data/list/basic.lean
    3869 ./topology/algebra/ordered/basic.lean
    3079 ./data/finset/basic.lean
    3062 ./measure_theory/measure/measure_space.lean
    2969 ./analysis/calculus/fderiv.lean
    2961 ./analysis/calculus/times_cont_diff.lean
    2889 ./data/set/basic.lean
    2777 ./order/filter/basic.lean
    2741 ./data/finsupp/basic.lean
    2684 ./data/buffer/parser/basic.lean
    2614 ./linear_algebra/basic.lean
    2556 ./data/multiset/basic.lean
    2539 ./measure_theory/function/lp_space.lean
    2450 ./computability/turing_machine.lean
    2415 ./measure_theory/integral/interval_integral.lean
    2404 ./group_theory/subgroup/basic.lean
    2396 ./tactic/core.lean
    2340 ./measure_theory/integral/lebesgue.lean
    2247 ./topology/metric_space/basic.lean

Yaël Dillies (Oct 15 2021 at 19:23):

I don't like the way the list stuff is arranged. Every time I jump to a list definition hoping to find the relevant lemmas next to it, it brings me to this huge mess that data.list.basic is.

Yaël Dillies (Oct 15 2021 at 19:27):

Would you like if I split it up in files per definitions (or groups of definitions).

Julian Berman (Oct 15 2021 at 19:30):

cloc is also a nice thing -- https://github.com/AlDanial/cloc -- for mathlib it produces:

  cloc --by-file src | head -n 20                                                                                                                                                  julian@Airm
    1820 text files.
    1820 unique files.
       0 files ignored.

github.com/AlDanial/cloc v 1.90  T=1.80 s (1012.1 files/s, 358420.0 lines/s)
-----------------------------------------------------------------------------------------------------------------------------------
File                                                                                            blank        comment           code
-----------------------------------------------------------------------------------------------------------------------------------
src/data/list/basic.lean                                                                         1057            190           3891
src/topology/algebra/ordered/basic.lean                                                           632            429           2808
src/measure_theory/measure/measure_space.lean                                                     576            246           2240
src/analysis/calculus/fderiv.lean                                                                 617            218           2134
src/analysis/calculus/times_cont_diff.lean                                                        381            470           2110
src/data/finset/basic.lean                                                                        772            285           2022
src/order/filter/basic.lean                                                                       586            170           2021
src/measure_theory/function/lp_space.lean                                                         407            135           1997
src/data/finsupp/basic.lean                                                                       590            237           1914
src/data/buffer/parser/basic.lean                                                                 425            370           1889
src/data/set/basic.lean                                                                           846            153           1884
src/measure_theory/integral/lebesgue.lean                                                         335            135           1870

Kyle Miller (Oct 15 2021 at 19:31):

@Yaël Dillies Does splitting the file up solve the problem of finding relevant lemmas? Wouldn't reordering things within the file be equivalent for that purpose?

Yaël Dillies (Oct 15 2021 at 19:31):

Nope, because the current situation is that defs are in a file and lemmas in other ones.

Eric Wieser (Oct 15 2021 at 19:32):

Yaël Dillies said:

Would you like if I split it up in files per definitions.

The advantage of having the definitions separate from the non-trivial lemmas is that you don't have to worry about circular imports when proving the N^2 different interactions between those definitions

Eric Wieser (Oct 15 2021 at 19:34):

If you had a file per definition, you to some extent have to pick a linear order over your definitions, such that list/aaa.lean contains lemmas about only list.aaa, list/zzz.lean contains lemmas about how list.zzz interacts with list.yyy, list.xxx, ..., list.aaa

Yaël Dillies (Oct 15 2021 at 19:51):

Yes sure, but that still happens because you need the nontrivial lemmas for the next nontrivial lemmas.

Yaël Dillies (Oct 15 2021 at 19:52):

Also, this linearization of definition happens all the time, except maybe that list has many more definitions.

Rob Lewis (Oct 15 2021 at 20:02):

Another reason list lemmas and defs are split is that many of the defs are used to define tactics, that are used before the lemmas can be proved.

Rob Lewis (Oct 15 2021 at 20:03):

I don't want to claim it's impossible to disentangle this, but it would be a big effort, and IMO not necessarily for the better.

Yaël Dillies (Oct 15 2021 at 20:07):

Oh yeah most of the definitions are in core.

Yaël Dillies (Oct 15 2021 at 20:07):

This is orthogonal to splitting data.list.basic though.

Rob Lewis (Oct 15 2021 at 20:14):

Not entirely. Splitting data.list.basic won't help you with this, because the definitions are in core or in data.list.defs, for good reasons:

Yaël Dillies said:

I don't like the way the list stuff is arranged. Every time I jump to a list definition hoping to find the relevant lemmas next to it, it brings me to this huge mess that data.list.basic is.

Rob Lewis (Oct 15 2021 at 20:15):

Splitting data.list.basic sounds okay to me, regardless. But I'm not sure there are clear lines for where to split it because of the interactions that Eric mentions.

Rob Lewis (Oct 15 2021 at 20:16):

It should be clear to someone who proves a lemma about list.f and list.g where they should put it. Right now I guess the answer is "as high up in list.basic as you can," which isn't great

Scott Morrison (Oct 15 2021 at 20:18):

I've previously done lots of splitting of data.list.basic. I'm in favour of anyone doing more!

Yaël Dillies (Oct 15 2021 at 20:22):

I have a few ideas. For example, we could separate the defs that look up elements, like head, tail, nth take, drop, the defs that transform lists, like map, map2, bind, the defs that consider lists as combinatorial objects, like sublist, permutations, perm...

Yaël Dillies (Oct 15 2021 at 20:23):

Any other suggestions?

Eric Wieser (Oct 15 2021 at 20:31):

Note that its not enough to say "we should split out these groups", you also need to say "and we'll put them in this order in the imports heirarchy"

Eric Wieser (Oct 15 2021 at 20:32):

Or keep around a "lemmas" file for the interactions

Yaël Dillies (Oct 15 2021 at 20:33):

This is the dirty job I will figure out when doing.

Yaël Dillies (Oct 15 2021 at 20:36):

permutation can earn its own file just before perm.

Alex J. Best (Oct 15 2021 at 23:42):

Coincidentally I made #9728 earlier today, it doesn't help with the length much but hopefully simplifies the import hierarchy a little (and makes it easier to avoid loops).

Yury G. Kudryashov (Oct 16 2021 at 06:38):

Moved ~270 lines from topology.algebra.ordered.basic in #9745

Yaël Dillies (Oct 23 2021 at 14:42):

Am moving 194 lines from data.list.basic in #9906

Yaël Dillies (Nov 10 2021 at 08:40):

An update

$ find . -iname '*.lean' -exec wc -l {} + | sort -rn | head -n 20
 260185 total
 240111 total
 178400 total
  12135 total
   4114 ./src/data/list/basic.lean
   3278 ./src/measure_theory/measure/measure_space.lean
   3085 ./src/data/finset/basic.lean
   2969 ./src/analysis/calculus/fderiv.lean
   2965 ./src/analysis/calculus/times_cont_diff.lean
   2800 ./src/data/set/basic.lean
   2774 ./src/order/filter/basic.lean
   2753 ./src/data/finsupp/basic.lean
   2684 ./src/data/buffer/parser/basic.lean
   2677 ./src/linear_algebra/basic.lean
   2614 ./src/topology/algebra/ordered/basic.lean
   2577 ./src/measure_theory/function/lp_space.lean
   2557 ./src/data/multiset/basic.lean
   2486 ./src/group_theory/subgroup/basic.lean
   2450 ./src/computability/turing_machine.lean
   2407 ./src/tactic/core.lean

Last updated: Dec 20 2023 at 11:08 UTC