Zulip Chat Archive
Stream: general
Topic: longest file
Sebastien Gouezel (Oct 15 2021 at 18:43):
Is there a magical grep command to find the longest files in mathlib? I just needed to add a lemma to topological_space.algebra.ordered.basic
, but then I realized it was already 3870 lines long...
Yury G. Kudryashov (Oct 15 2021 at 18:48):
I was going to split it. I'll try to do it tonight.
Kyle Miller (Oct 15 2021 at 19:12):
Longest files by number of lines:
$ find . -iname '*.lean' -exec wc -l {} + | sort -rn | head -n 20
645859 total
5138 ./data/list/basic.lean
3869 ./topology/algebra/ordered/basic.lean
3079 ./data/finset/basic.lean
3062 ./measure_theory/measure/measure_space.lean
2969 ./analysis/calculus/fderiv.lean
2961 ./analysis/calculus/times_cont_diff.lean
2889 ./data/set/basic.lean
2777 ./order/filter/basic.lean
2741 ./data/finsupp/basic.lean
2684 ./data/buffer/parser/basic.lean
2614 ./linear_algebra/basic.lean
2556 ./data/multiset/basic.lean
2539 ./measure_theory/function/lp_space.lean
2450 ./computability/turing_machine.lean
2415 ./measure_theory/integral/interval_integral.lean
2404 ./group_theory/subgroup/basic.lean
2396 ./tactic/core.lean
2340 ./measure_theory/integral/lebesgue.lean
2247 ./topology/metric_space/basic.lean
Yaël Dillies (Oct 15 2021 at 19:23):
I don't like the way the list
stuff is arranged. Every time I jump to a list
definition hoping to find the relevant lemmas next to it, it brings me to this huge mess that data.list.basic
is.
Yaël Dillies (Oct 15 2021 at 19:27):
Would you like if I split it up in files per definitions (or groups of definitions).
Julian Berman (Oct 15 2021 at 19:30):
cloc
is also a nice thing -- https://github.com/AlDanial/cloc -- for mathlib it produces:
⊙ cloc --by-file src | head -n 20 julian@Airm
1820 text files.
1820 unique files.
0 files ignored.
github.com/AlDanial/cloc v 1.90 T=1.80 s (1012.1 files/s, 358420.0 lines/s)
-----------------------------------------------------------------------------------------------------------------------------------
File blank comment code
-----------------------------------------------------------------------------------------------------------------------------------
src/data/list/basic.lean 1057 190 3891
src/topology/algebra/ordered/basic.lean 632 429 2808
src/measure_theory/measure/measure_space.lean 576 246 2240
src/analysis/calculus/fderiv.lean 617 218 2134
src/analysis/calculus/times_cont_diff.lean 381 470 2110
src/data/finset/basic.lean 772 285 2022
src/order/filter/basic.lean 586 170 2021
src/measure_theory/function/lp_space.lean 407 135 1997
src/data/finsupp/basic.lean 590 237 1914
src/data/buffer/parser/basic.lean 425 370 1889
src/data/set/basic.lean 846 153 1884
src/measure_theory/integral/lebesgue.lean 335 135 1870
Kyle Miller (Oct 15 2021 at 19:31):
@Yaël Dillies Does splitting the file up solve the problem of finding relevant lemmas? Wouldn't reordering things within the file be equivalent for that purpose?
Yaël Dillies (Oct 15 2021 at 19:31):
Nope, because the current situation is that defs are in a file and lemmas in other ones.
Eric Wieser (Oct 15 2021 at 19:32):
Yaël Dillies said:
Would you like if I split it up in files per definitions.
The advantage of having the definitions separate from the non-trivial lemmas is that you don't have to worry about circular imports when proving the N^2 different interactions between those definitions
Eric Wieser (Oct 15 2021 at 19:34):
If you had a file per definition, you to some extent have to pick a linear order over your definitions, such that list/aaa.lean
contains lemmas about only list.aaa
, list/zzz.lean
contains lemmas about how list.zzz
interacts with list.yyy
, list.xxx
, ..., list.aaa
Yaël Dillies (Oct 15 2021 at 19:51):
Yes sure, but that still happens because you need the nontrivial lemmas for the next nontrivial lemmas.
Yaël Dillies (Oct 15 2021 at 19:52):
Also, this linearization of definition happens all the time, except maybe that list
has many more definitions.
Rob Lewis (Oct 15 2021 at 20:02):
Another reason list
lemmas and defs are split is that many of the defs are used to define tactics, that are used before the lemmas can be proved.
Rob Lewis (Oct 15 2021 at 20:03):
I don't want to claim it's impossible to disentangle this, but it would be a big effort, and IMO not necessarily for the better.
Yaël Dillies (Oct 15 2021 at 20:07):
Oh yeah most of the definitions are in core.
Yaël Dillies (Oct 15 2021 at 20:07):
This is orthogonal to splitting data.list.basic
though.
Rob Lewis (Oct 15 2021 at 20:14):
Not entirely. Splitting data.list.basic
won't help you with this, because the definitions are in core or in data.list.defs
, for good reasons:
Yaël Dillies said:
I don't like the way the
list
stuff is arranged. Every time I jump to alist
definition hoping to find the relevant lemmas next to it, it brings me to this huge mess thatdata.list.basic
is.
Rob Lewis (Oct 15 2021 at 20:15):
Splitting data.list.basic
sounds okay to me, regardless. But I'm not sure there are clear lines for where to split it because of the interactions that Eric mentions.
Rob Lewis (Oct 15 2021 at 20:16):
It should be clear to someone who proves a lemma about list.f
and list.g
where they should put it. Right now I guess the answer is "as high up in list.basic
as you can," which isn't great
Scott Morrison (Oct 15 2021 at 20:18):
I've previously done lots of splitting of data.list.basic
. I'm in favour of anyone doing more!
Yaël Dillies (Oct 15 2021 at 20:22):
I have a few ideas. For example, we could separate the defs that look up elements, like head
, tail
, nth
take
, drop
, the defs that transform lists, like map
, map2
, bind
, the defs that consider lists as combinatorial objects, like sublist
, permutations
, perm
...
Yaël Dillies (Oct 15 2021 at 20:23):
Any other suggestions?
Eric Wieser (Oct 15 2021 at 20:31):
Note that its not enough to say "we should split out these groups", you also need to say "and we'll put them in this order in the imports heirarchy"
Eric Wieser (Oct 15 2021 at 20:32):
Or keep around a "lemmas" file for the interactions
Yaël Dillies (Oct 15 2021 at 20:33):
This is the dirty job I will figure out when doing.
Yaël Dillies (Oct 15 2021 at 20:36):
permutation
can earn its own file just before perm
.
Alex J. Best (Oct 15 2021 at 23:42):
Coincidentally I made #9728 earlier today, it doesn't help with the length much but hopefully simplifies the import hierarchy a little (and makes it easier to avoid loops).
Yury G. Kudryashov (Oct 16 2021 at 06:38):
Moved ~270 lines from topology.algebra.ordered.basic
in #9745
Yaël Dillies (Oct 23 2021 at 14:42):
Am moving 194 lines from data.list.basic
in #9906
Yaël Dillies (Nov 10 2021 at 08:40):
An update
$ find . -iname '*.lean' -exec wc -l {} + | sort -rn | head -n 20
260185 total
240111 total
178400 total
12135 total
4114 ./src/data/list/basic.lean
3278 ./src/measure_theory/measure/measure_space.lean
3085 ./src/data/finset/basic.lean
2969 ./src/analysis/calculus/fderiv.lean
2965 ./src/analysis/calculus/times_cont_diff.lean
2800 ./src/data/set/basic.lean
2774 ./src/order/filter/basic.lean
2753 ./src/data/finsupp/basic.lean
2684 ./src/data/buffer/parser/basic.lean
2677 ./src/linear_algebra/basic.lean
2614 ./src/topology/algebra/ordered/basic.lean
2577 ./src/measure_theory/function/lp_space.lean
2557 ./src/data/multiset/basic.lean
2486 ./src/group_theory/subgroup/basic.lean
2450 ./src/computability/turing_machine.lean
2407 ./src/tactic/core.lean
Last updated: Dec 20 2023 at 11:08 UTC