Zulip Chat Archive

Stream: general

Topic: Splitting long files into smaller files


Eric Wieser (Aug 05 2020 at 09:47):

What's mathlib's stance on how many different things should be in a single file? For instance, would it be reasonable to separate semimodule and submodule(currently in algebra/module/basic.lean into separate files (basic.lean and submodule.lean)?

Anne Baanen (Aug 05 2020 at 09:53):

I'm generally in favour of splitting, and I would say that splitting module and submodule is also a good idea if there are no obvious breakages in the files themselves. You can even import algebra.module.submodule in src/algebra/module/default.lean, and (almost) no code should break.

Eric Wieser (Aug 05 2020 at 09:55):

Am I right in thinking that default is special, and import a.b will import a.b.default if found?

Anne Baanen (Aug 05 2020 at 09:55):

I believe so.

Anne Baanen (Aug 05 2020 at 09:55):

At least, it worked like that when splitting ring_theory.polynomial.

Eric Wieser (Aug 05 2020 at 09:55):

I'll take a crack at that split once my two trivial open PRs against basic.lean (#3631, #3690) are merged, thanks

Anne Baanen (Aug 05 2020 at 09:56):

Thanks! I will take a look at the new PR.

Scott Morrison (Aug 05 2020 at 10:53):

While it's great to provide .default files for users of mathlib, files in mathlib should use the minimised imports, rather than grabbing .defaults, for the sake of minimising the compilation dependency graph.

Scott Morrison (Aug 05 2020 at 10:53):

And yes, effort into splitting files is generally good.

Scott Morrison (Aug 05 2020 at 10:54):

The one thing to think about is that it potentially degrades "findability", because it breaks the strategy "guess the right file, then scroll through it".

Scott Morrison (Aug 05 2020 at 10:54):

And so appropriate documentation in the module doc-strings (XYZ can be found in files ABC) is important!

Eric Wieser (Aug 05 2020 at 10:57):

Hopefully that strategy naturally leads into "guess the right directory, scroll through the handful of files in it"

Eric Wieser (Aug 05 2020 at 10:58):

Does the html doc generator have a mechanism to link to types and declarations elsewhere like sphinx's :func:`name` ?

Mario Carneiro (Aug 05 2020 at 11:03):

Scott Morrison said:

The one thing to think about is that it potentially degrades "findability", because it breaks the strategy "guess the right file, then scroll through it".

I've definitely noticed this aspect. I'm the one responsible for most of the massive files in mathlib, because I have high file size tolerance due to my set.mm background (all of ZFC in one file of 600000 lines), and I employ this tactic all the time for finding things in mathlib, but I have noticed it being less effective of late. Especially if I forget that a splitting PR has landed, my mental model will say "what is in this file is all there is about topic X" and so if the file is suddenly shrunk then I get lots of false negatives when searching

Rob Lewis (Aug 05 2020 at 11:07):

Eric Wieser said:

Does the html doc generator have a mechanism to link to types and declarations elsewhere like sphinx's :func:`name` ?

Any full declaration name that you type in backticks will be linked. See e.g. with_top in the first para of https://leanprover-community.github.io/mathlib_docs/geometry/manifold/smooth_manifold_with_corners.html

Mario Carneiro (Aug 05 2020 at 11:07):

Eric Wieser said:

Hopefully that strategy naturally leads into "guess the right directory, scroll through the handful of files in it"

I don't know any editor that makes this convenient. This starts to remind me of big java programs where there are thousands of 10 line files and you are constantly searching for where the actual stuff gets done

Scott Morrison (Aug 05 2020 at 12:22):

The other downside to many files is that library_search (and simp, for that matter) becomes weaker, because it can only see what is currently imported.

Scott Morrison (Aug 05 2020 at 12:23):

On the gripping hand, library_search and simp become faster when you don't have unnecessary material imported...

Anne Baanen (Aug 05 2020 at 12:27):

Findability via library_search / simp should not get worse if we move definitions (and their associated lemmas), right? Because you need to import those definitions anyway to talk about the thing we want to find.

Eric Wieser (Aug 05 2020 at 14:17):

Anne Baanen said:

I would say that splitting module and submodule is also a good idea if there are no obvious breakages in the files themselves.

PR'd as #3696 with no complications. Will remove the draft status once #3631 is resolved.


Last updated: Dec 20 2023 at 11:08 UTC