Zulip Chat Archive

Stream: general

Topic: Splitting long files into smaller files


view this post on Zulip Eric Wieser (Aug 05 2020 at 09:47):

What's mathlib's stance on how many different things should be in a single file? For instance, would it be reasonable to separate semimodule and submodule(currently in algebra/module/basic.lean into separate files (basic.lean and submodule.lean)?

view this post on Zulip Anne Baanen (Aug 05 2020 at 09:53):

I'm generally in favour of splitting, and I would say that splitting module and submodule is also a good idea if there are no obvious breakages in the files themselves. You can even import algebra.module.submodule in src/algebra/module/default.lean, and (almost) no code should break.

view this post on Zulip Eric Wieser (Aug 05 2020 at 09:55):

Am I right in thinking that default is special, and import a.b will import a.b.default if found?

view this post on Zulip Anne Baanen (Aug 05 2020 at 09:55):

I believe so.

view this post on Zulip Anne Baanen (Aug 05 2020 at 09:55):

At least, it worked like that when splitting ring_theory.polynomial.

view this post on Zulip Eric Wieser (Aug 05 2020 at 09:55):

I'll take a crack at that split once my two trivial open PRs against basic.lean (#3631, #3690) are merged, thanks

view this post on Zulip Anne Baanen (Aug 05 2020 at 09:56):

Thanks! I will take a look at the new PR.

view this post on Zulip Scott Morrison (Aug 05 2020 at 10:53):

While it's great to provide .default files for users of mathlib, files in mathlib should use the minimised imports, rather than grabbing .defaults, for the sake of minimising the compilation dependency graph.

view this post on Zulip Scott Morrison (Aug 05 2020 at 10:53):

And yes, effort into splitting files is generally good.

view this post on Zulip Scott Morrison (Aug 05 2020 at 10:54):

The one thing to think about is that it potentially degrades "findability", because it breaks the strategy "guess the right file, then scroll through it".

view this post on Zulip Scott Morrison (Aug 05 2020 at 10:54):

And so appropriate documentation in the module doc-strings (XYZ can be found in files ABC) is important!

view this post on Zulip Eric Wieser (Aug 05 2020 at 10:57):

Hopefully that strategy naturally leads into "guess the right directory, scroll through the handful of files in it"

view this post on Zulip Eric Wieser (Aug 05 2020 at 10:58):

Does the html doc generator have a mechanism to link to types and declarations elsewhere like sphinx's :func:`name` ?

view this post on Zulip Mario Carneiro (Aug 05 2020 at 11:03):

Scott Morrison said:

The one thing to think about is that it potentially degrades "findability", because it breaks the strategy "guess the right file, then scroll through it".

I've definitely noticed this aspect. I'm the one responsible for most of the massive files in mathlib, because I have high file size tolerance due to my set.mm background (all of ZFC in one file of 600000 lines), and I employ this tactic all the time for finding things in mathlib, but I have noticed it being less effective of late. Especially if I forget that a splitting PR has landed, my mental model will say "what is in this file is all there is about topic X" and so if the file is suddenly shrunk then I get lots of false negatives when searching

view this post on Zulip Rob Lewis (Aug 05 2020 at 11:07):

Eric Wieser said:

Does the html doc generator have a mechanism to link to types and declarations elsewhere like sphinx's :func:`name` ?

Any full declaration name that you type in backticks will be linked. See e.g. with_top in the first para of https://leanprover-community.github.io/mathlib_docs/geometry/manifold/smooth_manifold_with_corners.html

view this post on Zulip Mario Carneiro (Aug 05 2020 at 11:07):

Eric Wieser said:

Hopefully that strategy naturally leads into "guess the right directory, scroll through the handful of files in it"

I don't know any editor that makes this convenient. This starts to remind me of big java programs where there are thousands of 10 line files and you are constantly searching for where the actual stuff gets done

view this post on Zulip Scott Morrison (Aug 05 2020 at 12:22):

The other downside to many files is that library_search (and simp, for that matter) becomes weaker, because it can only see what is currently imported.

view this post on Zulip Scott Morrison (Aug 05 2020 at 12:23):

On the gripping hand, library_search and simp become faster when you don't have unnecessary material imported...

view this post on Zulip Anne Baanen (Aug 05 2020 at 12:27):

Findability via library_search / simp should not get worse if we move definitions (and their associated lemmas), right? Because you need to import those definitions anyway to talk about the thing we want to find.

view this post on Zulip Eric Wieser (Aug 05 2020 at 14:17):

Anne Baanen said:

I would say that splitting module and submodule is also a good idea if there are no obvious breakages in the files themselves.

PR'd as #3696 with no complications. Will remove the draft status once #3631 is resolved.


Last updated: May 13 2021 at 23:16 UTC