Zulip Chat Archive

Stream: mathlib4

Topic: Splitting Topology/Connected.lean?


Michael Rothgang (Oct 12 2023 at 20:01):

I have a branch splitting Topology/Connected.lean which passes CI. Is this change desirable in principle? (My impression is yes... though making sure I've found everything about topic X on zulip is hard.)

And are there particular opinions on the split? The file currently (on mathlib master) has 1776 lines and contains material about

  • (pre)connected spaces and sets (1250 lines)
  • locally connected spaces (120 lines)
  • totally disconnected and totally separated spaces
  • a section on the setoid of connected components.

Locally, I've split the second (final file is about ~150 lines, with docstrings etc.) and third bullet (330 lines). I haven't split the last item yet, but could (another 150 lines). Thoughts before I submit a PR?

Michael Rothgang (Oct 12 2023 at 20:01):

Happy to submit a PR if that makes evaluating a split easier.

Patrick Massot (Oct 12 2023 at 20:08):

Splitting files that are more than 1000 lines is often a good idea.

Michael Rothgang (Oct 12 2023 at 20:16):

Filed as #7646.

Scott Morrison (Oct 12 2023 at 21:15):

Thanks! I think splitting files is super useful. It makes refactoring and minimisation much easier if you don't have to deal with these giants.

Michael Rothgang (Oct 18 2023 at 13:17):

@Scott Morrison The above PR is still awaiting review. Pinging you per your own words :-)
(For the record, I'm not aware of any work depending on this split.)

Floris van Doorn (Oct 18 2023 at 13:30):

I left some comments

Michael Rothgang (Oct 18 2023 at 19:51):

Thanks! I have updated the PR accordingly.

David Loeffler (Oct 19 2023 at 07:50):

Thanks for this! There are a few other giant files in topology which could benefit from being split similarly. Topology.MetricSpace.Basic is the largest at 3300 lines. Any takers?

Kevin Buzzard (Oct 19 2023 at 20:52):

I'm sure you could knock up some python code to do it automatically David! :-)

Damiano Testa (Oct 19 2023 at 20:53):

Or even ask GPT to do it for all the files at once!

David Loeffler (Oct 20 2023 at 09:13):

Kevin Buzzard said:

I'm sure you could knock up some python code to do it automatically David! :-)

I'm sure writing an auto-splitter wouldn't be hard. But I think it is better if we try to find "natural" splitting points, so the split makes the codebase easier, rather than harder, to navigate. This is clearly not something Python or ChatGPT can do.

David Loeffler (Oct 20 2023 at 09:14):

For example Topology.Separation can (and probably should) become a directory with separate files for T0, T1, T2 etc spaces. But I didn't see a natural splittage point in MetricSpace.Basic.

Michael Rothgang (Oct 20 2023 at 12:34):

David Loeffler said:

Thanks for this! There are a few other giant files in topology which could benefit from being split similarly. Topology.MetricSpace.Basic is the largest at 3300 lines. Any takers?

I've looked into that file locally. It's harder than it looks on first sight :-) I have split some parts, reducing it to 2200 lines. (Going further is harder.)

Ruben Van de Velde (Oct 20 2023 at 12:36):

I attacked LinearAlgebra.Basic over in #7801

Michael Rothgang (Oct 25 2023 at 11:45):

Michael Rothgang said:

David Loeffler said:

Thanks for this! There are a few other giant files in topology which could benefit from being split similarly. Topology.MetricSpace.Basic is the largest at 3300 lines. Any takers?

I've looked into that file locally. It's harder than it looks on first sight :-) I have split some parts, reducing it to 2200 lines. (Going further is harder.)

Now PRed as #7920


Last updated: Dec 20 2023 at 11:08 UTC