Zulip Chat Archive

Stream: mathlib4

Topic: split Nat/Prime.lean


Ralf Stephan (Jun 30 2024 at 08:25):

I'm planning to split Data/Nat/Prime.lean into its own directory. Motivation is that Factorization/Basic imports Data/Nat/Prime.lean and I have a new lemma for Prime that needs stuff from Factorization/Basic, but that lemma (and many others) certainly does not belong in the main Prime that gets imported for the Prime definition. Prime.lean is simply ripe for splitting IMO. I would like to hear opinions on the whole matter, and pointers to documents or threads are highly appreciated, as this would be my first maintenance attempt.

In Detail, I think a case can be made to have:

Nat/Prime/
+-- Basic.lean
+-- MinFac.lean
+-- Infinitude.lean
+-- Misc.lean

Is there consensus for names like Misc, Lemmas, Other, or such?

Kim Morrison (Jun 30 2024 at 08:52):

Generally we prefer Defs / Basic / Lemmas as the "generic" names.

Kim Morrison (Jun 30 2024 at 08:52):

(And I am strongly in favour of, where possible, splitting definitions into Defs files, which then depend as little as possible --- but no less --- on non-Defs files.)

Ralf Stephan (Jun 30 2024 at 08:56):

So I propose

Nat/Prime/
+-- Defs.lean
+-- Basic.lean
+-- MinFac.lean
+-- Infinitude.lean
+-- Lemmas.lean

Michael Rothgang (Jun 30 2024 at 08:58):

Ralf Stephan said:

So I propose

Nat/Prime/
+-- Defs.lean
+-- Basic.lean
+-- MinFac.lean
+-- Infinitude.lean
+-- Lemmas.lean

Looks reasonable, on first glance. (If Basic and Lemmas are meaningfully different or should rather be merged is not something I can judge without looking at the files. A reviewer will certainly opine on this :-))

Ralf Stephan (Jun 30 2024 at 09:05):

The criterion could be whether a lemma is used or not.

Eric Wieser (Jun 30 2024 at 09:23):

One option to speed up (at least partial) review might be to not do all the splits at once

Kevin Buzzard (Jun 30 2024 at 09:24):

Note that any PR splitting files should make no changes other than splitting the files, ie don't add your new lemmas at the same time. The tedious part of splitting is that you need to create new copyright headers and this is typically done by looking at git history (probably in this case in both the lean 3 and lean 4 repos).

Eric Wieser (Jun 30 2024 at 09:26):

I do wonder if we should switch to using Copyright mathlib contributors, but I guess we still want to maintain the Authors list which can be helpful for requesting reviews

Kim Morrison (Jun 30 2024 at 11:34):

Kevin Buzzard said:

Note that any PR splitting files should make no changes other than splitting the files, ie don't add your new lemmas at the same time. The tedious part of splitting is that you need to create new copyright headers and this is typically done by looking at git history (probably in this case in both the lean 3 and lean 4 repos).

I would like to promulgate the following rules for copyright headers:

  • It's okay to simply reproduce the copyright header of the file you are splitting. (After all, it sufficed before the split, for the same content.)
  • Forensics into git history should be entirely optional, but welcome.

Matthew Ballard (Jun 30 2024 at 12:09):

Can Copilot chat help here?

Kevin Buzzard (Jun 30 2024 at 12:34):

Looking forward to seeing it make up some random people in the lean community.

Matthew Ballard (Jun 30 2024 at 12:45):

This is assuming it has access to the git history for mathlib and mathlib4

Matthew Ballard (Jun 30 2024 at 12:54):

No. It doesn’t

Ruben Van de Velde (Jun 30 2024 at 12:59):

Yeah, that seems like a bad idea

Eric Wieser (Jun 30 2024 at 13:01):

Maybe a compromise is a "needs-archaology" label for PRs where the archaeology has been skipped? Then either someone else can come along before they're merged, or do a cleanup pass after (removing the label either way).

Ralf Stephan (Jun 30 2024 at 13:23):

Or, anyone that appears in git blame?

Kim Morrison (Jul 01 2024 at 02:15):

A tension here is that the authors line has always been ambiguous --- it is somewhere in between

  • capture everyone who has touched this file
  • identify the people who promise to be available long term to help maintain and update this file

Kim Morrison (Jul 01 2024 at 02:15):

The first would be ridiculous for many files. (I suspect I have touched every file in Mathlib. :-)

Kim Morrison (Jul 01 2024 at 02:16):

The second goes too far in ignoring significant contributions.

Kim Morrison (Jul 01 2024 at 02:16):

But the truth is that we are in some ill-defined compromise between the two of these.

Mario Carneiro (Jul 01 2024 at 02:21):

I think I would phrase it as "these are people who contributed significantly to the file" (with an emphasis on mathematical contributions, and specifically excluding updates from large scale refactors caused by changes to an earlier file or upstream)

Mario Carneiro (Jul 01 2024 at 02:22):

it should be a subset of the first bullet but it's probably incomparable with the second

Mario Carneiro (Jul 01 2024 at 02:25):

also it includes people who contributed significantly to an earlier version of the file, even if their contributions are no longer visible due to a subsequent major refactor

Yaël Dillies (Jul 01 2024 at 08:26):

I agree with Mario's interpretation. If we wanted to improve clarity, we could have both "Authors" and "Maintainers"/"Owners" (in the style of codeowner)/"Referent"/"Keeper"/however we want to call the people who take responsibility for a file

Ralf Stephan (Jul 01 2024 at 16:00):

Please review #14286.

Ralf Stephan (Jul 02 2024 at 11:16):

I'm now adding a second PR that only moves 4 theorems from Prime/Basic to Prime/Defs to enable changing the import in Data/Nat/Factors to Prime/Defs. It's remarkable by how many modules Factors is indirectly imported, so this should be worth the effort, besides resolving my circular dependency problem.


Last updated: May 02 2025 at 03:31 UTC