Zulip Chat Archive

Stream: mathlib4

Topic: putting mathlib3 in the git history


Julian Berman (May 02 2023 at 22:59):

Perhaps once the port is done it's worth slight consideration of even merging the repositories?

Alex J. Best (May 02 2023 at 23:01):

I definitely think it would be helpful to graft the git histories somehow, even a slightly imperfect result would still be helpful when working on mathlib in the future

Julian Berman (May 02 2023 at 23:05):

Not that it needs any further reputability but might also be nice for publicity purposes if someone in the future sees a 10 year commit history rather than a 2 year one or whatever. But yeah git is perfectly happy with that sort of thing.

Eric Wieser (May 02 2023 at 23:23):

What I'd probably suggest is doing something like

* "Combine the mathlib3 and mathlib4 repos"
|\
| * Merge of last mathport output into master, discarding all changes from mathport
| |\
| | * automated fixed (Mathbin -> Mathlib)
| | * mathport output for last commit in mathlib3, with lean3 versions deleted
| | * last commit in mathlib3
| | |
| |
* | misc mathlib4 features that happen
* | misc mathlib4 features that happen
|/
/
|
* last forward-porting commit in mathlib4
* last porting commit in mathlib4
* misc mathlib4 commits
|

Eric Wieser (May 02 2023 at 23:28):

A bit of thought can probably be put into which order we want the parents in the top two merge commits; I believe by default git follows the first parent. This lets us pick between emphasizing the porting history or the mathlib3 history

Julian Berman (May 02 2023 at 23:29):

(And then on the GitHub side you transfer all the mathlib4 issues and they get new mathlib numbers, and then you put your fingers in your ears and don't think about all the zulip mentions of mathlib4 numbers between now and then :)

Eric Wieser (May 02 2023 at 23:29):

That would actually be ok if the new home were leanprover-community/mathlib

Julian Berman (May 02 2023 at 23:30):

OK we figured it all out, is the port done now, where's Johan to say 100%

Eric Wieser (May 02 2023 at 23:30):

Because github would redirect from the mathlib4 issues automatically

Notification Bot (May 02 2023 at 23:32):

9 messages were moved here from #new members > Mention Lean 4 somewhere in the Lean Community pages by Eric Wieser.

Scott Morrison (May 03 2023 at 00:06):

I love the idea of the repo name eventually being just mathlib rather than mathlib4.

Alex J. Best (Jul 19 2023 at 15:50):

Is now a good point to think about implementing this?

Eric Wieser (Jul 19 2023 at 16:37):

Yes, once we've made the release tags. Some things we need to answer:

Eric Wieser (Jul 19 2023 at 16:38):

  • What do we want blame to point at; the porting history, or the mathlib3 history? Which order do the parents of the merge commits need to be to achieve this?

Eric Wieser (Jul 19 2023 at 16:38):

  • Do we want to shuffle repo names, making mathlib4 be mathlib again? This will either
    • break all our links (zulip, commit message, etc) to mathlib3 issues/pr (if we move mathlib to mathlib3)
    • break commit message links to issues, and require all open mathlib4 PRs to be re-opened (if we archive the mathlib4 repo and move all the work back into mathlib3)

Kevin Buzzard (Jul 19 2023 at 17:17):

I would definitely vote for blame pointing to the mathlib3 history. I am already missing this. I see a comment and think "oh who wrote that -- oh darn it's annoying to find out". As for repo renaming, I am less excited about this (for the reasons you mention).

Jireh Loreaux (Jul 19 2023 at 17:42):

I would prefer mathlib3 history; and archiving of mathlib4 repo (requiring reopening mathlib4 PRs) instead of renaming repos. I feel like that's an acceptable compromise to reclaim the repo name. It also means that existing mathlib3 PRs can be migrated "easier" (well, sort of).

Yaël Dillies (Jul 20 2023 at 10:34):

It also means we get back to a unique numbering system.

Eric Wieser (Jul 20 2023 at 10:37):

/poll Ignoring details of the git history, what should we do about repo names on github

  • Do nothing. All future development happens in mathlib3.
  • Archive mathlib4 and move Lean 4 development back to mathlib. Transfer all issues, ask contributors to re-open PRs.
  • Rename mathlib to mathlib3 and mathlib4 to mathlib, breaking all links to mathlib3 code.

Eric Wieser (Jul 20 2023 at 10:39):

Edit for visibility: https://github.com/leanprover-community/mathlib/wiki/Merging-github-repos-and-history compares the consequences of these options.

Sorry, that first option was obviously a typo!

Mario Carneiro (Jul 20 2023 at 10:40):

We can ask the same question about lean4

Eric Wieser (Jul 20 2023 at 10:41):

If we want to do the second option, we should do it ASAP to minimize the number of PRs that have to be re-opened. We could do that by:

  • Renaming the master branch of mathlib to lean3-master
  • Putting the master branch of mathlib4 in the mathlib repo
  • Deal with merging the history later

Mario Carneiro (Jul 20 2023 at 10:42):

I'm not even sure it's worth merging the git histories

Mario Carneiro (Jul 20 2023 at 10:43):

I really like having the new repo to give us a clean break between lean 3 and lean 4 things

Eric Wieser (Jul 20 2023 at 10:43):

I think merging the git histories probably is worthwhile, because:

  • It makes a blame possible
  • It means the github contributor stats reflect the mathlib3 effort too

Mario Carneiro (Jul 20 2023 at 10:44):

the contrib stats are a good point, they look really weird right now

Eric Wieser (Jul 20 2023 at 10:45):

Merging the repos is less of a win; the only obvious benefit is that the mathlib3 commits with titles like "feat: some PR (#XXXX)" will not end up being incorrectly linkified by github

Mario Carneiro (Jul 20 2023 at 10:45):

in exchange for incorrectly linkifying all the mathlib4 PR numbers

Eric Wieser (Jul 20 2023 at 10:46):

Sorry, I mispoke: either we have the problem in my comment (if we merge histories in mathlib4 but leave the repos alone), or the problem in your comment (if we merge the repos back into mathlib)

Mario Carneiro (Jul 20 2023 at 10:47):

We probably want to merge histories soon, since refactors will make the blame game harder

Eric Wieser (Jul 20 2023 at 10:47):

I think the pressure is off with the merging strategy I outlined above

Eric Wieser (Jul 20 2023 at 10:48):

We can always do a merge at the port-complete tags, and then merge in nn weeks of refactors that came after that

Eric Wieser (Jul 20 2023 at 10:49):

Eric Wieser said:

Sorry, I mispoke: either we have the problem in my comment (if we merge histories in mathlib4 but leave the repos alone), or the problem in your comment (if we merge the repos back into mathlib)

We can of course avoid both outcomes if we're willing to rewrite history, but that seems rather disruptive

Mario Carneiro (Jul 20 2023 at 10:51):

we either have the commits the same as mathlib3, or the issue numbers

Mario Carneiro (Jul 20 2023 at 10:52):

I'd go for keeping the commits and let people figure out how to copy the issue numbers

Johan Commelin (Jul 20 2023 at 10:53):

Yeah, there are too many pointers at specific commits of master. I would like those to stay commits in the history of master.

Julian Berman (Jul 20 2023 at 10:53):

I didn't follow your bit Eric. There's of course probably a third option though to throw out, which is more work, but I think fixes what I'm guessing I should have understood -- namely create some new branch in mathlib4 (main or whatever), run git-filter-repo on mathlib and stick those commits on the new main (of mathlib4) while rewriting all the #XXXX links to leanprover-community/mathlib3#XXXX, then rename the repos (mathlib -> mathlib3 + archive, mathlib4 -> mathlib, maybe with transfering issues)

Possibly I'm still not following all the potential problems though

Eric Wieser (Jul 20 2023 at 10:53):

... or we merge the repos into mathlib4, and have the same problem but for mathilb4 instead of mathlib3

Eric Wieser (Jul 20 2023 at 10:53):

For which there are fewer of each

Mario Carneiro (Jul 20 2023 at 10:55):

Possibly I'm still not following all the potential problems though

The issue is that this changes the commit SHAs for mathlib commits, which as Johan says are referred to in many places

Julian Berman (Jul 20 2023 at 10:55):

Ah the commit SHAs. got it.

Mauricio Collares (Jul 20 2023 at 11:02):

Issues can be moved between repos, I think. PRs can't, but it's easy to edit mathlib3 PR #X descriptions to include "perhaps you want PR mathlib4#X instead?"

Eric Wieser (Jul 20 2023 at 11:03):

Let me make a wiki page to summarize the options here

Mauricio Collares (Jul 20 2023 at 11:10):

Here's a cross-reference to an older thread because it includes discussion on .git-blame-ignore-revs: https://leanprover.zulipchat.com/#narrow/stream/113488-general/topic/mathport.20and.20git.20history

Mauricio Collares (Jul 20 2023 at 11:11):

It's a pity the automated mathport commits are squashed by bors, because ignoring just them (and not the manual fixes) would be great.

Mauricio Collares (Jul 20 2023 at 11:18):

Mario Carneiro said:

Possibly I'm still not following all the potential problems though

The issue is that this changes the commit SHAs for mathlib commits, which as Johan says are referred to in many places

You can mention old mathlib4 SHAs in the rewritten commit message (and keep old commits somewhere, either in mathlib4 or in a mathlib3 branch)

Eric Wieser (Jul 20 2023 at 11:28):

Having two copies of every commit sounds pretty confusing to me

Mauricio Collares (Jul 20 2023 at 11:31):

The "rewritten" commit could be a merge of the original mathlib4 commit to the main branch. Then the mathlib4 part of git log --graph would be a pretty ladder 🪜

Eric Wieser (Jul 20 2023 at 11:37):

Eric Wieser said:

Let me make a wiki page to summarize the options here

https://github.com/leanprover-community/mathlib/wiki/Merging-github-repos-and-history

Eric Rodriguez (Jul 20 2023 at 12:34):

I think it ends up convenient that we predate the master/main switch.

Eric Rodriguez (Jul 20 2023 at 12:34):

It should make some things easier here, right?

Eric Rodriguez (Jul 20 2023 at 12:34):

Old commits can be on master, the new history can be on main

Eric Wieser (Jul 20 2023 at 12:43):

I think that probably makes things less confusing for option2 if we have the two branches be main and lean3-main, with no master in sight (I've added a remark to that effect). I don't think it adds any value to the other options.

Eric Rodriguez (Jul 20 2023 at 12:51):

wouldn't lean3-main kill any old hyperlinks?

Eric Wieser (Jul 20 2023 at 12:52):

Hyperlinks to what?

Eric Rodriguez (Jul 20 2023 at 12:54):

to github, although I'm realising that most github links don't link to master but usually to a specific commit

Eric Wieser (Jul 20 2023 at 12:54):

Links to (files and lines) on master are broken all the time by refactors anyway

Eric Wieser (Jul 20 2023 at 12:55):

And the port is one heck of a refactor


Last updated: Dec 20 2023 at 11:08 UTC