Zulip Chat Archive

Stream: general

Topic: mathport and git history


Patrick Massot (Dec 22 2021 at 15:14):

I don't think this is a goal in this process. Things are difficult enough.

Anne Baanen (Dec 22 2021 at 15:17):

Would it be at all within the realm of possibility to run git filter-branch --tree-filter 'synport **/*.lean' on the entirety of mathlib history? I might look into that if I suddenly find myself with too much free time :)

Mauricio Collares (Dec 22 2021 at 15:19):

On the other hand, it requires zero Lean skills so the chance of executing this in parallel (by finding a different set of volunteers to do the repo surgery) is much higher. But it requires a few executive decisions first so the mathlib3 and mathlib4 GitHub Issues don't end up in different repositories if the history surgery is eventually done.

Mauricio Collares (Dec 22 2021 at 15:24):

The easiest way to do that would be to probably copy the git /mathlib repo to /mathlib3 when the day comes (instead of renaming through GitHub) and push a big commit to /mathlib with the migration. This preserves history and issues, and also allows someone to do work in the future to improve git blame support across versions.

Arthur Paulino (Dec 22 2021 at 15:39):

I like this approach. Another way to think of it is starting a fresh repository with precisely the current mathlib files and git history and then do a big commit replacing everything by the migrated version.

Mauricio Collares (Dec 22 2021 at 15:44):

This depends on whether people decide keeping old issues/PRs in the same repo as new ones is desirable, because those don't live in-repo. It has its pros and cons.

Arthur Paulino (Dec 22 2021 at 15:49):

(deleted)

Eric Wieser (Dec 22 2021 at 15:56):

Arthur Paulino said:

I like this approach. Another way to think of it is starting a fresh repository with precisely the current mathlib files and git history and then do a big commit replacing everything by the migrated version.

This is the approach I would lean towards. In particular, it means that git-savvy users can:

  • Run synport on the new files in some stale lean3 PR
  • merge mathlib4/master into the PR, and have things mostly just work

Mauricio Collares (Dec 22 2021 at 15:58):

Wouldn't that be easier if mathlib was the Lean 4 repo post-migration? You could even update stale lean3 PRs in-place.

Johan Commelin (Dec 22 2021 at 16:00):

(Note: this discussion originated as reply to a community blogpost. See https://leanprover.zulipchat.com/#narrow/stream/113488-general/topic/Community.20blog/near/265810799)

Eric Wieser (Dec 22 2021 at 16:01):

Keeping it separate has the advantages of:

  • A clean slate for Prs in mathlib4
  • Leaving behind an easily viewable snapshot of working lean3 PRs, that don't end up being wrecked by a half-baked attempt at a lean4 port before users abandon them again

Eric Wieser (Dec 22 2021 at 16:02):

In principle we can decide what to do about where to keep old mathlib3 PRs later, we can always swap around the git repos under the hood if we change our mind (as long as we have no mathlib4 PRs yet)
!

Eric Wieser (Dec 22 2021 at 16:02):

Maybe the best argument for keeping the same PRs (as you suggest, @Mauricio Collares) is that PR references in commit messages will continue to point to the right thing.

Eric Wieser (Dec 22 2021 at 16:03):

That is, preserving the closed PRs is more important than the open ones

Mauricio Collares (Dec 22 2021 at 16:13):

Eric Wieser said:

Keeping it separate has the advantages of:

  • A clean slate for Prs in mathlib4
  • Leaving behind an easily viewable snapshot of working lean3 PRs, that don't end up being wrecked by a half-baked attempt at a lean4 port before users abandon them again

For the second point, there would still be a snapshot of branches in a read-only, archived /mathlib3 repository, so people could go to the mathlib PR, copy the branch name and look it up in /mathlib3 in case of wreckage. It's a bit more work but it sounds like an exceptional case.

I don't know enough about mathlib to opine on the first point (and maybe the right choice will depend on the usability of synport on flag day), but in general "clean slate" doesn't sound like a good thing when it comes to version control. If that is desired, though, it can be "emulated" by closing all PRs on flag day.

Mario Carneiro (Dec 22 2021 at 20:45):

Note that there are separate repos for https://github.com/leanprover/lean2/, https://github.com/leanprover/lean/ and https://github.com/leanprover/lean4/, so I think I know the strategy that Leo prefers: a clean break in a new repository. Similarly mathlib is in a separate repository from the lean 2 library (well, they were not separated into core + mathlib at the time so this is no surprise). So I would suggest keeping the new mathlib4 repository separate forever, although we might change the repo names around at some point.

Mario Carneiro (Dec 22 2021 at 20:47):

If you want to do git archaeology, it's really not that bad to switch over to the other repo if you hit the bottom and want to keep digging

Mario Carneiro (Dec 22 2021 at 20:51):

If we did concatenate the histories in some way, I don't think the git diff would be useful at all. Early mathlib4 history will show entire files being ported from somewhere, and then the initial commit will show deletion of a bunch of stuff that is not restored until way later. Much simpler just to acknowledge that they have completely separate edits. If every mathlib change had a corresponding mathlib4 change it might be a different story, but the port isn't going to work like that except possibly in the very late stage

Adam Topaz (Dec 22 2021 at 20:51):

What are the benefits of keeping the git history? Is the main one git blame?

Eric Wieser (Dec 22 2021 at 20:53):

And the ability to merge old mathlib3 branches with git trying to help

Mario Carneiro (Dec 22 2021 at 20:53):

git can't help...

Eric Wieser (Dec 22 2021 at 20:53):

I'd counter with "what would be the benefit of not keeping the history?"

Mario Carneiro (Dec 22 2021 at 20:55):

It amounts to providing the fiction that mathlib4 was somehow produced by editing mathlib, which is not the case and would be very hard to present well

Eric Wieser (Dec 22 2021 at 20:55):

But that's not fiction, is it? The "editing" is performed by synport

Mario Carneiro (Dec 22 2021 at 20:56):

The process that produces files that appear in mathlib4 is complex, involving automatic translation of a mathlib file plus a bunch of human touchup. There is no way git will have a good time trying to understand this relationship

Arthur Paulino (Dec 22 2021 at 20:58):

Knowing "who did this" might be helpful, but one can just ask here on Zulip.
Another advantage of git history is keeping growth data. But it's a bit strange to mix mathlib3 and 4 contribution data

Mario Carneiro (Dec 22 2021 at 20:58):

Plus, there will be a long period of co-evolution (which we are currently in), where mathlib3 still receives commits while mathlib4 is doing its own thing. Appending histories implies that there is a "last" commit of mathlib3, and I don't think we ever need to make a hard break on this

Mario Carneiro (Dec 22 2021 at 20:59):

It's not like we're going to delete the mathlib3 repo, so calling it "not keeping the history" is disingenuous

Mario Carneiro (Dec 22 2021 at 21:00):

I'm not even sure we need to archive it

Mario Carneiro (Dec 22 2021 at 21:00):

You can still contribute to lean 2 if you want

Arthur Paulino (Dec 22 2021 at 21:02):

It's a very personal thing but maybe others resonate: I like enclosing cycles and doing things in a way that help me build the feeling of starting fresh and wiser :smiley:

Mario Carneiro (Dec 22 2021 at 21:02):

If I were plotting growth data, I would put mathlib3 and mathlib4 on different lines / colors and let readers connect the dots themselves

Mauricio Collares (Dec 22 2021 at 21:10):

Mario Carneiro said:

Note that there are separate repos for https://github.com/leanprover/lean2/, https://github.com/leanprover/lean/ and https://github.com/leanprover/lean4/, so I think I know the strategy that Leo prefers: a clean break in a new repository. Similarly mathlib is in a separate repository from the lean 2 library (well, they were not separated into core + mathlib at the time so this is no surprise). So I would suggest keeping the new mathlib4 repository separate forever, although we might change the repo names around at some point.

The lean4 repo started as a copy of the lean repo (history starts in 2013 and the least common ancestor is d36b859c6579ce1b86f257a494bb99417c7cdac1, from 2018). But, as you said, this may be an apples to oranges comparison.

Mario Carneiro (Dec 22 2021 at 21:49):

Also, mathlib derives from an earlier repository via manual porting, which will probably look similar to the mathlib4 port. The repo can now be found at https://github.com/avigad/library_dev

Mario Carneiro (Dec 22 2021 at 21:50):

It does not have any common commits with mathlib, but many files are copied from it in the first few commits

Mac (Dec 23 2021 at 11:39):

@Mario Carneiro is the intention to add synported files to the current mathlib4 repository or to start a new fresh respository with the whole synported merged with state of mathlib4 at that time? If it is the former, keeping them separate is almost a necessity as mathlib4 already has its own PRs and issues that one would probably not want to smash.

Mario Carneiro (Dec 23 2021 at 12:04):

Right now, it's the former. We might revisit this question when we are ready to start the large scale port, but there are files that need to be ported before that because they are used in tactics, and those are all going into mathlib4 as is. I expect somewhere around 5-10% of mathlib to be "pre-ported" in this way before we can do the rest of it.

Mauricio Collares (Mar 15 2022 at 17:24):

Mario Carneiro said:

The process that produces files that appear in mathlib4 is complex, involving automatic translation of a mathlib file plus a bunch of human touchup. There is no way git will have a good time trying to understand this relationship

Now that GitHub supports ignoring commits in the blame view, I did a simplistic experiment to see how well Git would handle blame data if we simply replaced every file by its mathported version. It's at my mathlib fork (for example, look at https://github.com/collares/mathlib/blame/master/src/LinearAlgebra/Basic.lean, https://github.com/collares/mathlib/blame/master/src/GroupTheory/Sylow.lean or https://github.com/collares/mathlib/blame/master/src/Algebra/Group/Freiman.lean).

Yaël Dillies (Mar 15 2022 at 17:37):

Wow, that third file is really well written!

Eric Wieser (Mar 15 2022 at 18:19):

I suspect we want to do that, but also merge in the Mathlib4 history at the same time, so that we don't lose the non-mathportable tactic history there. Either way, thanks for the experiment!

Mauricio Collares (Mar 15 2022 at 18:21):

Eric Wieser said:

I suspect we want to do that, but also merge in the Mathlib4 history at the same time, so that we don't lose the non-mathportable tactic history there. Either way, thanks for the experiment!

Right! That seems doable with git subtree split.

Eric Wieser (Mar 15 2022 at 18:36):

git merge --allow-unrelated-histories ought to be enough


Last updated: Dec 20 2023 at 11:08 UTC