Zulip Chat Archive

Stream: mathlib4

Topic: Backward compatibility

Yury G. Kudryashov (Dec 23 2023 at 19:23):

UPD: this message and the next 2 were moved from the previous discussion.

@Tudor achim I have a few planned refactors that can't go through the deprecation scheme you suggest:

redefine bundled sets/homomorphisms as suggested in #2202
generalize the definition of docs#Absorbs, changing defeq and preserving most lemmas;
generalize the definition of docs#HasFDerivAtFilter so that it works for topological vector spaces;
redefine docs#Set.image2 to use ∃ x ∈ s, ∃ y ∈ t, _ instead of ∃ x y, x ∈ s ∧ y ∈ t ∧ _ (not sure that this one will be accepted)

What would you suggest in these cases?

Yaël Dillies (Dec 23 2023 at 19:26):

(I definitely want the last one to happen)

Yury G. Kudryashov (Dec 23 2023 at 19:27):

See branch#YK-image2-def; I'm pushing PRs for golfs I made while working on it now.

Yury G. Kudryashov (Dec 23 2023 at 19:29):

Opening a new topic for discussion about backward compatibility started in https://leanprover.zulipchat.com/#narrow/stream/287929-mathlib4/topic/Renaming.20monotonicity.20of.20powers.20lemmas (and several times before that)

Ruben Van de Velde (Dec 23 2023 at 19:50):

I think mathlib's ability and willingness to change even pretty central parts of the library is one of its great advantages, though it clearly has negative effects on downstream projects. A few thoughts:

In the case of renaming definitions where the old name isn't reused, I think we could generally keep a deprecated alias around at low cost. It might be necessary to improve the support for deprecation if we do this.
We could set up infrastructure to test a PR against a body of known dependents; possibly this is planned for reservoir?
We could create a place where potentially significant breaking changes are announced, possibly in advance (and then make sure we use it)
We had a bot to automatically bump mathlib3 in dependents, which helped catch issues sooner. I suspect this could be resurrected.
We could move completed projects into some sort of "staging" part of the mathlib repository, rather than trying to land them piecemeal over a number of months (or giving up before landing them entirely)
When changing core parts of the library, we could consider keeping the old and new version side by side, though that's probably too much work for most changes.

Yury G. Kudryashov (Dec 23 2023 at 20:13):

What about the effect on AI projects? Someone trains AI on the old library, then it generates code that no longer works. Disclaimer: I'm not an expert in AI, so I don't know how big is this issue for AI projects.

Tudor achim (Dec 23 2023 at 20:14):

I'm not an expert on mathlib yet, so @Yury G. Kudryashov and @Yaël Dillies I don't have enough knowledge of the dynamics in the areas you're referring to.

I think every major software effort like mathlib has some parts that are more under development than others (linux kernel is a great example). For things that are "more central" as @Yaël Dillies described I think a deprecation path is actually necessary. For things that are understood to be under active development (and it should be clear to users which is which), a more adaptable strategy is required.

Notification Bot (Dec 23 2023 at 20:15):

A message was moved here from #mathlib4 > Renaming monotonicity of powers lemmas by Tudor achim.

Yury G. Kudryashov (Dec 23 2023 at 20:21):

docs#Submonoid is a pretty low-level definition that should stay for a long time unless we implement #2202

Tudor achim (Dec 23 2023 at 20:23):

@Ruben Van de Velde I think those are great suggestions. It seems like the sum total of those steps would be a pretty big workflow departure from the current way that mathlib is maintained, so going step by step is a good way to get there.

I think the best first step that would help us, and probably lots of other users that are not as attuned to the mathlib dev cycle, is to define a deprecation workflow that preserves functionality and then apply that workflow to the "core" aspects of Mathlib, however the maintainers/community define those.

Yury G. Kudryashov (Dec 23 2023 at 20:33):

I think that we need:

a bot that generates and posts API diff for each PR; @Alex J. Best can leaff help here?
topic or a stream for announces of large refactors that can't be made in a backward-compatible way (e.g., changing definitions);
workflow for other refactors that improves backward compatibility; e.g.,
- renames should preserve old lemmas, probably deprecated;
- generalizations of typeclass assumptions are fine (?);
- how do we handle changing explicit assumptions to something weaker?

Ruben Van de Velde (Dec 23 2023 at 20:39):

Oh yeah, API diff would be great, also when moving things around

Eric Rodriguez (Dec 23 2023 at 20:43):

what's leaff?

Tudor achim (Dec 23 2023 at 20:51):

Eric Rodriguez said:

what's leaff?

https://github.com/alexjbest/leaff

Eric Wieser (Dec 23 2023 at 20:59):

Presumably one shortcoming in leaff is that it can't tell when a deletion/addition should actually be considered a rename/tweak. I guess the question is whether we want to use the git model of using a heuristic diff, or the mercurial model of recording it up front (and providing tools to initialize the recorsing heuristically)

Eric Wieser (Dec 23 2023 at 21:03):

If we choose the latter, there's also the question of whether we want this data to be immutably in the main git history (giving us no easy way to recover from incorrect or missing data), or stored elsewhere (git notes, an external database, PR descriptions, ...)

Yury G. Kudryashov (Dec 23 2023 at 21:52):

First, I would love to see that data as a comment to each successfully compiled PR.

Eric Wieser (Dec 23 2023 at 21:57):

Yes, even if it's not ready to be used as a migration tool, the current iteration looks useful for review

Mario Carneiro (Dec 23 2023 at 22:08):

maybe a bot posts this in a comment?

Mario Carneiro (Dec 23 2023 at 22:08):

I guess that's what yury means

Yury G. Kudryashov (Dec 23 2023 at 22:09):

Yes, that's what I had in mind.

Yury G. Kudryashov (Dec 23 2023 at 22:09):

Who volunteers to write the bot?

Alex J. Best (Dec 23 2023 at 23:45):

I already wrote the bot #8479 it uses a sticky comment to update to the latest leaff output

Alex J. Best (Dec 23 2023 at 23:47):

Eric Wieser said:

Presumably one shortcoming in leaff is that it can't tell when a deletion/addition should actually be considered a rename/tweak. I guess the question is whether we want to use the git model of using a heuristic diff, or the mercurial model of recording it up front (and providing tools to initialize the recorsing heuristically)

Can you elaborate more what you mean here. Currently leaff emits a few different types of difference, eg. changed proof, changed type, renamed declaration, deleted declaration etc. I plan to add support for some combinations, like rename + proof changed.

Yury G. Kudryashov (Dec 24 2023 at 03:31):

Related question: #9078 by @Ruben Van de Velde removes some lemmas that were deprecated for almost a year. It turns out that these lemmas were used a few times in Mathlib and this code didn't generate warnings/errors in the CI. Is it a bug in (our handling of) @[deprecated]?

Yury G. Kudryashov (Dec 25 2023 at 21:40):

While we don't have formal backward compatibility guidelines, I suggest 1 simple rule: if a downstream project wants to restore some lemmas removed in Mathlib, then they should submit a PR that restores the old lemmas (probably, as deprecated aliases to the new names).

Yury G. Kudryashov (Dec 25 2023 at 21:56):

I propose a backward-incompatible change to the simp set in https://leanprover.zulipchat.com/#narrow/stream/287929-mathlib4/topic/mem_image
I haven't started working on the implementation yet (and won't till I get some feedback).

Tudor achim (Dec 26 2023 at 01:05):

Yury G. Kudryashov said:

While we don't have formal backward compatibility guidelines, I suggest 1 simple rule: if a downstream project wants to restore some lemmas removed in Mathlib, then they should submit a PR that restores the old lemmas (probably, as deprecated aliases to the new names).

that is a pretty bad user experience for downstream projects

Tudor achim (Dec 26 2023 at 01:06):

what's the process by which the mathlib maintainers decide on a policy for backwards compatibility?

Yury G. Kudryashov (Dec 26 2023 at 01:07):

Up until now, we started the discussion a few times, then never came to any conclusion because nobody pushed for backwards compatibility hard enough.

Yury G. Kudryashov (Dec 26 2023 at 01:09):

Also, up until very recently (1 year? 2 years?) the project was so small that nobody cared about formal procedures.

Yury G. Kudryashov (Dec 26 2023 at 01:11):

Currently, the backwards compatibility policy is "there is no compatibility".

Yury G. Kudryashov (Dec 26 2023 at 01:12):

To change it, we need (a) decide what are we ready to promise; (b) write tools to enforce these promises.

Yury G. Kudryashov (Dec 26 2023 at 01:15):

@Alex J. Best says that leaff will be ready for use on all PRs in about a week.

Yury G. Kudryashov (Dec 26 2023 at 01:20):

Once we have it, we can, e.g., (this is my idea, not a decision of the team)

add a stream where the bot will post API changes in submitted PRs, if some theorems are deleted or some definitions are changed; no need to spam if a PR only adds theorems;
enforce a delay (a few days?) for these PRs;
mark some files as "we promise we won't delete/change theorems in this file without a deprecation period".

Yury G. Kudryashov (Dec 26 2023 at 01:21):

BTW, the downstream code can fail if we add @[simp] to a lemma upstream, especially if you use non-terminal simps.

Yury G. Kudryashov (Dec 26 2023 at 01:38):

Also, up until now most of the downstream projects had a bounded life span (i.e., "formalize that theorem, and we're done") and at least half of them used a frozen mathlib version as a dependency.

Johan Commelin (Dec 26 2023 at 06:19):

I think that last point is important. (The other points are good as well :smiley:) It would be good to get more awareness for projects that depend on mathlib long-term. Because we just haven't had many examples of such projects yet. @Tudor achim is your project that depends on mathlib published somewhere? Or is it closed source?

Yury G. Kudryashov (Dec 26 2023 at 06:43):

I know the answer: closed source (at least, for now)

Eric Wieser (Dec 26 2023 at 08:50):

I don't love the idea of rushing to establish a backwards compatibility policy (e.g. start requiring @[deprecated] alias everywhere) that creates a burden for many mathlib contributors, when the only motivation is a closed source project, and even our high-profile open source projects (LTE, sphere eversion, PFR, ...) did not seem to motivate such a change.

Eric Wieser (Dec 26 2023 at 09:39):

But I do think this is a great reason to invest more in things like leaff

Geoffrey Irving (Dec 26 2023 at 09:52):

https://github.com/girving/ray is an example of a project which has existed as Lean 4 for a while (mainly because I haven’t had time to work on upstreaming much). It would certainly be easier if there was a standard command to run that would do the various renames since I last synced; currently I have my own script for this which I gradually update.

Johan Commelin (Dec 26 2023 at 10:23):

@Geoffrey Irving Do you think it is worthwhile to share your mathlib-bump-sync-update script, until we have better tooling? Or would you rather keep it private for now?

Geoffrey Irving (Dec 26 2023 at 10:25):

No reason to keep it private, but it's extremely hacky. It's called port-fix because I initially used it when I was doing the Lean 4 port. We'd want something way cleaner if it was standard. It probably best serves as an example of what people have to do in the absence of standard tooling.

port-fix

Eric Wieser (Dec 26 2023 at 10:27):

Ideally if we build such tooling "properly", it would be able to run against versions of mathlib 4 that predate it

Mario Carneiro (Dec 26 2023 at 13:25):

port-fix looks like a hacky version of mathport. Is it possible that we can use #align declarations over time as a migration assistant?

Geoffrey Irving (Dec 26 2023 at 13:26):

^ Yes, I think that could work very well, and it would be a lot simpler than Mathport due to working entirely inside Lean 4.

Mario Carneiro (Dec 26 2023 at 13:42):

I think it will always be difficult to run migration and diffing entirely in lean, because lean doesn't like to talk to other versions of itself

Geoffrey Irving (Dec 26 2023 at 13:45):

Imperfection seems fine here, and for small incremental upgrades the two versions will be similar enough that the #aligns could be parsed and applied in the new version. They could even be interpreted as macros that you can click on in VSCode to do the update.

Yury G. Kudryashov (Dec 26 2023 at 14:41):

Probably, I should disclose a COI: I've recently accepted a job offer from @Tudor achim, applying for a visa now.

Tudor achim (Dec 26 2023 at 15:41):

Eric Wieser said:

I don't love the idea of rushing to establish a backwards compatibility policy (e.g. start requiring @[deprecated] alias everywhere) that creates a burden for many mathlib contributors, when the only motivation is a closed source project, and even our high-profile open source projects (LTE, sphere eversion, PFR, ...) did not seem to motivate such a change.

The reasons to establish a backwards compatibility policy are (and these are common to all high profile OSS projects, which mathlib4 already is / will soon become):

For a better human experience i.e. not having to re-learn things in the standard library except in major release bumps
To prevent downstream projects from needing active maintenance all the time to keep up. I.e. consider that you can download most python 3 code from the internet and, generally speaking, have it run without modification, which makes it really easy for people to get into python and build on what's there

If the vision for mathlib is that it will be one monorepo for all of internet Lean4 math, then (2) is not really a concern. But imo the probable outcome is that there will be much more Lean 4 math code outside of mathlib than inside (consider what happens when the majority of papers are formalized/automatically formalized), so it makes sense to keep core parts of the library more stable than the leaf nodes.

On my end, I'm cool with whatever the community decides and am happy to continue participating in the discussion.

Tudor achim (Dec 26 2023 at 15:46):

LTE, sphere eversion, PFR

I think the question is what happens with projects like that if they don't make it into mathlib -- in my view it would be unfortunate if they need frequent maintenance with minor version releases of mathlib to keep working. So if there's a straightforward way to lower the probability of that substantially (like with deprecation paths) it seems like it ought to be considered.

Tudor achim (Dec 26 2023 at 15:51):

Mario Carneiro said:

port-fix looks like a hacky version of mathport. Is it possible that we can use #align declarations over time as a migration assistant?

To make sure I understand -- is the idea here that #aligns could be added to mathlib PRs that deprecate old lemma names, and then downstream projects could include a build step that automatically applies all the #aligns to yield a version that works with the latest mathlib (in the case where the types themselves were not changed)?

Edit: if yes, I think this checks the boxes, since now the onus is on the user of the downstream projects to simply run a build step and not on the authors to keep doing manual work. I think if the tool is well-tested that should work well.

Yaël Dillies (Dec 26 2023 at 15:53):

Even if the types changed, that would still be useful, as you only have to add/remove/shuffle arguments, rather than finding what declaration this used to be.

Tudor achim (Dec 26 2023 at 15:54):

Yaël Dillies said:

Even if the types changed, that would still be useful, as you only have to add/remove/shuffle arguments, rather than finding what declaration this used to be.

Makes sense. Would adding such #align declarations be considered a serious burden for PR authors?

Yaël Dillies (Dec 26 2023 at 15:57):

Without automation, yes. With automation, it should be fine.

Ruben Van de Velde (Dec 26 2023 at 15:57):

Tudor achim said:

LTE, sphere eversion, PFR

I think the question is what happens with projects like that if they don't make it into mathlib -- in my view it would be unfortunate if they need frequent maintenance with minor version releases of mathlib to keep working. So if there's a straightforward way to lower the probability of that substantially (like with deprecation paths) it seems like it ought to be considered.

There's two kinds of "keep working" to consider here, though. If a downstream project is abandoned, it stays pinned to a specific mathlib version that will rot no faster than the rest of the computing platform it's built on. There's no guarantee that it will keep working with the most recent mathlib though, and I don't think it's feasible to guarantee that, because basically every line of mathlib is "API", in a sense. That doesn't mean we can't make upgrading easier

Ruben Van de Velde (Dec 26 2023 at 15:59):

There's also no fundamental reason that a legacy project needs to continue compiling with the most recent mathlib, unless you want to build on top of it

Yaël Dillies (Dec 26 2023 at 16:01):

Actually I don't think plain #align-like statements will work since a lemma could be renamed several times. We need to record exactly when a lemma was renamed, and this information should not be overwritten by future PRs. So probably what we want is a database external to mathlib (or at least to the Lean code it relates to, maybe by having a big JSON inside mathlib) which would record diffs on a lemma level. Tools like Leaff could provide a first approximation of the diff, and PR authors could subsequently fix the diff manually.

Yaël Dillies (Dec 26 2023 at 16:02):

It would be even better if there was no need to record anything at all. After all, git diffs are not recorded, but recomputed on the fly. Hopefully, Leaff could reach that level too.

Yaël Dillies (Dec 26 2023 at 16:04):

In that ideal world, you could upgrade your mathlib dependency from version A to version B by getting Leaff to compute all "lemma diffs" between consecutive versions of mathlib from A to B and using those diffs to automatically rename the lemmas in your project.

Yaël Dillies (Dec 26 2023 at 16:05):

And maybe Leaff could even compute argument reorders and the upgrade tool could rearrance the arguments in lemma calls automatically.

Mario Carneiro (Dec 26 2023 at 16:07):

Yaël Dillies said:

Actually I don't think plain #align-like statements will work since a lemma could be renamed several times. We need to record exactly when a lemma was renamed, and this information should not be overwritten by future PRs. So probably what we want is a database external to mathlib (or at least to the Lean code it relates to, maybe by having a big JSON inside mathlib) which would record diffs on a lemma level. Tools like Leaff could provide a first approximation of the diff, and PR authors could subsequently fix the diff manually.

My idea is that #align statements exist in the git repo which means they provide information regarding how a declaration has moved over time

Mario Carneiro (Dec 26 2023 at 16:08):

When being used directly for migration, you don't need to know when and where they changed, only what the initial and final names are

Yaël Dillies (Dec 26 2023 at 16:08):

Ah so the tool would not look at the #align in the current version of mathlib but in older revisions? Then an #align could technically be present for a revision only and that would be enough already. How do you decide when to remove them?

Mario Carneiro (Dec 26 2023 at 16:08):

if we use #align for this then we can't remove them

Yaël Dillies (Dec 26 2023 at 16:09):

Nevermind, you disambiguated before I pressed Enter.

Mario Carneiro (Dec 26 2023 at 16:09):

basically the LHS #align name plays the role of a permanent identifier which is never changed even if the real declaration changes

Mario Carneiro (Dec 26 2023 at 16:11):

the drawback of using the literal #align command is that this was designed for lean 3 alignment and so there are a lot of lean 4 declarations which lack #align because they are new in lean 4

Mario Carneiro (Dec 26 2023 at 16:12):

so if we want to rename them we get no migration assistance for them

Geoffrey Irving (Dec 26 2023 at 16:12):

Can’t we just add new #aligns?

Yaël Dillies (Dec 26 2023 at 16:13):

Okay no, your idea doesn't work. What if I have lemma A which is renamed to B? I would need

#align A B

But now if I add another lemma called A which I subsequently rename to C I would need to add

#align A C

and now you're in a pickle since you don't know whether a call to A in a downstream project should be translated to B or C. The correct answer will depend on what version of mathlib the call to A was written with, and your #align mechanism has forgotten this information.

Yury G. Kudryashov (Dec 26 2023 at 16:13):

Also, #aligns don't help with refactors that reorder/drop arguments of a lemma.

Mario Carneiro (Dec 26 2023 at 16:13):

No the A needs to be a permanent identifier which is not necessarily related to the initial name

Alex J. Best (Dec 26 2023 at 16:14):

Independently of auto updating I think some way of having perma-ids for relatively important results will be important for mathlib long term. But I agree that it would be burdonesome to have these for all lemmas. And the small lemmas are normally the ones that get refactored the most

Geoffrey Irving (Dec 26 2023 at 16:15):

For people who haven’t seen Unison, it might be useful to look at for ideas. They have a bunch of machinery for incremental refactoring, though it’s built pretty deeply into the language and thus not necessarily applicable.

https://www.unison-lang.org

Mario Carneiro (Dec 26 2023 at 16:16):

It is plausible that we can obtain this information by looking back through the history but automatically following renames seems incredibly difficult without something like #align

Yaël Dillies (Dec 26 2023 at 16:16):

I think littering mathlib code with permanent identifiers is not going to do us well. Anybody could (accidentally) edit them and they will be permanent junk in files. Automatically calculating diffs doesn't take any space in mathlib files and is guaranteed to not break with later commits.

Yaël Dillies (Dec 26 2023 at 16:17):

Mario Carneiro said:

automatically following renames seems incredibly difficult without something like #align

I think Leaff is doing pretty well at that?

Mario Carneiro (Dec 26 2023 at 16:17):

Not really, it doesn't do renames at all?

Yaël Dillies (Dec 26 2023 at 16:18):

Hmm... I must have misread

Mario Carneiro (Dec 26 2023 at 16:19):

IMO renames are the most important part to get right in automated migration. Other things can be handled by manual work but when a name change is mixed with type and other changes then it's really a nightmare to upgrade

Mario Carneiro (Dec 26 2023 at 16:21):

An alternative to #align in the files is some machine checkable format for describing diffs to go along with renaming PRs. But we would have to invent a whole ecosystem of tools to make sure that we don't mess this data up

Alex J. Best (Dec 26 2023 at 16:24):

What do you mean by "renames" leaff will identify if a lemma is renamed with the same type and proof. And soon if it has the same type but different proof also

Alex J. Best (Dec 26 2023 at 16:26):

I think many changes could be tracked over time as a composition of renames and proof / type changes. Of course if the name and type change within one commit there would likely need to be human intervention to assist in tracking that the lemmas are morally the same somehow

Mario Carneiro (Dec 26 2023 at 16:29):

you should probably also consider the case where the type is equal to a renaming of the original type

Mario Carneiro (Dec 26 2023 at 16:29):

aka dependent rename

Mario Carneiro (Dec 26 2023 at 16:30):

plus, "one commit" is probably more than just one commit because we squash merge

Geoffrey Irving (Dec 26 2023 at 16:30):

But generally “incrementally add heuristic features to Leaff” seems a better path long-term than manual #aligns.

Mario Carneiro (Dec 26 2023 at 16:31):

the advantage of #align is that it is easier to keep things in sync and curate the mapping

Mario Carneiro (Dec 26 2023 at 16:31):

Because leaff is just per commit we would still need to run it over the whole mathlib history to get some migration database and maintain it and now that becomes another moving part

Alex J. Best (Dec 26 2023 at 16:33):

I think my main point is that we should automate as much as possible, maintaining a list of renames by hand seems more work than running a tool over history and adding a few hints where needed. If we actually want to go back to the start of time at all, which I'm not sure we do?

Mario Carneiro (Dec 26 2023 at 16:36):

Another possibility which I should mention is what set.mm does: maintain a list of renames in the style of a changelog, with dated entries, and then you can just take some subsequence from it and apply it as a migration. The work of maintenance of this list is on those doing the renames, and we can always go back and fix historical entries if needed

Mario Carneiro (Dec 26 2023 at 16:41):

Automation is definitely required for this to scale, but I don't think leaff can be used as the source of truth, only as a check on something that can be hand-edited. There are just too many ways where it is too rigid and some interpretation is needed to determine what the actual rename target is supposed to be

Kyle Miller (Dec 26 2023 at 16:57):

@Tudor achim The OSS projects you've mentioned as points of comparison, the Linux kernel and Python 3, are two very established bodies of work that are designed to support applications and be very conservative with breaking "user space". They're pieces of infrastructure that need a guarantee that multiple projects can work together and run within the same platform. These projects also get industry support for their development.

Maybe there are better projects to compare against? Consider that mathlib4 just finished its port this summer. It's still got a number of internal tensions due to changes from Lean 3 to Lean 4, and it will still take some time for it to anneal. Plus, if you pick up a random project that depends on mathlib, you can still run it, since you can still obtain the version of Lean and the version of mathlib it was built for -- you don't need to upgrade a project's dependencies just to use it.

Migration tools would definitely be nice (and everyone's talking about this already), but I think it's important to mention that an important question in here is who is responsible for which aspects of handling reverse-incompatible changes to mathlib. Handling this in the nicest way for downstream projects takes resources away from time spent refactoring mathlib to be as elegant and versatile as it can be (and unlike a normal software library, refactors can be due to communicating an improved mathematical understanding). My personal opinion is that downstream projects at the present moment are more tolerated than supported -- tolerated since many times it's easier to develop new theory in a project first and then contribute it to mathlib -- and for this reason migration tools are very important to the mathlib project.

However, for downstream projects that don't intend to become part of mathlib, perhaps the resources for handling reverse-incompatible changes should come from groups that are interested in these sorts of guarantees, rather than asking mathlib to be developed differently? For example, these additional resources could be put into organizing stable mathlib releases along with detailed migration instructions from release to release. (However: the Zulip community is generally less responsive to questions about old versions of mathlib, which might be worth considering.) To be clear, we shouldn't have a fork of mathlib, but rather a version of it that someone is willing to support.

Julian Berman (Dec 26 2023 at 17:04):

I'm only casually following this thread, but

My personal opinion is that downstream projects at the present moment are more tolerated than supported

definitely matches how I think of Mathlib. I also always get reminded of https://0ver.org/ and basically consider Mathlib to be in that boat of "production ready software that still doesn't really want to commit to hard guarantees" (for anyone who misses it, that page is dripping with sarcasm).

Kyle Miller (Dec 26 2023 at 17:08):

I'll add that I think having backward compatibility will certainly be important, especially as projects using mathlib outpace the ability of mathlib to absorb it all. We will also reach a point where things get more stable internally.

It's worth talking about what this would look like, but I don't think it will happen soon, and I would hate for premature efforts to make mathlib be more conservative in handling backward compatibility to stifle the "philosophical experiment" part of mathlib: we don't really know whether all of established mathematics can co-exist in a monorepo, but, if it's possible, I think we do understand that to get there we need to be comfortable with deep refactorings.

Julian Berman (Dec 26 2023 at 17:11):

I think an interesting thing to consider before committing to anything is at least attempting to decide to measure it. E.g. a simplish idea is to start to ask people doing refactors to estimate (in number of hours or something) the amount of additional effort it would be to alsodo the backwards compatibility work, and then to track the estimates over time. They'll be wildly wrong of course, as every software estimation process is proven to be, but they still may have some information in them.

Tudor achim (Dec 26 2023 at 17:23):

However, for downstream projects that don't intend to become part of mathlib, perhaps the resources for handling reverse-incompatible changes should come from groups that are interested in these sorts of guarantees, rather than asking mathlib to be developed differently

I thought it's reasonable to flag an issue with backwards compatibility, but if bringing this up isn't appropriate for this zulip, happy to move the discussion elsewhere

Kyle Miller (Dec 26 2023 at 17:43):

@Tudor achim I'm sorry if what you're getting out of my messages is "go away" -- that's not what I meant at all. Everyone in the Lean community should be welcome here, and I think having the discussion here on this Zulip is beneficial to the community.

I can see how "asking for mathlib to be developed differently" could be interpreted differently from how I wanted. I was trying to say something about how, like in all of software engineering, discussions tend to circle around who is putting what resources into which parts of a project, and I think it's important to make sure we talk about this part. This is the "political" part, where choices have different impacts on the different groups that use mathlib. One way would be for mathlib to ask volunteers to spend time worrying about reverse compatibility rather than developing theory, another is improving migration tools, and yet another is looking into ways that those who would benefit from reverse compatibility to incur some of the "costs" of ensuring guarantees, for example by curating and supporting stable releases. Maybe some combination of all of these, but I'd hope we can be sure we keep costs down on carrying out significant refactors for contributors.

Tudor achim (Dec 26 2023 at 18:58):

Got it. Just to avoid misinterpretation, I'm not asking volunteers to spend time on things they don't want to do (at this point, I'm getting the sense this is something the community might not want to do at the moment). My objective with this discussion is to surface that there is a downstream cost to renaming core lemmas of Mathlib, and this cost will almost certainly increase substantially over the next several years as the wider mathematical community starts to move toward formal math in Lean [1]. It's of course up to the Mathlib community to decide what to do with that.

As @Yury G. Kudryashov alluded to above, I am involved with a company (https://harmonic.fun/) that uses Mathlib, among other things, and if there is interest I'm happy to discuss concrete proposals for how we can help here! I will need to ramp up on the state of the codebase, though, because I would've thought that the proposal above of using #align liberally was a pretty reasonable and low-overhead process for PR authors but that doesn't seem to be a universally held view :sweat_smile:

[1] Again, if the working model the community has in mind is that all of this actually does go into Mathlib then this point is moot because the current process probably supports that just fine.

Mario Carneiro (Dec 26 2023 at 19:01):

#align has the advantage that it is what we are currently doing, but I think most people are thinking about it as junk from the lean 3 era and not something to use going forward

Mario Carneiro (Dec 26 2023 at 19:04):

Kyle Miller said:

I'll add that I think having backward compatibility will certainly be important, especially as projects using mathlib outpace the ability of mathlib to absorb it all. We will also reach a point where things get more stable internally.

It's worth talking about what this would look like, but I don't think it will happen soon, and I would hate for premature efforts to make mathlib be more conservative in handling backward compatibility to stifle the "philosophical experiment" part of mathlib: we don't really know whether all of established mathematics can co-exist in a monorepo, but, if it's possible, I think we do understand that to get there we need to be comfortable with deep refactorings.

I think we should be careful to separate the concepts of "backward compatibility" from "migration assistance". I think we are not yet at a point where we can consider backward compatibility in mathlib, but already well past the point where we should have a coherent story for migration because this is causing problems for every user that depends on mathlib and wants to stay up to date

Mario Carneiro (Dec 26 2023 at 19:06):

the key differentiator being that it is okay to require that people upgrading mathlib run a tool over their code which fixes as much as it can. Ideally it would replace lake update and would be otherwise fully automatic

Mario Carneiro (Dec 26 2023 at 19:08):

At least for stable-to-stable bumps I think we could just go over the list of renames and curate a list of migrations

Yury G. Kudryashov (Dec 27 2023 at 02:08):

Mario Carneiro said:

At least for stable-to-stable bumps I think we could just go over the list of renames and curate a list of migrations

To have "stable-to-stable" bumps we need stable branches. I don't see any volunteers to maintain it so far.

Mario Carneiro (Dec 27 2023 at 02:09):

we need stable tags, not branches

Mario Carneiro (Dec 27 2023 at 02:09):

and we already have them

Mario Carneiro (Dec 27 2023 at 02:12):

the task would be to determine what changed from one stable tag to the next, maybe starting from a leaff based analysis and refining it with comments from respective PRs (labeling the changes with the PR they came from) to eliminate spurious removals (I think we should not have any removals in a migration, since mathlib basically never actually removes things, although it may not be easy to determine what the replacement is without reading the PR comment or other discussion)

Shreyas Srinivas (Dec 28 2023 at 11:21):

How does it help when tactics change, especially when they get weaker?

Eric Wieser (Dec 28 2023 at 12:30):

I think there's a lot of value to developing migration strategies for lemma renames even if we can't guarantee a perfect migration for things like tactic behavior changes

Shreyas Srinivas (Dec 28 2023 at 13:08):

Definitely. Agreed. In general uncontrolled rapid changes and quick fixes with no migration pathway are a very bad idea.

llllvvuu (Dec 30 2023 at 00:33):

Codemods are pretty rare in my experience (I see them only for major JavaScript frameworks). Usually developers are expected to manually migrate their code when @deprecated warnings show up.

I scrolled through this thread but I couldn't find it - what was the issue with @deprecated? It seems quite a bit simpler than Leaff and a more widespread practice. I think it's not quite the same as #align either, since it goes beyond exact renames (it should also warn/encourage usage of the preferred variant, which I'm not sure #align does)

Yury G. Kudryashov (Dec 30 2023 at 00:43):

There are few issues with @[deprecated]:

If we start using them, then we need automation to
- remind/help PR authors to add them; leaff could help here;
- remind someone to clean up lemmas that were deprecated for a long time;
- (optional): help downstream projects to migrate.
Without extra automation (which we don't have yet), this creates extra burden on PR authors and maintainers.
It only helps with the "rename" refactors. But many other things can happen:
- slightly redefine an existing definition, see #9275 for an example;
- generalize a definition; e.g., generalize docs#fderiv to topological vector spaces or docs#Absorbs as I suggested here;
- refactor a large chunk of the library, as I'm suggesting in #2202 (WIP in branch#YK-bundled-set)
- add a lemma to the default simp set, possibly creating confluence issues down the road.

llllvvuu (Dec 30 2023 at 03:03):

Is 2. that bad? (assuming it would be suggested in the contributing guide and the reminder is simply the GitHub suggest button)

For cleaning up lemmas, I've seen the practice of putting a version number after @deprecated; it doesn't seem like mathlib has version numbers, but there could be a substitute. (This is the easier part to automate and I'd be happy to contribute)

For 3., that's OK I think. One could continue to break things like simp while re-expressing other changes as renames wherever natural.

FWIW I've also seen the practice of combining all approaches: @deprecated, versioning (SemVer or otherwise), and backports. I think the latter two are actually more effort though.

llllvvuu (Dec 30 2023 at 03:07):

Another idea is the ! convention from Conventional Commits: I've never seen it enforced, and consequently commits inevitably don't 100% get labeled, but as a best-effort thing can get maybe >90% of breaking changes indicated

Mario Carneiro (Jan 14 2024 at 07:43):

Mathport just got hit by a backward compatibility break in #9553 (by @Yaël Dillies ), which removes the Mathlib.Algebra.Abs file. This is the easiest kind of breakage to avoid, just leave a re-import file in its place. I would really like us to make more headway on this issue...

Eric Wieser (Jan 14 2024 at 07:45):

Is this something that #align_import should be handling?

Eric Wieser (Jan 14 2024 at 07:45):

(that is: is the problem that the file was deleted, or that the #align_import was deleted?)

Mario Carneiro (Jan 14 2024 at 07:46):

the #align_import was deleted

Mario Carneiro (Jan 14 2024 at 07:47):

although I think we need some more infrastructure before #align_import would be able to solve this issue

Mario Carneiro (Jan 14 2024 at 07:47):

but it's a good point, the #align_import should not have been deleted (and should be restored)

Yury G. Kudryashov (Jan 14 2024 at 08:22):

Did we decide about a deadline after which we drop mathlib3->mathlib4 migration support?

Mario Carneiro (Jan 14 2024 at 08:23):

well this particular example was about mathport qua lean 4 project

Mario Carneiro (Jan 14 2024 at 08:23):

sorry I didn't make that clear

Mario Carneiro (Jan 14 2024 at 08:23):

mathport itself didn't build

Yury G. Kudryashov (Jan 14 2024 at 08:24):

Once we drop the migration support, we no longer need mathport as well, do we?

Yury G. Kudryashov (Jan 14 2024 at 08:25):

(I don't suggest doing it today or this month)

Mario Carneiro (Jan 14 2024 at 08:25):

#align_import is actually a pretty good way to track moved files, but like #align it has "lean 3 cruft" written all over it so people are antsy to delete it, after which point we will lose the ability to track these as well for lean 4 projects

Mario Carneiro (Jan 14 2024 at 08:25):

mathport here is just an example of a largish lean 4 project depending on mathlib which I happen to maintain

Mario Carneiro (Jan 14 2024 at 08:26):

it's actually a bit of an issue that we don't have very many of these projects, since it means we don't have enough canaries for backcompat breaks

Mario Carneiro (Jan 14 2024 at 08:27):

The deadline I originally set was 1 year after the port, but after seeing some lean 3 projects demoed at lean together which are being ported this week I think it's still premature

Yaël Dillies (Jan 14 2024 at 09:14):

Wait, so what should I have done? Should I have moved the #align_import to Algebra.Order.Group.Abs?

Yury G. Kudryashov (Jan 14 2024 at 09:16):

Yes, and I create multiple copies of #align_import when I split a file.

Patrick Massot (Jan 14 2024 at 15:23):

Mario Carneiro said:

The deadline I originally set was 1 year after the port, but after seeing some lean 3 projects demoed at lean together which are being ported this week I think it's still premature

I think we didn't see any project that did not start porting. So they are all past the point where mathport is useful.

Mario Carneiro (Jan 14 2024 at 15:37):

Yes, but those are just the projects we saw in presentations. It is at least evidence that 1 week ago would have been too early, which is why I think it is better to wait some more. I want it to be an old memory for everyone (not just the bleeding edge folks, everyone) before taking action on this.

Mario Carneiro (Jan 14 2024 at 15:42):

Despite your claims to the contrary I still expect LTE to be ported eventually. (I recall you saying the same thing about sphere eversion but it did get ported.)

Eric Wieser (Jan 14 2024 at 15:49):

Even if we stop maintaining #aligns today and stop running mathport CI, lean3 projects would be able to port to the version of mathlib4 that existed right now

Eric Wieser (Jan 14 2024 at 15:49):

That doesn't sound all that different from having a project that was ported to lean4 immediately, but then was never bumped to newer mathlib4 versions

Mario Carneiro (Jan 14 2024 at 15:50):

the irony is that you get better migration assistance by porting from lean 3 to new lean 4 than by porting from lean 3 to old lean 4 and then manually upgrading your old lean 4 project

Mario Carneiro (Jan 14 2024 at 15:51):

because mathport is by far the most sophisticated and comprehensive migration tool we have, and there is no lean 4 equivalent

Mario Carneiro (Jan 14 2024 at 15:52):

which is what this thread is about

Eric Wieser (Jan 14 2024 at 15:56):

Mario Carneiro said:

Despite your claims to the contrary I still expect LTE to be ported eventually. (I recall you saying the same thing about sphere eversion but it did get ported.)

Doesn't it need bumping to the latest mathlib3 before than can happen?

Mario Carneiro (Jan 14 2024 at 15:56):

most likely

Mario Carneiro (Jan 14 2024 at 15:57):

probably best to do that before the tools break down... leanproject is not very happy to be run these days

Mario Carneiro (Jan 14 2024 at 16:01):

(and yes, that is me advising someone on how to run mathport only 2 days ago)

Patrick Massot (Jan 14 2024 at 18:45):

Mario Carneiro said:

Despite your claims to the contrary I still expect LTE to be ported eventually. (I recall you saying the same thing about sphere eversion but it did get ported.)

Do you want to bet about LTE being mathported? I don't remember saying SE would not be ported, but the situation is very different anyway.

Kevin Buzzard (Jan 14 2024 at 20:15):

I think it's very unlikely that LTE will get ported. LTE needed a lot of homological algebra and the formalisation of it used in the project had some down sides. Joel Riou now has a different way of doing it which is in mathlib4. The basics of solid abelian groups are being written by Dagur in lean 4. So mathlib4 has now diverged quite a bit from LTE -- in a good way.

Patrick Massot (Jan 14 2024 at 20:27):

Yes, this is why I wrote it will never be mathported. The main theorem may end up in Mathlib but it won't use mathport.

Kevin Buzzard (Jan 14 2024 at 21:54):

To be clear, this is also what I was suggesting

Mario Carneiro (Jan 15 2024 at 01:56):

I think eventually you will want the results in it, and once all the lean 3 tools fail on it because they are dependent on some infrastructure that dies (e.g. cache or elan) it may become difficult to even view projects for porting or partial porting purposes.

Mario Carneiro (Jan 15 2024 at 01:57):

Honestly it's been a bit scary to me how quickly all the lean 3 infrastructure is falling apart. I really hope this isn't the eventual destination of lean 4 projects too.

Mario Carneiro (Jan 15 2024 at 01:58):

But I acknowledge that I have an archivist mindset that does not really jive with the lean approach

Kim Morrison (Jan 15 2024 at 02:23):

If we eventually get to the point that everyone is happy using the stable releases of Lean, I think this gets much better (or at least much less bad).

Patrick Massot (Jan 15 2024 at 02:54):

Mario Carneiro said:

Honestly it's been a bit scary to me how quickly all the lean 3 infrastructure is falling apart. I really hope this isn't the eventual destination of lean 4 projects too.

There is exactly one thing falling apart and it is the olean cache because nobody was reading the email that got warning, right?

Jireh Loreaux (Jan 15 2024 at 02:56):

Would this be fixed by pushing one empty commit to mathlib3 so that a new cache is generated?

Yury G. Kudryashov (Jan 15 2024 at 02:56):

Can we rerun CI on the last commit instead?

Jireh Loreaux (Jan 15 2024 at 02:58):

Tomorrow I'll do a test where I try to get Lean 3 working from scratch and see if / where I get stuck.

Patrick Massot (Jan 15 2024 at 03:00):

Note that pushing any new commit to mathlib3 will destroy the graphs on the mathlib stats page and I don't plan to fix that kind of issue except by dropping new mathlib3 data (but of course anyone else is free to propose another fix).

Yury G. Kudryashov (Jan 15 2024 at 03:00):

NixOS build still works for me.

Mario Carneiro (Jan 15 2024 at 03:17):

Patrick Massot said:

Note that pushing any new commit to mathlib3 will destroy the graphs on the mathlib stats page and I don't plan to fix that kind of issue except by dropping new mathlib3 data (but of course anyone else is free to propose another fix).

I don't really understand why you set things up like that in the first place? I don't understand what kind of assumption you could make while plotting two graphs next to each other that requires that one of them stops at a particular point

Mario Carneiro (Jan 15 2024 at 03:18):

Patrick Massot said:

Mario Carneiro said:

Honestly it's been a bit scary to me how quickly all the lean 3 infrastructure is falling apart. I really hope this isn't the eventual destination of lean 4 projects too.

There is exactly one thing falling apart and it is the olean cache because nobody was reading the email that got warning, right?

I linked earlier an issue with leanproject caused by the fact that it assumes that the latest version is lean 3

Eric Wieser (Jan 15 2024 at 05:18):

The crux of the issue is that while mathlib, lean3, and leanproject are all effectively frozen tools, there is no frozen version of elan; right?

Mario Carneiro (Jan 15 2024 at 05:22):

elan is backward compatible though

Mario Carneiro (Jan 15 2024 at 05:23):

the thing that's not frozen is the stable track of elan, used when you call lean commands in a folder without a lean-toolchain

Mario Carneiro (Jan 15 2024 at 05:24):

(speaking of which, this relates to a bad behavior of lake: when you call lake new foo it generates a lean-toolchain containing stable, which is an anti-pattern)

Eric Wieser (Jan 15 2024 at 05:27):

Does elan no longer look for the lean3 leanpkg.toml?

Mario Carneiro (Jan 15 2024 at 05:28):

leanproject new doesn't have a leanpkg.toml to look for

Mario Carneiro (Jan 15 2024 at 05:29):

I believe elan still works in lean3 projects with a leanpkg.toml

Eric Wieser (Jan 15 2024 at 05:31):

Mario Carneiro said:

leanproject new doesn't have a leanpkg.toml to look for

This at least isn't relevant for mathport on old projects though, right?

Mario Carneiro (Jan 15 2024 at 05:31):

yes, as long as they are actually projects

Mario Carneiro (Jan 15 2024 at 05:31):

some people just have a pile of lean files and never learned that projects are necessary

Mario Carneiro (Jan 15 2024 at 05:33):

I have an old lean 3 project which is just a leanpkg.path and a bunch of lean files

Eric Wieser (Jan 15 2024 at 05:54):

Perhaps elan should detect that file too

Yaël Dillies (Jan 15 2024 at 10:54):

Patrick Massot said:

There is exactly one thing falling apart and it is the olean cache because nobody was reading the email that got warning, right?

I have definitely seen more things break than just cache but it's hard to remember what exactly.

Patrick Massot (Jan 15 2024 at 15:42):

Mario Carneiro said:

Patrick Massot said:

Note that pushing any new commit to mathlib3 will destroy the graphs on the mathlib stats page and I don't plan to fix that kind of issue except by dropping new mathlib3 data (but of course anyone else is free to propose another fix).

I don't really understand why you set things up like that in the first place? I don't understand what kind of assumption you could make while plotting two graphs next to each other that requires that one of them stops at a particular point

I didn't. Rob wrote the first version of the website showing both curves. It had a bug which made it display non-sense when there was a month without mathlib3 commit after the beginning of mathlib4. I fixed this bug in the most obvious way and later realized it would break if mathlib3 commit resume. Again I have nothing against someone else implementing a cleverer fix, but I have a lot of more important things to do for the community so I won't do it myself.

Patrick Massot (Jan 15 2024 at 15:43):

About leanproject, I understand how leanproject new could be broken indeed, but this is clearly at $-\infty$ priority. However I would like to keep leanproject get working since this is useful when reading old formalization papers.

Last updated: May 02 2025 at 03:31 UTC