Zulip Chat Archive

Stream: lean4

Topic: package versioning and dependency pinning


Arthur Paulino (Dec 02 2021 at 13:13):

Reading lake's readme I've noticed that there isn't a way to pin the version of a package dependency (of course, because the notion of a version hasn't been defined). But it got me thinking of ways to workaround this and avoid building things that can break just because someone changed a dependency package. That said, my two solutions are:

  • State a commit (hash) along with name and src in the dependencies field or
  • State a tag instead

First option requires nothing from package developers and the second option requires a certain degree of organization when building packages because then you'd have to release and push tags to your repo.

Both options would work, but second is less chaotic and Git allows you to clone specific tags from repositories.

What do you think?

Gabriel Ebner (Dec 02 2021 at 13:24):

We typically just refer to the revision: https://github.com/leanprover-community/mathport/blob/073d0d511ad36ee196d7b9d72f147a796540b8ce/lakefile.lean#L12

Gabriel Ebner (Dec 02 2021 at 13:24):

As you've noticed, you should never point the dependency at a branch.

Henrik Böving (Dec 02 2021 at 13:27):

The notion of a version has been purposely defined by @Mac to be a commit hash, there was a rather lengthy discussion about this a few weeks ago.

Arthur Paulino (Dec 02 2021 at 13:29):

Do you guys think there could be a way of benefiting from semantic versioning? It explicitly says "this version fixes bugs", this version adds new things" and "this version breaks stuff"

(And I think tags can be used with this level of explainability)

Henrik Böving (Dec 02 2021 at 13:29):

This has also been discussed in the thread I just mentioned, I'll dig it out...

Henrik Böving (Dec 02 2021 at 13:30):

https://leanprover.zulipchat.com/#narrow/stream/270676-lean4/topic/.5BRFC.5D.20name.2Fversion.20package.20fields/near/254114011

Arthur Paulino (Dec 02 2021 at 13:56):

Gabriel Ebner said:

We typically just refer to the revision: https://github.com/leanprover-community/mathport/blob/073d0d511ad36ee196d7b9d72f147a796540b8ce/lakefile.lean#L12

Hm, if that's an option then adding it to the readme would be nice

Arthur Paulino (Dec 02 2021 at 13:57):

Can I PR it? Or should someone else do it?

Mario Carneiro (Dec 02 2021 at 16:03):

I don't think that semantic versions are (easily) possible or useful for lean. The issue is basically the same as with private: almost anything can be a breaking change (even whitespace changes!), so semantic versions lose their main draw. You can try to have a semver-like versioning scheme, based on "morally" breaking changes, but it is hard to make it mean anything rigorous.

Mac (Dec 02 2021 at 16:47):

To add on to what @Mario Carneiro said, a Lean semver requires the API designer to specify what is considered the "public interface" (i.e., what changes qualify as breaking). Some of this could be incorporated into the code itself with attributes (ala Rust), but some of it would have to just be written out. Without specification, changing virtually anything in a codebase can be considered breaking in Lean.

Mac (Dec 02 2021 at 16:49):

In the thread Henrik mentioned, Julian Berman describes the problem with semver quite nicely:

And also it's just impossible :) (and I'm a proponent of semver personally by the way) -- but no one ever does what you're supposed to in semver, which is define what your public API actually is -- and it's not just types and interfaces. For Python I put some notes here: https://regret.readthedocs.io/en/stable/before-you-deprecate/#document-your-public-api which are things no one I know of ever really specifies, and even those who try of course will come to some scenario where it's unclear

Arthur Paulino (Dec 02 2021 at 16:50):

More on src: Is it possible to define dependency sources with SSH URLs instead of HTTPS? I'm thinking in the context of companies who'd try to use Lean and need to develop internal (non-public) packages

Arthur Paulino (Dec 02 2021 at 17:00):

Ah, I think it should be fine: https://github.com/leanprover/lake/blob/6dd609db18ed681f21f0f5e4f492dad678a2d67b/Lake/Util/Git.lean#L35

Jannis Limperg (Dec 02 2021 at 19:15):

There are problems with traditional versioning, but I'm not sure that pinning dependencies to a specific commit is the answer. What do you do if you have two dependencies A and B and both depend on C, but on a different commit? If A and B exchange types from C, I think you're out of luck, even with source code available. Traditional versioning solves (well, addresses) this problem by letting A and B give some wiggle room so that they can hopefully agree on a version of C. Any alternative scheme must be able to solve this problem as well (if Lean is to become even moderately big).

I don't have a satisfying answer either; defeq and metaprogramming do create additional problems and versioning doesn't work amazingly well even in 'normal' languages. But I feel like a practical, scalable scheme is more likely to be found by refining existing approaches to versioning, rather than by giving up on them altogether.

Reid Barton (Dec 02 2021 at 19:28):

The root cause of issues like this is that C made a breaking change in the first place, otherwise you could build both A and B against whichever version of C is later. From what I saw in the Haskell world, the PVP (basically Haskell's version of SemVer) seemed to do more harm than good because it reduced the perceived costs of breaking changes ("it's fine, just have to increment the major version number").

Arthur Paulino (Dec 02 2021 at 20:16):

What I see happening in practice with Python is that packages that are extremely useful also have good versioning management. matplotlib, scipy, numpy, pandas, sklearn, keras etc and even the Python API itself. The version window does help a lot.

Mac (Dec 02 2021 at 20:56):

Jannis Limperg said:

There are problems with traditional versioning, but I'm not sure that pinning dependencies to a specific commit is the answer. What do you do if you have two dependencies A and B and both depend on C, but on a different commit? If A and B exchange types from C, I think you're out of luck, even with source code available. Traditional versioning solves (well, addresses) this problem by letting A and B give some wiggle room so that they can hopefully agree on a version of C.

The problem is that, in Lean, there is a lot of granularity as to what can constitute a "breaking change". In standard imperative programming languages, this tends to boil down to just signature changes of functions in the API (e.g., adding, removing, and modifying functions). Also most consumers of a library will care little about how functions are implemented so changes can be made there without effecting the vast majority of consumers.

In Lean, however, any library that reasons about another can (and likely will) break on any change to relevant functions (i.e., there are virtually no non-breaking changes for such libraries). Furthermore, Lean introduces a number of new concepts that can introduce breakage: (i) type-level changes like adding, deleting, modifying type class instances; changing definition reducibility; (ii) proof-level changes like adding or deleting @[simp] lemmas or changing tactic implementations, (iii) syntax changes (including adding/removing keywords), etc.

In my opinion, flexible versioning for Lean would probably want to move away from version numbers altogether and instead work at a metaprogramming level by specifying or inferring relevant changes in the code itself.

In the short term, my expectation is that Lean packages will be versioned much more like programming languages, OSes, or large projects like LLVM. That is, have few infrequently released stable versions (e.g., represented as tags in the repository). Stable versions of dependent packages will then target either a single such release or a small set of releases. potentially even with different code for each supported release.

I don't see small frequent releases being compatible with the way dependency chains currently work in Lean (except for unstable bleeding edge development). It is just too likely that even a minor version discrepancy with little change in a sufficiently large dependency chain will cause breakage.

Sebastian Ullrich (Dec 02 2021 at 22:26):

In hindsight, I believe it was a mistake to export the definitional unfoldings from a module by default if one wants to support a big, mature ecosystem of packages at some point, though Lean is certainly in good company among other provers in that regard. As has been mentioned, there are many creative ways to break a module's interface in Lean, but this one is probably the worst offender. There is a part in the module system issue about reigning in delta reduction by default afair, though it is relatively clear that there will also be a way to opt back into the current behavior.

I don't see why if all you're doing is programming in Lean, the experience of managing the dependency graph should be considerably worse than in other programming languages. Maybe there is in fact a different way from semantic versioning, one that may be a better fit for Lean, but until that is proven it seems to amount to wishful thinking to me. The only language I know of that has some interesting fresh ideas in that direction is Unison.

Mac (Dec 02 2021 at 23:00):

Sebastian Ullrich said:

I don't see why if all you're doing is programming in Lean, the experience of managing the dependency graph should be considerably worse than in other programming languages.

One big problem Lean faces is transitive dependencies, which complicates matters. One of the ways in which manner scripting language package managers (like NPM) get away from dependency hell is by peer packages to depend on different versions of the same package, something that is not possible in Lean.

You are right, though, If people are just doing ordinary programming in Lean with no extra bells and whistles, the usual CS approach to versions should work fine. I just suspect users will end up doing all kinds of metaprogramming shenanigans that will ended breaking the simple model. I mean most of the packages written thus far are doing some kind of black magic (be it proofs, FFI, and/or metaprogramming -- all of which complicate dependency management).

Jannis Limperg (Dec 02 2021 at 23:07):

Mac, I think this is a perfectly fine plan short-term. Long-term I don't think you can forgo minor versions or some equivalent mechanism (and I think we agree on this). People don't want to wait 6 months for minor bugfixes, particularly security-relevant ones, and conversely you can't have a new 'major version' with potentially arbitrary breakage every 3 months.

Perhaps it would be interesting to distinguish between compile-time and runtime breakage. Compile-time breakage is in a sense more benign because it can be detected automatically and the tooling can try a bunch of versions until it finds a working one. So versioning can maybe be imprecise in this regard. I think it's also likely that much potential breakage never actually manifests because people avoid the most likely causes (or because, as Sebastian suggests, the worst offenders get reined in on the language side). So maybe optimistic versioning could work. Major versions would then have to announce only changes that are either very likely to break consumers (to avoid the tools churning through obviously-incompatible versions) or that silently change runtime semantics.

Reid, my personal impression is that PVP gets a lot of flak despite working okayish with a large package database. But I've never been deep into that stuff, so maybe my impression is wrong. What you suggest, I think, amounts to 'immutable packaging' where you can't change published packages in any breaking way, so if you break something, you publish a new package (i.e. the major version becomes part of the package name: containers-1 etc). I think this would give people an incentive to break things less frequently, but it also means that consumers can't accept multiple major versions any more, even if the breakage doesn't concern them.

Mac (Dec 02 2021 at 23:24):

Jannis Limperg said:

People don't want to wait 6 months for minor bugfixes, particularly security-relevant ones, and conversely you can't have a new 'major version' with potentially arbitrary breakage every 3 months.

Slight tangent, but you might be surprised at how often security related bugfixes don't get patched in small releases. Some colleagues and I wrote a paper about research we did in finding and testing micropatches for CVEs/ We found that a significant problem was that many CVE bugfixes were not released as micropatches but rather with full releases with major breaking changes and that some fixes were not officially packaged in a new release even years after the bug had been reported and patched in the codebase.

Also, my thought was not that small patch releases would cease to exist, but more that minor releases would disappear. That is, that the only two common types of releases would be earth-shattering major version releases and tiny bugfixes. However, this is merely my concern, it may be that this does not occur in practice and my worries were groundless, and I be happy about that. :laughing:

Arthur Paulino (Dec 03 2021 at 00:54):

May I open a PR mentioning that it's possible to refer to a particular commit hash here?

Mac (Dec 03 2021 at 01:04):

@Arthur Paulino you most certainly can. Lake has much more lax PR rules than Lean proper. I am more than happy to look through anything one might wish to propose (at least at the moment -- there may be a time where this eventually becomes infeasible -- but this is certainly not the case yet).

Arthur Paulino (Dec 03 2021 at 02:02):

@Mac https://github.com/leanprover/lake/pull/37

Jannis Limperg (Dec 03 2021 at 09:33):

Mac said:

Slight tangent, but you might be surprised at how often security related bugfixes don't get patched in small releases. Some colleagues and I wrote a paper about research we did in finding and testing micropatches for CVEs/ We found that a significant problem was that many CVE bugfixes were not released as micropatches but rather with full releases with major breaking changes and that some fixes were not officially packaged in a new release even years after the bug had been reported and patched in the codebase.

Very interesting, thanks! Not too surprising that people are a little cavalier with security issues. :(


Last updated: Dec 20 2023 at 11:08 UTC