Zulip Chat Archive

Stream: lean4

Topic: Get mathlib cache after updating mathlib dependency


Bolton Bailey (Sep 20 2023 at 12:45):

I have just run lake update on my project which depends on mathlib. I then ran lake exe cache get! to get the mathlib cache. But I still can't compile anything. How do I get mathlib to work again?

boltonbailey@starlight formal-piops % lake update
updating ./lake-packages/mathlib to revision 4026e6f81193cc3417cf989e3cf26d72c99d116a
warning: improperly formatted manifest: unknown manifest version `5`
warning: improperly formatted manifest: unknown manifest version `5`
warning: improperly formatted manifest: unknown manifest version `5`
boltonbailey@starlight formal-piops %
boltonbailey@starlight formal-piops % lake exe cache get!
Attempting to download 3777 file(s)
Downloaded: 609 file(s) [attempted 3777/3777 = 100%] (16% success)
Warning: some files were not found in the cache.
This usually means that your local checkout of mathlib4 has diverged from upstream.
If you push your commits to a branch of the mathlib4 repository, CI will build the oleans and they will be available later.
Decompressing 609 file(s)
unpacked in 619 ms

Shreyas Srinivas (Sep 20 2023 at 12:47):

purge the cache folder. ~/.cache

Ruben Van de Velde (Sep 20 2023 at 12:47):

But first, did you update your own lean-toolchain file?

Bolton Bailey (Sep 20 2023 at 12:48):

Ruben Van de Velde said:

But first, did you update your own lean-toolchain file?

If this is necessary, lake update should do it for me

Bolton Bailey (Sep 20 2023 at 12:50):

Also, if there is some kind of inconsistency causing a problem, an error should be thrown at some point saying so, it shouldn't just try to build and make an error.

Ruben Van de Velde (Sep 20 2023 at 12:51):

Perhaps so, but while that's not the case, please follow the instructions: https://github.com/leanprover-community/mathlib4/wiki/Using-mathlib4-as-a-dependency#updating-mathlib4

Shreyas Srinivas (Sep 20 2023 at 12:51):

Bolton Bailey said:

Also, if there is some kind of inconsistency causing a problem, an error should be thrown at some point saying so, it shouldn't just try to build and make an error.

This was discussed a few months ago. The basic issue is that the current heuristic "match mathlib version" is just a heuristic which will make no sense for projects with complex dependency relationships

Patrick Massot (Sep 20 2023 at 12:55):

I think the argument: "the fix that works in 99% of current Lean projects wouldn't work for projects with complex dependency relationships so let's not use this fix" is really really bad.

Bolton Bailey (Sep 20 2023 at 12:56):

Ruben Van de Velde said:

Perhaps so, but while that's not the case, please follow the instructions: https://github.com/leanprover-community/mathlib4/wiki/Using-mathlib4-as-a-dependency#updating-mathlib4

Even after running these it's not working

`/Users/boltonbailey/.elan/toolchains/leanprover--lean4---v4.1.0-rc1/bin/lake print-paths Init Mathlib.GroupTheory.Sylow Mathlib.GroupTheory.Subgroup.Finite Mathlib.Data.Polynomial.Eval Mathlib.GroupTheory.OrderOfElement Mathlib.FieldTheory.Finite.Basic` failed:

stderr:
warning: improperly formatted manifest: incompatible manifest version `4`
error: missing manifest; use `lake update` to generate one
Invalid Lake configuration.  Please restart the server after fixing the Lake configuration file.

Shreyas Srinivas (Sep 20 2023 at 12:56):

purge the cache

Ruben Van de Velde (Sep 20 2023 at 12:58):

The manifest issue sounds like something different as well

Shreyas Srinivas (Sep 20 2023 at 12:58):

Patrick Massot said:

I think the argument: "the fix that works in 99% of current Lean projects wouldn't work for projects with complex dependency relationships so let's not use this fix" is really really bad.

It is not that simple. Lean is a new and growing language with a lot of newcomers pushing in different directions. Basing these decisions on current usage hamstrings the language for other use cases, making the assumption a self-fulfilling prophecy.

Shreyas Srinivas (Sep 20 2023 at 12:59):

And then, when you need proper dependency management later (perhaps even in mathlib using projects), you are saddled with the technical debt of ensuring backwards compatibility.

Patrick Massot (Sep 20 2023 at 13:00):

I don't buy this. Fixing this issue would be trivial without making anything else more difficult.

Ruben Van de Velde (Sep 20 2023 at 13:00):

I think there's a shared understanding that there's still rough edges here that everyone would like to be smoother, but less so on the exact way to implement that

Ruben Van de Velde (Sep 20 2023 at 13:01):

Bolton Bailey said:

Ruben Van de Velde said:

Perhaps so, but while that's not the case, please follow the instructions: https://github.com/leanprover-community/mathlib4/wiki/Using-mathlib4-as-a-dependency#updating-mathlib4

Even after running these it's not working

`/Users/boltonbailey/.elan/toolchains/leanprover--lean4---v4.1.0-rc1/bin/lake print-paths Init Mathlib.GroupTheory.Sylow Mathlib.GroupTheory.Subgroup.Finite Mathlib.Data.Polynomial.Eval Mathlib.GroupTheory.OrderOfElement Mathlib.FieldTheory.Finite.Basic` failed:

stderr:
warning: improperly formatted manifest: incompatible manifest version `4`
error: missing manifest; use `lake update` to generate one
Invalid Lake configuration.  Please restart the server after fixing the Lake configuration file.

Can you try if just changing "4" to "5" in lake-manifest.json helps?

Shreyas Srinivas (Sep 20 2023 at 13:01):

Relevant thread: https://leanprover.zulipchat.com/#narrow/stream/287929-mathlib4/topic/Lake.20update.20behaviour

Patrick Massot (Sep 20 2023 at 13:01):

We are talking about adding three lines of code to lake saying that when a user runs lake update in the project depending on mathlib then it should copy mathlib's lean-toolchain file.

Patrick Massot (Sep 20 2023 at 13:02):

But this is prevented by people who insist that mathlib is just a random Lean library with no special status at all.

Shreyas Srinivas (Sep 20 2023 at 13:03):

It would be a terribly missed opportunity if we have a nice formal math library that is not usable for verifying systems that could do with a math library, because its build tool makes life hard for non-trivial non-purely-math projects

Bolton Bailey (Sep 20 2023 at 13:04):

rm -rf ~/.cache, deleting the manifest, rerunning lake update worked, thanks.

Bolton Bailey (Sep 20 2023 at 13:06):

What I would like at the very least is for lake update to alert me if my dependencies toolchains are different from my own.

Shreyas Srinivas (Sep 20 2023 at 13:06):

They will almost always be different. Std will be on one toolchain, Qq on another, and maybe future baseline dependencies too.

Bolton Bailey (Sep 20 2023 at 13:07):

Wait, so things are supposed to work even if the toolchains are different?

Shreyas Srinivas (Sep 20 2023 at 13:07):

Check your current lean-toolchain files for dependencies

Bolton Bailey (Sep 20 2023 at 13:08):

I'm confused. If it's acceptable to lean for my project and my dependencies to be on different toolchains, then what was happening to me a few minutes ago?

Bolton Bailey (Sep 20 2023 at 13:11):

I am not a language-minded person, so I don't know what a toolchain is really. The Rust book tells me it's "a complete installation of a compiler and related tools". Is that what it is for Lean?

Eric Wieser (Sep 20 2023 at 13:11):

Patrick Massot said:

But this is prevented by people who insist that mathlib is just a random Lean library with no special status at all.

The compromise here would be:

  • Collect the lean-toolchain files of all the non-inherited packages
  • Let the user pick between them (with the auto-selection option being mathlib)

Eric Wieser (Sep 20 2023 at 13:12):

In most cases there's only one non-inherited package anyway (mathlib)

Patrick Massot (Sep 20 2023 at 13:12):

Bolton, the solution to the riddle is simple: mathlib is not a random library, it has takes precedence.

Shreyas Srinivas (Sep 20 2023 at 13:12):

Bolton Bailey said:

I'm confused. If it's acceptable to lean for my project and my dependencies to be on different toolchains, then what was happening to me a few minutes ago?

I'd love to see the design reason behind this.

Mauricio Collares (Sep 20 2023 at 13:13):

Shreyas Srinivas said:

purge the cache folder. ~/.cache

If you're on Linux, ~/.cache is used for all programs! It's best to purge ~/.cache/mathlib only.

Shreyas Srinivas (Sep 20 2023 at 13:13):

Patrick Massot said:

Bolton, the solution to the riddle is simple: mathlib is not a random library, it has takes precedence.

Wrong. This is most likely the result of an on-the-fly decision that was taken to make it feasible to adapt quickly as mathlib was being ported and changed.

Patrick Massot (Sep 20 2023 at 13:14):

Eric, there is no need to bother normal users. The default behavior should be to use mathlib's toolchain if mathlib is used. You can ask questions when mathlib isn't used or when lake update is called with a special flag for special situations.

Eric Wieser (Sep 20 2023 at 13:14):

Normal users only have one such package so the question would never be shown to them

Shreyas Srinivas (Sep 20 2023 at 13:14):

This was roughly what Sebastian suggested here: https://leanprover.zulipchat.com/#narrow/stream/287929-mathlib4/topic/Lake.20update.20behaviour

Shreyas Srinivas (Sep 20 2023 at 13:15):

But in the long run it is infeasible because there will likely be projects that need both mathlib and some other library, say from the PL side, natural sciences side or what not.

Bolton Bailey (Sep 20 2023 at 13:16):

Patrick Massot said:

Bolton, the solution to the riddle is simple: mathlib is not a random library, it has takes precedence.

When I look in my manifest file, I see lots of dependencies. mathlib4 std aesop for example. Do all of these have different toolchains?

Eric Wieser (Sep 20 2023 at 13:16):

I suggest showing the message because if the user is presented with:

WARNING: dependencies use incompatible lean versions:
1. Lean `4.1.0` (`mathlib`) [default]
2. Lean `4.0.0` (`your_other_project`)
Which version would you like [1]:

that notifies them that probably they need to update the version of lean in your_other_project first

Shreyas Srinivas (Sep 20 2023 at 13:16):

If you hamstring lake for the immediate requirement, then this will push away any future attempts to build libraries other than mathlib.

Patrick Massot (Sep 20 2023 at 13:18):

Bolton Bailey said:

Patrick Massot said:

Bolton, the solution to the riddle is simple: mathlib is not a random library, it has takes precedence.

When I look in my manifest file, I see lots of dependencies. mathlib4 std aesop for example. Do all of these have different toolchains?

Read again my explanation: those other dependencies are not mathlib, hence they have no role in this story.

Eric Wieser (Sep 20 2023 at 13:18):

Those packages are irrelevant because they have inherited: true

Eric Wieser (Sep 20 2023 at 13:19):

In a typical manifest, there is only one inherited: false and it's mathlib

Bolton Bailey (Sep 20 2023 at 13:21):

I'm afraid you're going to have to spell this out more for me. Even though these other packages are not mathlib, they are still dependencies of my project, according to my manifest. How is it possible that these can coexist if they have different toolchains?

Eric Wieser (Sep 20 2023 at 13:22):

Anything in the manifest that has inherited: true will be updated to match the versions that are present in the non-inherited dependencies (i.e. mathlib). You know they can co-exist, because they built together in the mathlib4 CI.

Bolton Bailey (Sep 20 2023 at 13:23):

When I hear "project A has dependencies B, C, D" my understanding is that code in project A is supposed to be able to call code from projects B, C, and D. But if B, C, and D have all been compiled by different toolchains, I assume their binaries are all in different formats, and A will not be able to use them all.

Eric Wieser (Sep 20 2023 at 13:23):

lake-manifest doesn't pull in compiled artifacts for typical B, C, D, only the source code

Eric Wieser (Sep 20 2023 at 13:24):

lake exe cache put is only run by mathlib, so the versions in the cache are the ones that were built with the mathlib4 toolchain version

Scott Morrison (Sep 20 2023 at 13:24):

If you are not using lake cache (provided by Mathlib), then no compiling is happening anywhere except on your machine, and that is all happening with the lean-toolchain from your project.

Scott Morrison (Sep 20 2023 at 13:25):

If an upstream dependency has a different lean-toolchain in its source repository, that is simply ignored, and things either break or don't break when you compile.

Eric Wieser (Sep 20 2023 at 13:25):

Eric Wieser said:

In a typical manifest, there is only one inherited: false and it's mathlib

I think there's some confusion here because this inherited field is new, and lake update was really broken before it was added.

Scott Morrison (Sep 20 2023 at 13:25):

lake cache adds something slightly more complicated to the mix: that pulls down oleans, compiled on CI machines using the lean-toolchain in whichever version of Mathlib you are using.

Bolton Bailey (Sep 20 2023 at 13:27):

Ok I think I'm beginning to understand now, thank you all for your patience. When I lake update without updating my toolchain, I am compiling mathilb with an old compiler which might not correctly compile the current version of mathlib. It may be that it coincidentally works anyway, or works for some of the other dependencies, but it also may not.

Bolton Bailey (Sep 20 2023 at 13:29):

I still feel like there should be some kind of warning that explicitly says "you are trying to compile code with a toolchain that is not present in the repository for that code".

Bolton Bailey (Sep 20 2023 at 13:31):

Shreyas seemed to think that this would tend to mean a lot of warnings, because compiling dependencies with different toolchains is common. Still seems worth a warning though, no? This feels like the sort of thing which could throw a wrench in whatever the user is doing, so the user should be informed.

Eric Wieser (Sep 20 2023 at 13:31):

I think it's almost never common

Shreyas Srinivas (Sep 20 2023 at 13:31):

Bolton Bailey said:

I still feel like there should be some kind of warning that explicitly says "you are trying to compile code with a toolchain that is not present in the repository for that code".

I can get behind this. But now that lean is working towards a more "stable"release schedule, it might also make sense to have proper dependency management.

Eric Wieser (Sep 20 2023 at 13:32):

I'd be interested in seeing a project that exists right now that has more than one inherited: false dependency

Shreyas Srinivas (Sep 20 2023 at 13:35):

@Eric Wieser : You are looking at existing projects. Is the final goal of the lean 4 project purely mathlib and its dependents? I often get the sense in these discussions that some more mathy people really believe this is the only focus of lean for good.

Eric Wieser (Sep 20 2023 at 13:36):

lake update is never going to be run on projects that don't exist, so we should optimize for the ones that do

Shreyas Srinivas (Sep 20 2023 at 13:39):

Eric Wieser said:

lake update is never going to be run on projects that don't exist, so we should optimize for the ones that do

Lean is a language, and languages live long. It is natural that the design process will take into account existing and future needs, which align with its stated goals ( being a programming language is included in this), unless what you really want is javascript for math.

Shreyas Srinivas (Sep 20 2023 at 13:47):

More on topic: The idea of adding options for choosing among the non-inherited toolchains sounds nice as a fix for now.

Scott Morrison (Sep 20 2023 at 13:52):

I think this discussion about whether it's "only mathematics" or "no mathematics" driving design is unnecessary noise at the moment.

The FRO's goals are to develop Lean 4 into a widely used language that supports users far beyond Mathlib and research mathematicians. At the same time, Mathlib is a huge part of the Lean ecosystem, now and (hopefully!) in 5 years time, and we want to make sure the tooling works well for mathematicians. We are well aware that there are lots of edges (some sharp, some rough!).

Making the tooling work well for "low-tech" users is very important: we need to support the complicated use cases, but we need to get users in the door as well.

Shreyas Srinivas (Sep 20 2023 at 14:10):

@Scott Morrison : Thanks, that really helps. We both agree that supporting non-techie users is very important. We also agree that we need to get users in the door. This is why, if we make instant decisions that work for the existing 99% projects, I fear that we will end up excluding new users who are keen on pushing in new directions.

To take a simple, purely hypothetical, illustrative example, suppose I want to write a utility in lean that relies on more than one dependency, say Mathlib (or some small part of it), some http server library, and some other utility libraries (say for GUI or handling forms). I choose an example like this, because in most languages this might be reasonably basic and something new programmers try as a first project. Since we are talking about lean it might also be nice to fiddle with mathlib and build something depending on it. Suppose, further, that I am merely a user of these libraries, and it is upto the respective maintainers to bump up the lean-toolchain. Suppose lake update simply matches the mathlib toolchain, which doesn't match the other libraries. How would I go about implementing my utility? For a new user this might simply reinforce the notion that unless they are doing something like the existing users (i.e. math), lean is going to throw up basic entry-level barriers.

As for solutions, Eric and Bolton's suggestions seem to be good options. In the earlier thread Sebastian also suggested that projects which are started with math have this default behaviour for lake update, whereas other projects continue to work as is. In the longer run, it seems some dependency management is necessary. If it all happens transparently, then all the goals you mention would be met.

Mac Malone (Sep 20 2023 at 19:08):

@Patrick Massot, @Eric Wieser: In the short term, I have created mathlib4#7282, which will add a check (and update the lean-toolchain!) if the workspace' s toolchain does not match Mathlib's.

Bolton Bailey (Sep 20 2023 at 20:11):

That's awesome, thanks for taking your time and using your expertise to help smooth things out for users like me!

Mac Malone (Sep 21 2023 at 06:33):

@Bolton Bailey Thanks! Unfortunately, @Mario Carneiro does not seem to happy with it. :(

Mario Carneiro (Sep 21 2023 at 06:35):

It should be fairly straightforward to move the logic into cache like I suggested, the effect for users should be approximately the same

Mario Carneiro (Sep 21 2023 at 06:39):

(also it should prompt the user before making the replacement, I don't like the idea of lake build modifying things outside the build directory without explicit approval from the user)

Mac Malone (Sep 21 2023 at 06:39):

@Mario Carneiro I was honestly not sure what you were suggesting. Do you mean performing this check within cache itself? That seems hard as it does not have access to Lake configuration and thus directory structure (...oh, looking back at cache apparently it just hard codes all of that).

Mario Carneiro (Sep 21 2023 at 06:40):

...I was resisting the urge to point out how a certain PR would have been able to fix that issue

Mario Carneiro (Sep 21 2023 at 06:41):

but yes, cache currently solves the problem by ignoring it

Mario Carneiro (Sep 21 2023 at 06:41):

lean-cache was written in a bit more principled way which is why it actually needs that data from lake

Bolton Bailey (Sep 21 2023 at 06:41):

I don't see why it can't be a warning in both places. Seems like good policy to just warn the user whenever they do something dangerous, and both "compiling code with a toolchain it doesn't know about" and "getting cached oleans built with a different toolchain than the one in the directory" sound dangerous to me.

Mario Carneiro (Sep 21 2023 at 06:44):

my proposal is something like:

$ lake exe cache get
warning: The checked out mathlib's lean-toolchain does not match the current project's lean-toolchain,
so the downloaded cache files will not work (`lake build` will ignore them).
Copy mathlib's lean-toolchain into the current project? [Y/n]

Marc Huisinga (Sep 21 2023 at 06:55):

Mario Carneiro said:

my proposal is something like:

$ lake exe cache get
warning: The checked out mathlib's lean-toolchain does not match the current project's lean-toolchain,
so the downloaded cache files will not work (`lake build` will ignore them).
Copy mathlib's lean-toolchain into the current project? [Y/n]

On a side note: It would be nice if command line prompts could always be circumvented with flags for downstream tooling.

Mario Carneiro (Sep 21 2023 at 07:14):

there is always yes | :P

Marc Huisinga (Sep 21 2023 at 07:17):

Mario Carneiro said:

there is always yes | :P

Unfortunately that then requires downstream tooling to provide special handling for Windows :(

Mario Carneiro (Sep 21 2023 at 07:17):

(I agree with you btw)

Mac Malone (Sep 21 2023 at 07:18):

@Marc Huisinga I agree as well. However, this seems very difficulty for an arbitrary lake exe ... as how and if the exe prompts is entirely up to it.

Mario Carneiro (Sep 21 2023 at 07:18):

right, this is a cache flag

Mario Carneiro (Sep 21 2023 at 07:18):

cache already has some flags IIRC

Mac Malone (Sep 21 2023 at 07:19):

I took the "always" in @Marc Huisinga's statement to imply that he wanted this to be automatic for anything in Lake (which I agree would be nice; I just do not know how to accomplish it).

Mario Carneiro (Sep 21 2023 at 07:20):

I think it was more like "this is a CLI design principle that you should apply generally"

Marc Huisinga (Sep 21 2023 at 07:20):

No, it was just a comment about command line applications within the Lean ecosystem in general.

Mac Malone (Sep 21 2023 at 07:20):

Ah, my bad.

Mac Malone (Sep 21 2023 at 07:21):

Yeah, having a -y is always good design.

Mario Carneiro (Sep 21 2023 at 07:21):

(and -n in this case)

Mac Malone (Sep 21 2023 at 07:22):

-n is harder because it often conflicts with the desired short name of other common options (i.e., anything countable).

Mario Carneiro (Sep 21 2023 at 07:22):

modulo syntax bikeshedding

Mario Carneiro (Sep 21 2023 at 07:23):

I don't think short names are too important here, anything seems fine

Mac Malone (Sep 21 2023 at 07:23):

---always-yes and --always-no are the long-form disambiguates.

Mario Carneiro (Sep 21 2023 at 07:23):

well in this case I wouldn't have any always command

Mario Carneiro (Sep 21 2023 at 07:23):

it's just about this one option

Mario Carneiro (Sep 21 2023 at 07:24):

--update-toolchain=yes

Mario Carneiro (Sep 21 2023 at 07:24):

-y seems fine for "default everything"

Mac Malone (Sep 21 2023 at 07:25):

That odd part is that usually for prompts with y/n and always-yes/always-no configurations one of the options generally cause the program to exit (hence the ability to just rely on -y). In cases were both do something, a specific option like you suggested is generally used.

Mario Carneiro (Sep 21 2023 at 07:26):

maybe it should be [Y/n/exit]?

Mario Carneiro (Sep 21 2023 at 07:26):

at a prompt there is always ctrl-C

Mac Malone (Sep 21 2023 at 07:26):

[(Y)es/(n)o/e(x)it]

Mac Malone (Sep 21 2023 at 07:27):

:laughing:

Tomas Skrivan (Oct 04 2023 at 16:09):

After some time I wanted to update mathlib in my project. It was a bit frustrating experience, after running lake update I got mathlib with commit 1b346a7bb0c56b8d1f6526161359a75384276be0 and lake exe cache get failed. Could it be because CI failed for that commit? After multiple retries and deletion of lake-packages I ran lake update again, got a5516f580a31bf7ec20be84cc4fe50824cd3e35a commit and everything works.


Last updated: Dec 20 2023 at 11:08 UTC