Zulip Chat Archive
Stream: general
Topic: Cache becomes out of date when new package is added
Peiran Wu (Oct 24 2023 at 16:44):
I tried out Paperproof (in a project that is not mathlib4 itself) and it looked great. But I had trouble getting Paperproof to work it inside mathlib4 itself.
Let's say I start with the master branch of mathlib4 with no local changes. I run lake exe cache get
which works as expected. As instructed by the README.md of Paperproof, I add the following line to lakefile.lean
, below all the other require
statements.
require Paperproof from git "https://github.com/Paper-Proof/paperproof.git"@"main"/"lean"
I run lake update Paperproof
to install the package.
Now I start/restart the Lean server in VS Code and open a Lean file (say "Mathlib/GroupTheory/GroupAction/Basic.lean"). I get a notification (nice feature btw) that says imports are out of date and must be rebuilt. (Note that I haven't added import Paperproof
.) If I rebuild, it'll involve 1000+ files and take a long time. If I try to run lake exe cache get
, then I get something like the following.
Attempting to download 3849 file(s)
Downloaded: 0 file(s) [attempted 3849/3849 = 100%] (0% success)
Warning: some files were not found in the cache.
This usually means that your local checkout of mathlib4 has diverged from upstream.
If you push your commits to a branch of the mathlib4 repository, CI will build the oleans and they will be available later.
No cache files to decompress
And I think this is happening because all the online cache is considered "out of date" as well.
So my questions are, did I do something wrong? Is it expected/necessary/ideal behaviour that cache becomes out of date when a new package is added to lakefile.lean
/lake-manifest.json
? Is there any way to avoid rebuilding mathlib locally?
Shreyas Srinivas (Oct 24 2023 at 17:00):
Ah so we have a situation with two non-inherited dependencies : sorry mathlib itself is the project so this is not true
Shreyas Srinivas (Oct 24 2023 at 17:01):
@Eric Wieser : per your proposal which tool chain would lake pick on calling lake update?
Shreyas Srinivas (Oct 24 2023 at 17:01):
Did it get implemented?
Shreyas Srinivas (Oct 24 2023 at 17:02):
@Peiran Wu : can you check if the lean-toolchain of your project and of mathlib4 inside lake-packages match?
Shreyas Srinivas (Oct 24 2023 at 17:03):
You could try lake exe cache clean!
Shreyas Srinivas (Oct 24 2023 at 17:04):
And then repeat lake exe cache get!
Mauricio Collares (Oct 24 2023 at 17:14):
You can instead create a standalone project and add paperproof and mathlib as dependencies. I think cache will work correctly in this case.
Mauricio Collares (Oct 24 2023 at 17:14):
(I think cache misses are expected if you're adding paperproof as a Mathlib dependency directly)
Mauricio Collares (Oct 24 2023 at 17:15):
lake new ProjectName math
creates a project with a mathlib dependency, then you can just add paperproof to it as you did above. lake exe cache get
in the root folder should work
Mauricio Collares (Oct 24 2023 at 17:17):
Sorry, I now realize you already know this works, and you are adding it directly to mathlib on purpose. I'll have to defer to the experts.
Ruben Van de Velde (Oct 24 2023 at 17:20):
Yeah, I don't know how this can work unless we add it to mathlib upstream
Shreyas Srinivas (Oct 24 2023 at 17:22):
Apologies! my suggestion won't work. The cache always downloads its dependencies caches too. Maybe that's the missing part, since paperproof doesn't have a cache.
Mauricio Collares (Oct 24 2023 at 17:29):
Unfortunately it's not just that, I think. lakefile.lean
and lake-manifest.json
are hashed by the cache script (and probably by Lake itself). You can't change them in mathlib without causing cache misses.
Peiran Wu (Oct 24 2023 at 17:43):
Mauricio Collares said:
Unfortunately it's not just that, I think.
lakefile.lean
andlake-manifest.json
are hashed by the cache script (and probably by Lake itself). You can't change them in mathlib without causing cache misses.
This would have been my guess as well. Would there be any benefit in modifying the behaviour of the cache script, e.g. taking in account some more specific information such as dependencies instead of just hashing lakefile.lean
and/or lake-manifest.json
? Or would it break a lot of other things / make things slower?
Shreyas Srinivas (Oct 24 2023 at 17:53):
they are not. At least not by lake. I learnt this just yesterday
Shreyas Srinivas (Oct 24 2023 at 17:53):
Mario Carneiro (Oct 24 2023 at 20:49):
Peiran Wu said:
Mauricio Collares said:
Unfortunately it's not just that, I think.
lakefile.lean
andlake-manifest.json
are hashed by the cache script (and probably by Lake itself). You can't change them in mathlib without causing cache misses.This would have been my guess as well. Would there be any benefit in modifying the behaviour of the cache script, e.g. taking in account some more specific information such as dependencies instead of just hashing
lakefile.lean
and/orlake-manifest.json
? Or would it break a lot of other things / make things slower?
This is by design. You should not be adding paperproof as a dependency of mathlib; or if you do you should either do it in a branch, run CI and get some fresh caches to use, or you should make a PR adding it to master.
Mario Carneiro (Oct 24 2023 at 20:51):
Changes to the mathlib lakefile invalidate the cache because it is difficult to determine whether those changes have invalidated build outputs. We would need to ask lake, and this would require a completely different implementation of cache (which may happen someday but not today)
Shreyas Srinivas (Oct 24 2023 at 21:00):
Now I am confused. To be clear you mean cache hashes lakefile.lean and not lake itself right?
Mario Carneiro (Oct 24 2023 at 22:07):
yes
Mario Carneiro (Oct 24 2023 at 22:08):
cache is not lake, it only has limited view into what builds depend on (and lake wants to keep it that way), so cache is conservative and assumes any lakefile changes could potentially invalidate build outputs
Mario Carneiro (Oct 24 2023 at 22:09):
(In fact, this is not even enough because lake has -K
config options you can pass to change build configuration without even touching lakefile.lean)
Shreyas Srinivas (Oct 24 2023 at 22:21):
okay. Btw, in the current state of affairs, my hypothetical scenario of a few weeks ago becomes real. Unless and until paperproof becomes a mathlib dependency, the only straightforward way to use paperproof with mathlib is to have a math project with paperproof as an extra dependency right? So there can easily be projects with two top level dependencies. This might create a real world test case for the lake update
proposals (unless paperproof tracks mathlib's toolchain).
Mario Carneiro (Oct 24 2023 at 22:24):
What's the problem there?
Mario Carneiro (Oct 24 2023 at 22:25):
You would just use the lean-toolchain of mathlib and a corresponding version of paperproof
Shreyas Srinivas (Oct 24 2023 at 22:25):
There isn't. There was a lot of discussion last month or so which was hypothetical at the time. Naturally it is interesting to observe a real world test case.
Mario Carneiro (Oct 24 2023 at 22:25):
this isn't a counterexample case though
Mario Carneiro (Oct 24 2023 at 22:26):
You need to make the example more complicated to actually hit problems
Shreyas Srinivas (Oct 24 2023 at 22:26):
Mario Carneiro said:
You would just use the lean-toolchain of mathlib and a corresponding version of paperproof
Do we have an easy way of picking this version other than going through the commit history?
Peiran Wu (Oct 25 2023 at 12:20):
Mario Carneiro said:
Changes to the mathlib lakefile invalidate the cache because it is difficult to determine whether those changes have invalidated build outputs. We would need to ask lake, and this would require a completely different implementation of cache (which may happen someday but not today)
I see. This fully answers the original questions I asked. Thanks.
Last updated: Dec 20 2023 at 11:08 UTC