Zulip Chat Archive

Stream: general

Topic: Cache becomes out of date when new package is added


Peiran Wu (Oct 24 2023 at 16:44):

I tried out Paperproof (in a project that is not mathlib4 itself) and it looked great. But I had trouble getting Paperproof to work it inside mathlib4 itself.

Let's say I start with the master branch of mathlib4 with no local changes. I run lake exe cache get which works as expected. As instructed by the README.md of Paperproof, I add the following line to lakefile.lean, below all the other require statements.

require Paperproof from git "https://github.com/Paper-Proof/paperproof.git"@"main"/"lean"

I run lake update Paperproof to install the package.

Now I start/restart the Lean server in VS Code and open a Lean file (say "Mathlib/GroupTheory/GroupAction/Basic.lean"). I get a notification (nice feature btw) that says imports are out of date and must be rebuilt. (Note that I haven't added import Paperproof.) If I rebuild, it'll involve 1000+ files and take a long time. If I try to run lake exe cache get, then I get something like the following.

Attempting to download 3849 file(s)
Downloaded: 0 file(s) [attempted 3849/3849 = 100%] (0% success)
Warning: some files were not found in the cache.
This usually means that your local checkout of mathlib4 has diverged from upstream.
If you push your commits to a branch of the mathlib4 repository, CI will build the oleans and they will be available later.
No cache files to decompress

And I think this is happening because all the online cache is considered "out of date" as well.

So my questions are, did I do something wrong? Is it expected/necessary/ideal behaviour that cache becomes out of date when a new package is added to lakefile.lean/lake-manifest.json? Is there any way to avoid rebuilding mathlib locally?

Shreyas Srinivas (Oct 24 2023 at 17:00):

Ah so we have a situation with two non-inherited dependencies : sorry mathlib itself is the project so this is not true

Shreyas Srinivas (Oct 24 2023 at 17:01):

@Eric Wieser : per your proposal which tool chain would lake pick on calling lake update?

Shreyas Srinivas (Oct 24 2023 at 17:01):

Did it get implemented?

Shreyas Srinivas (Oct 24 2023 at 17:02):

@Peiran Wu : can you check if the lean-toolchain of your project and of mathlib4 inside lake-packages match?

Shreyas Srinivas (Oct 24 2023 at 17:03):

You could try lake exe cache clean!

Shreyas Srinivas (Oct 24 2023 at 17:04):

And then repeat lake exe cache get!

Mauricio Collares (Oct 24 2023 at 17:14):

You can instead create a standalone project and add paperproof and mathlib as dependencies. I think cache will work correctly in this case.

Mauricio Collares (Oct 24 2023 at 17:14):

(I think cache misses are expected if you're adding paperproof as a Mathlib dependency directly)

Mauricio Collares (Oct 24 2023 at 17:15):

lake new ProjectName math creates a project with a mathlib dependency, then you can just add paperproof to it as you did above. lake exe cache get in the root folder should work

Mauricio Collares (Oct 24 2023 at 17:17):

Sorry, I now realize you already know this works, and you are adding it directly to mathlib on purpose. I'll have to defer to the experts.

Ruben Van de Velde (Oct 24 2023 at 17:20):

Yeah, I don't know how this can work unless we add it to mathlib upstream

Shreyas Srinivas (Oct 24 2023 at 17:22):

Apologies! my suggestion won't work. The cache always downloads its dependencies caches too. Maybe that's the missing part, since paperproof doesn't have a cache.

Mauricio Collares (Oct 24 2023 at 17:29):

Unfortunately it's not just that, I think. lakefile.lean and lake-manifest.json are hashed by the cache script (and probably by Lake itself). You can't change them in mathlib without causing cache misses.

Peiran Wu (Oct 24 2023 at 17:43):

Mauricio Collares said:

Unfortunately it's not just that, I think. lakefile.lean and lake-manifest.json are hashed by the cache script (and probably by Lake itself). You can't change them in mathlib without causing cache misses.

This would have been my guess as well. Would there be any benefit in modifying the behaviour of the cache script, e.g. taking in account some more specific information such as dependencies instead of just hashing lakefile.lean and/or lake-manifest.json? Or would it break a lot of other things / make things slower?

Shreyas Srinivas (Oct 24 2023 at 17:53):

they are not. At least not by lake. I learnt this just yesterday

Shreyas Srinivas (Oct 24 2023 at 17:53):

See : https://leanprover.zulipchat.com/#narrow/stream/113488-general/topic/Potential.20data.20loss.20from.20.60lake.20clean.60.20with.204.2E2.2E0-rc2.2F3/near/398158713

Mario Carneiro (Oct 24 2023 at 20:49):

Peiran Wu said:

Mauricio Collares said:

Unfortunately it's not just that, I think. lakefile.lean and lake-manifest.json are hashed by the cache script (and probably by Lake itself). You can't change them in mathlib without causing cache misses.

This would have been my guess as well. Would there be any benefit in modifying the behaviour of the cache script, e.g. taking in account some more specific information such as dependencies instead of just hashing lakefile.lean and/or lake-manifest.json? Or would it break a lot of other things / make things slower?

This is by design. You should not be adding paperproof as a dependency of mathlib; or if you do you should either do it in a branch, run CI and get some fresh caches to use, or you should make a PR adding it to master.

Mario Carneiro (Oct 24 2023 at 20:51):

Changes to the mathlib lakefile invalidate the cache because it is difficult to determine whether those changes have invalidated build outputs. We would need to ask lake, and this would require a completely different implementation of cache (which may happen someday but not today)

Shreyas Srinivas (Oct 24 2023 at 21:00):

Now I am confused. To be clear you mean cache hashes lakefile.lean and not lake itself right?

Mario Carneiro (Oct 24 2023 at 22:07):

yes

Mario Carneiro (Oct 24 2023 at 22:08):

cache is not lake, it only has limited view into what builds depend on (and lake wants to keep it that way), so cache is conservative and assumes any lakefile changes could potentially invalidate build outputs

Mario Carneiro (Oct 24 2023 at 22:09):

(In fact, this is not even enough because lake has -K config options you can pass to change build configuration without even touching lakefile.lean)

Shreyas Srinivas (Oct 24 2023 at 22:21):

okay. Btw, in the current state of affairs, my hypothetical scenario of a few weeks ago becomes real. Unless and until paperproof becomes a mathlib dependency, the only straightforward way to use paperproof with mathlib is to have a math project with paperproof as an extra dependency right? So there can easily be projects with two top level dependencies. This might create a real world test case for the lake update proposals (unless paperproof tracks mathlib's toolchain).

Mario Carneiro (Oct 24 2023 at 22:24):

What's the problem there?

Mario Carneiro (Oct 24 2023 at 22:25):

You would just use the lean-toolchain of mathlib and a corresponding version of paperproof

Shreyas Srinivas (Oct 24 2023 at 22:25):

There isn't. There was a lot of discussion last month or so which was hypothetical at the time. Naturally it is interesting to observe a real world test case.

Mario Carneiro (Oct 24 2023 at 22:25):

this isn't a counterexample case though

Mario Carneiro (Oct 24 2023 at 22:26):

You need to make the example more complicated to actually hit problems

Shreyas Srinivas (Oct 24 2023 at 22:26):

Mario Carneiro said:

You would just use the lean-toolchain of mathlib and a corresponding version of paperproof

Do we have an easy way of picking this version other than going through the commit history?

Peiran Wu (Oct 25 2023 at 12:20):

Mario Carneiro said:

Changes to the mathlib lakefile invalidate the cache because it is difficult to determine whether those changes have invalidated build outputs. We would need to ask lake, and this would require a completely different implementation of cache (which may happen someday but not today)

I see. This fully answers the original questions I asked. Thanks.


Last updated: Dec 20 2023 at 11:08 UTC