Zulip Chat Archive

Stream: mathlib4

Topic: cache not working after bump to v4.8.0-rc1v?


Moritz Firsching (May 03 2024 at 07:17):

after #12548, db651742f2c631e5b8525e9aabcf3d61ed094a4a, the cache is not working for me anymore.
I'm getting

> lake exe cache get
info: std: updating repository '././.lake/packages/std' to revision '3025cb124492b423070f20cf0a70636f757d117f'
info: aesop: updating repository '././.lake/packages/aesop' to revision '0a21a48c286c4a4703c0be6ad2045f601f31b1d0'
info: proofwidgets: updating repository '././.lake/packages/proofwidgets' to revision 'fe1eff53bd0838c657aa6126fe4dd75ad9939d9a'
info: Cli: updating repository '././.lake/packages/Cli' to revision 'a11566029bd9ec4f68a65394e8c3ff1af74c1a29'
info: importGraph: updating repository '././.lake/packages/importGraph' to revision '188eb34fcf1125e89d651ad462d02598219718ca'
Attempting to download 4450 file(s)
Downloaded: 0 file(s) [attempted 4450/4450 = 100%] (0% success)
Warning: some files were not found in the cache.
This usually means that your local checkout of mathlib4 has diverged from upstream.
If you push your commits to a branch of the mathlib4 repository, CI will build the oleans and they will be available later.
No cache files to decompress
> lake --version
Lake version 5.0.0-dcccfb7 (Lean version 4.8.0-rc1)

This is with a fresh clone of the mathlib4 repo.
On the commit before that getting the cache works fine. Is there something wrong on my end?

Ruben Van de Velde (May 03 2024 at 07:18):

Works for me on e6f552f77a91b06b52d1c4b08f61c77f54f09f37

Moritz Firsching (May 03 2024 at 07:20):

Thanks for checking. Weird, for me it doesn't work on that commit either.

Ruben Van de Velde (May 03 2024 at 07:22):

Try lake exe cache clean!

Moritz Firsching (May 03 2024 at 07:23):

That didn't fix it

Moritz Firsching (May 03 2024 at 07:25):

I tried it also on a different machine (albeit with a similar debian system) and also can't get the cache.

Kim Morrison (May 03 2024 at 07:29):

Working for me on macos at both of the commits mentioned.

Anyone with a debian system they could try this on?

Ruben Van de Velde (May 03 2024 at 07:31):

(I'm on Ubuntu)

Eric Wieser (May 03 2024 at 08:05):

The cache doesn't work for me on Gitpod (ubuntu)

Eric Wieser (May 03 2024 at 08:06):

(in an entirely fresh image started from #12617)

Sebastian Ullrich (May 03 2024 at 08:11):

It does not work for me either on NixOS, the curl.cfg starts with

url = https://lakecache.blob.core.windows.net/mathlib4/f/b503c218379d3bcc.ltar

Mario Carneiro (May 03 2024 at 08:16):

Isn't nix broken due to lean4#4015?

Sebastian Ullrich (May 03 2024 at 08:17):

I'm using elan

Sebastian Ullrich (May 03 2024 at 08:19):

~/lean/mathlib4   master  lake exe cache get Mathlib/Data/Array/Defs.lean
Attempting to download 1 file(s)
Downloaded: 0 file(s) [attempted 1/1 = 100%] (0% success)
Warning: some files were not found in the cache.
This usually means that your local checkout of mathlib4 has diverged from upstream.
If you push your commits to a branch of the mathlib4 repository, CI will build the oleans and they will be available later.
No cache files to decompress
 ~/lean/mathlib4   master  lean --version
Lean (version 4.8.0-rc1, x86_64-unknown-linux-gnu, commit dcccfb73cb24, Release)
 ~/lean/mathlib4   master  lean --githash
dcccfb73cb247e9478220375ab7de03f7c67e505
 ~/lean/mathlib4   master  md5sum lakefile.lean lean-toolchain lake-manifest.json Mathlib/Data/Array/Defs.lean
f8a55904cebc403cd531d9dca6409cc3  lakefile.lean
16a3113417af1e9269d1e457c33b08f1  lean-toolchain
d4ed6a00733f9b3463d561b0dcf068dc  lake-manifest.json
42f17e3585d3f6171b6562fd65f2b501  Mathlib/Data/Array/Defs.lean

Sebastian Ullrich (May 03 2024 at 08:27):

This may be an independent issue but it looks like the Lake output changes broke the "check the cache" step https://github.com/leanprover-community/mathlib4/actions/runs/8934086194/job/24540460290 /cc @Kim Morrison @Mac Malone

Mario Carneiro (May 03 2024 at 08:28):

yes, I was expecting that step to break, although it shouldn't have actually broken the cache

Mac Malone (May 03 2024 at 08:29):

Why does that step not use --no-build?

Mario Carneiro (May 03 2024 at 08:30):

pretty sure it didn't exist at the time

Sebastian Ullrich (May 03 2024 at 08:35):

Here is a small debug change to cache that should help pin-pointing the divergence between machines: https://github.com/leanprover-community/mathlib4/commit/2c0ce17b86ab26008ebb1db3f8b783fcc06844f5

$ lake exe cache get Mathlib/Data/Array/Defs.lean
[5586305563779175217, 1544972802846357171, 13046909329292126002, 3441304350335271181]
[13125509958613804700, 12918580028159898586, 8250920309656508580]
Attempting to download 1 file(s)
Downloaded: 0 file(s) [attempted 1/1 = 100%] (0% success)
Warning: some files were not found in the cache.
This usually means that your local checkout of mathlib4 has diverged from upstream.
If you push your commits to a branch of the mathlib4 repository, CI will build the oleans and they will be available later.
No cache files to decompress

Sebastian Ullrich (May 03 2024 at 08:40):

Could someone with a working cache try running that on branch cache-debug? @Ruben Van de Velde @Kim Morrison

Mario Carneiro (May 03 2024 at 08:44):

$ lake exe cache get Mathlib/Data/Array/Defs.lean
[12616063415761296130, 1544972802846357171, 13046909329292126002, 3441304350335271181]
[2128698828512116273, 12918580028159898586, 8250920309656508580]
Attempting to download 1 file(s)
Downloaded: 1 file(s) [attempted 1/1 = 100%] (100% success)
Decompressing 1 file(s)
unpacked in 7 ms

Mario Carneiro (May 03 2024 at 08:46):

here it is again with dbgTraceVal Lean.githash:

$ lake exe cache get Mathlib/Data/Array/Defs.lean
b470eb522bfd68ca96938c23f6a1bce79da8a99f
[12616063415761296130, 1544972802846357171, 13046909329292126002, 3441304350335271181]
[2128698828512116273, 12918580028159898586, 8250920309656508580]
No files to download
Decompressing 1 file(s)
unpacked in 2 ms

Sebastian Ullrich (May 03 2024 at 08:48):

Well that is not the githash I have :) . lean --version? OS?

Mario Carneiro (May 03 2024 at 08:50):

$ lean --version
Lean (version 4.8.0-rc1, x86_64-unknown-linux-gnu, commit b470eb522bfd, Release)

Mario Carneiro (May 03 2024 at 08:50):

this is the commit immediately prior to the one tagged as v4.8.0-rc1 in the repo

Mario Carneiro (May 03 2024 at 08:51):

@Kim Morrison ?

Mario Carneiro (May 03 2024 at 08:52):

I don't see anything suspicious in the release action which produced the 4.8.0-rc1 release, I don't understand why it would package this version

Mario Carneiro (May 03 2024 at 08:53):

Was 4.8.0-rc1 released more than once?

Kim Morrison (May 03 2024 at 08:53):

Oh, dear.

Kim Morrison (May 03 2024 at 08:53):

Yes, I did a force push to the tag earlier today when I realised I had not set the release flag.

Kim Morrison (May 03 2024 at 08:54):

:-(

Mario Carneiro (May 03 2024 at 08:54):

I'm not sure there is any way to kick elan to re-download things it has already downloaded

Kim Morrison (May 03 2024 at 08:54):

I hope we don't need an rc2 just for this.

Mario Carneiro (May 03 2024 at 08:55):

...so we might need an rc2 just for this

Kim Morrison (May 03 2024 at 08:55):

I mean you can delete the toolchain locally.

Mario Carneiro (May 03 2024 at 08:55):

because the fact that my cache is succeeding and Sebastian's isn't means that the cache files are also using the bad version

Mario Carneiro (May 03 2024 at 08:56):

so anyone who downloads rc1 right now will get a failing cache

Kim Morrison (May 03 2024 at 08:56):

Well, for caches built with the earlier rc1.

Kim Morrison (May 03 2024 at 08:56):

This was really dumb.

Kim Morrison (May 03 2024 at 08:57):

I am cooking right now. I better not burn dinner as well as the release...

Mario Carneiro (May 03 2024 at 08:57):

You will have to go into each of the hoskinson machines and manually clear the elan toolchain cache or else they will keep using the old version

Mario Carneiro (May 03 2024 at 08:57):

and then PSA everyone to also clear their toolchain cache

Mario Carneiro (May 03 2024 at 08:57):

rc2 seems easier

Kim Morrison (May 03 2024 at 08:58):

That would be doable, I have a script to run Hoskinson in parallel.

Kim Morrison (May 03 2024 at 08:58):

But yeah.

Kim Morrison (May 03 2024 at 08:58):

:-(

Mario Carneiro (May 03 2024 at 09:02):

Kim Morrison said:

Yes, I did a force push to the tag earlier today when I realised I had not set the release flag.

Checking this should probably be a CI step in the release action. Force pushing would have been fine as long as the release wasn't actually uploaded and used

Sebastian Ullrich (May 03 2024 at 09:13):

A new RC means touching half a dozen repos. If the Hoskinson machines are not an issue, then for humans we can simply commit a check for the bad release in cache. Even the Hoskinson fix could be commited to the CI script.

Mario Carneiro (May 03 2024 at 09:24):

Ok, sounds reasonable. Something like, if Lean.versionString == "4.8.0-rc1" && Lean.githash == "b470eb522bfd68ca96938c23f6a1bce79da8a99f" then call IO.process a few times to get elan to delete the bad release and download a new one, then restart yourself

Mario Carneiro (May 03 2024 at 09:25):

I'm not going to write that fix though rn

Kim Morrison (May 03 2024 at 10:07):

Okay, back from dinner, and I've cleared v4.8.0-rc1 off Hoskinson.

Kim Morrison (May 03 2024 at 10:13):

Okay, and #12623 is a patch to cache that detects the bad version, suggests elan toolchain uninstall leanprover/lean4:v4.8.0-rc1, and then exits.

Kim Morrison (May 03 2024 at 10:13):

I was on the old copy of rc1, so I've tested the PR, and it behaved as expected before and after.

Kim Morrison (May 03 2024 at 10:13):

If someone can merge, I am going to be glad that it is now the weekend here. :-)

Yaël Dillies (May 03 2024 at 10:16):

Kim Morrison said:

Okay, back from dinner, and I've cleared v4.8.0-rc1 off Hoskinson.

So this is expected?

Yaël Dillies (May 03 2024 at 10:17):

#11104 will need to put back on the queue once the toolchains are sorted

Kim Morrison (May 03 2024 at 10:19):

Yaël Dillies said:

Kim Morrison said:

Okay, back from dinner, and I've cleared v4.8.0-rc1 off Hoskinson.

So this is expected?

Hopefully only expected if I managed to delete the toolchain mid build. I've restarted that job now.

Kim Morrison (May 03 2024 at 10:20):

Ugh, but it seems to have instantly failed again the same way. :-(

Kim Morrison (May 03 2024 at 10:21):

I am more agressively cleaning up elan.

Kim Morrison (May 03 2024 at 10:22):

Okay, the job is going through now.

Kim Morrison (May 03 2024 at 10:22):

Thanks.

Kevin Buzzard (May 03 2024 at 10:28):

This is all part of the learning process. The system is so gigantic now and it's not surprising that the community is still learning about what we can and can't get away with. Thanks so much Kim for all your work getting mathlib onto 4.8, this has been a massive effort. I think the community is generally hoping and expecting that these jobs are going to get easier over time.

Kim Morrison (May 03 2024 at 10:28):

This one was pure user error. If I'd actually followed the nice checklist that I prepared during the last release, this wouldn't have happened.

Kevin Buzzard (May 03 2024 at 10:30):

Right -- we're learning what's important! rc2 is hardly the end of the world!

Yaël Dillies (May 03 2024 at 10:49):

So how do I get cache to work again?

Yaël Dillies (May 03 2024 at 10:49):

I'm not managing to get cache for #9000

Sebastian Ullrich (May 03 2024 at 10:57):

I restarted CI for your PR, it's using the correct Lean version now. Hopefully the cache should be available when it's finished

Yaël Dillies (May 03 2024 at 10:58):

Ah! The runners had the wrong version? I assumed I too had the wrong version locally

Mauricio Collares (May 03 2024 at 11:01):

Kim Morrison said:

This one was pure user error. If I'd actually followed the nice checklist that I prepared during the last release, this wouldn't have happened.

Something I've read once (I forgot where) is that it helps to turn the text checklist into a script that just prints the next step and waits for confirmation that it's done. The reason that's nice is that anyone can then replace print statements in the script by automation piecemeal whenever they have free time.

Mario Carneiro (May 03 2024 at 17:36):

Some caches have gone missing after Kim's intervention. #12634 should help with this, it resets the cache so that there won't be a ton of missing files that were generated for the wrong version. (Expedited merging would be appreciated.) Once it lands, everyone with cache issues should rebase their PR or update their project to a version including this commit.

Riccardo Brasca (May 03 2024 at 17:38):

Thanks!

Mario Carneiro (May 03 2024 at 17:50):

Actually I take it back, I misdiagnosed the issue and #12634 is not needed. The effect of Kim's change is actually that of a reset cache, so the instructions to fix it are to just push another commit or manually rerun CI on any bad branch. (Here a branch is "bad" if it is on 4.8.0-rc1 and has had no commits since Kim's manual intervention a few hours ago.)

Mario Carneiro (May 03 2024 at 17:58):

However users will want to update to a branch including #12623, which will tell you if you locally have the bad version installed and prompt you to reinstall it. If you have the bad version then you will only be able to use bad branches, meaning that going forward you will have a lot of cache misses. If you don't want to wait for the prompt, the fix is:

$ lean --version
Lean (version 4.8.0-rc1, x86_64-unknown-linux-gnu, commit b470eb522bfd, Release)
$ # if the above says b470eb522bfd you are on the bad version
$ elan toolchain uninstall leanprover/lean4:v4.8.0-rc1
$ lake exe cache get
info: downloading component 'lean'
179.6 MiB / 179.6 MiB (100 %)  12.8 MiB/s ETA:   0 s
info: installing component 'lean'
Attempting to download 4452 file(s)
Downloaded: 0 file(s) [attempted 249/4452 = 5%]^C
$ # if it says 0 file(s) here then the branch you are on is bad and you should follow the previous message
$ lean --version
Lean (version 4.8.0-rc1, x86_64-unknown-linux-gnu, commit dcccfb73cb24, Release)
$ # dcccfb73cb24 is the fixed version of 4.8.0-rc1

Kim Morrison (May 06 2024 at 07:04):

Mac Malone said:

Why does that step not use --no-build?

@Mac Malone I've adjusted Mathlib CI to use --no-build in #12688, but the reviewer did ask:

Jireh Loreaux said:

Kim Morrison I had to go to the source to figure out what --no-build does, or even that it exists. Did you find this documented somewhere? Or are the options for lake subcommands intentioned not documented in their respective lake subcmd --help?

Mac Malone (May 06 2024 at 14:59):

Kim Morrison said:

Jireh Loreaux said:

Kim Morrison I had to go to the source to figure out what --no-build does, or even that it exists. Did you find this documented somewhere? Or are the options for lake subcommands intentioned not documented in their respective lake subcmd --help?

Oh! --no-build was originally implemented as internal option for the Lake server, so like setup-file it was not in the help. However, it does appear to be generally useful, so it should be documented. I will need to fix that.


Last updated: May 02 2025 at 03:31 UTC