Zulip Chat Archive

Stream: ecosystem infrastructure

Topic: lake exe cache get always decompresses


Michael Stoll (Oct 26 2025 at 18:22):

When running lake exe cache get, I see something like

Downloaded: 2 file(s) [attempted 2/2 = 100%, 193 KB/s]
Decompressing 7394 file(s)

Why does it always decompress all files, even though it only downloaded a few of them? The others should already be present in their uncompressed form, I would assume, and so it should only be necessary to decompress the files that were actually downloaded.

Floris van Doorn (Oct 26 2025 at 23:15):

It is definitely always true that if you don't have to download a file, then the decompressed version is already in your lakefile. You might have downloaded that compressed file for a different project.

If it is significantly cheaper to check whether the correct decompressed version is already there than it is to decompress a file, then we could consider doing that.

Michael Stoll (Jan 31 2026 at 11:37):

Floris van Doorn said:

If it is significantly cheaper to check whether the correct decompressed version is already there than it is to decompress a file, then we could consider doing that.

Is this being looked at currently? Just now:

> lake exe cache get
Current branch: MS_Heights_3
Using cache (Azure) from origin: MichaelStollBayreuth/mathlib4
Attempting to download 3 file(s) from leanprover-community/mathlib4 cache
Downloaded: 0 file(s) [attempted 3/3 = 100%, 0 KB/s]
Attempting to download 3 file(s) from MichaelStollBayreuth/mathlib4 cache
Downloaded: 3 file(s) [attempted 3/3 = 100%, 300 KB/s]
Decompressing 7910 file(s)
Unpacked in 32499 ms
Completed successfully!

It is a bit annoying that after downloading three files, it still spends half a minute decompressing lots of files that are already present in decompressed form...

Kevin Buzzard (Jan 31 2026 at 11:49):

One of my machines has a slow filesystem and it can take over 2 minutes to unpack 7000 files.

Kim Morrison (Feb 01 2026 at 04:06):

You might be interested in

feat: pipeline downloads and decompression in cache get #32987

which has been waiting for review for a few weeks.

Kim Morrison (Feb 01 2026 at 05:22):

Michael Stoll said:

It is a bit annoying that after downloading three files, it still spends half a minute decompressing lots of files that are already present in decompressed form...

#34667 tries to address this, and seems to be much faster on my machine. I'm a bit surprised this is worth doing, as leantar is apparently filtering as well, and I would expect it to be fast.

Could someone take a look, @Michael Stoll or @Kevin Buzzard, with this PR, and see if it is faster for you too?

Michael Stoll (Feb 01 2026 at 09:20):

It does seem to improve matters:

$ gh pr checkout 34667
(...)
$ lake exe cache get
Current branch: skip-already-unpacked-cache
Using cache (Azure) from git@github.com:kim-em/mathlib4.git: kim-em/mathlib4
Attempting to download 5174 file(s) from leanprover-community/mathlib4 cache
Downloaded: 5174 file(s) [attempted 5174/5174 = 100%, 254 KB/s]
No files to download
Decompressing 5217 file(s) (2698 already unpacked)
Unpacked in 13255 ms
Completed successfully!

(I guess the No files to download refers to Kim's fork the Mathlib repo(?), which didn't have to provide files in this case.)

Michael Stoll (Feb 01 2026 at 09:25):

I seem to remember that somebody (@Kim Morrison ? Yes: #mathlib4 > `lake exe cache get` decompressing as it downloads @ 💬) floated the idea of interleaving decompressing files with downloading. This should also speed up things quite a bit (with the plan I have at home, downloading the cache takes quite a bit longer than decompressing the files). See #32987. It would be really nice if both PRs could make it into Mathlib soon!

Patrick Massot (Feb 01 2026 at 13:44):

I don’t know if it’s the same question, but it’s very frustrating to me that every time I update the Verbose Lean dependency in my course it decompresses mathlib that hasn’t changed at all. My lakefile only requires verbose, and the mathlib version used by verbose very very rarely changes.


Last updated: Feb 28 2026 at 14:05 UTC