Zulip Chat Archive

Stream: mathlib4

Topic: lake exe cache get


Johan Commelin (Feb 01 2023 at 18:08):

$ lake exe cache get
Attempting to download 12 file(s)
Decompressing 1115 file(s)
uncaught exception:
gzip: stdin: unexpected end of file
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now

Johan Commelin (Feb 01 2023 at 18:09):

Aha, I guess I should use get!?

Johan Commelin (Feb 01 2023 at 18:09):

But is there also an option that Lake tells me the hash of the corrupt file, so that I can remove it manually?

Mario Carneiro (Feb 01 2023 at 18:17):

This sort of thing happens a lot, since it seems that lake is not protected against multiple concurrent executions, which happens commonly when I run lake build but the lean 4 server is doing a "refresh file dependencies" action concurrently (which afaik you can't cancel)

Mario Carneiro (Feb 01 2023 at 18:17):

lake stamps over its own files, occasionally fails and sometimes produces corrupted files

Mario Carneiro (Feb 01 2023 at 18:19):

If we could detect the corrupted files in advance, lake exe cache get could just silently treat them as missing

Mario Carneiro (Feb 01 2023 at 18:20):

Do the corrupted files have any easily detectable pattern? (Are they 0 byte files?)

Johan Commelin (Feb 01 2023 at 18:26):

In my case not. I think it was a partial download.

Gabriel Ebner (Feb 01 2023 at 19:03):

Mario Carneiro said:

This sort of thing happens a lot, since it seems that lake is not protected against multiple concurrent executions, which happens commonly when I run lake build but the lean 4 server is doing a "refresh file dependencies" action concurrently (which afaik you can't cancel)

All of this is true, but it has nothing to do with lake exe cache which is a completely different beast. lake exe cache only runs when you manually start it (by design, we want mathlib4 to work offline). The corrupted archive error is IMO most likely due to 1) pressing ctrl-c while lake exe cache get is running, 2) running lake exe cache get in two different checkouts at the same time, 3) a bad internet connection and interrupted downloads.

Sebastian Ullrich (Feb 01 2023 at 20:19):

Hmm, do I read the code right in that the exit code and stderr of curl are silently discarded in downloadFiles? That doesn't seem prudent.

Gabriel Ebner (Feb 01 2023 at 20:20):

The exit code will typically be nonzero, I believe, since we expect 404s.

Sebastian Ullrich (Feb 01 2023 at 20:23):

https://daniel.haxx.se/blog/2019/07/22/curl-goez-parallel/

When doing many simultaneous transfers, how do you figure out how they all did individually, like from your script? That’s still to be figured out and implemented.

Heh.

Sebastian Ullrich (Feb 01 2023 at 20:31):

Apparently --write-out can be used for passing on the results https://stackoverflow.com/a/71967814. Parsing that from Lean doesn't sound too bad, at least compared to waiting for libcurl integration.

Gabriel Ebner (Feb 01 2023 at 20:32):

NOTE: The %-symbol is a special symbol in the win32-environment, where all occurrences of % must be doubled when using this option.

Sebastian Ullrich (Feb 01 2023 at 20:34):

Hopefully not relevant in the case of IO.Process though...?

Gabriel Ebner (Feb 01 2023 at 20:37):

It says "win32", not "cmd.exe". But hopefully that's just a simplification.

Wojciech Nawrocki (Feb 02 2023 at 03:15):

Gabriel Ebner said:

Note: of course std4/aesop/etc. versions also need to match exactly. Deleting any require std etc. in your lakefile is a good start, followed by lake update.

My experience so far has been to avoid lake update like the plague. For instance, mathlib4's lakefile requires std4@"main", and running lake update will automatically bump that to a version that may be incompatible. When depending on mathlib4 I have resorted to copying hashes from its lake-manifest.json into my lake-manifest.json. Pinning a Git tag is one potential fix, but I think long-term the lack of proper package versioning with semver will be a serious hurdle to using many Lean packages. Am I doing something wrong?

Gabriel Ebner (Feb 02 2023 at 04:04):

and running lake update will automatically bump that to a version that may be incompatible.

This won't happen. Unless you have a require std in your project.

Gabriel Ebner (Feb 02 2023 at 04:05):

Note that it used to be like that but I fixed that bug sometime last year.

Wojciech Nawrocki (Feb 02 2023 at 04:37):

Aha, I did have require std - thanks! While semver would be able to detect conflicting versions, I suppose removing that works as a stopgap.

Matthew Ballard (Feb 02 2023 at 11:17):

I have a package where I am not sure if I am seeing the expected behavior/making a mistake.

From clean copies of the repository, I run lake exe cache get and then lake build vs running only lake build. They both take the same amount of time to complete. Both spend 99% of time building (no compiling) files from the dependencies (std/mathlib/qq).

I can see that lake exe cache get does get the cache for the dependencies.

Am I doing this right? If so, is this expected behavior? Thanks.

Arthur Paulino (Feb 02 2023 at 11:34):

I am not on my PC but mathlib4 uses leanprover/lean4:nightly-2023-01-29

Arthur Paulino (Feb 02 2023 at 11:35):

Your lean-toolchain is set for an older version

Matthew Ballard (Feb 02 2023 at 11:39):

This is the problem. I thought 1-16 would be recent enough. Thank @Arthur Paulino

Gabriel Ebner (Feb 02 2023 at 17:34):

recent enough

To be clear, "recent enough" is not sufficient. It needs to be exactly the same version as mathlib.

Yaël Dillies (Feb 03 2023 at 00:42):

I am still not managing to make lake exe cache get work consistently :sad:

Yaël Dillies (Feb 03 2023 at 00:43):

I need to run the command thrice before I stop getting uncaught exception: lake-packages/std/build/lib/some_file: Can't unlink already-existing object

Yaël Dillies (Feb 03 2023 at 00:44):

And the syntax for lake exe cache get some_file still escapes me (probably because Windows paths).

Johan Commelin (Feb 03 2023 at 02:38):

@Arthur Paulino If I run lake exe cache get, then it spends a bit of time building and downloading files, and after that it spends ~8 seconds decompressing ~1200 files. All of this is fine. But if I run lake exe cache get again, then it skips the first bit (of course) but again spends ~8 seconds decompressing ~1200 files. Is it possible to speed that up by calculating hashes of oleans that are already present?

Arthur Paulino (Feb 03 2023 at 09:12):

@Yaël Dillies it's best to just close VS Code altogether before calling cache

Arthur Paulino (Feb 03 2023 at 09:16):

@Johan Commelin the cache is compressed so we can't compute the hashes of the binary oleans that easily. We would have to decompress them in order to do that. But that's already the current cost

Johan Commelin (Feb 03 2023 at 09:17):

hmm, would it make sense to have some index? Or to make the filenames twice as long?

Arthur Paulino (Feb 03 2023 at 09:18):

The tarbals contain more than just the olean. It contains ilean, trace, c and c.trace files, all related to the same Lean file

Arthur Paulino (Feb 03 2023 at 09:19):

It's safer to overwrite them all

Johan Commelin (Feb 03 2023 at 09:19):

I like living on the edge.

Arthur Paulino (Feb 03 2023 at 09:23):

Using the hashes of the olean files puts us in a chicken-egg problem, because we don't know their hashes when downloading. We only know the hashes of the sources

Arthur Paulino (Feb 03 2023 at 09:23):

And we use exact matches to build the correct URLs

Arthur Paulino (Feb 03 2023 at 09:25):

And then computing the hashes of 1k+ binary files wouldn't be that fast either. That's a considerable amount of IO overhead

Arthur Paulino (Feb 03 2023 at 09:27):

And the index files would have to be hosted and downloaded separately to avoid weird race conditions from parallel puts. That would double up the number of files curl has to deal with

Arthur Paulino (Feb 03 2023 at 09:28):

Idk... It sounds like more trouble with a small benefit

Arthur Paulino (Feb 03 2023 at 09:29):

Decompressing is the fastest part. Downloading is the slow part and we already avoid the download of repeated files (unless you get!)

Sebastian Ullrich (Feb 03 2023 at 09:37):

Why not eagerly decompress them into a directory named by the source hash? That would allow for the hardlinking optimization discussed above.

Arthur Paulino (Feb 03 2023 at 09:43):

That would double the space consumed by olean files in the FS. I'm not totally convinced that trade-off is sane for the sake of 8~10 seconds

Arthur Paulino (Feb 03 2023 at 09:45):

I, particularly, use SSD and space is something that has become more expensive to me

Arthur Paulino (Feb 03 2023 at 09:48):

And that wouldn't eliminate the need to compute hashes of 2 olean files per Lean source file just to know whether we should overwrite it or not

Sebastian Ullrich (Feb 03 2023 at 09:51):

The whole point of hardlinking would be to reduce space overhead, especially since there's no need to keep the tarball around after that

Sebastian Ullrich (Feb 03 2023 at 09:52):

Arthur Paulino said:

And that wouldn't eliminate the need to compute hashes of 2 olean files per Lean source file just to know whether we should overwrite it or not

No need for hashing the contents if the source hash is encoded in the directory name

Arthur Paulino (Feb 03 2023 at 09:57):

Then it wouldn't double the total space but there would still be a considerable increase in space consumed because the tarballs are compressed with maximum compression rate.

Arthur Paulino (Feb 03 2023 at 09:59):

Maybe it would double the space of the caches

Arthur Paulino (Feb 03 2023 at 10:10):

And remember, the tarballs contain 5 files: olean, iliean, trace, c and c.trace. The mechanics to keep all of this consistent aren't simple and I doubt it would pay off the effort needed to make it work (plus the space cost). Yes, you can avoid computing the hashes of the cached files by persisting it somewhere, in a directory name or in a text file, but there's still the hashing of the target files (which we are trying to figure out whether we should overwrite or not).

tar already does the efficient work of keeping track of the folder structure in order to extract things to the correct places

Eric Wieser (Feb 03 2023 at 10:11):

Also worth noting that hard-linking is likely going to confuse windows users, assuming it's possible at all

Arthur Paulino (Feb 03 2023 at 10:32):

I would encourage experimentation with a new get- command if someone is feeling inclined. But I wouldn't be surprised if after all the work it would still take 2~3 seconds instead of 8~10 (assuming you accept the space cost)

Arthur Paulino (Feb 03 2023 at 10:52):

The algorithm would go like this:

  1. Download content like get does
  2. Extract files to a raw folder inside the cache folder
  3. For each decompressed file in each tarball, compute its hash and store it in an index file (if that file wasn't already computed during put)
  4. Delete downloaded tarballs
  5. For each Lean source file, compute the hashes of the respective olean, ilean, trace, c and c.trace files
  6. Overwrite each separately iff their hashes don't match with the precomputed ones

Arthur Paulino (Feb 03 2023 at 10:57):

Ah, alternatively, there's no need to compute hashes. You can just compare their respective byte arrays. But keep in mind that core doesn't have a fast way to compare byte arrays yet

Sebastian Ullrich (Feb 03 2023 at 11:32):

I really don't understand what we are defending here against, tar itself corrupting the output?

  1. download $hash.tar.gz (as before) if $hash/ does not exist or get! was used
  2. unpack into $hash/
  3. delete archive
  4. copy $hash/ contents into local build directory using whatever is the cheapest method on the file system

Arthur Paulino (Feb 03 2023 at 11:37):

That avoids decompression but doesn't avoid unnecessary overwrites. It's a simpler solution but not the one I had in mind

Sebastian Ullrich (Feb 03 2023 at 11:38):

Yes, I just realized that you were probably thinking about how to elide step 4. I would want to measure first though if there is any need for that.

Sebastian Ullrich (Feb 03 2023 at 11:41):

AFAIK hardlinks work fine on NTFS, by the way (symlinks are the problematic one). So step 4 might be completely negligible on all relevant platforms as long as your cache and project directory are on the same filesystem.

Arthur Paulino (Feb 03 2023 at 11:42):

Is there an IO.copyBinFile? (probably not)

Arthur Paulino (Feb 03 2023 at 11:45):

I'm trying to foresee issues related to the coexistence of get and get-

Arthur Paulino (Feb 03 2023 at 11:51):

Because of the unpack command. How should it behave?

Sebastian Ullrich (Feb 03 2023 at 11:53):

Having two models at the same time does sound like a headache, I would avoid that

Arthur Paulino (Feb 03 2023 at 11:55):

Right but we can't assume everyone wants the extra space consumption

Eric Wieser (Feb 03 2023 at 18:15):

hardlinks work fine on NTFS

Hardlinks require administrator permissions, which is probably an issue for classroom use. They also don't work between drives, which I could also see being a problem

Gabriel Ebner (Feb 03 2023 at 18:42):

An easy way to speed up unpacking would be to do it in parallel. I believe we're mostly CPU-bound with gzip.

Johan Commelin (Feb 03 2023 at 18:43):

why can't we download an index?

Arthur Paulino (Feb 03 2023 at 18:47):

An index would be troublesome to be maintained consistent, with parallel put operations being performed. That would be highly subject to race conditions

Johan Commelin (Feb 03 2023 at 18:48):

I see. Because we don't have a sqlite database in the cloud.

Johan Commelin (Feb 03 2023 at 18:49):

But we could keep a local index. If you don't know the hashes of the contents of $hash.tar.gz then you just unzip it, because you haven't seen it before. But after that, you store the hashes of its contents in index.sqlite.

Johan Commelin (Feb 03 2023 at 18:50):

But I guess by now I'm seriously overcomplicating things.

Arthur Paulino (Feb 03 2023 at 18:52):

The gain in speed is marginal compared to the effort. tar with gzip is pretty fast already. And pretty space-efficient too

Johan Commelin (Feb 03 2023 at 18:55):

Yeah, we can worry about it when it becomes a real problem.

Arthur Paulino (Feb 03 2023 at 19:03):

Gabriel Ebner said:

An easy way to speed up unpacking would be to do it in parallel. I believe we're mostly CPU-bound with gzip.

That's a nice one btw

Arthur Paulino (Feb 27 2023 at 01:07):

Hi everyone! We've decided to migrate cache to std4. See the discussion in #std4 > cache adoption by std4.
From a practical standpoint, the user experience won't change.
The only thing I'd ask everyone is to nuke the $HOME/.cache/mathlib folder because that will no longer be the target directory of cache files. $HOME/.cache/lean4_cache will be the new target after this merge.

If you want to see the code changes, please check !4#2509

Arthur Paulino (Feb 27 2023 at 01:18):

Note: that's not the migration patch. It's a preparation step before I can factor out the code and leave only the configuration JSON file

Gareth Ma (Feb 27 2023 at 03:13):

Maybe add a warning when the directory is detected?

Kevin Buzzard (Feb 27 2023 at 08:35):

At what point do I nuke this folder?

Arthur Paulino (Feb 27 2023 at 11:13):

Kevin Buzzard said:

At what point do I nuke this folder?

After the merge, if you're not planning on visiting older branches

Arthur Paulino (Feb 27 2023 at 11:13):

Gareth Ma said:

Maybe add a warning when the directory is detected?

I don't think it's worth adding such dependency in the code

Eric Wieser (Feb 27 2023 at 11:21):

Presumably even if you are visiting older branches, merging mathlib4 master will make the old folder unnecessary again?

Arthur Paulino (Feb 27 2023 at 11:36):

An alternative solution is to rename the folder so you don't lose the cache files that you have already downloaded

Eric Wieser (Feb 27 2023 at 12:01):

What goes wrong if you don't delete the folder?

Arthur Paulino (Feb 27 2023 at 12:28):

Nothing, just some memory waste

Eric Wieser (Feb 27 2023 at 12:30):

Then why would you rename the folder instead of just leaving it alone?

Arthur Paulino (Feb 27 2023 at 12:32):

Hmm, let me explain. The target folder of cache files will change. If nothing is done, there will be no error, but the FS will have a folder full of unused files. If you rename the old cache folder to the new one, it will reuse the downloaded olean files

Arthur Paulino (Feb 27 2023 at 12:33):

But this is all after the merge is done

Kevin Buzzard (Feb 27 2023 at 13:59):

Arthur Paulino said:

Kevin Buzzard said:

At what point do I nuke this folder?

After the merge, if you're not planning on visiting older branches

How do I know when "the merge" has happened? (I'm clueless as to which merge we're talking about -- I have no idea about how the infrastructure is set up)

Arthur Paulino (Feb 27 2023 at 20:05):

@Kevin Buzzard sorry I missed your question for some reason. I mean this PR: !4#2509
I will signal here again once it's merged so everyone will have a higher chance of seeing it

Shreyas Srinivas (Mar 06 2023 at 23:20):

Hello, today I tried to start a lean4 project with mathlib using lake new <project name> math. After running lake update, I ran lake exe cache get.

I got the following error: uncaught exception: ./build/lib/Mathlib/Data/Sym/Sym2.olean: truncated gzip input

Shreyas Srinivas (Mar 06 2023 at 23:22):

this was an intel mac machine

Arthur Paulino (Mar 06 2023 at 23:25):

Try lake exe cache get!. I think you got a corrupted file

Shreyas Srinivas (Mar 06 2023 at 23:36):

That worked. Thanks!

Sebastian Ullrich (Mar 10 2023 at 09:05):

cache should now actually tell you what went wrong with a download instead of leading to a gzip error. Please do post the error messages here so we get a feeling for what goes wrong in the wild.

Jon Eugster (Mar 15 2023 at 11:02):

For the sake of just jotting down errors, I just got two of them running lake exe cache get! in two different repos simultaneously (one dependent on current mathlib4, one 1-2 days old version of mathlib4 itself):

Attempting to download 1737 file(s)
uncaught exception: no such file or directory (error code: 2)
  file: /home/me/.cache/mathlib/curl.cfg

and

Attempting to download 1344 file(s)
Decompressing 1688 file(s)
uncaught exception:
gzip: stdin: unexpected end of file
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now

However, just running lake exe cache get! in both repos again fixed everything :tada: and I can't reproduce it now. Probably a corrupted cache file, which wouldn't be surprising if you run two processes simultaneously.

Yury G. Kudryashov (Mar 16 2023 at 00:40):

Cache fails for me: after lake exe cache get, lake build fails with the following errors:

Building Mathlib.Tactic.WLOG
Building Aesop.Tree
Building Mathlib.Tactic.SplitIfs
Building Mathlib.Tactic.SolveByElim
error: > LEAN_PATH=./build/lib:./lake-packages/Qq/build/lib:./lake-packages/aesop/build/lib:./lake-packages/std/build/lib LD_LIBRARY_PATH=/home/urkud/.elan/toolchains/leanprover--lean4---nightly-2023-03-09/lib:/nix/store/zdx4ykiq8ijfmzxzlrrcw3zswl383wfc-pipewire-0.3.60-jack/lib:./build/lib /home/urkud/.elan/toolchains/leanprover--lean4---nightly-2023-03-09/bin/lean -DwarningAsError=true -Dpp.unicode.fun=true ./././Mathlib/Tactic/WLOG.lean -R ././. -o ./build/lib/Mathlib/Tactic/WLOG.olean -i ./build/lib/Mathlib/Tactic/WLOG.ilean -c ./build/ir/Mathlib/Tactic/WLOG.c
error: external command `/home/urkud/.elan/toolchains/leanprover--lean4---nightly-2023-03-09/bin/lean` exited with code 139
error: > LEAN_PATH=./build/lib:./lake-packages/Qq/build/lib:./lake-packages/aesop/build/lib:./lake-packages/std/build/lib LD_LIBRARY_PATH=/home/urkud/.elan/toolchains/leanprover--lean4---nightly-2023-03-09/lib:/nix/store/zdx4ykiq8ijfmzxzlrrcw3zswl383wfc-pipewire-0.3.60-jack/lib:./lake-packages/aesop/build/lib /home/urkud/.elan/toolchains/leanprover--lean4---nightly-2023-03-09/bin/lean ./lake-packages/aesop/././Aesop/Tree.lean -R ./lake-packages/aesop/./. -o ./lake-packages/aesop/build/lib/Aesop/Tree.olean -i ./lake-packages/aesop/build/lib/Aesop/Tree.ilean -c ./lake-packages/aesop/build/ir/Aesop/Tree.c
error: external command `/home/urkud/.elan/toolchains/leanprover--lean4---nightly-2023-03-09/bin/lean` exited with code 139
error: > LEAN_PATH=./build/lib:./lake-packages/Qq/build/lib:./lake-packages/aesop/build/lib:./lake-packages/std/build/lib LD_LIBRARY_PATH=/home/urkud/.elan/toolchains/leanprover--lean4---nightly-2023-03-09/lib:/nix/store/zdx4ykiq8ijfmzxzlrrcw3zswl383wfc-pipewire-0.3.60-jack/lib:./build/lib /home/urkud/.elan/toolchains/leanprover--lean4---nightly-2023-03-09/bin/lean -DwarningAsError=true -Dpp.unicode.fun=true ./././Mathlib/Tactic/SplitIfs.lean -R ././. -o ./build/lib/Mathlib/Tactic/SplitIfs.olean -i ./build/lib/Mathlib/Tactic/SplitIfs.ilean -c ./build/ir/Mathlib/Tactic/SplitIfs.c
error: external command `/home/urkud/.elan/toolchains/leanprover--lean4---nightly-2023-03-09/bin/lean` exited with code 139
error: > LEAN_PATH=./build/lib:./lake-packages/Qq/build/lib:./lake-packages/aesop/build/lib:./lake-packages/std/build/lib LD_LIBRARY_PATH=/home/urkud/.elan/toolchains/leanprover--lean4---nightly-2023-03-09/lib:/nix/store/zdx4ykiq8ijfmzxzlrrcw3zswl383wfc-pipewire-0.3.60-jack/lib:./build/lib /home/urkud/.elan/toolchains/leanprover--lean4---nightly-2023-03-09/bin/lean -DwarningAsError=true -Dpp.unicode.fun=true ./././Mathlib/Tactic/SolveByElim.lean -R ././. -o ./build/lib/Mathlib/Tactic/SolveByElim.olean -i ./build/lib/Mathlib/Tactic/SolveByElim.ilean -c ./build/ir/Mathlib/Tactic/SolveByElim.c
error: external command `/home/urkud/.elan/toolchains/leanprover--lean4---nightly-2023-03-09/bin/lean` exited with code 139

Gabriel Ebner (Mar 16 2023 at 01:07):

Do we have our first hash collision? :heart_eyes:

Gabriel Ebner (Mar 16 2023 at 01:31):

@Yury G. Kudryashov Which branch is this on?

Yury G. Kudryashov (Mar 16 2023 at 01:59):

port/Topology.Homotopy.Equiv

Yury G. Kudryashov (Mar 16 2023 at 02:00):

UPD: lake clean doesn't help.

Yury G. Kudryashov (Mar 16 2023 at 02:00):

More precisely, after lake clean I get

Building Aesop.Tree
error: > LEAN_PATH=./build/lib:./lake-packages/Qq/build/lib:./lake-packages/aesop/build/lib:./lake-packages/std/build/lib LD_LIBRARY_PATH=/home/urkud/.elan/toolchains/leanprover--lean4---nightly-2023-03-09/lib:/nix/store/zdx4ykiq8ijfmzxzlrrcw3zswl383wfc-pipewire-0.3.60-jack/lib:./lake-packages/aesop/build/lib /home/urkud/.elan/toolchains/leanprover--lean4---nightly-2023-03-09/bin/lean ./lake-packages/aesop/././Aesop/Tree.lean -R ./lake-packages/aesop/./. -o ./lake-packages/aesop/build/lib/Aesop/Tree.olean -i ./lake-packages/aesop/build/lib/Aesop/Tree.ilean -c ./lake-packages/aesop/build/ir/Aesop/Tree.c
error: external command `/home/urkud/.elan/toolchains/leanprover--lean4---nightly-2023-03-09/bin/lean` exited with code 139

Yury G. Kudryashov (Mar 16 2023 at 02:04):

rm -rf lake-packages/aesop/build helped with this one.

Yury G. Kudryashov (Mar 16 2023 at 18:44):

Any news?

Gabriel Ebner (Mar 16 2023 at 18:55):

I don't see that error in any of the CI runs in !4#2919.

Yury G. Kudryashov (Mar 16 2023 at 19:32):

I don't see them in CI either. What else can it be?

Yury G. Kudryashov (Mar 16 2023 at 19:33):

Should I upload SHA512 sums of all files in ~/.cache/mathlib?

Yury G. Kudryashov (Mar 16 2023 at 19:35):

cache.sha512

Gabriel Ebner (Mar 16 2023 at 19:38):

Can you also post the git revision where you get the issue. I'd like to reproduce it.

Scott Morrison (Mar 19 2023 at 21:54):

Anyone else experiencing problems with lake exe cache get today?

After I run lake exe cache get, lake build starts a full build again.

Eric Wieser (Mar 19 2023 at 21:59):

I've taken to running pkill lean before getting caches now

Scott Morrison (Mar 19 2023 at 22:07):

I'm not going to be able to run the port-progress-bot anymore if I need to do that.

Scott Morrison (Mar 19 2023 at 22:10):

Yeah, even killing everything, lake exe cache get results in the build starting from scratch (including all dependencies).

Scott Morrison (Mar 19 2023 at 22:11):

If anyone is able to confirm that lake exe cache get followed by lake build works for them on current master, that would be helpful to me.

Scott Morrison (Mar 19 2023 at 22:16):

Hmm... a fresh clone of mathlib4 seems to resolve the problem.

Scott Morrison (Mar 19 2023 at 22:32):

Gah... eventually realised the problem. I had an elan override set for that mathlib4 directory, and was using a slightly different version of Lean4! I wonder if this is something that cache should take into account.

Scott Morrison (Mar 19 2023 at 22:32):

@Arthur Paulino, do you think cache could notice this, and refuse to provide oleans if the repo is using a nonstandard version of Lean via an elan override?

Arthur Paulino (Mar 20 2023 at 00:05):

Maybe comparing Lean.versionString with the content of the tool chain file. @Scott Morrison can you confirm that such variable will carry a different value if the override is set?

Scott Morrison (Mar 20 2023 at 00:08):

Yes, lean --version says Lean (version 4.0.0, commit 983904995027, Release) with my override set, but Lean (version 4.0.0-nightly-2023-03-15, commit 3d21124445b3, Release) without.

Arthur Paulino (Mar 20 2023 at 01:40):

I mean Lean.versionString in the compiled Lean code:

#eval Lean.versionString -- "4.0.0-nightly-2023-01-10"

Scott Morrison (Mar 20 2023 at 01:43):

Sorry, yes: -- 4.0.0, commit 98390499502743d9f119f3f5808a7ccd1cfb4d61

Arthur Paulino (Mar 20 2023 at 02:11):

Arthur Paulino said:

Maybe comparing Lean.versionString with the content of the tool chain file. Scott Morrison can you confirm that such variable will carry a different value if the override is set?

@Gabriel Ebner do you validate this method?

Reid Barton (Mar 20 2023 at 03:23):

Just curious: why does port-progress-bot need to build any Lean 4 code?

Scott Morrison (Mar 20 2023 at 08:20):

Reid Barton said:

Just curious: why does port-progress-bot need to build any Lean 4 code?

Good question. It doesn't! I guess I was confused --- the port benchmark bot definitely does run Lean 3 and 4 code, however.

Alex J. Best (Mar 20 2023 at 18:05):

(deleted)

Gabriel Ebner (Mar 20 2023 at 20:37):

I don't think Lean.versionString is a good replacement for lean-toolchain. It doesn't contain the compilation flags, for once.

Sebastian Ullrich (Mar 20 2023 at 20:40):

Would anyone actually miss override if we removed it? With the alternative being to just change lean-toolchain so that it actually registers as a change in your git status window.

Jannis Limperg (Mar 20 2023 at 20:51):

If you want to build Lean without Nix, you need overrides for the stage0/stage1 toolchains.

Arthur Paulino (Mar 20 2023 at 23:39):

Gabriel Ebner said:

I don't think Lean.versionString is a good replacement for lean-toolchain. It doesn't contain the compilation flags, for once.

Sorry I wasn't clear enough. I meant to compare the output of Lean.versionString with the content of lean-toolchain. And if they mismatch we exit with a message instead of following up with the regular cache routine

Gabriel Ebner (Mar 21 2023 at 00:09):

Ah, then you want to compare Lean.toolchain. But it should only be a warning since tags don't have good version info:

$ echo 'import Lean #eval Lean.toolchain' | lean +gebner/lean4:bundlelibatomic --stdin
""

Arthur Paulino (Mar 21 2023 at 00:24):

But Lean.versionString does tell the toolchain

Gabriel Ebner (Mar 21 2023 at 00:43):

That also doesn't work for tags:

$ echo 'import Lean #eval Lean.versionString' | lean +gebner/lean4:bundlelibatomic --stdin
"4.0.0, commit 43fbd494cdbfd0b41fb3605318540bcae15f7577"

Gabriel Ebner (Mar 21 2023 at 00:43):

(Lean.toolchain does work just fine for nightlies though.)

Arthur Paulino (Mar 21 2023 at 02:27):

Ah, got it. But why does it need to be a warning? If those toolchains mismatch then whatever is downloaded is not going to be used anyways

Gabriel Ebner (Mar 21 2023 at 02:34):

Lean doesn't use the Lean.toolchain definition. It doesn't matter if it's the empty string, the cache would still be used. (Even though in this case the lean-toolchain file would have a different, nonempty, content.)

Sebastian Ullrich (Mar 21 2023 at 09:09):

Grepping through elan show might be an easier option

Notification Bot (Mar 22 2023 at 08:49):

15 messages were moved from this topic to #lean4 dev > Build setup under elan proposal by Sebastian Ullrich.

Gabriel Ebner (Mar 22 2023 at 20:21):

Anyone else having problems with lake exe cache get today? It doesn't seem to find any caches:

$ lake exe cache get
Attempting to download 1818 file(s)
Decompressing 320 file(s)

Gabriel Ebner (Mar 22 2023 at 20:21):

False alarm. Turns out I had local changes in lake-packages/std

Kendall Frey (Mar 23 2023 at 00:03):

I've been trying to create a new project with a dependency on mathlib4, but something is going wrong at the lake exe cache get step.

Attempting to download 1818 file(s)
No cache files to decompress

My .cache directory remains empty.
It worked fine when I created a new project a week or two ago, but when I created one a couple days ago, I experienced this problem. How do I get the cache files installed?

Gabriel Ebner (Mar 23 2023 at 00:06):

Most likely you've got the wrong lean version. Try cp lake-packages/mathlib/lean-toolchain .

Kendall Frey (Mar 23 2023 at 00:09):

I copied the file using Windows Explorer before getting the cache

Kendall Frey (Mar 23 2023 at 00:11):

Both ./lake-packages/mathlib/lean-toolchain and ./lean-toolchain contain leanprover/lean4:nightly-2023-03-15

Kendall Frey (Mar 23 2023 at 00:13):

lean -v in the project root reports Lean (version 4.0.0-nightly-2023-03-15, commit 3d21124445b3, Release)

Gabriel Ebner (Mar 23 2023 at 00:20):

That sounds good.

Gabriel Ebner (Mar 23 2023 at 00:21):

The only other thing I can think of is different line endings. Have you set git to convert them to CRLF?

Kendall Frey (Mar 23 2023 at 00:22):

Let me check

Kendall Frey (Mar 23 2023 at 00:23):

Both files end with a CR LF

Kendall Frey (Mar 23 2023 at 00:24):

As do the files in the successfully created project from before

Kendall Frey (Mar 23 2023 at 00:27):

I ran it again on that project, and as it should, it output:

No files to download
Decompressing 1731 file(s)

Arthur Paulino (Mar 23 2023 at 00:45):

Do you have the project on GitHub?

Kendall Frey (Mar 23 2023 at 00:46):

No, not at the moment

Arthur Paulino (Mar 23 2023 at 00:54):

What are your dependencies on the lakefile.lean?

Kendall Frey (Mar 23 2023 at 00:56):

require mathlib from git
  "https://github.com/leanprover-community/mathlib4.git"

in both

Arthur Paulino (Mar 23 2023 at 00:57):

Yeah, hard to know. I'd need to see it

Arthur Paulino (Mar 23 2023 at 00:57):

Let us know when you upload it somewhere

Kendall Frey (Mar 23 2023 at 00:58):

I deleted the .cache directory in the working project and reran the cache get. It successfully downloaded the files.

Kendall Frey (Mar 23 2023 at 00:59):

I can create a brand new empty project and push it to github, if that would be helpful.

Kendall Frey (Mar 23 2023 at 01:02):

This problem is not limited to a single project

Kendall Frey (Mar 23 2023 at 02:21):

Unfortunately whatever I did on the working project seems to have bricked it. The cache files are still there, but when I start the Lean server it seems to start rebuilding mathlib anyway, so I'm unable to check any of my work on that project either now.

Gabriel Ebner (Mar 23 2023 at 02:32):

Kendall Frey said:

Both files end with a CR LF

I'm pretty sure that's a problem. We don't convert the line endings for hashing. So a CRLF source file would never match an LF source file.

Gabriel Ebner (Mar 23 2023 at 02:34):

Maybe this is also a new issue. See https://github.com/leanprover/lean4/pull/2138/commits/d85a4f5495a60a528befca9a2e77d461c2fbf918

Kendall Frey (Mar 23 2023 at 02:37):

The curious part is why it worked in the first project, if CR LF is actually the problem

Kendall Frey (Mar 23 2023 at 02:39):

I checked the toolchain in the working project, and it's 03-09, the day before that PR was merged

Kendall Frey (Mar 23 2023 at 02:50):

I created another new project, and this time before lake update I modified lakefile.lean to point to mathlib @ revision af5c2dab7415d3e5e8703104b4e5ad5590f19495, same as the working project. Now lake exe cache get seems to be downloading the files properly. I need to go now, so I'll need to test tomorrow to confirm whether or not this actually avoids having to rebuild mathlib. But indeed it does appear there was a new issue introduced between then and now.

Sebastian Ullrich (Mar 23 2023 at 09:50):

Indeed I probably broke this, my bad. It's a great demonstration of the evils of autocrlf though.

Kendall Frey (Mar 24 2023 at 00:07):

I set autocrlf = input and told vscode to use LF (to prevent any more mixups). Getting the cache and importing mathlib seems to be working properly now, so I should be able to continue my work as normal again. Thanks for the help, @Gabriel Ebner

Yakov Pechersky (Mar 24 2023 at 15:59):

After running a lake exe cache get, the VSCode server tries to run this command, and segfaults:

$ LEAN_PATH=./build/lib:./lake-packages/Qq/build/lib:./lake-packages/aesop/build/lib:./lake-packages/std/build/lib LD_LIBRARY_PATH=/home/yakov/.elan/toolchains/leanprover--lean4---nightly-2023-03-17/lib:./build/lib:./lake-packages/Qq/build/lib:./lake-packages/aesop/build/lib:./lake-packages/std/build/lib:./lake-packages/std/build/lib /home/yakov/.elan/toolchains/leanprover--lean4---nightly-2023-03-17/bin/lean -DwarningAsError=true -Dlinter.missingDocs=true ./lake-packages/std/././Std/Data/List/Basic.lean -R ./lake-packages/std/./. -o ./lake-packages/std/build/lib/Std/Data/List/Basic.olean -i ./lake-packages/std/build/lib/Std/Data/List/Basic.ilean -c ./lake-packages/std/build/ir/Std/Data/List/Basic.c
[1]    26421 segmentation fault  LEAN_PATH= LD_LIBRARY_PATH=  -DwarningAsError=true -Dlinter.missingDocs=true

What could be going on?

Moritz Firsching (Apr 05 2023 at 14:45):

Yury G. Kudryashov said:

rm -rf lake-packages/aesop/build helped with this one.

I was in a similar situation, where only

rm -rf lake-packages/aesop/build/ lake-packages/std/build/ lake-packages/Qq/build

seem solve getting the same error. Thanks for the help!

Ruben Van de Velde (Apr 10 2023 at 12:23):

By the way, I'd like to mention again how amazing the cache is, even compared to the one we had for mathlib3. There's no way I could be remotely as productive without it

Arthur Paulino (Apr 10 2023 at 14:26):

Yakov Pechersky said:

After running a lake exe cache get, the VSCode server tries to run this command, and segfaults:

$ LEAN_PATH=./build/lib:./lake-packages/Qq/build/lib:./lake-packages/aesop/build/lib:./lake-packages/std/build/lib LD_LIBRARY_PATH=/home/yakov/.elan/toolchains/leanprover--lean4---nightly-2023-03-17/lib:./build/lib:./lake-packages/Qq/build/lib:./lake-packages/aesop/build/lib:./lake-packages/std/build/lib:./lake-packages/std/build/lib /home/yakov/.elan/toolchains/leanprover--lean4---nightly-2023-03-17/bin/lean -DwarningAsError=true -Dlinter.missingDocs=true ./lake-packages/std/././Std/Data/List/Basic.lean -R ./lake-packages/std/./. -o ./lake-packages/std/build/lib/Std/Data/List/Basic.olean -i ./lake-packages/std/build/lib/Std/Data/List/Basic.ilean -c ./lake-packages/std/build/ir/Std/Data/List/Basic.c
[1]    26421 segmentation fault  LEAN_PATH= LD_LIBRARY_PATH=  -DwarningAsError=true -Dlinter.missingDocs=true

What could be going on?

Sorry I missed that one. Are you still having this issue? Did you have VS Code open while cache was running?

Yakov Pechersky (Apr 10 2023 at 14:39):

Yes, I had vscode open. I nuked the whole folder and reclined. That "fixed" it

Arthur Paulino (Apr 10 2023 at 14:44):

Having VS Code open causes the Lean server to interfere with the mechanics of cache. Next time try to close VS Code before calling cache. And if it still doesn't work, try get!

Mauricio Collares (May 11 2023 at 10:31):

Just wanted to say that lake exe cache get is so much better than the non-content-addressed mathlib3 version! I can only imagine how good a Nix-ified version of mathlib would be

Yaël Dillies (May 11 2023 at 10:32):

Agreed. I just wish I could use it on Windows :sad:

Mauricio Collares (May 11 2023 at 10:39):

Are the Windows problems documented somewhere? Does it work under WSL?

Sebastien Gouezel (May 11 2023 at 14:14):

For the record, it works perfectly fine on windows for me.

Sebastien Gouezel (May 11 2023 at 14:14):

(without using WSL)

alex chan (May 12 2023 at 04:01):

It works in WSL with Ubuntu or any other distro. If you have git installed on PC you don't need WSL, git cli will do just fine.

Yaël Dillies (May 12 2023 at 08:26):

Mauricio Collares said:

Are the Windows problems documented somewhere? Does it work under WSL?

I've copiously complained about them in this thread already, eg https://leanprover.zulipchat.com/#narrow/stream/287929-mathlib4/topic/lake.20exe.20cache.20get/near/321126042

Mauricio Collares (May 12 2023 at 14:33):

Wow, that's a confusing error. Does it still happen with an up-to-date copy of mathlib4?

Mauricio Collares (May 12 2023 at 14:39):

If so, where are you running the commands? Still in bash inside VS Code?

Mauricio Collares (May 13 2023 at 18:55):

(deleted, I think I made a mistake while experimenting with the caching and I don't want to mislead people)

Mauricio Collares (May 13 2023 at 18:55):

(deleted)

Eric Rodriguez (Oct 30 2023 at 11:47):

Is the cache currently stored only as indivdual oleans which are gotten one at a time?

Scott Morrison (Oct 30 2023 at 11:49):

Yes.

Eric Wieser (Oct 30 2023 at 11:54):

Related to that; I've noticed that a cache miss is much more expensive than a cache hit; running lake exe cache get on a branch where no cache is available due to an std bump (so nothing is downloaded) can often take longer than downloading a full cache!

Eric Rodriguez (Oct 30 2023 at 12:00):

I wonder if it's faster/better to revert to the old gzipped version - won't there be a lot of overhead starting the download for each individual file etc? I understand it's better when there's overlap in the cache and you only have to download, e.g., 20 oleans as opposed to all 4000

Scott Morrison (Oct 30 2023 at 12:00):

Well, except for the complete cache miss scenario Eric just mentioned, I don't think this is the case.

Scott Morrison (Oct 30 2023 at 12:00):

We're only making one http connection.

Arthur Paulino (Oct 30 2023 at 12:05):

There are other cost dimensions at play other than download speed anyway. There's storage cost (on Azure and on users' computers) and also data transfer costs, which can't be neglected for everyone

Arthur Paulino (Oct 30 2023 at 12:09):

Idk how big the cache is currently, but suppose it's 500mb. If you're just changing one leaf file with your PR, then once it's merged everyone else would have to pay the cost of another chunk of 500mb on their machines if they aren't keeping the cache folder clean

Eric Rodriguez (Oct 30 2023 at 12:11):

The thought mainly came to me as I just had significantly faster internet installed, and I noticed that the cache get timings were very similar. On users' computers I assumed the data could be unpacked and any duplicates discarded, but on Azure the issue is probably fairly different.

Arthur Paulino (Oct 30 2023 at 12:14):

That's true, but still, being forced to download 500mb sounds very suboptimal if what I already have only differs on a few files

Eric Rodriguez (Oct 30 2023 at 12:15):

We have a full cache miss only when we upgrade lean, I guess?

Arthur Paulino (Oct 30 2023 at 12:16):

That cache miss happens on a cloud machine. But it builds the cache and makes it available for everyone so the full cache miss only happens once

Arthur Paulino (Oct 30 2023 at 12:19):

I think there are other situations similar to a full cache miss. Like if you change a deep file in Mathlib or in Std. The deeper the file is, the higher the chances of getting closer to a full cache miss

Mario Carneiro (Oct 30 2023 at 18:38):

Note that some of this could be fixed by additional smarts on the Azure side. For example if you could send a single query to find out which of a set of files exist on the server; I believe HTTP/2 also lets you do something here. But the current hosting is just a plain HTTP/1.1 file server, so in order to find out if none of a set of 4000 files exist on the server you have to do 4000 file requests.

One other trick I am hoping to implement at some point is to have a marker file that exists once per "root hash" (which is a combination of the lakefile.lean, lean-toolchain and lake-manifest.json), which exists iff any cache file with that root hash is on the server. This should address the case where you have a brand new lean version or a change to the lakefile and so it's a complete cache miss. This doesn't help almost complete cache misses though.

Joachim Breitner (Oct 31 2023 at 08:20):

Does the cache tool assume and use the fact that if A depends on B, and B isn't available in the cache, then A won't be either? So if it queries from root to leaf module, it can stop early?

Mauricio Collares (Oct 31 2023 at 08:26):

This is a good idea but it might hinder parallelism in the common case of a full cache hit.

Mario Carneiro (Oct 31 2023 at 08:52):

@Joachim Breitner yes

Mario Carneiro (Oct 31 2023 at 08:56):

Er, actually I take it back, cache does take advantage of this in the case where B is not found, but it will make the full list of things to download before downloading anything, to maximize parallelism. lean-cache is able to take advantage of this more fine-grained dependency notion because it is driving cURL itself and can interleave the network requests with hashing and stop things early

Joachim Breitner (Aug 16 2024 at 09:59):

Coming back to this. Looking at the code I see the difficulty: downloadFiles downloads the full list in one go. But this means 3:49 of rather pointless download attempts when the cache is cold (e.g. on lean-PR-testing branches, which incidentially are branches whose developers are likely to actually wait on).

Would it be worth it to check a single canary file from the bottom of the dependency tree (e.g. Batteries.Wf), and if that fails, conclude that the cache is very likely completely dead and skip the rest?

Actually, that logic could probably easily live in the CI script. Let me try on my branch…

Joachim Breitner (Aug 16 2024 at 12:57):

Something like https://github.com/leanprover-community/mathlib4/commit/9c41f64862a6f288c869a77b48f837c1db7b4bb1 seems to work, and speeds up PR testing CI builds.

Damiano Testa (Aug 16 2024 at 13:17):

I like this idea and I would be in favour of asking for an import that is higher up in Mathlib, such as Mathlib.Tactic.Basic.

Damiano Testa (Aug 16 2024 at 13:21):

Some data:

import Mathlib.Tactic.Basic

open Lean

/-
All deps: 822
Lean deps: 626
Batteries deps: 0  -- actually, there is no `Batteries.WF` here...
Mathlib deps: 7
-/
run_cmd
  let env  getEnv
  let mods := env.allImportedModuleNames
  dbg_trace "All deps: {mods.size}"
  for rt in [`Lean, `Batteries, `Mathlib] do
    dbg_trace "{rt} deps: {(mods.filter (·.getRoot == rt)).size}"

Mario Carneiro (Aug 16 2024 at 20:29):

why doesn't Mathlib.Tactic.Basic import Batteries.Tactic?

Mario Carneiro (Aug 16 2024 at 20:33):

or at least Mathlib.Tactic.Common? This is crazy, what is going on with the imports here

Floris van Doorn (Aug 16 2024 at 20:44):

Mathlib.Tactic.Common should definitely import Batteries.Tactic.

Joachim Breitner (Aug 21 2024 at 09:47):

Joachim Breitner said:

Something like https://github.com/leanprover-community/mathlib4/commit/9c41f64862a6f288c869a77b48f837c1db7b4bb1 seems to work, and speeds up PR testing CI builds.

After using this on my branch for a bit, I’ve PRed it at
https://github.com/leanprover-community/mathlib4/pull/16028

Kim Morrison (Aug 22 2024 at 03:22):

Sorry, I've over done import removals, but keep assuming thay when I say I don't object to restoring them, someone who wants it will do it. :-) I'll do it soon!

Kim Morrison (Aug 22 2024 at 04:16):

#16055

Mario Carneiro (Aug 22 2024 at 06:40):

@Kim Morrison Why only Batteries.Tactic.Basic instead of Batteries.Tactic?

Mario Carneiro (Aug 22 2024 at 06:40):

this missed most of batteries' tactics

Joachim Breitner (Aug 22 2024 at 11:24):

Joachim Breitner said:

After using this on my branch for a bit, I’ve PRed it at
https://github.com/leanprover-community/mathlib4/pull/16028

Ugh, I thought I tested it on my branch, but maybe did some changes later that broke it; on https://github.com/leanprover-community/mathlib4/actions/runs/10507055900/job/29107989722 it did run the full cache get even though the cache was cold. But I don’t see how

  ! test -e .lake/build/lib/Mathlib/Init.olean || lake exe cache get

could have done the wrong thing here. :confused:

Antoine Chambert-Loir (Jul 20 2025 at 20:20):

On some branches of my fork of mathlib4, I can't get lake exe cache get to work, I always get the error message

uncaught exception: Could not find a remote pointing to leanprover-community/mathlib4

Kim Morrison (Jul 24 2025 at 04:28):

@Antoine Chambert-Loir, could you include the output of git remote?

Antoine Chambert-Loir (Jul 24 2025 at 07:16):

(9:14) pro-acl > git remote
upstream

Kim Morrison (Jul 24 2025 at 07:19):

Are your branches up to date with master? Some more details (e.g. full error messages, instructions to reproduce, etc) would be helpful here!

Antoine Chambert-Loir (Aug 06 2025 at 06:28):

Sorry to come back after 10 days (I needed some vacation…), and the stuff doesn't seem to be systematic. As it just happened, I may be able to give meaningful information.
So I went to the branch J-Combination of my fork, also #27107.
I did the following

(8:22) pro-acl > git checkout J-Combination
Switched to branch 'J-Combination'
Your branch is up to date with 'upstream/J-Combination'.

and then

(8:22) pro-acl > lake exe cache get
info: plausible: checking out revision '61c44bec841faabd47d11c2eda15f57ec2ffe9d5'
info: importGraph: checking out revision '140dc642f4f29944abcdcd3096e8ea9b4469c873'
info: aesop: checking out revision 'a62ecd0343a2dcfbcac6d1e8243f5821879c0244'
info: Qq: checking out revision '867d9dc77534341321179c9aa40fceda675c50d4'
info: batteries: checking out revision '3cabaef23886b82ba46f07018f2786d9496477d6'
info: Cli: checking out revision 'e22ed0883c7d7f9a7e294782b6b137b783715386'
Current branch: J-Combination
uncaught exception: Could not find a remote pointing to leanprover-community/mathlib4

git remote -vsays :

upstream    git@github.com:AntoineChambert-Loir/mathlib4.git (fetch)
upstream    git@github.com:AntoineChambert-Loir/mathlib4.git (push)

I don't know whether that is relevant, but I had synced the main branch of my fork with the one of mathlib yesterday.

Sébastien Gouëzel (Aug 06 2025 at 06:38):

That's a nonstandard setup: AntoineChambert-Loir/mathlib4.git is typically called origin, and there should be another remote called upstream at leanprover-community/mathlib4.git, per https://leanprover-community.github.io/contribute/git.html

Sébastien Gouëzel (Aug 06 2025 at 06:39):

Would probably be fixed by

git remote rename upstream origin
git remote add upstream https://github.com/leanprover-community/mathlib4.git

Antoine Chambert-Loir (Aug 06 2025 at 07:19):

I applied your two lines and that worked, thanks!

Kim Morrison (Aug 06 2025 at 09:37):

Hopefully running scripts/gitHelper.py would have identified that this was the problem! It seems to work quite well!

Antoine Chambert-Loir (Aug 06 2025 at 11:04):

Well, that scripts returns errors here: If I run . scripts/gitHelper.py, it launches ImageMagick (??!!);, and if I run python3 scripts/gitHelper.py, the behaviour looks reasonable, but I'm supposed to fill in some {fix}.

Antoine Chambert-Loir (Aug 06 2025 at 11:06):

(I'm really sorry about all this…)

Joscha Mennicken (Aug 06 2025 at 17:23):

Antoine Chambert-Loir said:

Well, that scripts returns errors here: If I run . scripts/gitHelper.py, it launches ImageMagick (??!!);, and if I run python3 scripts/gitHelper.py, the behaviour looks reasonable, but I'm supposed to fill in some {fix}.

Running . scripts/githelper.py (with the space between the . and the s) means that your shell is sourcing the contents of githelper.py and interpreting it as its own code (likely bash) instead of python. import is an ImageMagick tool. Instead, you'd want to run scripts/githelper.py or ./scripts/githelper.py (or python3 scripts/githelper.py, of course).

Michael Stoll (Aug 23 2025 at 18:59):

Hijacking this thread:
When trying to download the cache today, I saw a bunch of errors like

HTTP/2 stream 573 was not closed cleanly before end of the underlying stream

and then (e.g.)

Downloaded: 584 file(s) [attempted 607/607 = 100%] (96% success), 23 failed
23 download(s) failed

where the number of "failed" downloads seems to correspond to the HTTP errors.
This seems to be random: repeating lake exe cache getten times or so resulted in all files landing uncorrupted on my machine.

Something similar seems to have happened during CI in #26766 in the "Post-Build Step (fork)" part (with one failure reported). Restarting this workflow finally worked.

This may be a transient problem, but maybe other people also saw this.

Kevin Buzzard (Aug 23 2025 at 19:50):

I just saw this now: after git pull on mathlib master this happened:

buzzard@brutus:~/Mathlib$ lake exe cache get
Current branch: master
Using cache from upstream: leanprover-community/mathlib4
Attempting to download 5477 file(s) from leanprover-community/mathlib4 cache
Downloaded: 575 file(s) [attempted 575/5477 = 10%]Send failure: Broken pipe
Send failure: Broken pipe
Send failure: Broken pipe
Send failure: Broken pipe
Send failure: Broken pipe
Send failure: Broken pipe
Send failure: Broken pipe
Send failure: Broken pipe
Send failure: Broken pipe
Send failure: Broken pipe
Send failure: Broken pipe
Send failure: Broken pipe
Send failure: Broken pipe
Send failure: Broken pipe
Send failure: Broken pipe
Send failure: Broken pipe
Send failure: Broken pipe
Send failure: Broken pipe
Send failure: Broken pipe
Send failure: Broken pipe
Send failure: Broken pipe
Send failure: Broken pipe
Send failure: Broken pipe
Send failure: Broken pipe
Send failure: Broken pipe
Send failure: Broken pipe
Send failure: Broken pipe
Send failure: Broken pipe
Send failure: Broken pipe
Send failure: Broken pipe
Send failure: Broken pipe
Send failure: Broken pipe
Send failure: Broken pipe
Downloaded: 2419 file(s) [attempted 2452/5477 = 44%], 33 failedSend failure: Broken pipe
Send failure: Broken pipe
Send failure: Broken pipe
Send failure: Broken pipe
Send failure: Broken pipe
Send failure: Broken pipe
Send failure: Broken pipe
Send failure: Broken pipe
Send failure: Broken pipe
Send failure: Broken pipe
Send failure: Broken pipe
Send failure: Broken pipe
Send failure: Broken pipe
Send failure: Broken pipe
Send failure: Broken pipe
Send failure: Broken pipe
Send failure: Broken pipe
Send failure: Broken pipe
Send failure: Broken pipe
Send failure: Broken pipe
Send failure: Broken pipe
Downloaded: 2447 file(s) [attempted 2501/5477 = 45%], 54 failedSend failure: Broken pipe
Downloaded: 5422 file(s) [attempted 5477/5477 = 100%] (98% success), 55 failed
55 download(s) failed
buzzard@brutus:~/Mathlib$ lake exe cache get
Current branch: master
Using cache from upstream: leanprover-community/mathlib4
Attempting to download 123 file(s) from leanprover-community/mathlib4 cache
Downloaded: 123 file(s) [attempted 123/123 = 100%] (100% success)
Decompressing 7099 file(s)
/home/buzzard/.cache/mathlib/cc7340e4556c852a.ltar: removing corrupted file
/home/buzzard/.cache/mathlib/55d008490c18f704.ltar: removing corrupted file
/home/buzzard/.cache/mathlib/61f8d95075e90411.ltar: removing corrupted file
/home/buzzard/.cache/mathlib/924266eb33d49155.ltar: removing corrupted file
/home/buzzard/.cache/mathlib/3c0a761b6dcb8281.ltar: removing corrupted file
/home/buzzard/.cache/mathlib/70d0f41a5ff60b02.ltar: removing corrupted file
/home/buzzard/.cache/mathlib/cb1760c8fafbd8b9.ltar: removing corrupted file
/home/buzzard/.cache/mathlib/f1d257334b14a3c6.ltar: removing corrupted file
/home/buzzard/.cache/mathlib/6777938dcec794c4.ltar: removing corrupted file
/home/buzzard/.cache/mathlib/fb7ff54ebf53b1bf.ltar: removing corrupted file
/home/buzzard/.cache/mathlib/75ff4603fff02ab7.ltar: removing corrupted file
/home/buzzard/.cache/mathlib/2bead50accc001be.ltar: removing corrupted file
/home/buzzard/.cache/mathlib/2191eca95cadda68.ltar: removing corrupted file
/home/buzzard/.cache/mathlib/9f4cf12b63a78606.ltar: removing corrupted file
/home/buzzard/.cache/mathlib/5f466c92984dd9f3.ltar: removing corrupted file
/home/buzzard/.cache/mathlib/5ba96af4ea785cfc.ltar: removing corrupted file
/home/buzzard/.cache/mathlib/9d7ccf1f6a96a7b3.ltar: removing corrupted file
/home/buzzard/.cache/mathlib/a0e2fe74a19a05a9.ltar: removing corrupted file
/home/buzzard/.cache/mathlib/e9b371d906783dfa.ltar: removing corrupted file
/home/buzzard/.cache/mathlib/bd98d2a9440a0ccb.ltar: removing corrupted file
/home/buzzard/.cache/mathlib/07fa7d4c9d3cb952.ltar: removing corrupted file
/home/buzzard/.cache/mathlib/4773c0b4d2fdfde1.ltar: removing corrupted file
/home/buzzard/.cache/mathlib/f5a1b313de7360d0.ltar: removing corrupted file
/home/buzzard/.cache/mathlib/ba1b755b95f4083a.ltar: removing corrupted file
/home/buzzard/.cache/mathlib/c430652a2022d168.ltar: removing corrupted file
/home/buzzard/.cache/mathlib/e64db1dc689260f4.ltar: removing corrupted file
/home/buzzard/.cache/mathlib/99c3af541d1c7042.ltar: removing corrupted file
/home/buzzard/.cache/mathlib/71af2145325e58df.ltar: removing corrupted file
/home/buzzard/.cache/mathlib/a72ee5f853612500.ltar: removing corrupted file
/home/buzzard/.cache/mathlib/dcd56e9df542c403.ltar: removing corrupted file
/home/buzzard/.cache/mathlib/b6e88c698438babe.ltar: removing corrupted file
/home/buzzard/.cache/mathlib/7ec72410f95ea2c6.ltar: removing corrupted file
/home/buzzard/.cache/mathlib/52db2da535a7726b.ltar: removing corrupted file
/home/buzzard/.cache/mathlib/9e83f2a95ad53b53.ltar: removing corrupted file
/home/buzzard/.cache/mathlib/5f084db8f021c06d.ltar: removing corrupted file
/home/buzzard/.cache/mathlib/e917e3db2e789318.ltar: removing corrupted file
/home/buzzard/.cache/mathlib/fcabdb36bd9bc3a3.ltar: removing corrupted file
/home/buzzard/.cache/mathlib/a0de402b1d4ad989.ltar: removing corrupted file
/home/buzzard/.cache/mathlib/fe8766ad835da820.ltar: removing corrupted file
/home/buzzard/.cache/mathlib/7d2dc8ae67accf97.ltar: removing corrupted file
/home/buzzard/.cache/mathlib/e2ac07aad2d7ce1b.ltar: removing corrupted file
/home/buzzard/.cache/mathlib/9822e9eca805f64d.ltar: removing corrupted file
/home/buzzard/.cache/mathlib/cae7a134dbf0a574.ltar: removing corrupted file
/home/buzzard/.cache/mathlib/e843b73eed7b7113.ltar: removing corrupted file
/home/buzzard/.cache/mathlib/16ec5c3e57537d27.ltar: removing corrupted file
/home/buzzard/.cache/mathlib/d5d6fcaa28d8aa13.ltar: removing corrupted file
/home/buzzard/.cache/mathlib/ba524f128c3b747e.ltar: removing corrupted file
/home/buzzard/.cache/mathlib/f17de80e561dbcc9.ltar: removing corrupted file
/home/buzzard/.cache/mathlib/2f67c1ae4d00db95.ltar: removing corrupted file
/home/buzzard/.cache/mathlib/c83b143542fb0dba.ltar: removing corrupted file
/home/buzzard/.cache/mathlib/6ef30625e004a45a.ltar: removing corrupted file
/home/buzzard/.cache/mathlib/659f68b4882b66cc.ltar: removing corrupted file
/home/buzzard/.cache/mathlib/67cf209f81eac92c.ltar: removing corrupted file
/home/buzzard/.cache/mathlib/25dd29b6135a6174.ltar: removing corrupted file
/home/buzzard/.cache/mathlib/4b6efc3904f0e91c.ltar: removing corrupted file
/home/buzzard/.cache/mathlib/c0e1228b5d2d4405.ltar: removing corrupted file
/home/buzzard/.cache/mathlib/c1aaacb6c8840d7b.ltar: removing corrupted file
/home/buzzard/.cache/mathlib/56968ab80ecb7250.ltar: removing corrupted file
Unpacked in 121143 ms
Completed successfully!

Bryan Gin-ge Chen (Aug 23 2025 at 20:19):

I've been keeping a list of these in #nightly-testing > Mathlib `lake update` failure as well. They seem to have started when we switched our cache host to the FRO-provided server.

Kevin Buzzard (Aug 23 2025 at 20:31):

In fact the repeated lake exe cache gets do not seem to solve the problem for me -- I currently seem to be unable to use mathlib without building most of it myself.

Michael Rothgang (Aug 23 2025 at 20:50):

I've observed something similar to Michael Stoll: so far, repeated lake exe cache get fixes things --- with one exception: if I got the cache for files far enough downstream, I got errors about proofwidgets.

Kim Morrison (Aug 24 2025 at 10:55):

We've just identified one problem (cloudflare blocking some requests coming from github machines, mis-identifying them as denial of service attacks). I think a fix for that is in the works.

Kim Morrison (Aug 24 2025 at 10:55):

@Mac Malone, just making sure you've seen this thread. What was the motivation for moving to Cloudflare in the first place? What would we lose if we reversed that?

Kim Morrison (Aug 24 2025 at 10:57):

(From a private thread: Azure was getting really expensive.)

Michael Rothgang (Aug 24 2025 at 17:10):

This made #28857 fail on staging.

Kim Morrison (Aug 25 2025 at 03:08):

I am finding lake exe cache get (on mathlib master) very slow today.

Kevin Buzzard (Aug 25 2025 at 07:15):

The last time I used it I was just grateful that it worked at all (I had to build 5000 mathlib files locally two days ago). I aborted a lake exe cache get on a train yesterday because it was intolerably slow; at the time I blamed it on train wifi

Bryan Gin-ge Chen (Aug 25 2025 at 23:36):

This still seems to be an issue: https://github.com/leanprover-community/mathlib4/actions/runs/17219961260/job/48852754879#step:16:133

Michael Rothgang (Aug 26 2025 at 16:51):

I just got an interesting an unexpected error:

/home/michael/.cache/mathlib/e579ace69f8d40f6.ltar: bad .ltar file
/home/michael/.cache/mathlib/b942f83894a4678d.ltar: bad .ltar file
/home/michael/.cache/mathlib/927aa954dbff6cf6.ltar: bad .ltar file
/home/michael/.cache/mathlib/d09ab9793921db32.ltar: bad .ltar file
/home/michael/.cache/mathlib/06d551f9df083cff.ltar: bad .ltar file
/home/michael/.cache/mathlib/92a82f6dba5fb8ac.ltar: bad .ltar file
/home/michael/.cache/mathlib/b8ed1c4d503bda3c.ltar: bad .ltar file
/home/michael/.cache/mathlib/fdfb8a44c9052a60.ltar: bad .ltar file
/home/michael/.cache/mathlib/0875d747924a9ce9.ltar: bad .ltar file
/home/michael/.cache/mathlib/8cb0b299f491b072.ltar: bad .ltar file
/home/michael/.cache/mathlib/545e2c2f9b832611.ltar: bad .ltar file
/home/michael/.cache/mathlib/062e3e3d7975234c.ltar: bad .ltar file
/home/michael/.cache/mathlib/c4b4614304d1f781.ltar: bad .ltar file
/home/michael/.cache/mathlib/b420f50e52d76b0b.ltar: bad .ltar file
/home/michael/.cache/mathlib/a60e6f330cbde821.ltar: bad .ltar file
/home/michael/.cache/mathlib/798eb8d7874f5a3d.ltar: bad .ltar file
/home/michael/.cache/mathlib/4587f9d477bc4eff.ltar: bad .ltar file
/home/michael/.cache/mathlib/834efec076b50fee.ltar: bad .ltar file
uncaught exception: leantar failed with error code 1

Michael Rothgang (Aug 26 2025 at 16:51):

@Mario Carneiro Any idea where the error could be/where leantar could benefit from further logging?

Mario Carneiro (Aug 26 2025 at 16:52):

that usually means the download was interrupted and you have empty or truncated files

Mario Carneiro (Aug 26 2025 at 16:53):

normally it will automatically clean up the files in this situation, so rerunning it should work

Kevin Buzzard (Aug 26 2025 at 17:19):

I saw plenty of these errors a day or two ago when cache was being flaky. It was one of the things that caused me to notice that my ~/.cache/mathlib directory was 145 gigabytes. I also had loads of blah.ltar.part files in there.

Michael Rothgang (Aug 27 2025 at 08:33):

I just hit this again --- a combination of the cache flakiness and this bug made my life really miserable. More specifically,

  • running lake exe cache get would work for most files, but yield a number of errors HTTP/2 stream 861 was not closed cleanly before end of the underlying stream (that's the cache part)

  • fine, just re-run this: again, a number of files work, and a few still fail

  • at some point, the output takes this form:
Current branch: fieldsimp-rewrite
Using cache from git@github.com:hrmacbeth/mathlib4.git: hrmacbeth/mathlib4
Attempting to download 37 file(s) from leanprover-community/mathlib4 cache
Downloaded: 0 file(s) [attempted 37/37 = 100%] (0% success)
Attempting to download 37 file(s) from hrmacbeth/mathlib4 cache
Downloaded: 37 file(s) [attempted 37/37 = 100%] (100% success)
Decompressing 7104 file(s)
/home/michael/.cache/mathlib/2709c9434e5e79cd.ltar: removing corrupted file
/home/michael/.cache/mathlib/040abbee809a02fa.ltar: bad .ltar file
/home/michael/.cache/mathlib/99e201f7b7001b88.ltar: bad .ltar file
/home/michael/.cache/mathlib/e6acd46b84803627.ltar: bad .ltar file
/home/michael/.cache/mathlib/08b4f4ec1bcfa145.ltar: removing corrupted file
/home/michael/.cache/mathlib/3038f76058d010ac.ltar: removing corrupted file
/home/michael/.cache/mathlib/4c12d88ff6b97bbc.ltar: bad .ltar file
/home/michael/.cache/mathlib/1027646065b5e9da.ltar: removing corrupted file
/home/michael/.cache/mathlib/e64913926dc6cb13.ltar: bad .ltar file
/home/michael/.cache/mathlib/2593cfd49fd99733.ltar: removing corrupted file
/home/michael/.cache/mathlib/2a4419f8e8b19949.ltar: removing corrupted file
/home/michael/.cache/mathlib/098100617ed99c29.ltar: removing corrupted file
uncaught exception: leantar failed with error code 1
  • after that point, no more files are Downloaded
Current branch: fieldsimp-rewrite
Using cache from git@github.com:hrmacbeth/mathlib4.git: hrmacbeth/mathlib4
Attempting to download 7 file(s) from leanprover-community/mathlib4 cache
Downloaded: 0 file(s) [attempted 7/7 = 100%] (0% success)
Attempting to download 7 file(s) from hrmacbeth/mathlib4 cache
Downloaded: 7 file(s) [attempted 7/7 = 100%] (100% success)
Decompressing 7104 file(s)
/home/michael/.cache/mathlib/4c12d88ff6b97bbc.ltar: bad .ltar file
/home/michael/.cache/mathlib/e64913926dc6cb13.ltar: bad .ltar file
/home/michael/.cache/mathlib/e6acd46b84803627.ltar: bad .ltar file
/home/michael/.cache/mathlib/99e201f7b7001b88.ltar: bad .ltar file
/home/michael/.cache/mathlib/040abbee809a02fa.ltar: bad .ltar file
uncaught exception: leantar failed with error code 1

and doing the same repeats.

  • lake exe cache get! does not help, as the cache is flaky.
  • eventually, what helped was manually deleting the bad .ltar files.

Michael Rothgang (Aug 27 2025 at 08:33):

Is there a command to tell the cache "delete every bad .ltar file" (so re-running it will re-download things again)?

Ruben Van de Velde (Aug 27 2025 at 08:34):

Huh, I didn't know "bad .ltar file" and "removing corrupted file" were different things

Floris van Doorn (Aug 27 2025 at 14:01):

I ran into the same today, receiving both the errors HTTP/2 stream 503 was not closed cleanly before end of the underlying stream and /local/vdoorn/.cache/mathlib/3a333d79e7d005e6.ltar: removing corrupted file and slowness. I had to run lake exe cache get 7 times before I got the whole cache. I killed one of these because it took ages and the counter didn't move.

Floris van Doorn (Aug 27 2025 at 14:13):

The previous message was Mathlib master.
I'm now checking out a branch, and I also see the bad .ltar file-errors now.

Kevin Buzzard (Aug 27 2025 at 14:13):

When I had to run lake exe cache get several times the other day, ultimately it claimed to have got everything but when I tried lake build it still took forever. But this might have been related to me having 145 gigs of cache files (I've now deleted them so can't test any more)

Mario Carneiro (Aug 27 2025 at 14:14):

Could this be related to the cloudflare switch?

Michael Rothgang (Aug 27 2025 at 14:55):

The "retrieve the cache is flaky" part seems related to that, yes. The "need to manually remove bad .ltar files" looks like a bug in the convex hull of cache and leantar.

Arthur Paulino (Aug 27 2025 at 15:06):

This is unlikely, depending on the implementation of the hash function, but hash collision is also a possibility

Mario Carneiro (Aug 27 2025 at 15:07):

Michael Rothgang said:

Is there a command to tell the cache "delete every bad .ltar file" (so re-running it will re-download things again)?

As I mentioned, it already does this by default

Mario Carneiro (Aug 27 2025 at 15:07):

it will delete any bad .ltar file it touches directly

Mario Carneiro (Aug 27 2025 at 15:08):

it does not attempt to check any files not involved in the current unpack though

Michael Rothgang (Aug 27 2025 at 15:11):

I heard you, that it's supposed to do this automatically. My question is: what do I need to do to make it delete all bad files? (So far, I did not find out how to do this.) Will lake exe cache unpack! work?

Mario Carneiro (Aug 27 2025 at 15:11):

an easy thing is to just use ls and look for files of 0 bytes

Michael Rothgang (Aug 27 2025 at 15:20):

That works. (I've extracted the bad filenames from the terminal output and delete those, that also works.)

Michael Rothgang (Aug 27 2025 at 15:20):

Let me rephrase the question I actually meant to ask: is there a user-facing command (lake exe cache foobar) which does this?

Mario Carneiro (Aug 27 2025 at 15:21):

no

Michael Rothgang (Aug 27 2025 at 15:32):

Mario Carneiro said:

an easy thing is to just use ls and look for files of 0 bytes

For the record: my last lake exe cache get, after several rounds of re-downloading, left 10 bad .ltar files --- so I looked for 0 byte files (and found once 1, which seemed unrelated).

Floris van Doorn (Aug 27 2025 at 15:36):

Mario Carneiro said:

Michael Rothgang said:

Is there a command to tell the cache "delete every bad .ltar file" (so re-running it will re-download things again)?

As I mentioned, it already does this by default

It does this when it raises the error message removing corrupted file, but not when it raises the error message bad .ltar file.
I also didn't find any 0-byte files in my .cache/mathlib folder.

Mario Carneiro (Aug 27 2025 at 15:38):

if you have a bad .ltar file which is not deleted, please send it to me and I can diagnose

Mario Carneiro (Aug 27 2025 at 15:40):

I double checked, and files will only be deleted if they get the UnexpectedEof error during parsing, not any other kind of malformity

Mario Carneiro (Aug 27 2025 at 15:41):

so if you attempt to parse Hamlet as an .ltar file it will give you that error

Floris van Doorn (Aug 27 2025 at 15:46):

079cc1a058706060.ltar is an example of a corrupted file.
Ooh, this .ltar file is very different from the others (in particular, it's html instead of binary)

Floris van Doorn (Aug 27 2025 at 15:49):

So something has gone very wrong, maybe during uploading?

Mario Carneiro (Aug 27 2025 at 15:50):

oh, that could be a 404 then

Floris van Doorn (Aug 27 2025 at 15:50):

Oh no, it's an html page displaying Object not found

Floris van Doorn (Aug 27 2025 at 15:57):

It seems that there are a few issues here:

  • In some cases an ltar file doesn't get uploaded correctly
  • In some cases the cache downloads a 404 html webpage instead of giving the warning "Warning: some files were not found in the cache."
  • Maybe: when the bad .ltar file error occurs, also delete the file.

After I manually deleted some of these bad .ltar files, I got the message "Warning: some files were not found in the cache." when running lake exe cache get again.

Floris van Doorn (Aug 27 2025 at 15:58):

(note: bullet point 2 might have happened because I manually killed lake exe cache get, but I think not)

Floris van Doorn (Aug 27 2025 at 16:01):

(btw, for me this happened on d71a7ff7dfac7b1a9a0944e131b11f8f4887a2e8, part of #29024, which passes CI)

Heather Macbeth (Aug 27 2025 at 16:24):

Something is going wrong with cache-fetching on CI for #28658:
https://github.com/leanprover-community/mathlib4/actions/runs/17272042284/job/49020015371?pr=28658
Is this the same issue? What is the fix?

Update: I rebased on a randomly selected new version of master and it got fixed.

Heather Macbeth (Aug 27 2025 at 16:25):

Downloaded: 1960 file(s) [attempted 5959/6040 = 98%]
Downloaded: 1973 file(s) [attempted 6019/6040 = 99%]
Downloaded: 1973 file(s) [attempted 6036/6040 = 99%]
Downloaded: 1973 file(s) [attempted 6038/6040 = 99%]
Downloaded: 1973 file(s) [attempted 6040/6040 = 100%] (32% success), 1 failed
Warning: some files were not found in the cache.
This usually means that your local checkout of mathlib4 has diverged from upstream.
If you push your commits to a branch of the mathlib4 repository, CI will build the oleans and they will be available later.
Alternatively, if you already have pushed your commits to a branch, this may mean the CI build has failed part-way through building.
During August 2025, we are changing the back-end for the olean cache. If you are unexpectedly not finding oleans for your PR, please try merging `master`.
1 download(s) failed
Error: Process completed with exit code 1.

Mario Carneiro (Aug 27 2025 at 16:43):

I think the cloudflare 404 page issue should be fixed in cache, leangz should not have to deal with html pages masquerading as ltar

Jad Abou Hawili (Aug 27 2025 at 17:03):

Floris van Doorn said:

I ran into the same today, receiving both the errors HTTP/2 stream 503 was not closed cleanly before end of the underlying stream and /local/vdoorn/.cache/mathlib/3a333d79e7d005e6.ltar: removing corrupted file and slowness. I had to run lake exe cache get 7 times before I got the whole cache. I killed one of these because it took ages and the counter didn't move.

Received a similar HTTP/2 was not closed cleanly type of error , though it has nothing to do with "bad ltar" files in .cache.(didn't know that was a thing to worry about)

Updating lean-toolchain (for a project that uses mathlib as a dependency) causes the error,

curl https://raw.githubusercontent.com/leanprover-community/mathlib4/master/lean-toolchain -o lean-toolchain
lake update

Fails to download the tar file.

Made a github issue on it a couple of hours ago.

Michael Rothgang (Aug 27 2025 at 17:45):

For others who have the same problem: running rg --type-add 'ltar:*.ltar' -tltar --fixed-strings "<html lang=" --count inside your mathlib cache directory shows you all fully downloaded files which are a .html page. I got 48 of them.

Michael Rothgang (Aug 27 2025 at 17:46):

Does leantar delete bad .ltar.part files itself, or should I also do this?

Mario Carneiro (Aug 27 2025 at 17:46):

I don't think .part files are deleted

Mario Carneiro (Aug 27 2025 at 17:46):

they are managed by cache

Michael Rothgang (Aug 27 2025 at 17:47):

In other words, deleting them manually (when you're not getting the cache right now) is probably safe, and perhaps even necessary? Sure.

Kevin Buzzard (Aug 27 2025 at 18:05):

Yes I had several years worth of .ltar.part files which were showing no signs of being deleted.

Bryan Gin-ge Chen (Aug 29 2025 at 14:59):

Cache flakiness still seems to be an issue; it looks like it caused this build to fail (cf. #29031).

Michael Rothgang (Aug 29 2025 at 15:24):

I've also had this several times in the past days on PRs of mine. (And it happens regularly when just getting the cache.)

Bryan Gin-ge Chen (Aug 30 2025 at 21:55):

Red x on master due to this: https://github.com/leanprover-community/mathlib4/actions/runs/17348514781/job/49251630720#step:16:73

Send failure: Broken pipe

Filippo A. E. Nuccio (Aug 31 2025 at 11:06):

I am also experiencing issues with the cache today, I get a bunch of "Send failure: broken pipe" errors, and although the number of failures seems small (at the end I get 33 download(s) failed, building mathlib takes forever.

Filippo A. E. Nuccio (Aug 31 2025 at 11:06):

I tried this both on master and on some non-master branches, with the same result.

Filippo A. E. Nuccio (Aug 31 2025 at 17:27):

I also tried following @Michael Rothgang 's suggestion, and run in the same loop. In more detail, I

  • deleted the whole cache (I noticed that lake clean had no effect whatsoever)
  • run a first call of lake exe cache get: it was pretty slow, but after a while it started downloading many things but also showed some Send failure: Broken pipe, so at the end it told me that 198 downloads failed;
  • run lake exe cache get again: same as before, we got down to 5 failed download and no bad *.ltar message
  • run lake exe cache get again: this time, simply got many bad *ltar followed by removing corrupted file ending with uncaught exception: leantar failed with error code 1
  • run lake exe cache get again: from now on, I was in hell, so the IO output would be the same at each call, namely some bad *.ltar file with no removing corrupted file after it; and ending message uncaught exception: leantar failed with error code 1. This loops indefinitely.

Filippo A. E. Nuccio (Aug 31 2025 at 17:27):

In this state, it seems impossible to do anything: if I open VSCode, it starts compiling everything.

Bryan Gin-ge Chen (Sep 02 2025 at 18:13):

From about 10 minutes ago, hitting a staging job: https://github.com/leanprover-community/mathlib4/actions/runs/17411769220/job/49431369841#step:3:350

Send failure: Broken pipe

Arthur Paulino (Sep 02 2025 at 18:52):

It's been ~2.5 years since mathlib Cache's first commit. It served as a kicker so people could port mathlib from Lean 3 to Lean 4 with less friction. Maybe it's time for a more robust solution

Patrick Massot (Sep 03 2025 at 06:27):

This is being worked on. Actually this work partly explains the current turbulences. Hopefully it will be a win in the long run.

Kim Morrison (Sep 04 2025 at 06:27):

(And: thank you, @Arthur Paulino, for that work ~2.5 years ago! We've been leaning heavily on it since. :-)

Michael Stoll (Sep 05 2025 at 18:50):

I'm getting a lot of

HTTP/2 stream 933 was not closed cleanly before end of the underlying stream

again, but I also observe a new phenomenon: repeating lake exe cache get, I see, e.g.,

Attempting to download 416 file(s) from leanprover-community/mathlib4 cache
Downloaded: 395 file(s) [attempted 403/416 = 96%]

(the numbers vary), at which point nothing further seems to be happening for a rather long time (no CPU or network activity), before it does continue after all (being impatient, I interrupted it a few times, then wrote this message while it seemed to hang, but when returning, it had finished). Probably some other kind of network connection problem...

Kevin Buzzard (Sep 06 2025 at 10:44):

I am having quite a confusing experience with cache at the minute. So I'm now slowly becoming immune to all these Send failure: Broken pipe and /home/buzzard/.cache/mathlib/8914c269f4486098.ltar: removing corrupted file messages, but after repeated lake exe cache gets I will eventually get some kind of claim of success. However it is then not always the case that lake build terminates quickly; sometimes it seems to still take an extremely long time even though it reports that things are "Running" rather than "Building". Currently I am just randomly trying things like rm -rf .lake and lake exe cache get! until eventually I find myself in some state where lake build finishes quickly. In short, I'm finding it quite hard to review PRs at this point, there is a frustrating element of randomness in the initial part of the process (at least for me).

Bryan Gin-ge Chen (Sep 06 2025 at 14:04):

It might be that there's still a corrupted file after the download that's breaking the build. One way to maybe figure out what's going on with a slow lake build is to try lake build -v --no-build first after lake exe cache get succeeds; the --no-build option tells lake to fail if it has to actually build a file, and it should tell you which file it is (though I bet this is also just going to be random, since the network errors seem to happen randomly).

Kim Morrison (Sep 08 2025 at 00:12):

I have just reverted the cache backend to Azure. It will be expensive again, but hopefully also everything will go back to working.

Weiyi Wang (Sep 08 2025 at 02:32):

Not sure if it is related but some recent PRs are experiencing cache problem in CI (example #29419)
Seems that whatever files the PR modifies cause cache error

Patrick Massot (Sep 08 2025 at 07:47):

Every change to this infrastructure is bound to create some turbulences, so we don’t expect to return to a fully functional situation immediately.

Arthur Paulino (Sep 08 2025 at 10:29):

Arthur Paulino said:

Maybe it's time for a more robust solution

This is one suggestion for what I mean by "more robust solution":

  1. Client packs all requested file names in a query file
  2. Server responds with a query response file, saying which files are available and their respective sizes
  3. Client meticulously manages the download of each file, knowing how many bytes to expect beforehand, retrying when it makes sense and giving up otherwise, keeping track of a query record saying which local files are up to date
  4. Client reports to user how many of those files were available on the server and how many of the available files were successfully downloaded
  5. User may want to retry, but the client knows where to start from because the local query record is trustworthy

This requires a real server script running the above protocol (and an authored file management would be nice too) instead hitting some third party cloud directly

Extra: the "query record" needs to know how to handle parallel updates. Using a proper DB (like sqlite3) would be ideal here

Yakov Pechersky (Sep 08 2025 at 13:17):

In the meantime, since we're calling curl command-line style via lean, we could have a preferred version that uses aria2 since that has built in retry logic, chunking, parallelism, and resuming.

Eric Rodriguez (Sep 16 2025 at 16:13):

I'm not sure if everything went smoothly here with this switch; I can only get a small amount of the cache files for mathlib @ v4.23.0. [I also get a compilation failure on it when building manually; I'm going to try this on a very clean environment to triple-check that]

Eric Rodriguez (Sep 16 2025 at 16:30):

(PEBKAC: I was requiring a package that wasn't updated to the latest version. On the other hand, I'm not sure why that affects either the cache mechanism or downstream compilation - my toolchain was set to v4.23.0)

Ruben Van de Velde (Sep 16 2025 at 18:47):

Did the not-yet-updated package also depend on mathlib, but a version of mathlib that didn't use 4.23?

Eric Rodriguez (Sep 17 2025 at 00:21):

Yes, indeed it did; but in both my lakefile and my manifest I made sure that the version of mathlib was the one I required

Kim Morrison (Sep 17 2025 at 00:42):

@Eric Rodriguez, do you have a reproducible problem we could look at?


Last updated: Dec 20 2025 at 21:32 UTC