Zulip Chat Archive

Stream: mathlib4

Topic: lake exe cache get


Johan Commelin (Feb 01 2023 at 18:08):

$ lake exe cache get
Attempting to download 12 file(s)
Decompressing 1115 file(s)
uncaught exception:
gzip: stdin: unexpected end of file
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now

Johan Commelin (Feb 01 2023 at 18:09):

Aha, I guess I should use get!?

Johan Commelin (Feb 01 2023 at 18:09):

But is there also an option that Lake tells me the hash of the corrupt file, so that I can remove it manually?

Mario Carneiro (Feb 01 2023 at 18:17):

This sort of thing happens a lot, since it seems that lake is not protected against multiple concurrent executions, which happens commonly when I run lake build but the lean 4 server is doing a "refresh file dependencies" action concurrently (which afaik you can't cancel)

Mario Carneiro (Feb 01 2023 at 18:17):

lake stamps over its own files, occasionally fails and sometimes produces corrupted files

Mario Carneiro (Feb 01 2023 at 18:19):

If we could detect the corrupted files in advance, lake exe cache get could just silently treat them as missing

Mario Carneiro (Feb 01 2023 at 18:20):

Do the corrupted files have any easily detectable pattern? (Are they 0 byte files?)

Johan Commelin (Feb 01 2023 at 18:26):

In my case not. I think it was a partial download.

Gabriel Ebner (Feb 01 2023 at 19:03):

Mario Carneiro said:

This sort of thing happens a lot, since it seems that lake is not protected against multiple concurrent executions, which happens commonly when I run lake build but the lean 4 server is doing a "refresh file dependencies" action concurrently (which afaik you can't cancel)

All of this is true, but it has nothing to do with lake exe cache which is a completely different beast. lake exe cache only runs when you manually start it (by design, we want mathlib4 to work offline). The corrupted archive error is IMO most likely due to 1) pressing ctrl-c while lake exe cache get is running, 2) running lake exe cache get in two different checkouts at the same time, 3) a bad internet connection and interrupted downloads.

Sebastian Ullrich (Feb 01 2023 at 20:19):

Hmm, do I read the code right in that the exit code and stderr of curl are silently discarded in downloadFiles? That doesn't seem prudent.

Gabriel Ebner (Feb 01 2023 at 20:20):

The exit code will typically be nonzero, I believe, since we expect 404s.

Sebastian Ullrich (Feb 01 2023 at 20:23):

https://daniel.haxx.se/blog/2019/07/22/curl-goez-parallel/

When doing many simultaneous transfers, how do you figure out how they all did individually, like from your script? That’s still to be figured out and implemented.

Heh.

Sebastian Ullrich (Feb 01 2023 at 20:31):

Apparently --write-out can be used for passing on the results https://stackoverflow.com/a/71967814. Parsing that from Lean doesn't sound too bad, at least compared to waiting for libcurl integration.

Gabriel Ebner (Feb 01 2023 at 20:32):

NOTE: The %-symbol is a special symbol in the win32-environment, where all occurrences of % must be doubled when using this option.

Sebastian Ullrich (Feb 01 2023 at 20:34):

Hopefully not relevant in the case of IO.Process though...?

Gabriel Ebner (Feb 01 2023 at 20:37):

It says "win32", not "cmd.exe". But hopefully that's just a simplification.

Wojciech Nawrocki (Feb 02 2023 at 03:15):

Gabriel Ebner said:

Note: of course std4/aesop/etc. versions also need to match exactly. Deleting any require std etc. in your lakefile is a good start, followed by lake update.

My experience so far has been to avoid lake update like the plague. For instance, mathlib4's lakefile requires std4@"main", and running lake update will automatically bump that to a version that may be incompatible. When depending on mathlib4 I have resorted to copying hashes from its lake-manifest.json into my lake-manifest.json. Pinning a Git tag is one potential fix, but I think long-term the lack of proper package versioning with semver will be a serious hurdle to using many Lean packages. Am I doing something wrong?

Gabriel Ebner (Feb 02 2023 at 04:04):

and running lake update will automatically bump that to a version that may be incompatible.

This won't happen. Unless you have a require std in your project.

Gabriel Ebner (Feb 02 2023 at 04:05):

Note that it used to be like that but I fixed that bug sometime last year.

Wojciech Nawrocki (Feb 02 2023 at 04:37):

Aha, I did have require std - thanks! While semver would be able to detect conflicting versions, I suppose removing that works as a stopgap.

Matthew Ballard (Feb 02 2023 at 11:17):

I have a package where I am not sure if I am seeing the expected behavior/making a mistake.

From clean copies of the repository, I run lake exe cache get and then lake build vs running only lake build. They both take the same amount of time to complete. Both spend 99% of time building (no compiling) files from the dependencies (std/mathlib/qq).

I can see that lake exe cache get does get the cache for the dependencies.

Am I doing this right? If so, is this expected behavior? Thanks.

Arthur Paulino (Feb 02 2023 at 11:34):

I am not on my PC but mathlib4 uses leanprover/lean4:nightly-2023-01-29

Arthur Paulino (Feb 02 2023 at 11:35):

Your lean-toolchain is set for an older version

Matthew Ballard (Feb 02 2023 at 11:39):

This is the problem. I thought 1-16 would be recent enough. Thank @Arthur Paulino

Gabriel Ebner (Feb 02 2023 at 17:34):

recent enough

To be clear, "recent enough" is not sufficient. It needs to be exactly the same version as mathlib.

Yaël Dillies (Feb 03 2023 at 00:42):

I am still not managing to make lake exe cache get work consistently :sad:

Yaël Dillies (Feb 03 2023 at 00:43):

I need to run the command thrice before I stop getting uncaught exception: lake-packages/std/build/lib/some_file: Can't unlink already-existing object

Yaël Dillies (Feb 03 2023 at 00:44):

And the syntax for lake exe cache get some_file still escapes me (probably because Windows paths).

Johan Commelin (Feb 03 2023 at 02:38):

@Arthur Paulino If I run lake exe cache get, then it spends a bit of time building and downloading files, and after that it spends ~8 seconds decompressing ~1200 files. All of this is fine. But if I run lake exe cache get again, then it skips the first bit (of course) but again spends ~8 seconds decompressing ~1200 files. Is it possible to speed that up by calculating hashes of oleans that are already present?

Arthur Paulino (Feb 03 2023 at 09:12):

@Yaël Dillies it's best to just close VS Code altogether before calling cache

Arthur Paulino (Feb 03 2023 at 09:16):

@Johan Commelin the cache is compressed so we can't compute the hashes of the binary oleans that easily. We would have to decompress them in order to do that. But that's already the current cost

Johan Commelin (Feb 03 2023 at 09:17):

hmm, would it make sense to have some index? Or to make the filenames twice as long?

Arthur Paulino (Feb 03 2023 at 09:18):

The tarbals contain more than just the olean. It contains ilean, trace, c and c.trace files, all related to the same Lean file

Arthur Paulino (Feb 03 2023 at 09:19):

It's safer to overwrite them all

Johan Commelin (Feb 03 2023 at 09:19):

I like living on the edge.

Arthur Paulino (Feb 03 2023 at 09:23):

Using the hashes of the olean files puts us in a chicken-egg problem, because we don't know their hashes when downloading. We only know the hashes of the sources

Arthur Paulino (Feb 03 2023 at 09:23):

And we use exact matches to build the correct URLs

Arthur Paulino (Feb 03 2023 at 09:25):

And then computing the hashes of 1k+ binary files wouldn't be that fast either. That's a considerable amount of IO overhead

Arthur Paulino (Feb 03 2023 at 09:27):

And the index files would have to be hosted and downloaded separately to avoid weird race conditions from parallel puts. That would double up the number of files curl has to deal with

Arthur Paulino (Feb 03 2023 at 09:28):

Idk... It sounds like more trouble with a small benefit

Arthur Paulino (Feb 03 2023 at 09:29):

Decompressing is the fastest part. Downloading is the slow part and we already avoid the download of repeated files (unless you get!)

Sebastian Ullrich (Feb 03 2023 at 09:37):

Why not eagerly decompress them into a directory named by the source hash? That would allow for the hardlinking optimization discussed above.

Arthur Paulino (Feb 03 2023 at 09:43):

That would double the space consumed by olean files in the FS. I'm not totally convinced that trade-off is sane for the sake of 8~10 seconds

Arthur Paulino (Feb 03 2023 at 09:45):

I, particularly, use SSD and space is something that has become more expensive to me

Arthur Paulino (Feb 03 2023 at 09:48):

And that wouldn't eliminate the need to compute hashes of 2 olean files per Lean source file just to know whether we should overwrite it or not

Sebastian Ullrich (Feb 03 2023 at 09:51):

The whole point of hardlinking would be to reduce space overhead, especially since there's no need to keep the tarball around after that

Sebastian Ullrich (Feb 03 2023 at 09:52):

Arthur Paulino said:

And that wouldn't eliminate the need to compute hashes of 2 olean files per Lean source file just to know whether we should overwrite it or not

No need for hashing the contents if the source hash is encoded in the directory name

Arthur Paulino (Feb 03 2023 at 09:57):

Then it wouldn't double the total space but there would still be a considerable increase in space consumed because the tarballs are compressed with maximum compression rate.

Arthur Paulino (Feb 03 2023 at 09:59):

Maybe it would double the space of the caches

Arthur Paulino (Feb 03 2023 at 10:10):

And remember, the tarballs contain 5 files: olean, iliean, trace, c and c.trace. The mechanics to keep all of this consistent aren't simple and I doubt it would pay off the effort needed to make it work (plus the space cost). Yes, you can avoid computing the hashes of the cached files by persisting it somewhere, in a directory name or in a text file, but there's still the hashing of the target files (which we are trying to figure out whether we should overwrite or not).

tar already does the efficient work of keeping track of the folder structure in order to extract things to the correct places

Eric Wieser (Feb 03 2023 at 10:11):

Also worth noting that hard-linking is likely going to confuse windows users, assuming it's possible at all

Arthur Paulino (Feb 03 2023 at 10:32):

I would encourage experimentation with a new get- command if someone is feeling inclined. But I wouldn't be surprised if after all the work it would still take 2~3 seconds instead of 8~10 (assuming you accept the space cost)

Arthur Paulino (Feb 03 2023 at 10:52):

The algorithm would go like this:

  1. Download content like get does
  2. Extract files to a raw folder inside the cache folder
  3. For each decompressed file in each tarball, compute its hash and store it in an index file (if that file wasn't already computed during put)
  4. Delete downloaded tarballs
  5. For each Lean source file, compute the hashes of the respective olean, ilean, trace, c and c.trace files
  6. Overwrite each separately iff their hashes don't match with the precomputed ones

Arthur Paulino (Feb 03 2023 at 10:57):

Ah, alternatively, there's no need to compute hashes. You can just compare their respective byte arrays. But keep in mind that core doesn't have a fast way to compare byte arrays yet

Sebastian Ullrich (Feb 03 2023 at 11:32):

I really don't understand what we are defending here against, tar itself corrupting the output?

  1. download $hash.tar.gz (as before) if $hash/ does not exist or get! was used
  2. unpack into $hash/
  3. delete archive
  4. copy $hash/ contents into local build directory using whatever is the cheapest method on the file system

Arthur Paulino (Feb 03 2023 at 11:37):

That avoids decompression but doesn't avoid unnecessary overwrites. It's a simpler solution but not the one I had in mind

Sebastian Ullrich (Feb 03 2023 at 11:38):

Yes, I just realized that you were probably thinking about how to elide step 4. I would want to measure first though if there is any need for that.

Sebastian Ullrich (Feb 03 2023 at 11:41):

AFAIK hardlinks work fine on NTFS, by the way (symlinks are the problematic one). So step 4 might be completely negligible on all relevant platforms as long as your cache and project directory are on the same filesystem.

Arthur Paulino (Feb 03 2023 at 11:42):

Is there an IO.copyBinFile? (probably not)

Arthur Paulino (Feb 03 2023 at 11:45):

I'm trying to foresee issues related to the coexistence of get and get-

Arthur Paulino (Feb 03 2023 at 11:51):

Because of the unpack command. How should it behave?

Sebastian Ullrich (Feb 03 2023 at 11:53):

Having two models at the same time does sound like a headache, I would avoid that

Arthur Paulino (Feb 03 2023 at 11:55):

Right but we can't assume everyone wants the extra space consumption

Eric Wieser (Feb 03 2023 at 18:15):

hardlinks work fine on NTFS

Hardlinks require administrator permissions, which is probably an issue for classroom use. They also don't work between drives, which I could also see being a problem

Gabriel Ebner (Feb 03 2023 at 18:42):

An easy way to speed up unpacking would be to do it in parallel. I believe we're mostly CPU-bound with gzip.

Johan Commelin (Feb 03 2023 at 18:43):

why can't we download an index?

Arthur Paulino (Feb 03 2023 at 18:47):

An index would be troublesome to be maintained consistent, with parallel put operations being performed. That would be highly subject to race conditions

Johan Commelin (Feb 03 2023 at 18:48):

I see. Because we don't have a sqlite database in the cloud.

Johan Commelin (Feb 03 2023 at 18:49):

But we could keep a local index. If you don't know the hashes of the contents of $hash.tar.gz then you just unzip it, because you haven't seen it before. But after that, you store the hashes of its contents in index.sqlite.

Johan Commelin (Feb 03 2023 at 18:50):

But I guess by now I'm seriously overcomplicating things.

Arthur Paulino (Feb 03 2023 at 18:52):

The gain in speed is marginal compared to the effort. tar with gzip is pretty fast already. And pretty space-efficient too

Johan Commelin (Feb 03 2023 at 18:55):

Yeah, we can worry about it when it becomes a real problem.

Arthur Paulino (Feb 03 2023 at 19:03):

Gabriel Ebner said:

An easy way to speed up unpacking would be to do it in parallel. I believe we're mostly CPU-bound with gzip.

That's a nice one btw

Arthur Paulino (Feb 27 2023 at 01:07):

Hi everyone! We've decided to migrate cache to std4. See the discussion in #std4 > cache adoption by std4.
From a practical standpoint, the user experience won't change.
The only thing I'd ask everyone is to nuke the $HOME/.cache/mathlib folder because that will no longer be the target directory of cache files. $HOME/.cache/lean4_cache will be the new target after this merge.

If you want to see the code changes, please check !4#2509

Arthur Paulino (Feb 27 2023 at 01:18):

Note: that's not the migration patch. It's a preparation step before I can factor out the code and leave only the configuration JSON file

Gareth Ma (Feb 27 2023 at 03:13):

Maybe add a warning when the directory is detected?

Kevin Buzzard (Feb 27 2023 at 08:35):

At what point do I nuke this folder?

Arthur Paulino (Feb 27 2023 at 11:13):

Kevin Buzzard said:

At what point do I nuke this folder?

After the merge, if you're not planning on visiting older branches

Arthur Paulino (Feb 27 2023 at 11:13):

Gareth Ma said:

Maybe add a warning when the directory is detected?

I don't think it's worth adding such dependency in the code

Eric Wieser (Feb 27 2023 at 11:21):

Presumably even if you are visiting older branches, merging mathlib4 master will make the old folder unnecessary again?

Arthur Paulino (Feb 27 2023 at 11:36):

An alternative solution is to rename the folder so you don't lose the cache files that you have already downloaded

Eric Wieser (Feb 27 2023 at 12:01):

What goes wrong if you don't delete the folder?

Arthur Paulino (Feb 27 2023 at 12:28):

Nothing, just some memory waste

Eric Wieser (Feb 27 2023 at 12:30):

Then why would you rename the folder instead of just leaving it alone?

Arthur Paulino (Feb 27 2023 at 12:32):

Hmm, let me explain. The target folder of cache files will change. If nothing is done, there will be no error, but the FS will have a folder full of unused files. If you rename the old cache folder to the new one, it will reuse the downloaded olean files

Arthur Paulino (Feb 27 2023 at 12:33):

But this is all after the merge is done

Kevin Buzzard (Feb 27 2023 at 13:59):

Arthur Paulino said:

Kevin Buzzard said:

At what point do I nuke this folder?

After the merge, if you're not planning on visiting older branches

How do I know when "the merge" has happened? (I'm clueless as to which merge we're talking about -- I have no idea about how the infrastructure is set up)

Arthur Paulino (Feb 27 2023 at 20:05):

@Kevin Buzzard sorry I missed your question for some reason. I mean this PR: !4#2509
I will signal here again once it's merged so everyone will have a higher chance of seeing it

Shreyas Srinivas (Mar 06 2023 at 23:20):

Hello, today I tried to start a lean4 project with mathlib using lake new <project name> math. After running lake update, I ran lake exe cache get.

I got the following error: uncaught exception: ./build/lib/Mathlib/Data/Sym/Sym2.olean: truncated gzip input

Shreyas Srinivas (Mar 06 2023 at 23:22):

this was an intel mac machine

Arthur Paulino (Mar 06 2023 at 23:25):

Try lake exe cache get!. I think you got a corrupted file

Shreyas Srinivas (Mar 06 2023 at 23:36):

That worked. Thanks!

Sebastian Ullrich (Mar 10 2023 at 09:05):

cache should now actually tell you what went wrong with a download instead of leading to a gzip error. Please do post the error messages here so we get a feeling for what goes wrong in the wild.

Jon Eugster (Mar 15 2023 at 11:02):

For the sake of just jotting down errors, I just got two of them running lake exe cache get! in two different repos simultaneously (one dependent on current mathlib4, one 1-2 days old version of mathlib4 itself):

Attempting to download 1737 file(s)
uncaught exception: no such file or directory (error code: 2)
  file: /home/me/.cache/mathlib/curl.cfg

and

Attempting to download 1344 file(s)
Decompressing 1688 file(s)
uncaught exception:
gzip: stdin: unexpected end of file
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now

However, just running lake exe cache get! in both repos again fixed everything :tada: and I can't reproduce it now. Probably a corrupted cache file, which wouldn't be surprising if you run two processes simultaneously.

Yury G. Kudryashov (Mar 16 2023 at 00:40):

Cache fails for me: after lake exe cache get, lake build fails with the following errors:

Building Mathlib.Tactic.WLOG
Building Aesop.Tree
Building Mathlib.Tactic.SplitIfs
Building Mathlib.Tactic.SolveByElim
error: > LEAN_PATH=./build/lib:./lake-packages/Qq/build/lib:./lake-packages/aesop/build/lib:./lake-packages/std/build/lib LD_LIBRARY_PATH=/home/urkud/.elan/toolchains/leanprover--lean4---nightly-2023-03-09/lib:/nix/store/zdx4ykiq8ijfmzxzlrrcw3zswl383wfc-pipewire-0.3.60-jack/lib:./build/lib /home/urkud/.elan/toolchains/leanprover--lean4---nightly-2023-03-09/bin/lean -DwarningAsError=true -Dpp.unicode.fun=true ./././Mathlib/Tactic/WLOG.lean -R ././. -o ./build/lib/Mathlib/Tactic/WLOG.olean -i ./build/lib/Mathlib/Tactic/WLOG.ilean -c ./build/ir/Mathlib/Tactic/WLOG.c
error: external command `/home/urkud/.elan/toolchains/leanprover--lean4---nightly-2023-03-09/bin/lean` exited with code 139
error: > LEAN_PATH=./build/lib:./lake-packages/Qq/build/lib:./lake-packages/aesop/build/lib:./lake-packages/std/build/lib LD_LIBRARY_PATH=/home/urkud/.elan/toolchains/leanprover--lean4---nightly-2023-03-09/lib:/nix/store/zdx4ykiq8ijfmzxzlrrcw3zswl383wfc-pipewire-0.3.60-jack/lib:./lake-packages/aesop/build/lib /home/urkud/.elan/toolchains/leanprover--lean4---nightly-2023-03-09/bin/lean ./lake-packages/aesop/././Aesop/Tree.lean -R ./lake-packages/aesop/./. -o ./lake-packages/aesop/build/lib/Aesop/Tree.olean -i ./lake-packages/aesop/build/lib/Aesop/Tree.ilean -c ./lake-packages/aesop/build/ir/Aesop/Tree.c
error: external command `/home/urkud/.elan/toolchains/leanprover--lean4---nightly-2023-03-09/bin/lean` exited with code 139
error: > LEAN_PATH=./build/lib:./lake-packages/Qq/build/lib:./lake-packages/aesop/build/lib:./lake-packages/std/build/lib LD_LIBRARY_PATH=/home/urkud/.elan/toolchains/leanprover--lean4---nightly-2023-03-09/lib:/nix/store/zdx4ykiq8ijfmzxzlrrcw3zswl383wfc-pipewire-0.3.60-jack/lib:./build/lib /home/urkud/.elan/toolchains/leanprover--lean4---nightly-2023-03-09/bin/lean -DwarningAsError=true -Dpp.unicode.fun=true ./././Mathlib/Tactic/SplitIfs.lean -R ././. -o ./build/lib/Mathlib/Tactic/SplitIfs.olean -i ./build/lib/Mathlib/Tactic/SplitIfs.ilean -c ./build/ir/Mathlib/Tactic/SplitIfs.c
error: external command `/home/urkud/.elan/toolchains/leanprover--lean4---nightly-2023-03-09/bin/lean` exited with code 139
error: > LEAN_PATH=./build/lib:./lake-packages/Qq/build/lib:./lake-packages/aesop/build/lib:./lake-packages/std/build/lib LD_LIBRARY_PATH=/home/urkud/.elan/toolchains/leanprover--lean4---nightly-2023-03-09/lib:/nix/store/zdx4ykiq8ijfmzxzlrrcw3zswl383wfc-pipewire-0.3.60-jack/lib:./build/lib /home/urkud/.elan/toolchains/leanprover--lean4---nightly-2023-03-09/bin/lean -DwarningAsError=true -Dpp.unicode.fun=true ./././Mathlib/Tactic/SolveByElim.lean -R ././. -o ./build/lib/Mathlib/Tactic/SolveByElim.olean -i ./build/lib/Mathlib/Tactic/SolveByElim.ilean -c ./build/ir/Mathlib/Tactic/SolveByElim.c
error: external command `/home/urkud/.elan/toolchains/leanprover--lean4---nightly-2023-03-09/bin/lean` exited with code 139

Gabriel Ebner (Mar 16 2023 at 01:07):

Do we have our first hash collision? :heart_eyes:

Gabriel Ebner (Mar 16 2023 at 01:31):

@Yury G. Kudryashov Which branch is this on?

Yury G. Kudryashov (Mar 16 2023 at 01:59):

port/Topology.Homotopy.Equiv

Yury G. Kudryashov (Mar 16 2023 at 02:00):

UPD: lake clean doesn't help.

Yury G. Kudryashov (Mar 16 2023 at 02:00):

More precisely, after lake clean I get

Building Aesop.Tree
error: > LEAN_PATH=./build/lib:./lake-packages/Qq/build/lib:./lake-packages/aesop/build/lib:./lake-packages/std/build/lib LD_LIBRARY_PATH=/home/urkud/.elan/toolchains/leanprover--lean4---nightly-2023-03-09/lib:/nix/store/zdx4ykiq8ijfmzxzlrrcw3zswl383wfc-pipewire-0.3.60-jack/lib:./lake-packages/aesop/build/lib /home/urkud/.elan/toolchains/leanprover--lean4---nightly-2023-03-09/bin/lean ./lake-packages/aesop/././Aesop/Tree.lean -R ./lake-packages/aesop/./. -o ./lake-packages/aesop/build/lib/Aesop/Tree.olean -i ./lake-packages/aesop/build/lib/Aesop/Tree.ilean -c ./lake-packages/aesop/build/ir/Aesop/Tree.c
error: external command `/home/urkud/.elan/toolchains/leanprover--lean4---nightly-2023-03-09/bin/lean` exited with code 139

Yury G. Kudryashov (Mar 16 2023 at 02:04):

rm -rf lake-packages/aesop/build helped with this one.

Yury G. Kudryashov (Mar 16 2023 at 18:44):

Any news?

Gabriel Ebner (Mar 16 2023 at 18:55):

I don't see that error in any of the CI runs in !4#2919.

Yury G. Kudryashov (Mar 16 2023 at 19:32):

I don't see them in CI either. What else can it be?

Yury G. Kudryashov (Mar 16 2023 at 19:33):

Should I upload SHA512 sums of all files in ~/.cache/mathlib?

Yury G. Kudryashov (Mar 16 2023 at 19:35):

cache.sha512

Gabriel Ebner (Mar 16 2023 at 19:38):

Can you also post the git revision where you get the issue. I'd like to reproduce it.

Scott Morrison (Mar 19 2023 at 21:54):

Anyone else experiencing problems with lake exe cache get today?

After I run lake exe cache get, lake build starts a full build again.

Eric Wieser (Mar 19 2023 at 21:59):

I've taken to running pkill lean before getting caches now

Scott Morrison (Mar 19 2023 at 22:07):

I'm not going to be able to run the port-progress-bot anymore if I need to do that.

Scott Morrison (Mar 19 2023 at 22:10):

Yeah, even killing everything, lake exe cache get results in the build starting from scratch (including all dependencies).

Scott Morrison (Mar 19 2023 at 22:11):

If anyone is able to confirm that lake exe cache get followed by lake build works for them on current master, that would be helpful to me.

Scott Morrison (Mar 19 2023 at 22:16):

Hmm... a fresh clone of mathlib4 seems to resolve the problem.

Scott Morrison (Mar 19 2023 at 22:32):

Gah... eventually realised the problem. I had an elan override set for that mathlib4 directory, and was using a slightly different version of Lean4! I wonder if this is something that cache should take into account.

Scott Morrison (Mar 19 2023 at 22:32):

@Arthur Paulino, do you think cache could notice this, and refuse to provide oleans if the repo is using a nonstandard version of Lean via an elan override?

Arthur Paulino (Mar 20 2023 at 00:05):

Maybe comparing Lean.versionString with the content of the tool chain file. @Scott Morrison can you confirm that such variable will carry a different value if the override is set?

Scott Morrison (Mar 20 2023 at 00:08):

Yes, lean --version says Lean (version 4.0.0, commit 983904995027, Release) with my override set, but Lean (version 4.0.0-nightly-2023-03-15, commit 3d21124445b3, Release) without.

Arthur Paulino (Mar 20 2023 at 01:40):

I mean Lean.versionString in the compiled Lean code:

#eval Lean.versionString -- "4.0.0-nightly-2023-01-10"

Scott Morrison (Mar 20 2023 at 01:43):

Sorry, yes: -- 4.0.0, commit 98390499502743d9f119f3f5808a7ccd1cfb4d61

Arthur Paulino (Mar 20 2023 at 02:11):

Arthur Paulino said:

Maybe comparing Lean.versionString with the content of the tool chain file. Scott Morrison can you confirm that such variable will carry a different value if the override is set?

@Gabriel Ebner do you validate this method?

Reid Barton (Mar 20 2023 at 03:23):

Just curious: why does port-progress-bot need to build any Lean 4 code?

Scott Morrison (Mar 20 2023 at 08:20):

Reid Barton said:

Just curious: why does port-progress-bot need to build any Lean 4 code?

Good question. It doesn't! I guess I was confused --- the port benchmark bot definitely does run Lean 3 and 4 code, however.

Alex J. Best (Mar 20 2023 at 18:05):

(deleted)

Gabriel Ebner (Mar 20 2023 at 20:37):

I don't think Lean.versionString is a good replacement for lean-toolchain. It doesn't contain the compilation flags, for once.

Sebastian Ullrich (Mar 20 2023 at 20:40):

Would anyone actually miss override if we removed it? With the alternative being to just change lean-toolchain so that it actually registers as a change in your git status window.

Jannis Limperg (Mar 20 2023 at 20:51):

If you want to build Lean without Nix, you need overrides for the stage0/stage1 toolchains.

Arthur Paulino (Mar 20 2023 at 23:39):

Gabriel Ebner said:

I don't think Lean.versionString is a good replacement for lean-toolchain. It doesn't contain the compilation flags, for once.

Sorry I wasn't clear enough. I meant to compare the output of Lean.versionString with the content of lean-toolchain. And if they mismatch we exit with a message instead of following up with the regular cache routine

Gabriel Ebner (Mar 21 2023 at 00:09):

Ah, then you want to compare Lean.toolchain. But it should only be a warning since tags don't have good version info:

$ echo 'import Lean #eval Lean.toolchain' | lean +gebner/lean4:bundlelibatomic --stdin
""

Arthur Paulino (Mar 21 2023 at 00:24):

But Lean.versionString does tell the toolchain

Gabriel Ebner (Mar 21 2023 at 00:43):

That also doesn't work for tags:

$ echo 'import Lean #eval Lean.versionString' | lean +gebner/lean4:bundlelibatomic --stdin
"4.0.0, commit 43fbd494cdbfd0b41fb3605318540bcae15f7577"

Gabriel Ebner (Mar 21 2023 at 00:43):

(Lean.toolchain does work just fine for nightlies though.)

Arthur Paulino (Mar 21 2023 at 02:27):

Ah, got it. But why does it need to be a warning? If those toolchains mismatch then whatever is downloaded is not going to be used anyways

Gabriel Ebner (Mar 21 2023 at 02:34):

Lean doesn't use the Lean.toolchain definition. It doesn't matter if it's the empty string, the cache would still be used. (Even though in this case the lean-toolchain file would have a different, nonempty, content.)

Sebastian Ullrich (Mar 21 2023 at 09:09):

Grepping through elan show might be an easier option

Notification Bot (Mar 22 2023 at 08:49):

15 messages were moved from this topic to #lean4 dev > Build setup under elan proposal by Sebastian Ullrich.

Gabriel Ebner (Mar 22 2023 at 20:21):

Anyone else having problems with lake exe cache get today? It doesn't seem to find any caches:

$ lake exe cache get
Attempting to download 1818 file(s)
Decompressing 320 file(s)

Gabriel Ebner (Mar 22 2023 at 20:21):

False alarm. Turns out I had local changes in lake-packages/std

Kendall Frey (Mar 23 2023 at 00:03):

I've been trying to create a new project with a dependency on mathlib4, but something is going wrong at the lake exe cache get step.

Attempting to download 1818 file(s)
No cache files to decompress

My .cache directory remains empty.
It worked fine when I created a new project a week or two ago, but when I created one a couple days ago, I experienced this problem. How do I get the cache files installed?

Gabriel Ebner (Mar 23 2023 at 00:06):

Most likely you've got the wrong lean version. Try cp lake-packages/mathlib/lean-toolchain .

Kendall Frey (Mar 23 2023 at 00:09):

I copied the file using Windows Explorer before getting the cache

Kendall Frey (Mar 23 2023 at 00:11):

Both ./lake-packages/mathlib/lean-toolchain and ./lean-toolchain contain leanprover/lean4:nightly-2023-03-15

Kendall Frey (Mar 23 2023 at 00:13):

lean -v in the project root reports Lean (version 4.0.0-nightly-2023-03-15, commit 3d21124445b3, Release)

Gabriel Ebner (Mar 23 2023 at 00:20):

That sounds good.

Gabriel Ebner (Mar 23 2023 at 00:21):

The only other thing I can think of is different line endings. Have you set git to convert them to CRLF?

Kendall Frey (Mar 23 2023 at 00:22):

Let me check

Kendall Frey (Mar 23 2023 at 00:23):

Both files end with a CR LF

Kendall Frey (Mar 23 2023 at 00:24):

As do the files in the successfully created project from before

Kendall Frey (Mar 23 2023 at 00:27):

I ran it again on that project, and as it should, it output:

No files to download
Decompressing 1731 file(s)

Arthur Paulino (Mar 23 2023 at 00:45):

Do you have the project on GitHub?

Kendall Frey (Mar 23 2023 at 00:46):

No, not at the moment

Arthur Paulino (Mar 23 2023 at 00:54):

What are your dependencies on the lakefile.lean?

Kendall Frey (Mar 23 2023 at 00:56):

require mathlib from git
  "https://github.com/leanprover-community/mathlib4.git"

in both

Arthur Paulino (Mar 23 2023 at 00:57):

Yeah, hard to know. I'd need to see it

Arthur Paulino (Mar 23 2023 at 00:57):

Let us know when you upload it somewhere

Kendall Frey (Mar 23 2023 at 00:58):

I deleted the .cache directory in the working project and reran the cache get. It successfully downloaded the files.

Kendall Frey (Mar 23 2023 at 00:59):

I can create a brand new empty project and push it to github, if that would be helpful.

Kendall Frey (Mar 23 2023 at 01:02):

This problem is not limited to a single project

Kendall Frey (Mar 23 2023 at 02:21):

Unfortunately whatever I did on the working project seems to have bricked it. The cache files are still there, but when I start the Lean server it seems to start rebuilding mathlib anyway, so I'm unable to check any of my work on that project either now.

Gabriel Ebner (Mar 23 2023 at 02:32):

Kendall Frey said:

Both files end with a CR LF

I'm pretty sure that's a problem. We don't convert the line endings for hashing. So a CRLF source file would never match an LF source file.

Gabriel Ebner (Mar 23 2023 at 02:34):

Maybe this is also a new issue. See https://github.com/leanprover/lean4/pull/2138/commits/d85a4f5495a60a528befca9a2e77d461c2fbf918

Kendall Frey (Mar 23 2023 at 02:37):

The curious part is why it worked in the first project, if CR LF is actually the problem

Kendall Frey (Mar 23 2023 at 02:39):

I checked the toolchain in the working project, and it's 03-09, the day before that PR was merged

Kendall Frey (Mar 23 2023 at 02:50):

I created another new project, and this time before lake update I modified lakefile.lean to point to mathlib @ revision af5c2dab7415d3e5e8703104b4e5ad5590f19495, same as the working project. Now lake exe cache get seems to be downloading the files properly. I need to go now, so I'll need to test tomorrow to confirm whether or not this actually avoids having to rebuild mathlib. But indeed it does appear there was a new issue introduced between then and now.

Sebastian Ullrich (Mar 23 2023 at 09:50):

Indeed I probably broke this, my bad. It's a great demonstration of the evils of autocrlf though.

Kendall Frey (Mar 24 2023 at 00:07):

I set autocrlf = input and told vscode to use LF (to prevent any more mixups). Getting the cache and importing mathlib seems to be working properly now, so I should be able to continue my work as normal again. Thanks for the help, @Gabriel Ebner

Yakov Pechersky (Mar 24 2023 at 15:59):

After running a lake exe cache get, the VSCode server tries to run this command, and segfaults:

$ LEAN_PATH=./build/lib:./lake-packages/Qq/build/lib:./lake-packages/aesop/build/lib:./lake-packages/std/build/lib LD_LIBRARY_PATH=/home/yakov/.elan/toolchains/leanprover--lean4---nightly-2023-03-17/lib:./build/lib:./lake-packages/Qq/build/lib:./lake-packages/aesop/build/lib:./lake-packages/std/build/lib:./lake-packages/std/build/lib /home/yakov/.elan/toolchains/leanprover--lean4---nightly-2023-03-17/bin/lean -DwarningAsError=true -Dlinter.missingDocs=true ./lake-packages/std/././Std/Data/List/Basic.lean -R ./lake-packages/std/./. -o ./lake-packages/std/build/lib/Std/Data/List/Basic.olean -i ./lake-packages/std/build/lib/Std/Data/List/Basic.ilean -c ./lake-packages/std/build/ir/Std/Data/List/Basic.c
[1]    26421 segmentation fault  LEAN_PATH= LD_LIBRARY_PATH=  -DwarningAsError=true -Dlinter.missingDocs=true

What could be going on?

Moritz Firsching (Apr 05 2023 at 14:45):

Yury G. Kudryashov said:

rm -rf lake-packages/aesop/build helped with this one.

I was in a similar situation, where only

rm -rf lake-packages/aesop/build/ lake-packages/std/build/ lake-packages/Qq/build

seem solve getting the same error. Thanks for the help!

Ruben Van de Velde (Apr 10 2023 at 12:23):

By the way, I'd like to mention again how amazing the cache is, even compared to the one we had for mathlib3. There's no way I could be remotely as productive without it

Arthur Paulino (Apr 10 2023 at 14:26):

Yakov Pechersky said:

After running a lake exe cache get, the VSCode server tries to run this command, and segfaults:

$ LEAN_PATH=./build/lib:./lake-packages/Qq/build/lib:./lake-packages/aesop/build/lib:./lake-packages/std/build/lib LD_LIBRARY_PATH=/home/yakov/.elan/toolchains/leanprover--lean4---nightly-2023-03-17/lib:./build/lib:./lake-packages/Qq/build/lib:./lake-packages/aesop/build/lib:./lake-packages/std/build/lib:./lake-packages/std/build/lib /home/yakov/.elan/toolchains/leanprover--lean4---nightly-2023-03-17/bin/lean -DwarningAsError=true -Dlinter.missingDocs=true ./lake-packages/std/././Std/Data/List/Basic.lean -R ./lake-packages/std/./. -o ./lake-packages/std/build/lib/Std/Data/List/Basic.olean -i ./lake-packages/std/build/lib/Std/Data/List/Basic.ilean -c ./lake-packages/std/build/ir/Std/Data/List/Basic.c
[1]    26421 segmentation fault  LEAN_PATH= LD_LIBRARY_PATH=  -DwarningAsError=true -Dlinter.missingDocs=true

What could be going on?

Sorry I missed that one. Are you still having this issue? Did you have VS Code open while cache was running?

Yakov Pechersky (Apr 10 2023 at 14:39):

Yes, I had vscode open. I nuked the whole folder and reclined. That "fixed" it

Arthur Paulino (Apr 10 2023 at 14:44):

Having VS Code open causes the Lean server to interfere with the mechanics of cache. Next time try to close VS Code before calling cache. And if it still doesn't work, try get!

Mauricio Collares (May 11 2023 at 10:31):

Just wanted to say that lake exe cache get is so much better than the non-content-addressed mathlib3 version! I can only imagine how good a Nix-ified version of mathlib would be

Yaël Dillies (May 11 2023 at 10:32):

Agreed. I just wish I could use it on Windows :sad:

Mauricio Collares (May 11 2023 at 10:39):

Are the Windows problems documented somewhere? Does it work under WSL?

Sebastien Gouezel (May 11 2023 at 14:14):

For the record, it works perfectly fine on windows for me.

Sebastien Gouezel (May 11 2023 at 14:14):

(without using WSL)

alex chan (May 12 2023 at 04:01):

It works in WSL with Ubuntu or any other distro. If you have git installed on PC you don't need WSL, git cli will do just fine.

Yaël Dillies (May 12 2023 at 08:26):

Mauricio Collares said:

Are the Windows problems documented somewhere? Does it work under WSL?

I've copiously complained about them in this thread already, eg https://leanprover.zulipchat.com/#narrow/stream/287929-mathlib4/topic/lake.20exe.20cache.20get/near/321126042

Mauricio Collares (May 12 2023 at 14:33):

Wow, that's a confusing error. Does it still happen with an up-to-date copy of mathlib4?

Mauricio Collares (May 12 2023 at 14:39):

If so, where are you running the commands? Still in bash inside VS Code?

Mauricio Collares (May 13 2023 at 18:55):

(deleted, I think I made a mistake while experimenting with the caching and I don't want to mislead people)

Mauricio Collares (May 13 2023 at 18:55):

(deleted)

Eric Rodriguez (Oct 30 2023 at 11:47):

Is the cache currently stored only as indivdual oleans which are gotten one at a time?

Scott Morrison (Oct 30 2023 at 11:49):

Yes.

Eric Wieser (Oct 30 2023 at 11:54):

Related to that; I've noticed that a cache miss is much more expensive than a cache hit; running lake exe cache get on a branch where no cache is available due to an std bump (so nothing is downloaded) can often take longer than downloading a full cache!

Eric Rodriguez (Oct 30 2023 at 12:00):

I wonder if it's faster/better to revert to the old gzipped version - won't there be a lot of overhead starting the download for each individual file etc? I understand it's better when there's overlap in the cache and you only have to download, e.g., 20 oleans as opposed to all 4000

Scott Morrison (Oct 30 2023 at 12:00):

Well, except for the complete cache miss scenario Eric just mentioned, I don't think this is the case.

Scott Morrison (Oct 30 2023 at 12:00):

We're only making one http connection.

Arthur Paulino (Oct 30 2023 at 12:05):

There are other cost dimensions at play other than download speed anyway. There's storage cost (on Azure and on users' computers) and also data transfer costs, which can't be neglected for everyone

Arthur Paulino (Oct 30 2023 at 12:09):

Idk how big the cache is currently, but suppose it's 500mb. If you're just changing one leaf file with your PR, then once it's merged everyone else would have to pay the cost of another chunk of 500mb on their machines if they aren't keeping the cache folder clean

Eric Rodriguez (Oct 30 2023 at 12:11):

The thought mainly came to me as I just had significantly faster internet installed, and I noticed that the cache get timings were very similar. On users' computers I assumed the data could be unpacked and any duplicates discarded, but on Azure the issue is probably fairly different.

Arthur Paulino (Oct 30 2023 at 12:14):

That's true, but still, being forced to download 500mb sounds very suboptimal if what I already have only differs on a few files

Eric Rodriguez (Oct 30 2023 at 12:15):

We have a full cache miss only when we upgrade lean, I guess?

Arthur Paulino (Oct 30 2023 at 12:16):

That cache miss happens on a cloud machine. But it builds the cache and makes it available for everyone so the full cache miss only happens once

Arthur Paulino (Oct 30 2023 at 12:19):

I think there are other situations similar to a full cache miss. Like if you change a deep file in Mathlib or in Std. The deeper the file is, the higher the chances of getting closer to a full cache miss

Mario Carneiro (Oct 30 2023 at 18:38):

Note that some of this could be fixed by additional smarts on the Azure side. For example if you could send a single query to find out which of a set of files exist on the server; I believe HTTP/2 also lets you do something here. But the current hosting is just a plain HTTP/1.1 file server, so in order to find out if none of a set of 4000 files exist on the server you have to do 4000 file requests.

One other trick I am hoping to implement at some point is to have a marker file that exists once per "root hash" (which is a combination of the lakefile.lean, lean-toolchain and lake-manifest.json), which exists iff any cache file with that root hash is on the server. This should address the case where you have a brand new lean version or a change to the lakefile and so it's a complete cache miss. This doesn't help almost complete cache misses though.

Joachim Breitner (Oct 31 2023 at 08:20):

Does the cache tool assume and use the fact that if A depends on B, and B isn't available in the cache, then A won't be either? So if it queries from root to leaf module, it can stop early?

Mauricio Collares (Oct 31 2023 at 08:26):

This is a good idea but it might hinder parallelism in the common case of a full cache hit.

Mario Carneiro (Oct 31 2023 at 08:52):

@Joachim Breitner yes

Mario Carneiro (Oct 31 2023 at 08:56):

Er, actually I take it back, cache does take advantage of this in the case where B is not found, but it will make the full list of things to download before downloading anything, to maximize parallelism. lean-cache is able to take advantage of this more fine-grained dependency notion because it is driving cURL itself and can interleave the network requests with hashing and stop things early


Last updated: Dec 20 2023 at 11:08 UTC