Zulip Chat Archive

Stream: lean4

Topic: Caching doesn't seem to work


Nikolas Kuhn (May 12 2023 at 22:30):

I have trouble getting mathlib4 to build with the cached files on Windows. I have all instances of VSCode closed, and am trying to run

lake clean
lake exe cache get
lake build

but it seems like it tries to build mathlib from scratch each time. I'm trying to build after porting a mathlib3 file using the script - I think it worked once when I tried to build "pure" mathlib4.

Is there anything that I'm overlooking or that could help to diagnose the issue?

Alex J. Best (May 12 2023 at 23:11):

Are you sure it is checking the files and not just compiling the executable code (what is the output of the build command?)

Nikolas Kuhn (May 12 2023 at 23:54):

[1/2369] Building Std.Tactic.OpenPrivate
[1/2369] Building Std.Data.Option.Init.Lemmas
[1/2369] Building Std.Tactic.HaveI
[1/2369] Building Std.Tactic.Unreachable
...

Mario Carneiro (May 12 2023 at 23:55):

what does lake exe cache get do? (What you posted is indeed mathlib building, and we want to avoid that)

Nikolas Kuhn (May 12 2023 at 23:57):

Attempting to download 4 file(s)
Decompressing 2350 file(s)

It seems like it's exiting fine. If I do lake clean beforehand it runs through a bunch of steps before downloading.

Kevin Buzzard (May 13 2023 at 00:01):

Try lake exe cache get! maybe?

Nikolas Kuhn (May 13 2023 at 08:05):

Same issue. I might try to reinstall in WSL and see whether that works better.

Max Nowak 🐉 (May 13 2023 at 08:49):

Have you tried lake update?

Nikolas Kuhn (May 13 2023 at 09:20):

Just tried it. Now lake clean followed by lake exe cache get or lake exe cache get! give

Attempting to download 2354 file(s)
No cache files to decompress

And build still starts at the beginning.

Nikolas Kuhn (May 13 2023 at 09:21):

Also, for reference:

PS C:\Users\kuhnn\lean4\mathlib4> lean --version
Lean (version 4.0.0-nightly-2023-05-06, commit 445fd417be4d, Release)
PS C:\Users\kuhnn\lean4\mathlib4> lake --version
Lake version 4.1.0-pre (Lean version 4.0.0-nightly-2023-05-06)

Eric Wieser (May 13 2023 at 09:58):

Do you have vscode open at the same time? lake cache get writes the same files as vscode does, so having both open makes a mess

Nikolas Kuhn (May 13 2023 at 10:44):

No, and as far as I can tell no lean processes running (although I don't know how to properly check this in windows).

Nikolas Kuhn (May 13 2023 at 10:51):

Relatedly, what is the preferred way to run start_port.sh in windows? It didn't work in Powershell (i.e. bash _ reported errors), so I ran it in the VSCode terminal. Maybe that has broken something?

Mario Carneiro (May 13 2023 at 10:52):

WSL might work

Mario Carneiro (May 13 2023 at 10:52):

I think it needs at least sed, so I doubt a plain cmd.exe will work

Mario Carneiro (May 13 2023 at 10:54):

hm, actually it is quite a bit sophisticated, you will probably need an actual conforming bash implementation

Mario Carneiro (May 13 2023 at 10:54):

what errors do you get?

Shreyas Srinivas (May 13 2023 at 11:36):

Did you run lake update before running lake exe cache get?

Nikolas Kuhn (May 13 2023 at 13:11):

@Shreyas Srinivas yes, I did
@Mario Carneiro This is what I get.

PS C:\Users\kuhnn\lean4\mathlib4> bash ./scripts/start_port.sh Mathlib/CategoryTheory/Sites/Pushforward.lean
./scripts/start_port.sh: line 2: $'\r': command not found
: invalid optionport.sh: line 3: set: -
set: usage: set [-abefhkmnptuvxBCHP] [-o option-name] [--] [arg ...]
: invalid option namesh: line 4: set: pipefail
./scripts/start_port.sh: line 5: $'\r': command not found
./scripts/start_port.sh: line 20: syntax error near unexpected token `$'in\r''
'/scripts/start_port.sh: line 20: `case $mathlib4_path in

Mario Carneiro (May 13 2023 at 13:14):

it looks like a lot of the problem is that the shell script has windows style newlines in it which are confusing bash

Mario Carneiro (May 13 2023 at 13:14):

try opening it and saving it with just \n newlines

Nikolas Kuhn (May 14 2023 at 12:34):

Thanks! I've got the script to work (changing the newlines, also had to add setting github credentials to within the script). I set up a new copy of mathlib4 and re-did everything -- it's still trying to build mathlib from scratch after calling lake exe cache get (or get!).

Nikolas Kuhn (May 14 2023 at 12:36):

I'm giving up on trying to make it work in windows directly now, and will try again with WSL.

Nikolas Kuhn (May 15 2023 at 19:04):

Got it to work in WSL. Under Windows, I could narrow down the issue:
1) If I run lake update, it modifies lake-manifest.json (I think it changes the line endings). This leads to lake exe cache get not working (with or without ! ):

Attempting to download 2404 file(s)
No cache files to decompress

2) If I then revert lake-manifest.json to its original form, it seems like lake exe cache get goes through (it actually says it's decompressing), but then mathlib starts building anew anyway.

Sebastian Ullrich (May 15 2023 at 19:12):

Could you please confirm whether the line endings were the only change? This should have been fixed by !4#3096

Sebastian Ullrich (May 15 2023 at 19:13):

In any case, do not run lake update in mathlib4 if you want to use the cache! I don't know where this suggestion is coming from, it's the opposite of what you want to do usually.

Nikolas Kuhn (May 15 2023 at 19:18):

Thanks for the clarification! People have been asking whether I ran it at least twice in this thread, which I assumed means they suggest to do so..

Nikolas Kuhn (May 15 2023 at 19:20):

This is what git diff shows (it changes not only the line endings):

warning: LF will be replaced by CRLF in lake-manifest.json.
The file will have its original line endings in your working directory
diff --git a/lake-manifest.json b/lake-manifest.json
index 9a29eda5..7e4395ee 100644
--- a/lake-manifest.json
+++ b/lake-manifest.json
@@ -10,12 +10,12 @@
   {"git":
    {"url": "https://github.com/JLimperg/aesop",
     "subDir?": null,
-    "rev": "cdc00b640d0179910ebaa9c931e3b733a04b881c",
+    "rev": "5cf2eb0722e8d0cb58149c744f9888de1b9f0855",
     "name": "aesop",
     "inputRev?": "master"}},
   {"git":
    {"url": "https://github.com/leanprover/std4",
     "subDir?": null,
-    "rev": "6006307d2ceb8743fea7e00ba0036af8654d0347",
+    "rev": "fe45c329180f4d051f9f19d1a360d9e195532afb",
     "name": "std",
     "inputRev?": "main"}}]}

Ruben Van de Velde (May 15 2023 at 19:20):

You'll want to revert those changes in any case

Nikolas Kuhn (May 15 2023 at 19:27):

Yup! I re-reverted in order to check.

Nikolas Kuhn (May 15 2023 at 19:30):

Starting with a fresh copy of mathlib and running lake exe cache get and lake build still starts a fresh build. (but cache get now has files to decompress again)

Nikolas Kuhn (May 15 2023 at 19:31):

lake exe cache get output:

info: cloning https://github.com/leanprover/std4 to .\lake-packages\std
info: cloning https://github.com/gebner/quote4 to .\lake-packages\Qq
info: cloning https://github.com/JLimperg/aesop to .\lake-packages\aesop
info: [0/8] Building Cache.IO
info: [1/8] Compiling Cache.IO
info: [1/8] Building Cache.Hashing
info: [2/8] Compiling Cache.Hashing
info: [2/8] Building Cache.Requests
info: [4/8] Compiling Cache.Requests
info: [4/8] Building Cache.Main
info: [6/8] Compiling Cache.Main
info: [8/8] Linking cache.exe
Attempting to download 2402 file(s)
Decompressing 2402 file(s)

Arthur Paulino (May 15 2023 at 19:34):

It's just building the cache binary, right?

Arthur Paulino (May 15 2023 at 19:35):

How long does it all take? And what happens if you open mathlib4 on some random lean file that contains theorems and stuff?

Nikolas Kuhn (May 15 2023 at 19:35):

You mean lake build? It looks like its building everything.

Ruben Van de Velde (May 15 2023 at 19:36):

If that's where it finished, try lake build - this should have barely if any output, which means it worked

Ruben Van de Velde (May 15 2023 at 19:36):

:(

Arthur Paulino (May 15 2023 at 19:37):

Nikolas Kuhn said:

You mean lake build? It looks like its building everything.

But how long does it take? More than a minute?

Nikolas Kuhn (May 15 2023 at 19:37):

To be clear, if the lines start Building ...., that's rebuilding from scratch, right? I had it correctly build from the cached version in WSL before and that was quick and didn't have this.

Nikolas Kuhn (May 15 2023 at 19:38):

Yes, more than a minute, more like a day.

Arthur Paulino (May 15 2023 at 19:40):

Okay, if lake build is taking forever even after cache successfully downloaded and decompressed all the olean files you needed, then it's likely something wrong with Lake

Arthur Paulino (May 15 2023 at 19:40):

Maybe some combination of Lake + your dev environment

Arthur Paulino (May 15 2023 at 19:52):

Here's an idea: pick an olean file that's built from scratch with a raw clone of mathlib4 without lake exe cache get... just go straight to lake build. You don't need to wait for it to finish. One olean file is enough. Once you have it, do a md5 on it and save the hash on a text file.

Then compare with the md5 hash of the same file that's downloaded by lake exe cache get. Maybe it's different? They should be the same

Sebastian Ullrich (May 15 2023 at 20:06):

Works on my machine, this is in a fresh clone of mathlib4
image.png

Mauricio Collares (May 15 2023 at 20:08):

It doesn't work here, and I have some potentially useful information: when I run lake build, the .olean and .ilean files don't change but the .trace files do

Mauricio Collares (May 15 2023 at 20:09):

Is there a way to see what goes into the trace hash computation?

Sebastian Ullrich (May 15 2023 at 20:10):

I don't think so, though I've wished for that before

Mauricio Collares (May 15 2023 at 20:12):

Wait, I see a few "mixTrace (<- getMTime olean) ..." calls in lake

Mauricio Collares (May 15 2023 at 20:13):

Does it really require accurate modification timestamps?

Mauricio Collares (May 15 2023 at 20:14):

Ah, that's "old mode"

Nikolas Kuhn (May 15 2023 at 20:18):

Here are the associated to Mathlib/Tactic/Set.leanwith a fresh build and cache getrespectively:
7C39C1177B1D85A22CB0DF933FC5E51E
7C39C1177B1D85A22CB0DF933FC5E51E
Should be indeed the same. (I got these from ./build/Lib/Mathlib/Tactic/Set.olean.)

Nikolas Kuhn (May 15 2023 at 20:20):

Weirdly, lake build does seem to pick up from where it stopped when I cancel it. So caching seems to work for me in principle?

Mauricio Collares (May 15 2023 at 20:21):

Nikolas, does it work if you run git config --global core.autocrlf false and reclone mathlib4? (also @Yaël Dillies)

Mauricio Collares (May 15 2023 at 20:21):

I think Git for Windows might have some weird defaults

Nikolas Kuhn (May 15 2023 at 20:25):

Thanks! That sounds like a useful thing to set either way. I'm giving it a try.

Sebastian Ullrich (May 15 2023 at 20:26):

Yes, it turns out I must have selected autocrlf = false when installing Git for Windows

Nikolas Kuhn (May 15 2023 at 20:27):

Yes, now it works! :D

Nikolas Kuhn (May 15 2023 at 20:32):

This also fixes the issue with the porting script.

Mauricio Collares (May 15 2023 at 21:04):

Should lake traces normalize newlines (at least on .lean files)?

Sebastian Ullrich (May 15 2023 at 21:04):

Yes, we need the same thing as !4#3096 in Lake

Sebastian Ullrich (May 21 2023 at 21:17):

https://github.com/leanprover/lake/pull/170


Last updated: Dec 20 2023 at 11:08 UTC