Zulip Chat Archive
Stream: mathlib4
Topic: lake exe cache get errors
Floris van Doorn (Mar 16 2023 at 13:32):
I'm noticing many errors when executing lake exe cache get!
in the last few days.
For example here the CI fails with
Attempting to download 1749 file(s)
OpenSSL SSL_read: Connection reset by peer, errno 104
OpenSSL SSL_read: Connection reset by peer, errno 104
2 download(s) failed
which might just be a connection hiccup.
Floris van Doorn (Mar 16 2023 at 13:32):
I also got this one locally (on Windows):
Floris@MSI MINGW64 /d/projects/mathlib4b ((094cafa95...))
$ lake exe cache get
error: > git fetch origin # in directory .\lake-packages\std
error: stderr:
error: cannot lock ref 'refs/remotes/origin/main': is at d4107bc2bd28bc70e8327578c16acb904d2bcfd0 but expected 7d37ab32aac2004fa32e1df5b38d878c2301cd71
From https://github.com/leanprover/std4
! 7d37ab3..d4107bc main -> origin/main (unable to update local ref)
error: external command `git` exited with code 1
and then all future attempts give
Floris@MSI MINGW64 /d/projects/mathlib4b (kmill_congr_fallback)
$ lake exe cache get
info: updating .\lake-packages\Qq to revision cc915afc9526e904a7b61f660d330170f9d60dd7
error: > git checkout --detach cc915afc9526e904a7b61f660d330170f9d60dd7 # in directory .\lake-packages\Qq
error: stderr:
fatal: Unable to create 'D:/projects/mathlib4b/lake-packages/Qq/.git/index.lock': File exists.
Another git process seems to be running in this repository, e.g.
an editor opened by 'git commit'. Please make sure all processes
are terminated then try again. If it still fails, a git process
may have crashed in this repository earlier:
remove the file manually to continue.
error: external command `git` exited with code 128
Floris van Doorn (Mar 16 2023 at 13:33):
This was easily fixed, but it looks like a bug?
Arthur Paulino (Mar 16 2023 at 14:08):
Are you running parallel processes? Is your VS Code open while you call cache
?
Floris van Doorn (Mar 16 2023 at 15:10):
Oh, upon reflection is was probably VS Code. I did have VS Code open, but only in a different project, so I thought it wouldn't interfere. However, both these projects were different worktrees of the same git repository, so the VS Code instance probably was still the thing interfering.
Oliver Nash (Sep 04 2023 at 10:11):
At LFTCM this morning we have encountered at least two people for whom we could not get the cache to work. Both users are Windows 11 users and with third-party antivirus software. We observed two problems:
- It was necessary to disable the antivirus software to get
lake exe cache get
to run. - Even after disabling antivirus and apparently getting
lake exe cache get
to run successfully, Lean does not seem to find the cache and wants to compile everything.
Oliver Nash (Sep 04 2023 at 10:11):
Has this been observed before? The users are surviving on Gitpod for now but obviously this is not a good situation.
Scott Morrison (Sep 04 2023 at 10:48):
It might be helpful to know the error messages when running lake exe cache get
with the antivirus enabled.
Scott Morrison (Sep 04 2023 at 10:49):
Similarly, showing us the actual output from lake exe cache get
and lake build
might be helpful?
Eric Wieser (Sep 04 2023 at 12:22):
Reagrding 1; I think the error code was 35; though I don't have the message to hand. Essentially, curl is complaining that someone MiTM'd its connection, which is correct; the antivirus did
Sebastian Ullrich (Sep 04 2023 at 12:24):
There might not exist a reasonable fix to such issues. I assume we're not talking about Microsoft's AV?
Eric Wieser (Sep 04 2023 at 12:26):
No, the two that caused trouble were AVG and Kaspersky
Eric Wieser (Sep 04 2023 at 12:26):
There was another user who had the Microsoft AV using up half their CPU
Malthe Sporring (Sep 04 2023 at 12:27):
The error code was "curl: (35) schannel: next InitializeSecurityContext failed: Unknown error (0x80092012) - The revocation function was unable to check revocation for the certificate"
Eric Wieser (Sep 04 2023 at 12:27):
Eric Wieser said:
There was another user who had the Microsoft AV using up half their CPU (I think I personally had this too)
We told it to ignore the operation of the lean.exe
and that seemed to quieten things down, though ultimately their disk speed seemed to be throttling Lean so much to be unusable
Malthe Sporring (Sep 04 2023 at 12:28):
Although I don't seem able to recreate it. I get "No files to download", but still Visual Studio is insisting on rebuilding everything.
Eric Wieser (Sep 04 2023 at 12:28):
You can do lean exe cache get!
to force it to restart from scratch
Eric Wieser (Sep 04 2023 at 12:28):
(and throw away your previous cache)
Malthe Sporring (Sep 04 2023 at 12:29):
That returns "Expected exactly one file name"
Malthe Sporring (Sep 04 2023 at 12:34):
lake
not lean
:)
Malthe Sporring (Sep 04 2023 at 12:40):
That replicates it. I now get a constant stream of schannel: next InitializeSecurityContext failed (...)
Mauricio Collares (Sep 04 2023 at 15:50):
For Kaspersky, you can disable the MiTM attempts (a.k.a. "encrypted connection scanning") following these instructions: https://support.kaspersky.com/KESWin/11.6.0/en-US/175124.htm. Similarly, for AVG, you can disable HTTPS scanning following the instructions at https://support.avg.com/SupportArticleView?l=en&urlName=Use-AVG-Antivirus-HTTPS-scan&supportType=home.
Yaël Dillies (Sep 11 2023 at 10:10):
I'm trying to get cache on my Windows machine for my project LeanCamCombi but lake exe cache get
keeps on failing:
$ lake exe cache get
info: cloning https://github.com/leanprover-community/mathlib4.git to .\lake-packages\mathlib
info: cloning https://github.com/leanprover/std4 to .\lake-packages\std
info: cloning https://github.com/gebner/quote4 to .\lake-packages\Qq
info: cloning https://github.com/JLimperg/aesop to .\lake-packages\aesop
info: cloning https://github.com/mhuisi/lean4-cli.git to .\lake-packages\Cli
info: cloning https://github.com/EdAyers/ProofWidgets4 to .\lake-packages\proofwidgets
info: Downloading proofwidgets/v0.0.13/windows-64.tar.gz
info: Unpacking proofwidgets/v0.0.13/windows-64.tar.gz
info: [1/9] Building Cache.IO
error: > LEAN_PATH=.\build\lib;.\lake-packages\proofwidgets\build\lib;.\lake-packages\Cli\build\lib;.\lake-packages\mathlib\build\lib;.\lake-packages\Qq\build\lib;.\lake-packages\aesop\build\lib;.\lake-packages\std\build\lib PATH c:\Users\YaÙl\.elan\toolchains\leanprover--lean4---v4.0.0\bin\lean.exe -Dpp.unicode.fun=true -DautoImplicit=false -DrelaxedAutoImplicit=false .\lake-packages\mathlib\.\.\Cache\IO.lean -R .\lake-packages\mathlib\.\. -o .\lake-packages\mathlib\build\lib\Cache\IO.olean -i .\lake-packages\mathlib\build\lib\Cache\IO.ilean -c .\lake-packages\mathlib\build\ir\Cache\IO.c
error: stdout:
.\lake-packages\mathlib\.\.\Cache\IO.lean:7:0: error: unknown package 'Init'
.\lake-packages\mathlib\.\.\Cache\IO.lean:17:11: error: expected ')', ',' or ':'
.\lake-packages\mathlib\.\.\Cache\IO.lean:24:7: error: expected '=>'
.\lake-packages\mathlib\.\.\Cache\IO.lean:29:23: error: unknown identifier 'UInt64'
.\lake-packages\mathlib\.\.\Cache\IO.lean:29:23: error: unknown constant 'sorryAx'
Yaël Dillies (Sep 11 2023 at 10:10):
It's as if lake exe cache get
ran on the wrong version of Lean somehow. Does anyone have a clue?
Sky Wilshaw (Sep 11 2023 at 10:18):
It seems to be working for me on my Windows 10 machine with a fresh clone, so I can't reproduce the error :(
Lake output
Sky Wilshaw (Sep 11 2023 at 10:19):
(not sure how to put code blocks in spoiler blocks)
Yaël Dillies (Sep 11 2023 at 10:23):
Like this
Kevin Buzzard (Sep 11 2023 at 10:50):
When everything is broken for me, I delete the lake-packages
directory and then do lake exe cache get!
. Does that make things any better?
Kevin Buzzard (Sep 11 2023 at 10:51):
Also I guess you can do lake update
in the middle (after deleting, before cache) if this is a project which depends on mathlib rather than mathlib itself.
Yaël Dillies (Sep 11 2023 at 10:52):
Nope, deleting lake-packages
, lake clean
, lake exe cache get!
, lake update
do not help.
Kevin Buzzard (Sep 11 2023 at 10:52):
Try copying lean-toolchain
from mathlib to your project, and make sure you are elan v3 with something like elan self update
Kevin Buzzard (Sep 11 2023 at 10:53):
and then do everything again in a random order, at least this is what I usually do when things don't work for me.
Sky Wilshaw (Sep 11 2023 at 10:54):
It's worth noting that the project compiles on my windows machine and on CI. I copied mathlib's lean-toolchain
into this project yesterday.
Kevin Buzzard (Sep 11 2023 at 10:54):
If none of these help then maybe post the current state of the error.
Yaël Dillies (Sep 11 2023 at 10:54):
Did all that. I'll try Sophie's votive candle :(
Sky Wilshaw (Sep 11 2023 at 10:57):
Does lake build
work (as in, start compiling things) without lake exe cache get
?
Scott Morrison (Sep 11 2023 at 13:17):
Hmm, lake exe cache get
works fine for me (mac os) in this project.
Scott Morrison (Sep 11 2023 at 13:17):
Have you reinstalled elan
? That's the only emergency measure I haven't seen tried above. :-)
Mario Carneiro (Sep 11 2023 at 13:22):
Yaël Dillies said:
c:\Users\YaÙl\.elan\toolchains\leanprover--lean4---v4.0.0\bin\lean.exe
@Yaël Dillies Is your name YaÙl
?
Yaël Dillies (Sep 11 2023 at 14:11):
The terminal indeed mangles my umlaut. Do you think this could be related?
Mario Carneiro (Sep 11 2023 at 14:12):
it's not clear if this is a terminal display issue or something else, but the error message suggests that your builtin lean path was not set correctly so it could be relevant
Mario Carneiro (Sep 11 2023 at 14:13):
can you build it from inside lake-packages/mathlib
?
Yaël Dillies (Sep 11 2023 at 14:22):
Am currently retrying everything in C:/Users/Public
. Will tell you how it goes if the airplane doesn't take off before it finishes loading.
Yaël Dillies (Sep 11 2023 at 14:25):
Mario Carneiro said:
can you build it from inside
lake-packages/mathlib
?
What do you mean exactly? Should I run lake build
inside that folder?
Yaël Dillies (Sep 11 2023 at 14:26):
The C:/Users/Public
trick didn't work because it's looking for something under the mangled folder.
Yaël Dillies (Sep 11 2023 at 14:27):
Is there any way I can just wipe everything lean-related and restart? (not dangerous, this computer is brand new)
Mario Carneiro (Sep 11 2023 at 14:29):
yes, just go into that folder and run lake exe cache
Patrick Massot (Sep 11 2023 at 14:34):
Yaël Dillies said:
The terminal indeed mangles my umlaut. Do you think this could be related?
Why do you call this a umlaut, are you a German Breton?
Yaël Dillies (Sep 11 2023 at 14:35):
Nobody knows what I mean when I say tréma :disappointed:
Yaël Dillies (Sep 11 2023 at 14:37):
Plane taking off, I guess I'll read some notes instead of working on my project.
Patrick Massot (Sep 11 2023 at 14:40):
Do American people know more about umlauts?
Patrick Massot (Sep 11 2023 at 14:41):
It's funny that on computers they ended up looking the same. On paper they aren't meant to be the same, right?
Mario Carneiro (Sep 11 2023 at 14:43):
haha the wikipedia page is actually just called https://en.wikipedia.org/wiki/Two_dots_(diacritic)
Patrick Massot (Sep 11 2023 at 14:46):
No, they have https://en.wikipedia.org/wiki/Diaeresis_(diacritic) and https://en.wikipedia.org/wiki/Umlaut_(diacritic)
Patrick Massot (Sep 11 2023 at 14:47):
Can you see I should be grading?
Mario Carneiro (Sep 11 2023 at 14:50):
those two pages are about the distinct phonemic functions, not just the typographical mark
Mario Carneiro (Sep 11 2023 at 14:51):
apparently the trema redirects to "diaeresis"
Mario Carneiro (Sep 11 2023 at 14:52):
and that also accords with my experience, in english the usage in french would be called a diaeresis
Julian Berman (Sep 11 2023 at 16:04):
I don't think I've ever heard anyone in conversational US English say "diaresis" unless it was specifically to be pedantic about the difference, IME "umlaut" is indeed a bit more common to say/hear, but "two dots" is probably even more common.
Yaël Dillies (Sep 11 2023 at 17:05):
I guess I don't say "two dots" because it means ":" in French.
Mario Carneiro (Sep 11 2023 at 18:06):
Two dots or double-dot seems reasonable as a way to describe the diacritic if you don't know the word for it, but "umlaut" is just wrong unless you are talking about the thing that changes vowel sounds in german and related languages. English itself has some diaereses, although most uses are considered a bit pedantic these days, such as in "coöperation". See also https://www.newyorker.com/culture/culture-desk/the-curse-of-the-diaeresis
Ruben Van de Velde (Sep 11 2023 at 18:48):
(This seems to have less and less to do with the name of the topic)
Damiano Testa (Sep 11 2023 at 19:28):
To steer this back on course, we could rename the topic name to läkë ëxë cächë gët
and start a poll on how many diaeresis, umlauts, two-dots people see in the title.
Bolton Bailey (Sep 19 2023 at 12:37):
I am getting the following error:
boltonbailey@starlight mathlib4 % lake exe cache get
error: > git fetch origin # in directory ./lake-packages/std
error: stderr:
fatal: bad object refs/heads/main 2
error: https://github.com/leanprover/std4 did not send all necessary objects
error: external command `git` exited with code 1
What could be causing this?
Bolton Bailey (Sep 19 2023 at 12:38):
It's only one branch, #6500
Bolton Bailey (Sep 19 2023 at 12:41):
Actually check that, it is not working on master. I ran it once on master fine, but then I synced and ran it again and it failed.
Scott Morrison (Sep 19 2023 at 12:49):
That sounds more like git having trouble talking to github than anything specific to lake
or cache
.
Scott Morrison (Sep 19 2023 at 12:50):
If it's github's fault, waiting should suffice. In the meantime you might try rm -rf lake-packages
and see if that helps.
Bolton Bailey (Sep 19 2023 at 12:53):
rm -rf lake-packages
worked. Thanks!
Bolton Bailey (Sep 19 2023 at 13:39):
Incidentally, GitHub seems to be having issues as well, so maybe that was the cause.
Dagur Asgeirsson (Oct 16 2023 at 08:15):
I get a segmentation fault when running lake exe cache get
on current mathlib master. On a branch where I haven't merged master in a few days, no issue. Is someone else running into this?
Scott Morrison (Oct 16 2023 at 08:29):
It's working for me.
Scott Morrison (Oct 16 2023 at 08:31):
Can you try (in order, so we find out which one helps):
lake clean && lake exe cache get
rm lakefile.olean && lake exe cache get
Dagur Asgeirsson (Oct 16 2023 at 08:40):
The first one worked. Thanks!
Eric Rodriguez (Oct 16 2023 at 10:48):
I got this after pulling the latest master.
Scott Morrison (Oct 16 2023 at 11:04):
Thanks @Eric Rodriguez, that's very helpful. Let's see if @Mac Malone has an idea.
Eric Rodriguez (Oct 16 2023 at 11:09):
I will say that wiping and repulling fixed everything, so maybe it's just a transient issue after upgrading.
Bolton Bailey (Oct 19 2023 at 06:56):
Trying to toy around with the rw_search
branch, but I'm getting the error:
boltonbailey@starlight mathlib4 % lake exe cache get
error: no such file or directory (error code: 2)
file: ./x�]/./Cache/Main.lean
Bolton Bailey (Oct 19 2023 at 06:57):
What is this garbage in the path? Why does lake expect to find something there?
Sebastian Ullrich (Oct 19 2023 at 06:57):
Does deleting the lakefile.olean
help?
Bolton Bailey (Oct 19 2023 at 07:01):
The error changes!
boltonbailey@starlight mathlib4 % lake exe cache get
info: [0/8] Building Cache.IO
info: [1/8] Compiling Cache.IO
info: [1/8] Building Cache.Hashing
info: [2/8] Compiling Cache.Hashing
info: [2/8] Building Cache.Requests
info: [4/8] Compiling Cache.Requests
info: [4/8] Building Cache.Main
info: [5/8] Compiling Cache.Main
info: [8/8] Linking cache
info: stderr:
ld64.lld: warning: /usr/lib/system/libsystem_kernel.dylib has version 13.3.0, which is newer than target minimum of 13.0.0
ld64.lld: warning: /usr/lib/system/libsystem_platform.dylib has version 13.3.0, which is newer than target minimum of 13.0.0
ld64.lld: warning: /usr/lib/system/libsystem_pthread.dylib has version 13.3.0, which is newer than target minimum of 13.0.0
zsh: segmentation fault lake exe cache get
Scott Morrison (Oct 19 2023 at 07:02):
lake clean
?
Bolton Bailey (Oct 19 2023 at 07:02):
boltonbailey@starlight mathlib4 % lake clean
error: invalid argument (error code: 22, invalid argument)
file: ./lake-packages/std/.
Scott Morrison (Oct 19 2023 at 07:02):
Oh dear!
Scott Morrison (Oct 19 2023 at 07:03):
rm -rf lake-packages && lake clean
?
Bolton Bailey (Oct 19 2023 at 07:03):
Success!
Bolton Bailey (Oct 19 2023 at 07:04):
(lake exe cache get
now works too) Thanks!
Jireh Loreaux (Oct 20 2023 at 00:54):
Hmmmm.... I followed the steps in this thread and something deleted the contents of my mathlib4 repo! I'm okay with it, but I'm pretty religious about committing, and I even push local changes to my own private Gitea server when I'm not ready to put something out for public consumption. So I didn't lose anything important, but I can imagine other people being really upset about this.
Note: my first guess was that I was just an idiot and did something stupid like rm -rf .
instead of what was written, but as you can see from my bash history below, that was not the case.
[jireh@boston mathlib4]$ lake exe cache get
error: no such file or directory (error code: 2)
file: ./x�]/./Cache/Main.lean
[jireh@boston mathlib4]$ lake exe cache get!
error: no such file or directory (error code: 2)
file: ./x�]/./Cache/Main.lean
[jireh@boston mathlib4]$ lake clean
error: invalid argument (error code: 22, invalid argument)
file: ./lake-packages/std/.
[jireh@boston mathlib4]$ lake clean
error: invalid argument (error code: 22, invalid argument)
file: ./lake-packages/Qq/.
[jireh@boston mathlib4]$ rm -rf lake-packages && lake clean
info: cloning https://github.com/leanprover/std4 to ./lake-packages/std
info: cloning https://github.com/gebner/quote4 to ./lake-packages/Qq
info: cloning https://github.com/JLimperg/aesop to ./lake-packages/aesop
info: cloning https://github.com/mhuisi/lean4-cli.git to ./lake-packages/Cli
info: cloning https://github.com/EdAyers/ProofWidgets4 to ./lake-packages/proofwidgets
error: invalid argument (error code: 22, invalid argument)
file: ./.
[jireh@boston mathlib4]$ lake clean
error: no such file or directory (error code: 2)
file: ./lakefile.lean
[jireh@boston mathlib4]$ ls
[jireh@boston mathlib4]$ ls -a
. ..
If it matters, I think I may have just switch to a branch prior to the lakefile.olean
(which I didn't attempt to delete before this happened), so it was probably present.
Jireh Loreaux (Oct 20 2023 at 00:55):
@Scott Morrison :up:
Scott Morrison (Oct 20 2023 at 00:58):
That is pretty scary.
Jireh Loreaux (Oct 20 2023 at 00:58):
I'll see if I can reproduce.
Scott Morrison (Oct 20 2023 at 00:58):
I think it can only have been the lake clean
step?
Jireh Loreaux (Oct 20 2023 at 00:59):
I think so.
Jireh Loreaux (Oct 20 2023 at 01:01):
can't seem to repro, sorry.
Scott Morrison (Oct 20 2023 at 01:07):
@Mac Malone, can you think how lake clean
could possibly have done this?
Mario Carneiro (Oct 20 2023 at 05:14):
I think there was something in the recent update that is leading to memory corruption errors in lake during the migration, likely due to out of date lakefile.olean
built for the wrong version of lean
Mario Carneiro (Oct 20 2023 at 05:20):
I think you need the -R
argument to run lake clean
without accidentally using the bad lakefile.olean
Mario Carneiro (Oct 20 2023 at 05:20):
(IMO lake clean
should default to using -R
)
Mac Malone (Oct 20 2023 at 05:22):
Lake does currently perform a check that the toolchain and olean match (it does this by checking that the modification time for the olean is newer than the toolchain). However, if on goes backwards in time to an older Lean with a new olean, this could corrupt the configuration (and would still be a problem with more bullet-proof changes to Lean/Lake's loading of oleans because an older Lean version would not have said changes).
Mario Carneiro (Oct 20 2023 at 05:39):
This just happened to me (moved to a branch on the new version, then moved to a branch on the old version, and vscode called something that produced things in bin
and lib
folders in the current directory). Deleted lakefile.olean
to hopefully fix the issue, but I'm guessing the same issue will recur if I go back and forth again. This is a pretty serious issue, can we back out the affected lean version?
Scott Morrison (Oct 20 2023 at 06:20):
I've seen the mysterious appearance of bin
and lib
twice as well. :-(
Mario Carneiro (Oct 20 2023 at 06:26):
unfortunately I can't think of any way to make a change to the lakefile.olean format which will be handled correctly by the old version of lake, other than renaming lakefile.olean
to something else for long enough for people to move on
Mario Carneiro (Oct 20 2023 at 06:27):
in hindsight we really need lake to include the lean toolchain version in the lakefile.olean
or alongside it
Mario Carneiro (Oct 20 2023 at 06:29):
(Incidentally, this isn't a problem that would have happened if the lakefile.olean was in a more forward compatible format like, say, JSON :grinning_face_with_smiling_eyes: )
Mario Carneiro (Oct 20 2023 at 06:31):
non-self-describing binary formats are super hazardous wrt an unclear versioning system
Mac Malone (Oct 20 2023 at 07:16):
Actually, investigating this some more, I am now confused. Changing branches should update the modification time of lean-toolchain
when the Lean version changes and there was no stable or release candidate of Lean missing the toolchain file modification time check. So Lake should always reconfigure if the toolchain changes. The check in question is here in case anyone sees something I overlooked.
Mario Carneiro (Oct 20 2023 at 07:23):
have you replicated the issue @Mac Malone ?
Mac Malone (Oct 20 2023 at 07:24):
@Mario Carneiro Not yet, no. I plan to try my hand at writing some tests in an attempt to do so tomorrow, but we will see if I succeed.
Mauricio Collares (Oct 20 2023 at 07:26):
Does this bug "respect" the repository boundary? Or can it rm -rf .. if you're unlucky with the memory corruption?
Mario Carneiro (Oct 20 2023 at 07:27):
I don't think we will know the answer to that until we know exactly what the change is and how it manifests
Mario Carneiro (Oct 20 2023 at 07:28):
it does seem very dangerous though, I would rather we just yank the new version until we figure this out
Mario Carneiro (Oct 20 2023 at 07:32):
I just tried a couple variations on going back and forth between old versions and new and I wasn't able to reproduce the bug
Mac Malone (Oct 20 2023 at 07:46):
Mario Carneiro said:
it does seem very dangerous though, I would rather we just yank the new version until we figure this out
Which new version? As far as I can tell, it is not clear what version of Lean (or what combination of versions) is even the source of this bug.
Mario Carneiro (Oct 20 2023 at 07:49):
My impression is that the issues are correlated with peoples' first experiences with rc3
Mario Carneiro (Oct 20 2023 at 07:50):
Have there been any changes to the olean version in rc3?
Mac Malone (Oct 20 2023 at 07:52):
There are two different problems here:
- The olean mismatch not being caught
- The specific Lake ABI change that leads to this
lake clean
behavior.
As long as (1) is not resolved, any Lake ABI change could convincibly cause a problem equal to or worse than this depending on how things get shuffled around from the version switch.
Mario Carneiro (Oct 20 2023 at 07:53):
just checked, the lakefile.olean
s generated for std4 before and after rc3 are identical
Mario Carneiro (Oct 20 2023 at 07:54):
are there any other ABI hazards you can think of here?
Mauricio Collares (Oct 20 2023 at 07:54):
rc2 was very short-lived, probably should check between rc1 and rc3
Mario Carneiro (Oct 20 2023 at 07:54):
in particular, things that would impact the calculation of lean paths
Mac Malone (Oct 20 2023 at 07:55):
Yes, this specific ABI difference I am rather certain is due to lean4#2603 which occurred between rc1 and rc2/rc3.
Mario Carneiro (Oct 20 2023 at 07:55):
indeed there was a binary difference between rc1 and rc2
Mac Malone (Oct 20 2023 at 07:55):
Since that added a PackageConfig
field and shifted everything below it down one.
Mac Malone (Oct 20 2023 at 07:57):
Since srcDir
and buildDir
are adjacent this shift puts the srcDir
in the buildDir
spot and thus causes the lake clean
behavior when it is (for some unknown reason) read by some earlier Lean version without reconfiguring.
Mac Malone (Oct 20 2023 at 07:59):
One hot fix would be to just move postUpdate?
to the bottom of the config, which should keep the PackageConfig
ABI above the same. However, I am not sure whether this would instead cause a segfault or something worse.
Mario Carneiro (Oct 20 2023 at 08:00):
in this case, a segfault is better than deleting your directory
Mario Carneiro (Oct 20 2023 at 08:01):
I wonder whether you can indirectly embed version numbers in the olean by making package
add a declaration to the environment with the version number
Mac Malone (Oct 20 2023 at 08:02):
I am not sure if a segfault would be the result. It could just as easily bump other things around to produce a worse result. (I think?)
Mauricio Collares (Oct 20 2023 at 08:03):
I don't understand oleans, but does the new field affect just its containing structure? Or does it "shift everything down" in the olean potentially corrupting reading later-occurring structures in older lean versions?
Mario Carneiro (Oct 20 2023 at 08:03):
usually if you access something off the end of a VM object you get a segfault, although for packed olean objects you are probably more likely to run into the next object
Mario Carneiro (Oct 20 2023 at 08:04):
the first field of the next object is the rc count which is always 0, so that's going to cause a crash if you are expecting a composite lean object (it's the null pointer)
Mario Carneiro (Oct 20 2023 at 08:04):
if the new field is a UInt8 or something then you might just read 0
Mac Malone (Oct 20 2023 at 08:06):
The version number definition sounds like a good solution.
Mario Carneiro (Oct 20 2023 at 08:07):
remember, keep it simple, you want to depend on the layout of as few lean objects as possible to minimize the possibility of corruption of the version decl
Mauricio Collares (Oct 20 2023 at 08:08):
So, iiuc, the chicken-and-egg problem here is that Lake typically checks if the oleans match the toolchain on a lake build
(by doing traces or whatever), but older versions of Lean pick up and use lakefile.olean
and there are no such checks. In Lean 3, oleans embedded Lean.versionString, is this no longer the case in Lean 4?
Mario Carneiro (Oct 20 2023 at 08:08):
the old versions of lake have a check, but it's not as precise as it should be
Mac Malone (Oct 20 2023 at 08:08):
Mario Carneiro said:
remember, keep it simple, you want to depend on the layout of as few lean objects as possible to minimize the possibility of corruption of the version decl
What do you mean by this?
Mario Carneiro (Oct 20 2023 at 08:08):
it seems to work in basic tests though
Mauricio Collares (Oct 20 2023 at 08:09):
Might be good to add some sanity checks to Lean itself (to the place which actually mmaps the olean) if it still makes sense in the current architecture
Mac Malone (Oct 20 2023 at 08:09):
Mario Carneiro said:
the old versions of lake have a check, but it's not as precise as it should be
More specifically, it is not clear what the imprecision even is.
Mario Carneiro (Oct 20 2023 at 08:09):
@Mac Malone You have to traverse several lean objects in order to get to the version decl's value. Reading an olean with the wrong ABI is pretty fraught and every read could cause something to blow up
Mario Carneiro (Oct 20 2023 at 08:10):
But thankfully the ABI of Expr and friends is fairly stable (although, I have a PR in progress...)
Mac Malone (Oct 20 2023 at 08:12):
Yeah, this solution is not bulletproof. Ideally, the Lean version would at least be in the olean to check (even if only optionally verified).
Mario Carneiro (Oct 20 2023 at 08:13):
I was thinking about that, but I think we don't necessarily want it always, since it changes the hashes
Mario Carneiro (Oct 20 2023 at 08:13):
in this case it would be sufficient if you could just make some room in lakefile.olean to store the version inline
Mauricio Collares (Oct 20 2023 at 08:14):
Mario Carneiro said:
I was thinking about that, but I think we don't necessarily want it always, since it changes the hashes
What's the concern? Lean.versionString is already hashed in lake exe cache get, and I'd certainly expect a full rebuild when changing toolchains.
Mac Malone (Oct 20 2023 at 08:14):
Right now, I think it is relatively unlikely for oleans to generally be compatible across versions. Though, I guess it could be a toggle in both directions (on save and read).
Mario Carneiro (Oct 20 2023 at 08:15):
It's also not bulletproof because the ABI of an olean depends on all of its dependencies, not just the lean toolchain version
Mario Carneiro (Oct 20 2023 at 08:15):
i.e. you can add a field to a struct without changing the lean version
Mac Malone (Oct 20 2023 at 08:15):
In essence, when the time comes that oleans are modular, I can see the value in not checking, but I think that would already be a big enough breaking change that changing the version embedding would not be a major concern.
Mario Carneiro (Oct 20 2023 at 08:15):
this is less of a problem for lakefile.olean because all of its dependencies are in the toolchain
Mac Malone (Oct 20 2023 at 08:16):
Mario Carneiro said:
It's also not bulletproof because the ABI of an olean depends on all of its dependencies, not just the lean toolchain version
Oh, I did not realize it was that frail.
Mario Carneiro (Oct 20 2023 at 08:17):
I mean, if you say a struct has 4 fields then that's how the ABI goes
Mac Malone (Oct 20 2023 at 08:18):
Yeah, It just never fully processed that that would be an implication of storing arbitrary data in the extensions.
Mario Carneiro (Oct 20 2023 at 08:18):
do you mean lake data or user data?
Mac Malone (Oct 20 2023 at 08:19):
Any arbitrary Lean data. Upon reflection, though, it is quite obvious.
Mario Carneiro (Oct 20 2023 at 08:19):
If you store stuff in Exprs then it's a bit more stable
Mario Carneiro (Oct 20 2023 at 08:19):
90% of (normal) .olean files is composed of Exprs
Mac Malone (Oct 20 2023 at 08:20):
Apparently, not much more because evalConst
is presumably the culprit here. (Apparently it does not check that the constructor is sized as expected).
Mario Carneiro (Oct 20 2023 at 08:21):
it can't, really
Mario Carneiro (Oct 20 2023 at 08:22):
I actually have a tool for generating "olean schemas" for lean types, but I'm not exactly sure how to mix it into the processes here
Mac Malone (Oct 20 2023 at 08:22):
Oh, yeah, because evalConst
just has the erased type not the actual type definition to check against.
Mario Carneiro (Oct 20 2023 at 08:27):
Mario Carneiro said:
I actually have a tool for generating "olean schemas" for lean types, but I'm not exactly sure how to mix it into the processes here
Here's a sketch: we run this on every commit of lake to produce a schema for what can appear in an olean file, and commit it to the repo. A CI job recalculates the schema and updates this file / errors if it is not updated, and any update of the file comes with a requirement to bump the "olean version", which is stored in the file somewhere easily (and stably) accessible and used for validation
Mario Carneiro (Oct 20 2023 at 08:28):
The reason for the indirection here is because recalculating the schema at runtime would be too expensive, it's basically a compile time known value but it's also a complex object and a bit big to put in the olean file itself
Mac Malone (Oct 20 2023 at 08:30):
Mario Carneiro said:
bump the "olean version", which is stored in the file somewhere easily (and stably) accessible and used for validation
Would this "olean version" be a parameter of saveModuleData
and return readModuleData
because those are currently C black boxes making the entire olean format fixed and opaque to Lean.
Mario Carneiro (Oct 20 2023 at 08:30):
yes
Mario Carneiro (Oct 20 2023 at 08:30):
this would be part of the olean file header
Mario Carneiro (Oct 20 2023 at 08:31):
(PS: if the olean file format changes I need to be made aware because leantar
, used by lake exe cache
will have to be updated)
Mac Malone (Oct 20 2023 at 08:32):
The most general thing I could think of for this header info would just be to make it an arbitrary Expr
one can add to the header. Does that seem reasonable?
Mario Carneiro (Oct 20 2023 at 08:32):
I would rather it not be a lean object, this runs into the same ABI issues
Mario Carneiro (Oct 20 2023 at 08:32):
something with a fixed layout would be better, like a string or list of strings
Mario Carneiro (Oct 20 2023 at 08:33):
or a number in this case
Mac Malone (Oct 20 2023 at 08:33):
In that case, maybe just a direct ByteArray
?
Mario Carneiro (Oct 20 2023 at 08:34):
sure, but now you need a secondary parser and a spec for what goes in that array, and when that spec changes you have ABI issues all over again
Mario Carneiro (Oct 20 2023 at 08:35):
This byte array would also have to be copied out of the olean, most likely, because it's not a proper lean object
Mac Malone (Oct 20 2023 at 08:35):
True, but that same issue is true for a string or a number.
Mario Carneiro (Oct 20 2023 at 08:35):
so don't put too much in it since it doesn't get the mmap goodness
Mario Carneiro (Oct 20 2023 at 08:37):
(this copying process is also exactly what makes it more segfault-resistant than regular lean objects)
Mac Malone (Oct 20 2023 at 08:38):
A ByteArray
just provides more flexibility and potential space saving for the header format whereas a number would give at most 64 bits and a string may be space inefficient and require undesirable parsing overhead.
Mario Carneiro (Oct 20 2023 at 08:38):
There is also the matter of the very large (and excitable) magic number at the beginning of every olean file, oleanfile!!!!!!!
. I can't resist the idea of adding versioning as oleanfile!!!1!!!
, oleanfile!!!2!!!
etc
Mario Carneiro (Oct 20 2023 at 08:40):
I think 32 bits is more than enough for a version number
Mario Carneiro (Oct 20 2023 at 08:40):
if that's the only requirement
Mario Carneiro (Oct 20 2023 at 08:41):
if you want to store random extra data I would just make use of some spare padding in the file
Mac Malone (Oct 20 2023 at 08:42):
Ideally, Lake would eventually store a cryptographic hash of the olean inputs.
Mac Malone (Oct 20 2023 at 08:42):
Though that is likely very long-term.
Mario Carneiro (Oct 20 2023 at 08:43):
that could be part of the lean data structure proper though, no?
Mario Carneiro (Oct 20 2023 at 08:43):
like in the EnvHeader
Mac Malone (Oct 20 2023 at 08:44):
No? Wouldn't that still have ABI problems if the core data structures change? Inversely, it would also be helpful when the inputs hopefully do not change every new Lean version and we can reuse oleans more often.
Mario Carneiro (Oct 20 2023 at 08:44):
Currently the olean header consists only of the magic number, the offset of the beginning of the file (used for relocating all lean objects if this location needs to be modified), and the pointer to the root object
Mario Carneiro (Oct 20 2023 at 08:45):
Yes, it would have ABI problems, the version is intended to defend against ABI changes
Mario Carneiro (Oct 20 2023 at 08:46):
Adding hashes to the olean file indeed make me concerned about lots of files getting lots of spurious updates
Mario Carneiro (Oct 20 2023 at 08:48):
The point of having an olean version rather than just using the toolchain version is that sometimes the toolchain changes without ABI changes and you can capitalize on this
Mario Carneiro (Oct 20 2023 at 08:48):
in particular, rc -> stable
Patrick Massot (Oct 23 2023 at 00:57):
Now that lean-rc4 is out, how are we meant to solve the issue described [https://leanprover.zulipchat.com/#narrow/stream/287929-mathlib4/topic/lake.20exe.20cache.20get.20errors/near/397441263]? I had the file: ./x�]/./Cache/Main.lean
error message. I tried removing lakefile.olean
and now every lake exe cache get
simply gets a segmentation fault. This is in Mathlib master.
Patrick Massot (Oct 23 2023 at 00:57):
@Mac Malone
Mac Malone (Oct 23 2023 at 01:00):
@Patrick Massot From that previous thread, the problem was that you also need to delete the lakefile.olean
of all dependences as well (i.e., those in lake-packages
). You could also just delete the whole lake-packages
directory and start over.
Patrick Massot (Oct 23 2023 at 01:01):
Thanks, it seems to work.
Oliver Nash (Nov 30 2023 at 12:36):
lake exe cache get
has stopped working for me after getting the latest master (currently at this commit, which happens to be one of my own PRs though this is not relevant).
I am running MacOS and I see just the following:
➜ mathlib4 git:(master) lake exe cache get
Attempting to download 3960 file(s)
Downloaded: 0 file(s) [attempted 3960/3960 = 100%] (0% success)
Warning: some files were not found in the cache.
This usually means that your local checkout of mathlib4 has diverged from upstream.
If you push your commits to a branch of the mathlib4 repository, CI will build the oleans and they will be available later.
No cache files to decompress
Oliver Nash (Nov 30 2023 at 12:37):
Anyone else having issues?
Damiano Testa (Nov 30 2023 at 12:38):
I think that I had the same this morning, but I erased .lake
and then it worked.
Oliver Nash (Nov 30 2023 at 12:38):
Thanks, that worked.
Damiano Testa (Nov 30 2023 at 12:39):
(Maybe I also merged master... there has been a new release, I think.)
Oliver Nash (Nov 30 2023 at 12:39):
Needless to say, this should not be necessary.
Andrew Yang (Nov 30 2023 at 12:56):
I think lake exe cache get!
would be helpful too?
Mario Carneiro (Nov 30 2023 at 21:23):
If someone can reproduce this that would be helpful
Eric Wieser (Nov 30 2023 at 21:49):
I've had this multiple times when bumping old branches to the latest lean today
Eric Wieser (Nov 30 2023 at 21:50):
I think lake clean
always resolved things
Mario Carneiro (Dec 01 2023 at 00:09):
@Mac Malone @Scott Morrison and I debugged this today. Repro:
git checkout 1937b42b
rm -rf .lake
lake exe cache get
git checkout 85239de6
lake exe cache get # fails to find anything
lake clean
lake exe cache get # works
The problem is a combination of several factors:
- The
4.3.0
release was accidentally a re-tag of the same commit as4.3.0-rc2
(It's supposed to be the same content, but usually there are some release notes or something so that it's not exactly the same commit) - The
cache
executable usesLean.versionString
for the root hash, which is different between4.3.0-rc2
and4.3.0
because it contains the literal toolchain name in it - This version string is baked into the
cache
executable itself based on the version used to compilecache
, it is not determined at runtime - Lake only recompiles executables when the lean git hash changes, so in the second
lake exe cache get
invocation above it does not recompilecache
and so it uses the4.3.0-rc2
version string in the root hash, which throws all the hashes off - Doing
lake clean
deletes the cache executable, and the second run rebuilds it with the correctLean.versionString
so it finds the files as expected
Mario Carneiro (Dec 01 2023 at 00:16):
The easy short term fix is to only use Lean.githash
in cache
instead of Lean.versionString
, so that lake invalidates it properly. #8755
Last updated: Dec 20 2023 at 11:08 UTC