Zulip Chat Archive

Stream: mathlib4

Topic: lake exe cache get errors


Floris van Doorn (Mar 16 2023 at 13:32):

I'm noticing many errors when executing lake exe cache get! in the last few days.
For example here the CI fails with

Attempting to download 1749 file(s)
OpenSSL SSL_read: Connection reset by peer, errno 104
OpenSSL SSL_read: Connection reset by peer, errno 104
2 download(s) failed

which might just be a connection hiccup.

Floris van Doorn (Mar 16 2023 at 13:32):

I also got this one locally (on Windows):

Floris@MSI MINGW64 /d/projects/mathlib4b ((094cafa95...))
$ lake exe cache get
error: > git fetch origin    # in directory .\lake-packages\std
error: stderr:
error: cannot lock ref 'refs/remotes/origin/main': is at d4107bc2bd28bc70e8327578c16acb904d2bcfd0 but expected 7d37ab32aac2004fa32e1df5b38d878c2301cd71
From https://github.com/leanprover/std4
 ! 7d37ab3..d4107bc  main       -> origin/main  (unable to update local ref)
error: external command `git` exited with code 1

and then all future attempts give

Floris@MSI MINGW64 /d/projects/mathlib4b (kmill_congr_fallback)
$ lake exe cache get
info: updating .\lake-packages\Qq to revision cc915afc9526e904a7b61f660d330170f9d60dd7
error: > git checkout --detach cc915afc9526e904a7b61f660d330170f9d60dd7    # in directory .\lake-packages\Qq
error: stderr:
fatal: Unable to create 'D:/projects/mathlib4b/lake-packages/Qq/.git/index.lock': File exists.

Another git process seems to be running in this repository, e.g.
an editor opened by 'git commit'. Please make sure all processes
are terminated then try again. If it still fails, a git process
may have crashed in this repository earlier:
remove the file manually to continue.
error: external command `git` exited with code 128

Floris van Doorn (Mar 16 2023 at 13:33):

This was easily fixed, but it looks like a bug?

Arthur Paulino (Mar 16 2023 at 14:08):

Are you running parallel processes? Is your VS Code open while you call cache?

Floris van Doorn (Mar 16 2023 at 15:10):

Oh, upon reflection is was probably VS Code. I did have VS Code open, but only in a different project, so I thought it wouldn't interfere. However, both these projects were different worktrees of the same git repository, so the VS Code instance probably was still the thing interfering.

Oliver Nash (Sep 04 2023 at 10:11):

At LFTCM this morning we have encountered at least two people for whom we could not get the cache to work. Both users are Windows 11 users and with third-party antivirus software. We observed two problems:

  1. It was necessary to disable the antivirus software to get lake exe cache get to run.
  2. Even after disabling antivirus and apparently getting lake exe cache get to run successfully, Lean does not seem to find the cache and wants to compile everything.

Oliver Nash (Sep 04 2023 at 10:11):

Has this been observed before? The users are surviving on Gitpod for now but obviously this is not a good situation.

Scott Morrison (Sep 04 2023 at 10:48):

It might be helpful to know the error messages when running lake exe cache get with the antivirus enabled.

Scott Morrison (Sep 04 2023 at 10:49):

Similarly, showing us the actual output from lake exe cache get and lake build might be helpful?

Eric Wieser (Sep 04 2023 at 12:22):

Reagrding 1; I think the error code was 35; though I don't have the message to hand. Essentially, curl is complaining that someone MiTM'd its connection, which is correct; the antivirus did

Sebastian Ullrich (Sep 04 2023 at 12:24):

There might not exist a reasonable fix to such issues. I assume we're not talking about Microsoft's AV?

Eric Wieser (Sep 04 2023 at 12:26):

No, the two that caused trouble were AVG and Kaspersky

Eric Wieser (Sep 04 2023 at 12:26):

There was another user who had the Microsoft AV using up half their CPU

Malthe Sporring (Sep 04 2023 at 12:27):

The error code was "curl: (35) schannel: next InitializeSecurityContext failed: Unknown error (0x80092012) - The revocation function was unable to check revocation for the certificate"

Eric Wieser (Sep 04 2023 at 12:27):

Eric Wieser said:

There was another user who had the Microsoft AV using up half their CPU (I think I personally had this too)

We told it to ignore the operation of the lean.exe and that seemed to quieten things down, though ultimately their disk speed seemed to be throttling Lean so much to be unusable

Malthe Sporring (Sep 04 2023 at 12:28):

Although I don't seem able to recreate it. I get "No files to download", but still Visual Studio is insisting on rebuilding everything.

Eric Wieser (Sep 04 2023 at 12:28):

You can do lean exe cache get! to force it to restart from scratch

Eric Wieser (Sep 04 2023 at 12:28):

(and throw away your previous cache)

Malthe Sporring (Sep 04 2023 at 12:29):

That returns "Expected exactly one file name"

Malthe Sporring (Sep 04 2023 at 12:34):

lake not lean:)

Malthe Sporring (Sep 04 2023 at 12:40):

That replicates it. I now get a constant stream of schannel: next InitializeSecurityContext failed (...)

Mauricio Collares (Sep 04 2023 at 15:50):

For Kaspersky, you can disable the MiTM attempts (a.k.a. "encrypted connection scanning") following these instructions: https://support.kaspersky.com/KESWin/11.6.0/en-US/175124.htm. Similarly, for AVG, you can disable HTTPS scanning following the instructions at https://support.avg.com/SupportArticleView?l=en&urlName=Use-AVG-Antivirus-HTTPS-scan&supportType=home.

Yaël Dillies (Sep 11 2023 at 10:10):

I'm trying to get cache on my Windows machine for my project LeanCamCombi but lake exe cache get keeps on failing:

$ lake exe cache get
info: cloning https://github.com/leanprover-community/mathlib4.git to .\lake-packages\mathlib
info: cloning https://github.com/leanprover/std4 to .\lake-packages\std
info: cloning https://github.com/gebner/quote4 to .\lake-packages\Qq
info: cloning https://github.com/JLimperg/aesop to .\lake-packages\aesop
info: cloning https://github.com/mhuisi/lean4-cli.git to .\lake-packages\Cli
info: cloning https://github.com/EdAyers/ProofWidgets4 to .\lake-packages\proofwidgets
info: Downloading proofwidgets/v0.0.13/windows-64.tar.gz
info: Unpacking proofwidgets/v0.0.13/windows-64.tar.gz
info: [1/9] Building Cache.IO
error: > LEAN_PATH=.\build\lib;.\lake-packages\proofwidgets\build\lib;.\lake-packages\Cli\build\lib;.\lake-packages\mathlib\build\lib;.\lake-packages\Qq\build\lib;.\lake-packages\aesop\build\lib;.\lake-packages\std\build\lib PATH c:\Users\YaÙl\.elan\toolchains\leanprover--lean4---v4.0.0\bin\lean.exe -Dpp.unicode.fun=true -DautoImplicit=false -DrelaxedAutoImplicit=false .\lake-packages\mathlib\.\.\Cache\IO.lean -R .\lake-packages\mathlib\.\. -o .\lake-packages\mathlib\build\lib\Cache\IO.olean -i .\lake-packages\mathlib\build\lib\Cache\IO.ilean -c .\lake-packages\mathlib\build\ir\Cache\IO.c
error: stdout:
.\lake-packages\mathlib\.\.\Cache\IO.lean:7:0: error: unknown package 'Init'
.\lake-packages\mathlib\.\.\Cache\IO.lean:17:11: error: expected ')', ',' or ':'
.\lake-packages\mathlib\.\.\Cache\IO.lean:24:7: error: expected '=>'
.\lake-packages\mathlib\.\.\Cache\IO.lean:29:23: error: unknown identifier 'UInt64'
.\lake-packages\mathlib\.\.\Cache\IO.lean:29:23: error: unknown constant 'sorryAx'

Yaël Dillies (Sep 11 2023 at 10:10):

It's as if lake exe cache get ran on the wrong version of Lean somehow. Does anyone have a clue?

Sky Wilshaw (Sep 11 2023 at 10:18):

It seems to be working for me on my Windows 10 machine with a fresh clone, so I can't reproduce the error :(

Lake output

Sky Wilshaw (Sep 11 2023 at 10:19):

(not sure how to put code blocks in spoiler blocks)

Yaël Dillies (Sep 11 2023 at 10:23):

Like this

Kevin Buzzard (Sep 11 2023 at 10:50):

When everything is broken for me, I delete the lake-packages directory and then do lake exe cache get!. Does that make things any better?

Kevin Buzzard (Sep 11 2023 at 10:51):

Also I guess you can do lake update in the middle (after deleting, before cache) if this is a project which depends on mathlib rather than mathlib itself.

Yaël Dillies (Sep 11 2023 at 10:52):

Nope, deleting lake-packages, lake clean, lake exe cache get!, lake update do not help.

Kevin Buzzard (Sep 11 2023 at 10:52):

Try copying lean-toolchain from mathlib to your project, and make sure you are elan v3 with something like elan self update

Kevin Buzzard (Sep 11 2023 at 10:53):

and then do everything again in a random order, at least this is what I usually do when things don't work for me.

Sky Wilshaw (Sep 11 2023 at 10:54):

It's worth noting that the project compiles on my windows machine and on CI. I copied mathlib's lean-toolchain into this project yesterday.

Kevin Buzzard (Sep 11 2023 at 10:54):

If none of these help then maybe post the current state of the error.

Yaël Dillies (Sep 11 2023 at 10:54):

Did all that. I'll try Sophie's votive candle :(

Sky Wilshaw (Sep 11 2023 at 10:57):

Does lake build work (as in, start compiling things) without lake exe cache get?

Scott Morrison (Sep 11 2023 at 13:17):

Hmm, lake exe cache get works fine for me (mac os) in this project.

Scott Morrison (Sep 11 2023 at 13:17):

Have you reinstalled elan? That's the only emergency measure I haven't seen tried above. :-)

Mario Carneiro (Sep 11 2023 at 13:22):

Yaël Dillies said:

c:\Users\YaÙl\.elan\toolchains\leanprover--lean4---v4.0.0\bin\lean.exe

@Yaël Dillies Is your name YaÙl?

Yaël Dillies (Sep 11 2023 at 14:11):

The terminal indeed mangles my umlaut. Do you think this could be related?

Mario Carneiro (Sep 11 2023 at 14:12):

it's not clear if this is a terminal display issue or something else, but the error message suggests that your builtin lean path was not set correctly so it could be relevant

Mario Carneiro (Sep 11 2023 at 14:13):

can you build it from inside lake-packages/mathlib?

Yaël Dillies (Sep 11 2023 at 14:22):

Am currently retrying everything in C:/Users/Public. Will tell you how it goes if the airplane doesn't take off before it finishes loading.

Yaël Dillies (Sep 11 2023 at 14:25):

Mario Carneiro said:

can you build it from inside lake-packages/mathlib?

What do you mean exactly? Should I run lake build inside that folder?

Yaël Dillies (Sep 11 2023 at 14:26):

The C:/Users/Public trick didn't work because it's looking for something under the mangled folder.

Yaël Dillies (Sep 11 2023 at 14:27):

Is there any way I can just wipe everything lean-related and restart? (not dangerous, this computer is brand new)

Mario Carneiro (Sep 11 2023 at 14:29):

yes, just go into that folder and run lake exe cache

Patrick Massot (Sep 11 2023 at 14:34):

Yaël Dillies said:

The terminal indeed mangles my umlaut. Do you think this could be related?

Why do you call this a umlaut, are you a German Breton?

Yaël Dillies (Sep 11 2023 at 14:35):

Nobody knows what I mean when I say tréma :disappointed:

Yaël Dillies (Sep 11 2023 at 14:37):

Plane taking off, I guess I'll read some notes instead of working on my project.

Patrick Massot (Sep 11 2023 at 14:40):

Do American people know more about umlauts?

Patrick Massot (Sep 11 2023 at 14:41):

It's funny that on computers they ended up looking the same. On paper they aren't meant to be the same, right?

Mario Carneiro (Sep 11 2023 at 14:43):

haha the wikipedia page is actually just called https://en.wikipedia.org/wiki/Two_dots_(diacritic)

Patrick Massot (Sep 11 2023 at 14:46):

No, they have https://en.wikipedia.org/wiki/Diaeresis_(diacritic) and https://en.wikipedia.org/wiki/Umlaut_(diacritic)

Patrick Massot (Sep 11 2023 at 14:47):

Can you see I should be grading?

Mario Carneiro (Sep 11 2023 at 14:50):

those two pages are about the distinct phonemic functions, not just the typographical mark

Mario Carneiro (Sep 11 2023 at 14:51):

apparently the trema redirects to "diaeresis"

Mario Carneiro (Sep 11 2023 at 14:52):

and that also accords with my experience, in english the usage in french would be called a diaeresis

Julian Berman (Sep 11 2023 at 16:04):

I don't think I've ever heard anyone in conversational US English say "diaresis" unless it was specifically to be pedantic about the difference, IME "umlaut" is indeed a bit more common to say/hear, but "two dots" is probably even more common.

Yaël Dillies (Sep 11 2023 at 17:05):

I guess I don't say "two dots" because it means ":" in French.

Mario Carneiro (Sep 11 2023 at 18:06):

Two dots or double-dot seems reasonable as a way to describe the diacritic if you don't know the word for it, but "umlaut" is just wrong unless you are talking about the thing that changes vowel sounds in german and related languages. English itself has some diaereses, although most uses are considered a bit pedantic these days, such as in "coöperation". See also https://www.newyorker.com/culture/culture-desk/the-curse-of-the-diaeresis

Ruben Van de Velde (Sep 11 2023 at 18:48):

(This seems to have less and less to do with the name of the topic)

Damiano Testa (Sep 11 2023 at 19:28):

To steer this back on course, we could rename the topic name to läkë ëxë cächë gët and start a poll on how many diaeresis, umlauts, two-dots people see in the title.

Bolton Bailey (Sep 19 2023 at 12:37):

I am getting the following error:

boltonbailey@starlight mathlib4 % lake exe cache get
error: > git fetch origin    # in directory ./lake-packages/std
error: stderr:
fatal: bad object refs/heads/main 2
error: https://github.com/leanprover/std4 did not send all necessary objects
error: external command `git` exited with code 1

What could be causing this?

Bolton Bailey (Sep 19 2023 at 12:38):

It's only one branch, #6500

Bolton Bailey (Sep 19 2023 at 12:41):

Actually check that, it is not working on master. I ran it once on master fine, but then I synced and ran it again and it failed.

Scott Morrison (Sep 19 2023 at 12:49):

That sounds more like git having trouble talking to github than anything specific to lake or cache.

Scott Morrison (Sep 19 2023 at 12:50):

If it's github's fault, waiting should suffice. In the meantime you might try rm -rf lake-packages and see if that helps.

Bolton Bailey (Sep 19 2023 at 12:53):

rm -rf lake-packages worked. Thanks!

Bolton Bailey (Sep 19 2023 at 13:39):

Incidentally, GitHub seems to be having issues as well, so maybe that was the cause.

Dagur Asgeirsson (Oct 16 2023 at 08:15):

I get a segmentation fault when running lake exe cache get on current mathlib master. On a branch where I haven't merged master in a few days, no issue. Is someone else running into this?

Scott Morrison (Oct 16 2023 at 08:29):

It's working for me.

Scott Morrison (Oct 16 2023 at 08:31):

Can you try (in order, so we find out which one helps):

  1. lake clean && lake exe cache get
  2. rm lakefile.olean && lake exe cache get

Dagur Asgeirsson (Oct 16 2023 at 08:40):

The first one worked. Thanks!

Eric Rodriguez (Oct 16 2023 at 10:48):

image.png

I got this after pulling the latest master.

Scott Morrison (Oct 16 2023 at 11:04):

Thanks @Eric Rodriguez, that's very helpful. Let's see if @Mac Malone has an idea.

Eric Rodriguez (Oct 16 2023 at 11:09):

I will say that wiping and repulling fixed everything, so maybe it's just a transient issue after upgrading.

Bolton Bailey (Oct 19 2023 at 06:56):

Trying to toy around with the rw_search branch, but I'm getting the error:

boltonbailey@starlight mathlib4 % lake exe cache get
error: no such file or directory (error code: 2)
  file: ./x]/./Cache/Main.lean

Bolton Bailey (Oct 19 2023 at 06:57):

What is this garbage in the path? Why does lake expect to find something there?

Sebastian Ullrich (Oct 19 2023 at 06:57):

Does deleting the lakefile.olean help?

Bolton Bailey (Oct 19 2023 at 07:01):

The error changes!

boltonbailey@starlight mathlib4 % lake exe cache get
info: [0/8] Building Cache.IO
info: [1/8] Compiling Cache.IO
info: [1/8] Building Cache.Hashing
info: [2/8] Compiling Cache.Hashing
info: [2/8] Building Cache.Requests
info: [4/8] Compiling Cache.Requests
info: [4/8] Building Cache.Main
info: [5/8] Compiling Cache.Main
info: [8/8] Linking cache
info: stderr:
ld64.lld: warning: /usr/lib/system/libsystem_kernel.dylib has version 13.3.0, which is newer than target minimum of 13.0.0
ld64.lld: warning: /usr/lib/system/libsystem_platform.dylib has version 13.3.0, which is newer than target minimum of 13.0.0
ld64.lld: warning: /usr/lib/system/libsystem_pthread.dylib has version 13.3.0, which is newer than target minimum of 13.0.0
zsh: segmentation fault  lake exe cache get

Scott Morrison (Oct 19 2023 at 07:02):

lake clean?

Bolton Bailey (Oct 19 2023 at 07:02):

boltonbailey@starlight mathlib4 % lake clean
error: invalid argument (error code: 22, invalid argument)
  file: ./lake-packages/std/.

Scott Morrison (Oct 19 2023 at 07:02):

Oh dear!

Scott Morrison (Oct 19 2023 at 07:03):

rm -rf lake-packages && lake clean?

Bolton Bailey (Oct 19 2023 at 07:03):

Success!

Bolton Bailey (Oct 19 2023 at 07:04):

(lake exe cache get now works too) Thanks!

Jireh Loreaux (Oct 20 2023 at 00:54):

Hmmmm.... I followed the steps in this thread and something deleted the contents of my mathlib4 repo! I'm okay with it, but I'm pretty religious about committing, and I even push local changes to my own private Gitea server when I'm not ready to put something out for public consumption. So I didn't lose anything important, but I can imagine other people being really upset about this.

Note: my first guess was that I was just an idiot and did something stupid like rm -rf . instead of what was written, but as you can see from my bash history below, that was not the case.

[jireh@boston mathlib4]$ lake exe cache get
error: no such file or directory (error code: 2)
  file: ./x�]/./Cache/Main.lean
[jireh@boston mathlib4]$ lake exe cache get!
error: no such file or directory (error code: 2)
  file: ./x�]/./Cache/Main.lean
[jireh@boston mathlib4]$ lake clean
error: invalid argument (error code: 22, invalid argument)
  file: ./lake-packages/std/.
[jireh@boston mathlib4]$ lake clean
error: invalid argument (error code: 22, invalid argument)
  file: ./lake-packages/Qq/.
[jireh@boston mathlib4]$ rm -rf lake-packages && lake clean
info: cloning https://github.com/leanprover/std4 to ./lake-packages/std
info: cloning https://github.com/gebner/quote4 to ./lake-packages/Qq
info: cloning https://github.com/JLimperg/aesop to ./lake-packages/aesop
info: cloning https://github.com/mhuisi/lean4-cli.git to ./lake-packages/Cli
info: cloning https://github.com/EdAyers/ProofWidgets4 to ./lake-packages/proofwidgets
error: invalid argument (error code: 22, invalid argument)
  file: ./.
[jireh@boston mathlib4]$ lake clean
error: no such file or directory (error code: 2)
  file: ./lakefile.lean
[jireh@boston mathlib4]$ ls
[jireh@boston mathlib4]$ ls -a
.  ..

If it matters, I think I may have just switch to a branch prior to the lakefile.olean (which I didn't attempt to delete before this happened), so it was probably present.

Jireh Loreaux (Oct 20 2023 at 00:55):

@Scott Morrison :up:

Scott Morrison (Oct 20 2023 at 00:58):

That is pretty scary.

Jireh Loreaux (Oct 20 2023 at 00:58):

I'll see if I can reproduce.

Scott Morrison (Oct 20 2023 at 00:58):

I think it can only have been the lake clean step?

Jireh Loreaux (Oct 20 2023 at 00:59):

I think so.

Jireh Loreaux (Oct 20 2023 at 01:01):

can't seem to repro, sorry.

Scott Morrison (Oct 20 2023 at 01:07):

@Mac Malone, can you think how lake clean could possibly have done this?

Mario Carneiro (Oct 20 2023 at 05:14):

I think there was something in the recent update that is leading to memory corruption errors in lake during the migration, likely due to out of date lakefile.olean built for the wrong version of lean

Mario Carneiro (Oct 20 2023 at 05:20):

I think you need the -R argument to run lake clean without accidentally using the bad lakefile.olean

Mario Carneiro (Oct 20 2023 at 05:20):

(IMO lake clean should default to using -R)

Mac Malone (Oct 20 2023 at 05:22):

Lake does currently perform a check that the toolchain and olean match (it does this by checking that the modification time for the olean is newer than the toolchain). However, if on goes backwards in time to an older Lean with a new olean, this could corrupt the configuration (and would still be a problem with more bullet-proof changes to Lean/Lake's loading of oleans because an older Lean version would not have said changes).

Mario Carneiro (Oct 20 2023 at 05:39):

This just happened to me (moved to a branch on the new version, then moved to a branch on the old version, and vscode called something that produced things in bin and lib folders in the current directory). Deleted lakefile.olean to hopefully fix the issue, but I'm guessing the same issue will recur if I go back and forth again. This is a pretty serious issue, can we back out the affected lean version?

Scott Morrison (Oct 20 2023 at 06:20):

I've seen the mysterious appearance of bin and lib twice as well. :-(

Mario Carneiro (Oct 20 2023 at 06:26):

unfortunately I can't think of any way to make a change to the lakefile.olean format which will be handled correctly by the old version of lake, other than renaming lakefile.olean to something else for long enough for people to move on

Mario Carneiro (Oct 20 2023 at 06:27):

in hindsight we really need lake to include the lean toolchain version in the lakefile.olean or alongside it

Mario Carneiro (Oct 20 2023 at 06:29):

(Incidentally, this isn't a problem that would have happened if the lakefile.olean was in a more forward compatible format like, say, JSON :grinning_face_with_smiling_eyes: )

Mario Carneiro (Oct 20 2023 at 06:31):

non-self-describing binary formats are super hazardous wrt an unclear versioning system

Mac Malone (Oct 20 2023 at 07:16):

Actually, investigating this some more, I am now confused. Changing branches should update the modification time of lean-toolchain when the Lean version changes and there was no stable or release candidate of Lean missing the toolchain file modification time check. So Lake should always reconfigure if the toolchain changes. The check in question is here in case anyone sees something I overlooked.

Mario Carneiro (Oct 20 2023 at 07:23):

have you replicated the issue @Mac Malone ?

Mac Malone (Oct 20 2023 at 07:24):

@Mario Carneiro Not yet, no. I plan to try my hand at writing some tests in an attempt to do so tomorrow, but we will see if I succeed.

Mauricio Collares (Oct 20 2023 at 07:26):

Does this bug "respect" the repository boundary? Or can it rm -rf .. if you're unlucky with the memory corruption?

Mario Carneiro (Oct 20 2023 at 07:27):

I don't think we will know the answer to that until we know exactly what the change is and how it manifests

Mario Carneiro (Oct 20 2023 at 07:28):

it does seem very dangerous though, I would rather we just yank the new version until we figure this out

Mario Carneiro (Oct 20 2023 at 07:32):

I just tried a couple variations on going back and forth between old versions and new and I wasn't able to reproduce the bug

Mac Malone (Oct 20 2023 at 07:46):

Mario Carneiro said:

it does seem very dangerous though, I would rather we just yank the new version until we figure this out

Which new version? As far as I can tell, it is not clear what version of Lean (or what combination of versions) is even the source of this bug.

Mario Carneiro (Oct 20 2023 at 07:49):

My impression is that the issues are correlated with peoples' first experiences with rc3

Mario Carneiro (Oct 20 2023 at 07:50):

Have there been any changes to the olean version in rc3?

Mac Malone (Oct 20 2023 at 07:52):

There are two different problems here:

  1. The olean mismatch not being caught
  2. The specific Lake ABI change that leads to this lake clean behavior.

As long as (1) is not resolved, any Lake ABI change could convincibly cause a problem equal to or worse than this depending on how things get shuffled around from the version switch.

Mario Carneiro (Oct 20 2023 at 07:53):

just checked, the lakefile.oleans generated for std4 before and after rc3 are identical

Mario Carneiro (Oct 20 2023 at 07:54):

are there any other ABI hazards you can think of here?

Mauricio Collares (Oct 20 2023 at 07:54):

rc2 was very short-lived, probably should check between rc1 and rc3

Mario Carneiro (Oct 20 2023 at 07:54):

in particular, things that would impact the calculation of lean paths

Mac Malone (Oct 20 2023 at 07:55):

Yes, this specific ABI difference I am rather certain is due to lean4#2603 which occurred between rc1 and rc2/rc3.

Mario Carneiro (Oct 20 2023 at 07:55):

indeed there was a binary difference between rc1 and rc2

Mac Malone (Oct 20 2023 at 07:55):

Since that added a PackageConfig field and shifted everything below it down one.

Mac Malone (Oct 20 2023 at 07:57):

Since srcDir and buildDir are adjacent this shift puts the srcDir in the buildDir spot and thus causes the lake clean behavior when it is (for some unknown reason) read by some earlier Lean version without reconfiguring.

Mac Malone (Oct 20 2023 at 07:59):

One hot fix would be to just move postUpdate? to the bottom of the config, which should keep the PackageConfig ABI above the same. However, I am not sure whether this would instead cause a segfault or something worse.

Mario Carneiro (Oct 20 2023 at 08:00):

in this case, a segfault is better than deleting your directory

Mario Carneiro (Oct 20 2023 at 08:01):

I wonder whether you can indirectly embed version numbers in the olean by making package add a declaration to the environment with the version number

Mac Malone (Oct 20 2023 at 08:02):

I am not sure if a segfault would be the result. It could just as easily bump other things around to produce a worse result. (I think?)

Mauricio Collares (Oct 20 2023 at 08:03):

I don't understand oleans, but does the new field affect just its containing structure? Or does it "shift everything down" in the olean potentially corrupting reading later-occurring structures in older lean versions?

Mario Carneiro (Oct 20 2023 at 08:03):

usually if you access something off the end of a VM object you get a segfault, although for packed olean objects you are probably more likely to run into the next object

Mario Carneiro (Oct 20 2023 at 08:04):

the first field of the next object is the rc count which is always 0, so that's going to cause a crash if you are expecting a composite lean object (it's the null pointer)

Mario Carneiro (Oct 20 2023 at 08:04):

if the new field is a UInt8 or something then you might just read 0

Mac Malone (Oct 20 2023 at 08:06):

The version number definition sounds like a good solution.

Mario Carneiro (Oct 20 2023 at 08:07):

remember, keep it simple, you want to depend on the layout of as few lean objects as possible to minimize the possibility of corruption of the version decl

Mauricio Collares (Oct 20 2023 at 08:08):

So, iiuc, the chicken-and-egg problem here is that Lake typically checks if the oleans match the toolchain on a lake build (by doing traces or whatever), but older versions of Lean pick up and use lakefile.olean and there are no such checks. In Lean 3, oleans embedded Lean.versionString, is this no longer the case in Lean 4?

Mario Carneiro (Oct 20 2023 at 08:08):

the old versions of lake have a check, but it's not as precise as it should be

Mac Malone (Oct 20 2023 at 08:08):

Mario Carneiro said:

remember, keep it simple, you want to depend on the layout of as few lean objects as possible to minimize the possibility of corruption of the version decl

What do you mean by this?

Mario Carneiro (Oct 20 2023 at 08:08):

it seems to work in basic tests though

Mauricio Collares (Oct 20 2023 at 08:09):

Might be good to add some sanity checks to Lean itself (to the place which actually mmaps the olean) if it still makes sense in the current architecture

Mac Malone (Oct 20 2023 at 08:09):

Mario Carneiro said:

the old versions of lake have a check, but it's not as precise as it should be

More specifically, it is not clear what the imprecision even is.

Mario Carneiro (Oct 20 2023 at 08:09):

@Mac Malone You have to traverse several lean objects in order to get to the version decl's value. Reading an olean with the wrong ABI is pretty fraught and every read could cause something to blow up

Mario Carneiro (Oct 20 2023 at 08:10):

But thankfully the ABI of Expr and friends is fairly stable (although, I have a PR in progress...)

Mac Malone (Oct 20 2023 at 08:12):

Yeah, this solution is not bulletproof. Ideally, the Lean version would at least be in the olean to check (even if only optionally verified).

Mario Carneiro (Oct 20 2023 at 08:13):

I was thinking about that, but I think we don't necessarily want it always, since it changes the hashes

Mario Carneiro (Oct 20 2023 at 08:13):

in this case it would be sufficient if you could just make some room in lakefile.olean to store the version inline

Mauricio Collares (Oct 20 2023 at 08:14):

Mario Carneiro said:

I was thinking about that, but I think we don't necessarily want it always, since it changes the hashes

What's the concern? Lean.versionString is already hashed in lake exe cache get, and I'd certainly expect a full rebuild when changing toolchains.

Mac Malone (Oct 20 2023 at 08:14):

Right now, I think it is relatively unlikely for oleans to generally be compatible across versions. Though, I guess it could be a toggle in both directions (on save and read).

Mario Carneiro (Oct 20 2023 at 08:15):

It's also not bulletproof because the ABI of an olean depends on all of its dependencies, not just the lean toolchain version

Mario Carneiro (Oct 20 2023 at 08:15):

i.e. you can add a field to a struct without changing the lean version

Mac Malone (Oct 20 2023 at 08:15):

In essence, when the time comes that oleans are modular, I can see the value in not checking, but I think that would already be a big enough breaking change that changing the version embedding would not be a major concern.

Mario Carneiro (Oct 20 2023 at 08:15):

this is less of a problem for lakefile.olean because all of its dependencies are in the toolchain

Mac Malone (Oct 20 2023 at 08:16):

Mario Carneiro said:

It's also not bulletproof because the ABI of an olean depends on all of its dependencies, not just the lean toolchain version

Oh, I did not realize it was that frail.

Mario Carneiro (Oct 20 2023 at 08:17):

I mean, if you say a struct has 4 fields then that's how the ABI goes

Mac Malone (Oct 20 2023 at 08:18):

Yeah, It just never fully processed that that would be an implication of storing arbitrary data in the extensions.

Mario Carneiro (Oct 20 2023 at 08:18):

do you mean lake data or user data?

Mac Malone (Oct 20 2023 at 08:19):

Any arbitrary Lean data. Upon reflection, though, it is quite obvious.

Mario Carneiro (Oct 20 2023 at 08:19):

If you store stuff in Exprs then it's a bit more stable

Mario Carneiro (Oct 20 2023 at 08:19):

90% of (normal) .olean files is composed of Exprs

Mac Malone (Oct 20 2023 at 08:20):

Apparently, not much more because evalConst is presumably the culprit here. (Apparently it does not check that the constructor is sized as expected).

Mario Carneiro (Oct 20 2023 at 08:21):

it can't, really

Mario Carneiro (Oct 20 2023 at 08:22):

I actually have a tool for generating "olean schemas" for lean types, but I'm not exactly sure how to mix it into the processes here

Mac Malone (Oct 20 2023 at 08:22):

Oh, yeah, because evalConst just has the erased type not the actual type definition to check against.

Mario Carneiro (Oct 20 2023 at 08:27):

Mario Carneiro said:

I actually have a tool for generating "olean schemas" for lean types, but I'm not exactly sure how to mix it into the processes here

Here's a sketch: we run this on every commit of lake to produce a schema for what can appear in an olean file, and commit it to the repo. A CI job recalculates the schema and updates this file / errors if it is not updated, and any update of the file comes with a requirement to bump the "olean version", which is stored in the file somewhere easily (and stably) accessible and used for validation

Mario Carneiro (Oct 20 2023 at 08:28):

The reason for the indirection here is because recalculating the schema at runtime would be too expensive, it's basically a compile time known value but it's also a complex object and a bit big to put in the olean file itself

Mac Malone (Oct 20 2023 at 08:30):

Mario Carneiro said:

bump the "olean version", which is stored in the file somewhere easily (and stably) accessible and used for validation

Would this "olean version" be a parameter of saveModuleData and return readModuleData because those are currently C black boxes making the entire olean format fixed and opaque to Lean.

Mario Carneiro (Oct 20 2023 at 08:30):

yes

Mario Carneiro (Oct 20 2023 at 08:30):

this would be part of the olean file header

Mario Carneiro (Oct 20 2023 at 08:31):

(PS: if the olean file format changes I need to be made aware because leantar, used by lake exe cache will have to be updated)

Mac Malone (Oct 20 2023 at 08:32):

The most general thing I could think of for this header info would just be to make it an arbitrary Expr one can add to the header. Does that seem reasonable?

Mario Carneiro (Oct 20 2023 at 08:32):

I would rather it not be a lean object, this runs into the same ABI issues

Mario Carneiro (Oct 20 2023 at 08:32):

something with a fixed layout would be better, like a string or list of strings

Mario Carneiro (Oct 20 2023 at 08:33):

or a number in this case

Mac Malone (Oct 20 2023 at 08:33):

In that case, maybe just a direct ByteArray?

Mario Carneiro (Oct 20 2023 at 08:34):

sure, but now you need a secondary parser and a spec for what goes in that array, and when that spec changes you have ABI issues all over again

Mario Carneiro (Oct 20 2023 at 08:35):

This byte array would also have to be copied out of the olean, most likely, because it's not a proper lean object

Mac Malone (Oct 20 2023 at 08:35):

True, but that same issue is true for a string or a number.

Mario Carneiro (Oct 20 2023 at 08:35):

so don't put too much in it since it doesn't get the mmap goodness

Mario Carneiro (Oct 20 2023 at 08:37):

(this copying process is also exactly what makes it more segfault-resistant than regular lean objects)

Mac Malone (Oct 20 2023 at 08:38):

A ByteArray just provides more flexibility and potential space saving for the header format whereas a number would give at most 64 bits and a string may be space inefficient and require undesirable parsing overhead.

Mario Carneiro (Oct 20 2023 at 08:38):

There is also the matter of the very large (and excitable) magic number at the beginning of every olean file, oleanfile!!!!!!!. I can't resist the idea of adding versioning as oleanfile!!!1!!!, oleanfile!!!2!!! etc

Mario Carneiro (Oct 20 2023 at 08:40):

I think 32 bits is more than enough for a version number

Mario Carneiro (Oct 20 2023 at 08:40):

if that's the only requirement

Mario Carneiro (Oct 20 2023 at 08:41):

if you want to store random extra data I would just make use of some spare padding in the file

Mac Malone (Oct 20 2023 at 08:42):

Ideally, Lake would eventually store a cryptographic hash of the olean inputs.

Mac Malone (Oct 20 2023 at 08:42):

Though that is likely very long-term.

Mario Carneiro (Oct 20 2023 at 08:43):

that could be part of the lean data structure proper though, no?

Mario Carneiro (Oct 20 2023 at 08:43):

like in the EnvHeader

Mac Malone (Oct 20 2023 at 08:44):

No? Wouldn't that still have ABI problems if the core data structures change? Inversely, it would also be helpful when the inputs hopefully do not change every new Lean version and we can reuse oleans more often.

Mario Carneiro (Oct 20 2023 at 08:44):

Currently the olean header consists only of the magic number, the offset of the beginning of the file (used for relocating all lean objects if this location needs to be modified), and the pointer to the root object

Mario Carneiro (Oct 20 2023 at 08:45):

Yes, it would have ABI problems, the version is intended to defend against ABI changes

Mario Carneiro (Oct 20 2023 at 08:46):

Adding hashes to the olean file indeed make me concerned about lots of files getting lots of spurious updates

Mario Carneiro (Oct 20 2023 at 08:48):

The point of having an olean version rather than just using the toolchain version is that sometimes the toolchain changes without ABI changes and you can capitalize on this

Mario Carneiro (Oct 20 2023 at 08:48):

in particular, rc -> stable

Patrick Massot (Oct 23 2023 at 00:57):

Now that lean-rc4 is out, how are we meant to solve the issue described [https://leanprover.zulipchat.com/#narrow/stream/287929-mathlib4/topic/lake.20exe.20cache.20get.20errors/near/397441263]? I had the file: ./x�]/./Cache/Main.lean error message. I tried removing lakefile.olean and now every lake exe cache get simply gets a segmentation fault. This is in Mathlib master.

Patrick Massot (Oct 23 2023 at 00:57):

@Mac Malone

Mac Malone (Oct 23 2023 at 01:00):

@Patrick Massot From that previous thread, the problem was that you also need to delete the lakefile.olean of all dependences as well (i.e., those in lake-packages). You could also just delete the whole lake-packages directory and start over.

Patrick Massot (Oct 23 2023 at 01:01):

Thanks, it seems to work.

Oliver Nash (Nov 30 2023 at 12:36):

lake exe cache get has stopped working for me after getting the latest master (currently at this commit, which happens to be one of my own PRs though this is not relevant).

I am running MacOS and I see just the following:

  mathlib4 git:(master) lake exe cache get
Attempting to download 3960 file(s)
Downloaded: 0 file(s) [attempted 3960/3960 = 100%] (0% success)
Warning: some files were not found in the cache.
This usually means that your local checkout of mathlib4 has diverged from upstream.
If you push your commits to a branch of the mathlib4 repository, CI will build the oleans and they will be available later.
No cache files to decompress

Oliver Nash (Nov 30 2023 at 12:37):

Anyone else having issues?

Damiano Testa (Nov 30 2023 at 12:38):

I think that I had the same this morning, but I erased .lake and then it worked.

Oliver Nash (Nov 30 2023 at 12:38):

Thanks, that worked.

Damiano Testa (Nov 30 2023 at 12:39):

(Maybe I also merged master... there has been a new release, I think.)

Oliver Nash (Nov 30 2023 at 12:39):

Needless to say, this should not be necessary.

Andrew Yang (Nov 30 2023 at 12:56):

I think lake exe cache get! would be helpful too?

Mario Carneiro (Nov 30 2023 at 21:23):

If someone can reproduce this that would be helpful

Eric Wieser (Nov 30 2023 at 21:49):

I've had this multiple times when bumping old branches to the latest lean today

Eric Wieser (Nov 30 2023 at 21:50):

I think lake clean always resolved things

Mario Carneiro (Dec 01 2023 at 00:09):

@Mac Malone @Scott Morrison and I debugged this today. Repro:

git checkout 1937b42b
rm -rf .lake
lake exe cache get
git checkout 85239de6
lake exe cache get  # fails to find anything
lake clean
lake exe cache get  # works

The problem is a combination of several factors:

  • The 4.3.0 release was accidentally a re-tag of the same commit as 4.3.0-rc2 (It's supposed to be the same content, but usually there are some release notes or something so that it's not exactly the same commit)
  • The cache executable uses Lean.versionString for the root hash, which is different between 4.3.0-rc2 and 4.3.0 because it contains the literal toolchain name in it
  • This version string is baked into the cache executable itself based on the version used to compile cache, it is not determined at runtime
  • Lake only recompiles executables when the lean git hash changes, so in the second lake exe cache get invocation above it does not recompile cache and so it uses the 4.3.0-rc2 version string in the root hash, which throws all the hashes off
  • Doing lake clean deletes the cache executable, and the second run rebuilds it with the correct Lean.versionString so it finds the files as expected

Mario Carneiro (Dec 01 2023 at 00:16):

The easy short term fix is to only use Lean.githash in cache instead of Lean.versionString, so that lake invalidates it properly. #8755


Last updated: Dec 20 2023 at 11:08 UTC