Zulip Chat Archive

Stream: lean4

Topic: git revision of dependency 'proofwidgets' changed


Kevin Buzzard (Sep 09 2023 at 14:33):

After git pull and lake exe cache get on master on mathlib today (which I probably last ran a few days ago) I get

warning: manifest out of date: git revision of dependency 'proofwidgets' changed, use `lake update` to update

and my understanding is that using lake update on mathlib is exactly what I should never do. Should I just ignore this warning? Will everyone get it or did I do something weird?

Eric Wieser (Sep 09 2023 at 14:58):

I think now the warning might actually be valid? But with the caveat that you should do it and make a standalone PR with it, and not do it as part of any other work

Julian Berman (Sep 09 2023 at 15:30):

Oh, I've definitely seen and ignored that message before for the same reason. Can you explain slightly further what that message is trying to say?

Julian Berman (Sep 09 2023 at 15:31):

It's like pulling a repo that mathlib4 depends on, sees that HEAD of that dependent repo is not the version that mathlib4 is saying it depends on, and basically you're saying when that happens that one should basically help update mathlib4 to that newer version?

Eric Wieser (Sep 09 2023 at 15:32):

I think now the warning might actually be valid?

By this I mean that there's now an "inherited" flag that is supposed to stop lake update updating too many things at once

Julian Berman (Sep 09 2023 at 15:35):

I think I'm still only partially following. First I think I misunderstood the message because I forgot the manifest is pulled -- that message means nothing of what I said right, it means "you pulled mathlib4, lake ran some command, saw that the installed version of some dependency does not match the version in the newly pulled manifest, and is trying to tell you to run a command to update the installed version", yeah?

Julian Berman (Sep 09 2023 at 15:36):

And ok now you're saying basically that someone should indeed run lake update in that case, to update the installed deps, but run with that minimal update flag so they don't screw everything else up

Julian Berman (Sep 09 2023 at 15:36):

Is that right? (If so maybe I do follow now)

Patrick Massot (Sep 09 2023 at 15:39):

In that case the cause is clear: #7044 was not good enough, and this wasn't caught by CI for some reason (maybe because it's only a warning). And once again the root cause is duplicated information in the lake world: when bumping the ProofWidgets dependency you need to modify both the lakefile and the manifest. @Mac Malone

Patrick Massot (Sep 09 2023 at 15:41):

And, as usual, lake gives you the wrong advice to fix the issue. Here we need to update only ProofWidgets and following the lake update suggestion will mess up everything. We all need to configure our terminal/shell so that it automatically remove the string "use lake update to update" from any output.

Alex J. Best (Sep 09 2023 at 15:42):

Why don't we instead track ProofWidgets@master in the mathlib lakefile then, and make bump PRs in the way we do for other packages like std?

Alex J. Best (Sep 09 2023 at 15:42):

The downside is that its less obvious which version we are on ofc, but for other packages we dont seem to care

Patrick Massot (Sep 09 2023 at 15:43):

Alex, this isn't possible because we want to use compiled versions of ProofWidgets so that mathlib doesn't depend on a npm tooolchain.

Patrick Massot (Sep 09 2023 at 15:44):

#7056

Kevin Buzzard (Sep 09 2023 at 16:44):

The build has failed, with error

...
[3629/3631] Building Mathlib.AlgebraicGeometry.Morphisms.FiniteType
Error: The process '/usr/bin/bash' failed with exit code 1
Remove matcher: gcc

I re-ran it in case it was a transient glitch.

Eric Wieser (Sep 09 2023 at 16:56):

The real error is a few thousand lines higher:

[4/3566] Building Std.Lean.Parser
error: no such file or directory (error code: 2)
  file: ./lake-packages/proofwidgets/././ProofWidgets/Component/SelectionPanel.lean

Kevin Buzzard (Sep 09 2023 at 18:05):

Indeed it failed again.

Wojciech Nawrocki (Sep 09 2023 at 18:25):

  • The actual versions of the packages in use are stored in lake-manifest.json. Changing only the versions in the lakefile but not in the manifest effectively does nothing: it only influences what lake update will do.
  • Indeed ProofWidgets is packaged in such a way that you should only ever depend on a release tag. I would go as far as to say it is better practice not to depend on the development branch of any package, since if you do this then running lake update can randomly break things. That said the Lake versioning story appears to be evolving with Mac's work at the FRO, so there may be new best practices in the future.
  • There have been breaking changes between ProofWidgets4 v0.0.13 and v0.0.15 so that bump needs to do a bit of actual fixing. Just did that now.

Mac Malone (Sep 09 2023 at 19:03):

Wojciech Nawrocki said:

  • Indeed ProofWidgets is packaged in such a way that you should only ever depend on a release tag. I would go as far as to say it is better practice not to depend on the development branch of any package, since if you do this then running lake update can randomly break things. That said the Lake versioning story appears to be evolving with Mac's work at the FRO, so there may be new best practices in the future.

I would suggest having a release or v0.0.x branch that is only bumped to new tags. Lake is smart enough to intelligently find a tag for the HEAD commit of a branch. Thus, so as long as very bump comes with a corresponding tag, downstream packages can still use a branch in the lakefile (and can thus update via lake update proofwidgets) while still getting the cloud release.

Kevin Buzzard (Sep 09 2023 at 19:04):

Wojciech's commit makes the branch build for me locally. Does it need to be Bors'ed again?

Patrick Massot (Sep 09 2023 at 19:55):

It still fails CI @Wojciech Nawrocki

Wojciech Nawrocki (Sep 09 2023 at 20:19):

Ah, the cloud release was broken. Sorry about that. Restarted the build now.

Kevin Buzzard (Sep 09 2023 at 23:01):

Looks good now and is hopefully about to be merged.

Scott Morrison (Sep 09 2023 at 23:18):

Okay, I have made three new issues to track things that we could hopefully do better here:

  • lean4#2527 (configuration option so Mathlib can suppress all `manifest out of date warning except for opt-in users)
  • lean4#2525 (have lake suggest lake update X when X is out of date, rather than suggesting lake update)
  • lean4#2526 (lake command that generates this warning, but without running anything expensive (build or exe), so we can add a CI step to guard for today's problem)

Scott Morrison (Sep 10 2023 at 02:35):

@David Renshaw has just reported that the bump commits for ProofWidgets have broken the cache. @Wojciech Nawrocki, just pinging in case you have an idea.

I think I am going to revert the bump to ProofWidgets, and merge it directly, and then we can try again later?

Scott Morrison (Sep 10 2023 at 02:36):

(Also :ping_pong: @Patrick Massot in case they have ideas?)

Patrick Massot (Sep 10 2023 at 02:36):

We can't keep a broken cache, so any urgent workaround is welcome.

Patrick Massot (Sep 10 2023 at 02:37):

I have no idea what could be causing this. I just want my shiny updated ProofWidgets.

Scott Morrison (Sep 10 2023 at 02:40):

Okay, reverted on master.

Scott Morrison (Sep 10 2023 at 02:43):

#7070 is now the new PR for attempting to bump ProofWidgets. We should carefully check that the cache works on that branch before merging it!

Patrick Massot (Sep 10 2023 at 02:44):

We should ping @Mario Carneiro as well.

Patrick Massot (Sep 10 2023 at 02:44):

Are you sure we will get some cache for your revert commit?

Patrick Massot (Sep 10 2023 at 02:45):

Nevermind, I just saw your message in the other stream.

Wojciech Nawrocki (Sep 10 2023 at 02:47):

Oh no! I have no idea why this bump would break the cache. It doesn't do anything fundamentally differently than previous bumps.

Arthur Paulino (Sep 10 2023 at 02:51):

Can you link the ProofWidgets PR that did the bump please?

Scott Morrison (Sep 10 2023 at 03:13):

I just made a PR that checks the manifest is up to date (thereby preventing future cases of #7044 needing the followup #7056).

#7071

Patrick Massot (Sep 10 2023 at 03:14):

Arthur, I'm sorry I don't understand your question.

Scott Morrison (Sep 10 2023 at 03:15):

@Arthur Paulino, there's not a single ProofWidgets PR that is relevant here. Here are the changes between 0.0.13 and 0.0.15 on ProofWidgets: https://github.com/EdAyers/ProofWidgets4/compare/v0.0.13...v0.0.15

Scott Morrison (Sep 10 2023 at 03:16):

It seems that #7070, which bumps Mathlib's dependency on ProofWidgets from 0.0.13 to 0.0.15 breaks the cache.

Scott Morrison (Sep 10 2023 at 03:17):

You can test this out now:

gh pr checkout 7070
lake exe cache get   # reports downloading everything successfully
lake build # starts from scratch :-(

Arthur Paulino (Sep 10 2023 at 03:28):

Thanks. This diff is pretty big. I skimmed through it pretty quickly and the thing that caught my attention was the fact that it introduces a dependency on Std, which is also required by Mathlib

Arthur Paulino (Sep 10 2023 at 03:41):

I'm not on my PC right now. Does it build everything or just Std?

My hypothesis is that the built files from Std, when required by ProofWidgets, end up with different traces from the ones required by Mathlib. We'd want those traces to overlap completely so the files downloaded for Mathlib can be used to compile ProofWidgets.

There's also this: ProofWidgets might depend on files that Mathlib doesn't, which would cause their respective cache to be missing.

Mac Malone (Sep 10 2023 at 08:19):

Arthur Paulino said:

My hypothesis is that the built files from Std, when required by ProofWidgets, end up with different traces from the ones required by Mathlib. We'd want those traces to overlap completely so the files downloaded for Mathlib can be used to compile ProofWidgets.

This is certainly a key concern. They must require exactly the same version of Std or one of them will be rebuilt. Rebuilding either is infeasible as both need to use a cloud build (ProofWidgets to avoid NPM and Mathlib to avoid long build times). However, it does seem like that both currently require the same version of Std (e8c27f7), so that is good at least.

Wojciech Nawrocki (Sep 10 2023 at 16:06):

Hm, the cloud build archive for ProofWidgets4 also does not contain any dependency oleans (for std4), so I'm not sure how downloading it could break the cache.

The cloud releases for PW4 also does the following hack: I build the library on a single platform (currently Mac), and then upload that same release archive as the archive for all platforms ({linux-64/windows-64/macOS-64}.tar.gz). This is because the part that matters, building JS, is platform-independent. It's no problem if the platform-dependent part - the oleans - have to be rebuilt by consumers. I am conjecturing that this is not relevant to this issue because I am observing a full rebuild on 7070 even though I am on the same platform the release archive was actually built on.

Wojciech Nawrocki (Sep 10 2023 at 16:27):

Okay, here is building on the pre-bump commit:

> git checkout bump_proofwidgets_15
Previous HEAD position was f93f41c18 chore: revert ProofWidgets bump in #7044 and #7056 (#7069)
Switched to branch 'bump_proofwidgets_15'
Your branch is up to date with 'origin/bump_proofwidgets_15'.
> git checkout HEAD~1
Note: switching to 'HEAD~1'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at f93f41c18 chore: revert ProofWidgets bump in #7044 and #7056 (#7069)
> lake exe cache get
info: cloning https://github.com/leanprover/std4 to ./lake-packages/std
info: cloning https://github.com/gebner/quote4 to ./lake-packages/Qq
info: cloning https://github.com/JLimperg/aesop to ./lake-packages/aesop
info: cloning https://github.com/mhuisi/lean4-cli.git to ./lake-packages/Cli
info: cloning https://github.com/EdAyers/ProofWidgets4 to ./lake-packages/proofwidgets
info: Downloading proofwidgets/v0.0.13/macOS-64.tar.gz
info: Unpacking proofwidgets/v0.0.13/macOS-64.tar.gz
info: [1/9] Building Cache.IO
info: [2/9] Compiling Cache.IO
info: [2/9] Building Cache.Hashing
info: [3/9] Compiling Cache.Hashing
info: [3/9] Building Cache.Requests
info: [6/9] Compiling Cache.Requests
info: [6/9] Building Cache.Main
info: [7/9] Compiling Cache.Main
info: [9/9] Linking cache
info: stderr:
ld64.lld: warning: /usr/lib/system/libsystem_kernel.dylib has version 13.5.0, which is newer than target minimum of 13.0.0
ld64.lld: warning: /usr/lib/system/libsystem_platform.dylib has version 13.5.0, which is newer than target minimum of 13.0.0
ld64.lld: warning: /usr/lib/system/libsystem_pthread.dylib has version 13.5.0, which is newer than target minimum of 13.0.0
No files to download
Decompressing 3729 file(s)
unpacked in 1981 ms
> cat build/lib/Mathlib/Tactic/PPWithUniv.trace
3493117283209974879%
> lake build Mathlib.Tactic.PPWithUniv
> cat build/lib/Mathlib/Tactic/PPWithUniv.trace
3493117283209974879%

Wojciech Nawrocki (Sep 10 2023 at 16:27):

And here is building after:

> git checkout bump_proofwidgets_15
Already on 'bump_proofwidgets_15'
Your branch is up to date with 'origin/bump_proofwidgets_15'.
> lake exe cache get
info: cloning https://github.com/leanprover/std4 to ./lake-packages/std
info: cloning https://github.com/gebner/quote4 to ./lake-packages/Qq
info: cloning https://github.com/JLimperg/aesop to ./lake-packages/aesop
info: cloning https://github.com/mhuisi/lean4-cli.git to ./lake-packages/Cli
info: cloning https://github.com/EdAyers/ProofWidgets4 to ./lake-packages/proofwidgets
info: Downloading proofwidgets/v0.0.15/macOS-64.tar.gz
info: Unpacking proofwidgets/v0.0.15/macOS-64.tar.gz
info: [1/9] Building Cache.IO
info: [2/9] Compiling Cache.IO
info: [2/9] Building Cache.Hashing
info: [3/9] Compiling Cache.Hashing
info: [3/9] Building Cache.Requests
info: [6/9] Compiling Cache.Requests
info: [6/9] Building Cache.Main
info: [7/9] Compiling Cache.Main
info: [9/9] Linking cache
info: stderr:
ld64.lld: warning: /usr/lib/system/libsystem_kernel.dylib has version 13.5.0, which is newer than target minimum of 11.0.0
ld64.lld: warning: /usr/lib/system/libsystem_platform.dylib has version 13.5.0, which is newer than target minimum of 11.0.0
ld64.lld: warning: /usr/lib/system/libsystem_pthread.dylib has version 13.5.0, which is newer than target minimum of 11.0.0
No files to download
Decompressing 3729 file(s)
unpacked in 2003 ms
> cat build/lib/Mathlib/Tactic/PPWithUniv.trace
11896431265506069789%
> lake build Mathlib.Tactic.PPWithUniv
[2/3] Building Mathlib.Tactic.PPWithUniv
> cat build/lib/Mathlib/Tactic/PPWithUniv.trace
3879143491258300896%

Wojciech Nawrocki (Sep 10 2023 at 16:28):

Observe that after the bump, Tactic.PPWithUniv has an incorrect trace fetched by the mathlib cache.

Scott Morrison (Sep 11 2023 at 02:17):

@Mac Malone, might you be able to help us diagnose this?

Mac Malone (Sep 11 2023 at 02:21):

@Scott Morrison, from @Wojciech Nawrocki's report, it seems like a cache error, not a lake one. lake build produces a consistent trace before and after the bump. The one stored in the cache is simply not correct.

Wojciech Nawrocki (Sep 11 2023 at 02:23):

Perhaps @Mario Carneiro?

Scott Morrison (Sep 11 2023 at 02:56):

@Mac Malone, how do we know that? Isn't it possible that what we're seeing here is that lake on CI is producing one trace, while lake locally is producing another trace? If then cache were faithfully transporting that .trace file, wouldn't we see what Wojciech observed above?

How can we distinguish that situation from the alternative, in which cache is obtaining that .trace file from some bad source?

Mac Malone (Sep 11 2023 at 03:09):

Scott Morrison said:

Isn't it possible that what we're seeing here is that lake on CI is producing one trace, while lake locally is producing another trace?

Why would the CI and local builds differ? The only thing I can I think of that would cause that would be some platform-dependent hashing. While the ProofWidgets cloud releases are platform-dependent, they have always been that way, so I am not sure why that would have changed anything (especially as the #7070 did not import ProofWidgets in any new files). @Wojciech Nawrocki also seemed to suggest that he had not changed anything in that regard either.

However, to verify whether the problem is platform-dependency is the problem, we could try building #7070 on both Linux 64bit what the CI runs and see if they get the same results as @Wojciech Nawrocki (who is on a Mac) or the CI.

Scott Morrison (Sep 11 2023 at 03:37):

I mean, I think it's a mystery either way:

  • why would the CI and local builds differ? (if the wrong .trace file is being produced, and cache is transporting is faithfully)
  • where would a bad .trace file have come from (if the right .trace file is being produced in CI, but cache is delivering an incorrect one).

The point is that we need to help Patrick and Wojciech distinguish between these two cases!

Scott Morrison (Sep 11 2023 at 03:38):

Perhaps we should just modify CI on #7070 to display the relevant traces. That seems more reliable than trying to simulate a CI run on different hardware.

Mac Malone (Sep 11 2023 at 05:01):

@Scott Morrison I just had :light_bulb: and think I know what the problem is. Std is being built as part of ProofWidgets and its build files are shipped in the ProofWidgets cloud release. Since they are part of ProofWidgets, they do not depend on ProofWidgets cloud release trace. Mathlib is then built with a Std which does depend on ProofWidgets cloud release and that is saved to the cache. A user then runs cache get, which unpacks ProofWidgets Std build files but does not clobber those files with the cache'sStd files (because it is not run with as get!). These Std files are thus the wrong ones leads to them being rebuilt along with their downstream dependents (i.e., most all of mathlib).

Scott Morrison (Sep 11 2023 at 05:03):

Interesting!

Scott Morrison (Sep 11 2023 at 05:06):

It seems that when Std is built on those two occasions, it would be better if the .trace files were the same either way. It is still Std, regardless of who is building it!

Mac Malone (Sep 11 2023 at 05:07):

At least, I think that is what is happening, there are some parts of Wojciech's example that I am still not sure how they fit into this picture. One way to test this theory is to see if lake exe cache get! helps any.

Mac Malone (Sep 11 2023 at 05:11):

However, since the mathlib build also uses the cache, that may already have left it in an inconsistent state?

Mac Malone (Sep 11 2023 at 05:17):

Upon further consideration, I think my assessment that Std being part of the cloud release and not being clobbered by cache is the problem is likely the right root cause. However, I am not sure if my explanation as to why that is a problem is correct.

Arthur Paulino (Sep 11 2023 at 11:46):

That was my hypothesis in more detail, but you said that they're using the same git revision of Std. My understanding is if that were the case, lake build would build the missing files from Std, only. It shouldn't end up with incompatible traces for the files that were built

Scott Morrison (Sep 11 2023 at 13:53):

Okay, this is not a fix, but at least some assistance from CI detecting this failure mode in future.

  • #7099 adds an extra CI step that runs lake exe cache get again after the build, and then runs lake build and errors if it builds anything. (The bash incantation to achieve this is probably fragile.)

  • I've then made #7070 (the 2nd attempt at bumping ProofWidgets) depend on #7099, and it correctly causes CI to fail at the "check cache" step.

Wojciech Nawrocki (Sep 11 2023 at 15:03):

Std is being built as part of ProofWidgets and its build files are shipped in the ProofWidgets cloud release.

There are no build files for Std in the .tar.gz that gets downloaded for the PW cloud release. Is there another .tar.gz being downloaded?

Wojciech Nawrocki (Sep 11 2023 at 15:09):

In particular, if I start from a clean folder (no build/ or lake-packages/) on #7070, and run lake build ProofWidgets, no oleans for std are present or have been downloaded anywhere.

Wojciech Nawrocki (Sep 11 2023 at 15:11):

Mathlib is then built with a Std which does depend on ProofWidgets cloud release

I also don't understand this statement. How can Std, a dependency of both Mathlib and PW4, depend on the PW4 cloud release? That sounds circuilar.

Wojciech Nawrocki (Sep 11 2023 at 15:13):

One way to test this theory is to see if lake exe cache get! helps any.

It still rebuilds from scratch on my machine; combined with the fact that no Std files seem to be downloaded before cache get! runs, I am guessing something else must be happening.

Mac Malone (Sep 11 2023 at 16:45):

@Wojciech Nawrocki You are correct. I am completely wrong. Lake does not hoist build files, so the Std files would not be in the root package's build directory and thus not part of the cloud release.

Mac Malone (Sep 11 2023 at 17:21):

An obvious potential problem is simply that ProofWidget cloud releases are currently system dependent, but I am not sure why that would be a new issue now since that has long been the case and this bump is not accompanied by a bump in Lean 4 / Lake to change anything about trace computation.

Mac Malone (Sep 11 2023 at 17:21):

Furthermore, since rebuild is also happening on the CI, platform-dependency does not appear to be a potential issue.

Mac Malone (Sep 11 2023 at 17:22):

It just seems like cache is somehow incorrectly storing the wrong traces?

Mario Carneiro (Sep 11 2023 at 18:18):

Note that cache does not compute trace files ever, lake is the one responsible for generating those files

Mario Carneiro (Sep 11 2023 at 18:18):

the new lean-cache (not deployed) does calculate trace files, but only for verification purposes, it doesn't write trace files

Mario Carneiro (Sep 11 2023 at 18:19):

So if there are bad numbers in the cache files on the server then that means lake did something different on CI than on a local machine

Mac Malone (Sep 11 2023 at 20:32):

@Mario Carneiro The problem here is that two subsequent runs of lake build on the same CI machine (before and after cache get) are performing rebuilds of the same file. That seems to indicate cache is corrupting the file and/or trace.

Mac Malone (Sep 11 2023 at 20:35):

Not only that, I believe from @Wojciech Nawrocki's report two subsequent lake build's do not rebuild the file itself without the intervening cache, indicating that lake is not responsible. To verify, @Scott Morrison could potentially add a double lake build sanity check (no caching intervening) to make sure that lake is not doing so.

Mario Carneiro (Sep 11 2023 at 20:38):

I have seen this variety of issue several times (I have asked for help on them here sometimes too) and it's always either some outdated state being used or system dependent hash calculation

Mario Carneiro (Sep 11 2023 at 20:39):

usually you will have the situation that lake build puts A in the file and lake exe cache get puts B in the file, you can run them in any order and they will both reliably do this

Mario Carneiro (Sep 11 2023 at 20:40):

and the reason is because when lake build was run in CI it put B in the file

Mac Malone (Sep 11 2023 at 20:54):

@Mario Carneiro True, but what seems to make this case confusing is that this problem is appearing on the CI machine itself. It runs lake build which presumably generates A, uploads it to cache (via cache commit) and then calls cache get which seems to be putting B in the file, causing lake build to run again to generate A.

Mario Carneiro (Sep 11 2023 at 20:56):

if it uploaded A to cache then A is what would be downloaded. The only kind of download failure I have seen here is truncated files, and the chances of this happening for a trace file is very small and quite obvious

Mac Malone (Sep 11 2023 at 20:56):

Looking at the CI, the cache commit line does appear to be produce an interesting "Please commit your changes first" line which is weird. The job is apparently running lake exe cache commit || true ignoring errors. On first glance, that seems like bad style instead of simply using the continue-on-error option on the GitHub action step.

Mario Carneiro (Sep 11 2023 at 21:00):

the "Please commit your changes first" message seems to be normal, regular CI runs have it too

Mario Carneiro (Sep 11 2023 at 21:01):

actually it seems like that might be a bug and all runs are borked

Mario Carneiro (Sep 11 2023 at 21:02):

because if it outputs that then lake exe cache commit doesn't do anything

Mac Malone (Sep 11 2023 at 21:02):

Yeah, that might be the problem....

Mario Carneiro (Sep 11 2023 at 21:02):

this is the job that adds the CACHE/c/COMMIT file to the cache, which is currently unused

Mario Carneiro (Sep 11 2023 at 21:03):

but apparently it is both unused and broken

Mac Malone (Sep 11 2023 at 21:03):

Oh :rofl:

Mario Carneiro (Sep 11 2023 at 21:03):

the real work is the lake exe cache put line

Mac Malone (Sep 11 2023 at 21:04):

Still, the step should probably be using continue-on-error rather than || true awaying errors.

Mac Malone (Sep 11 2023 at 21:06):

It makes it hard to tell whether either of these commands are working.

Mario Carneiro (Sep 11 2023 at 21:06):

I don't think they make much use of error codes to begin with

Mac Malone (Sep 11 2023 at 21:07):

I would hope they would at least produce a 1 if an error is encountered.

Mario Carneiro (Sep 11 2023 at 21:07):

It looks like the newly added "check the cache" step doesn't even hit the network, so it's possible that one can reproduce the run locally?

Mario Carneiro (Sep 11 2023 at 21:08):

at least, the "Please commit your changes" message is not accompanied by a nonzero error code

Mario Carneiro (Sep 12 2023 at 07:35):

Update: I have successfully replicated the issue using lean-cache (which reports more precise information regarding hash failures): on master all hashes are verified, but on #7070 you get:

$ lake-ext
$ lean-cache get
Nothing to download
Decompressing 186 file(s)
unpacked in 94 ms
warning: 29 file(s) were unpacked but the hash does not match;
run `lake build` to rebuild these files (and maybe report this)

Mario Carneiro (Sep 12 2023 at 11:35):

I think I cracked the puzzle, after lots of remote debugging (or at least most of it). Currently, the CI does lake exe cache get, compiles anything that is out of date, then uses lake exe cache put to put the new files on the server. For unknown reasons, the server acquired a file with the correct cache hash (i.e. 123.ltar file name) but the incorrect lake hash (stored in the first few bytes of the file). Because of this, lake sees the affected files as being out of date and recompiles them in CI. Then lake exe cache put runs, and because it is not set to overwrite the cache, even though we have new oleans, because the old cache file is still there we do not pack some new files to replace it. Then we upload all the cache files back to the server, preserving the out of date hash.

Kevin Buzzard (Sep 12 2023 at 11:38):

Nice! Do you have a solution?

Mario Carneiro (Sep 12 2023 at 11:39):

yes, a one-time lake exe cache pack! run in CI should fix it

Mario Carneiro (Sep 12 2023 at 11:41):

although I am considering adding some additional logic into lake exe cache pack to sniff the trace file out of the ltar to determine whether it should be overwritten even if it exists, so that it will be more self-correcting in the future

Kevin Buzzard (Sep 12 2023 at 11:41):

This is great! I'm really looking forward to those new widgets :-) I watched Patrick's demo https://youtu.be/8MFGhOWeCNE recently and my understanding is that we need shiny new proofwidgets for the calc trickery :-) (start at 30m30s to see what we have to look forward to)

Mario Carneiro (Sep 12 2023 at 11:49):

it looks like the "palate cleanser" worked

Ruben Van de Velde (Sep 12 2023 at 12:00):

Looks like this bors batch (of 17 PRs) failed with an error related to this https://github.com/leanprover-community/mathlib4/actions/runs/6158436929/job/16711208573

Ruben Van de Velde (Sep 12 2023 at 12:01):

@Johan Commelin you might want to take #7099 off the queue

Mario Carneiro (Sep 12 2023 at 12:03):

uh oh

Mario Carneiro (Sep 12 2023 at 12:03):

it might be a hash collision

Mario Carneiro (Sep 12 2023 at 12:04):

seems pretty unlikely, but I can imagine that the reason the mystery cache file got on the server was because it was created by an unrelated run, and my palate cleanser just overwrote that file, causing the other build to fail for the same reason

Johan Commelin (Sep 12 2023 at 12:04):

Ruben Van de Velde said:

Johan Commelin you might want to take #7099 off the queue

Done

Johan Commelin (Sep 12 2023 at 12:05):

We can retry in the near future

Johan Commelin (Sep 12 2023 at 12:05):

Also: huge shoutout to Mario for debugging this mystery!

Mario Carneiro (Sep 12 2023 at 12:06):

The failed file in the other run is ProofWidgets.Data.Json which is also suspicious

Mario Carneiro (Sep 12 2023 at 12:07):

I think the issue is not a hash collision exactly, but rather the cache key isn't taking into account some input which lake is

Eric Rodriguez (Sep 12 2023 at 12:08):

Kevin Buzzard said:

This is great! I'm really looking forward to those new widgets :-) I watched Patrick's demo https://youtu.be/8MFGhOWeCNE recently and my understanding is that we need shiny new proofwidgets for the calc trickery :-)

there was also some other nice ones in @Edward Ayers's ITP presentation:) point to what you want to rewrite and it will generate conv to get you to that subgoal!

Patrick Massot (Sep 12 2023 at 12:15):

I also showed it in my demo.

Mario Carneiro (Sep 12 2023 at 13:28):

Now the issue is manifesting on master: if you pull the cache then the proofwidgets files (and only those) are rebuilt by lake build. It seems to be a difference in the trace stored in proofwidgets/v0.0.13/linux-64.tar.gz and the one computed by lake build (and lean-cache). It would make sense that these bad hashes could get propagated/sustained through the cache, and there is at least one plausible mechanism for the bad hash (using oleans from the wrong arch and hoping they will end up the same in the end)

Mario Carneiro (Sep 12 2023 at 13:39):

AHA ProofWidgets-v0.0.13 uses the wrong lean-toolchain, mathlib builds these files using v4.0.0 but the distributed files were built by nightly-2023-07-15, and the toolchain is an input to the lake hash which is ignored by the cache hash (which only hashes the toolchain of mathlib itself, not dependent projects)

Mario Carneiro (Sep 12 2023 at 13:45):

And there is some competition here because both lake and cache want to unpack the files for ProofWidgets: lake has a zip file (with bad hashes) and cache has ltar files (with good hashes, usually)

Mario Carneiro (Sep 12 2023 at 13:49):

Unfortunately v0.0.15 is also being built by a nightly so it suffers from a similar issue, upgrading alone won't fix the mismatch. In fact I'm not sure how this whole setup ever really worked: whenever the proofwidgets toolchain doesn't exactly match mathlib's, lake will unpack the zip file and then immediately proceed to build the lean files and replace the originals

Ruben Van de Velde (Sep 12 2023 at 13:50):

oh no

Mario Carneiro (Sep 12 2023 at 14:02):

@Mac Malone I think this release thing doesn't really work as implemented. Before unpacking the release, you should check that the lean-toolchain is the same as the one being used for the compile, otherwise all the hashes will be useless and assuming they are correct leads to an inconsistent state. If it's not the same, then just proceed as if there is no release; mathlib's cache infrastructure can take over at that point but right now lake doesn't have any downstream opt-out for using releases

Mac Malone (Sep 13 2023 at 00:28):

Mario Carneiro said:

Before unpacking the release, you should check that the lean-toolchain is the same as the one being used for the compile, otherwise all the hashes will be useless and assuming they are correct leads to an inconsistent state.

Lake does not assume they are correct. It rebuilds the files if they are not. The Mathlib cache should be clobbering the proofwidget oleans anyway with its own, so nothing should break in that regard.

One key issue here though is that the proof widget cloud release hash is platform-dependent which trickles down to any dependent olean and causes a rebuild, but that is not a new issue. It will also be fixed once lean4#2521 is merged (cloud releases no longer effect traces).

Mario Carneiro (Sep 13 2023 at 00:31):

It should check this before downloading the zip file, because otherwise it's a waste of time to download it when you know nothing in it will be usable because the root hash is wrong

Mario Carneiro (Sep 13 2023 at 00:32):

Moreover, mathlib has no use for the cloud build at all, I'd like to just turn it off. Why is "prefer release build" a choice of the dependency rather than the main package? At least there should be an override

Mario Carneiro (Sep 13 2023 at 00:34):

the proof widget cloud release hash is platform-dependent

I don't think this is an issue given that Wojciech Nawrocki is faking platform independence by distributing the same file under three names

Mac Malone (Sep 13 2023 at 00:35):

Mario Carneiro said:

It should check this before downloading the zip file, because otherwise it's a waste of time to download it when you know nothing in it will be usable because the root hash is wrong

This is not necessarily true. Cloud releases can also contain FFI artifacts that are not Lean toolchain dependent (this is true, for instance, for LeanInfer).

Mac Malone (Sep 13 2023 at 00:36):

This is also true for ProofWidgets were the primary point of the cloud release is to ship the compiled JS which is not toolchain dependent.

Mario Carneiro (Sep 13 2023 at 00:37):

I would rather it just have a zip file with those JS files that cache can unpack

Mario Carneiro (Sep 13 2023 at 00:37):

the release stuff is just complicating matters

Wojciech Nawrocki (Sep 13 2023 at 00:37):

Now the issue is manifesting on master: if you pull the cache then the proofwidgets files (and only those) are rebuilt by lake build

@Mario Carneiro is this definitely the same issue? As you note, the proofwidget oleans are supposed to be rebuilt because they may have been built for the wrong platform. The original problem was that mathlib itself was being rebuilt for some reason.

Mario Carneiro (Sep 13 2023 at 00:38):

I'm not sure, I got cross-eyed tracking the issues

Mario Carneiro (Sep 13 2023 at 00:38):

I think that issue persists on #7070

Mario Carneiro (Sep 13 2023 at 00:40):

confirmed, mathlib is being rebuilt on #7070

Mac Malone (Sep 13 2023 at 00:43):

Wojciech Nawrocki said:

Mario Carneiro is this definitely the same issue? As you note, the proofwidget oleans are supposed to be rebuilt because they may have been built for the wrong platform.

Hmm, in that case my change in lean4#2521 may cause them to not be rebuilt. @Wojciech Nawrocki, if the are platform dependent it might be worth adding a hash of the platform descriptor as an extraDepTarget?

Mario Carneiro (Sep 13 2023 at 00:45):

no please stop breaking the build

Mario Carneiro (Sep 13 2023 at 00:46):

we really need those hashes to not be OS dependent

Wojciech Nawrocki (Sep 13 2023 at 00:47):

Hmm, in that case my change in lean4#2521 may cause them to not be rebuilt.

Sorry, I was assuming that standard .oleans are platform dependent. If that's not the case, then neither are PW4 .oleans. They are at least toolchain-version-dependent, right?

Mario Carneiro (Sep 13 2023 at 00:48):

they are actually not toolchain-version-dependent most of the time

Mario Carneiro (Sep 13 2023 at 00:48):

but it is hard to predict in advance whether a toolchain change will cause a change to the olean

Wojciech Nawrocki (Sep 13 2023 at 00:50):

Oh, I see that the binary format does not store the string literal for the toolchain version. Does it store an .olean-format-version? If not, how does Lake know when to rebuild on toolchain bumps?

Wojciech Nawrocki (Sep 13 2023 at 00:51):

Oh, is it just that the trace includes the toolchain version, but the binary contents of the .olean itself might not technically change?

Mac Malone (Sep 13 2023 at 05:51):

Mario Carneiro said:

Why is "prefer release build" a choice of the dependency rather than the main package?

Sorry, I did not notice this comment before. The reason for this is because Lake's build are bottom-up not top-down. A build only knows its immediate dependencies, not nested ones and not its dependents. Fetching releases (as one of the first actions) is at the very bottom of this tree and thus only has access to its immediate package. The whole dependency fetching part of Lake's design was inspired by the Build systems à la carte paper.

However, Lake could provide an CLI override to forcibly turn this off for packages when it loads them by setting the loaded package configuration's preferReleaseBuild to false.

Moreover, mathlib has no use for the cloud build at all, I'd like to just turn it off.

This can still be done currently if the package itself provides a configuration option to turn cloud releases off (e.g., via -KnoCloudRelease in ProofWidgets that sets preferReleaseBuild to false).

Wojciech Nawrocki (Sep 13 2023 at 19:36):

Moreover, mathlib has no use for the cloud build at all, I'd like to just turn it off.

Note that for PW4 the cloud build is crucial to avoid everyone having to install and run NodeJS. It could in principle be rolled into the mathlib cache, but I would prefer to stick to general Lean package ecosystem tooling whenever possible.

Scott Morrison (Sep 15 2023 at 00:16):

@Mario Carneiro, when you have a moment, would you mind summarising for me and @Patrick Massot where you got to on #7070? I haven't looked at this since you started looking at it, and I don't have much sense of what needs to be done to give Patrick his shiny toys. :-)

Mario Carneiro (Sep 15 2023 at 00:17):

There is an open PR #7143 which will improve debugging for issues such as this one

Mario Carneiro (Sep 15 2023 at 00:18):

I think we also might want PW4 to use the same version as mathlib, although things should still work without this

Mario Carneiro (Sep 15 2023 at 00:19):

some of the issues I was having with e.g. bad traces on nightly disappeared after some waiting, so I'm not sure if this is a temporary issue

Scott Morrison (Sep 15 2023 at 00:22):

Great, thanks. I just restarted the build for #7143 (broken by the runner carnage), and :peace_sign:'d.

Mario Carneiro (Sep 15 2023 at 00:35):

I bumped the version of leantar in #7143:

The new version of leantar has a new function: leantar -k 1234.ltar will print out the "comments" in the file. If it is a file generated by an old version of leantar it will be empty, but files generated subsequent to this commit will show something along the lines git=mathlib4@12de34 containing the commit sha for the run that generated this ltar file.

Mario Carneiro (Sep 15 2023 at 00:42):

I just tried lean exe cache get && lake build on #7070 and it seems to work, can anyone else replicate?

Patrick Massot (Sep 15 2023 at 00:42):

I'd very happy to try, but I have no idea what you want me to do.

Mario Carneiro (Sep 15 2023 at 00:45):

just check out the branch and see whether lake build does anything after cache get

Patrick Massot (Sep 15 2023 at 00:48):

Attempting to download 3344 file(s)
Downloaded: 3344 file(s) [attempted 3344/3344 = 100%] (100% success)
Decompressing 3733 file(s)
ProofWidgets.Data.Json hash before: (some 10852882052451301938), downloading 4c5ca804ed29bc79.ltar
unpacked in 6790 ms
ProofWidgets.Data.Json hash after: (some 16603086849742441083)
pmassot@fixe-massy:/tmp/mathlib4$ lake build
[69/175] Building Mathlib.Tactic.LabelAttr
[69/177] Building Mathlib.Data.Array.Defs
[69/175] Building Mathlib.Data.KVMap
[69/178] Building Mathlib.Lean.Meta.Simp

:sad:

Patrick Massot (Sep 15 2023 at 00:48):

This is on a fresh clone of mathlib after switching to the PR branch.

Patrick Massot (Sep 15 2023 at 00:49):

Or did you mean it works for you in the sense that you get the message about the hash mismatch?

Mario Carneiro (Sep 15 2023 at 00:49):

well I've been doing all kinds of things including lake exe cache pack so it's possible my local cache is correct and fooling me

Mario Carneiro (Sep 15 2023 at 00:51):

I'm going to assume the issue persists, because I haven't actually changed anything :)

Mario Carneiro (Sep 15 2023 at 00:53):

@Patrick Massot can you try clearing your cache and try again?

Mario Carneiro (Sep 15 2023 at 00:54):

I just cleared my cache and I still get a clean lake build

Patrick Massot (Sep 15 2023 at 00:54):

How do you clear your cache?

Mario Carneiro (Sep 15 2023 at 00:54):

lake exe cache clean! I think

Mario Carneiro (Sep 15 2023 at 00:54):

you can also just rm ~/.cache/mathlib/*.ltar

Patrick Massot (Sep 15 2023 at 01:07):

pmassot@fixe-massy:/tmp/mathlib4$ rm -rf ~/.cache/mathlib/
pmassot@fixe-massy:/tmp/mathlib4$ lake exe cache get
leantar is too old; downloading more recent version
Attempting to download 3733 file(s)
Downloaded: 3733 file(s) [attempted 3733/3733 = 100%] (100% success)
Decompressing 3733 file(s)
ProofWidgets.Data.Json hash before: (some 16603086849742441083), downloading 4c5ca804ed29bc79.ltar
unpacked in 198 ms
ProofWidgets.Data.Json hash after: (some 2539444116390327465)
pmassot@fixe-massy:/tmp/mathlib4$ lake build
pmassot@fixe-massy:/tmp/mathlib4$

is looking much better!

Patrick Massot (Sep 15 2023 at 02:57):

Should we merge this?

Wojciech Nawrocki (Sep 15 2023 at 03:09):

Is it true that the lake exe cache pack! invocation seems to fix the issue, but it is not clear why it does so or what the issue was? That would be mildly unsettling, though a fix is a fix!

Mario Carneiro (Sep 15 2023 at 03:16):

lake exe cache pack! (followed by lake exe cache put, when run on CI) has the effect of overwriting any cache files with new versions. The purpose of #7143 is to help determine where the bad cache files actually came from - they could be from any run for any PR. So if things are fixed now I think we should just go forward with it and keep an eye out for similar issues later, and we will be able to find the run that conflicts. My guess is that what is happening is that there are two runs from different versions of something, which have the same cache key because some relevant difference is not being hashed. It's hard to debug that without knowing what are the runs in question.

Timo Carlin-Burns (Sep 15 2023 at 04:16):

@Patrick Massot you typoed bord merge on #7070

Kevin Buzzard (Sep 15 2023 at 06:38):

Super-powered calc gets one step closer :-)


Last updated: Dec 20 2023 at 11:08 UTC