Zulip Chat Archive
Stream: general
Topic: cache is incomplete
Eric Wieser (May 06 2022 at 14:30):
I'm sure there was a thread about this before, but I wasn't able to find it.
I often find that the cache that gets uploaded to azure doesn't actually work. See for instance any of:
- https://github.com/leanprover-community/mathlib/commits/2c79857a016b1ce4fd5b07c12b9f9c6fb3c5845a
- https://github.com/leanprover-community/mathlib/commits/1d27f3ef40b5cbf59303eb16e5853046746e0c6e
- https://github.com/leanprover-community/mathlib/commits/29f2cec27b2888316abaa2bd72274fb460974cd6
All of these are empty commits with a single parent commit, and CI produced a cache for that parent commit; yet all of them required a partial rebuild anyway.
Eric Wieser (May 06 2022 at 14:34):
I vaguelly remember at one point we just had CI run lean make; lean make; lean make
to workaround this? (edit: we still do, we only do it twice)
Floris van Doorn (May 06 2022 at 14:37):
To what extend is the order in which Lean compiles files deterministic and independent of the cache? If a build fails, the compilation stops, so you don't get a cache consisting of all files that compile (at, least, I think so). So maybe on the rebuild Lean starts compiling other files first?
Not sure if this is the reason though.
Eric Wieser (May 06 2022 at 14:41):
In all these cases the cache doesn't even contain the files needed to reproduce the failure
Eric Wieser (May 06 2022 at 14:43):
Ah, I guess our lean make; lean make
hack doesn't work if the build fails the first time
Eric Wieser (May 06 2022 at 14:43):
Since it will bail before the second command begins
Floris van Doorn (May 06 2022 at 15:52):
Presumably the cache contains all dependencies of the broken file? If that isn't the case, then there is clearly something wrong...
Eric Wieser (May 06 2022 at 16:05):
No, it doesn't contain the dependencies either.
Eric Wieser (May 06 2022 at 16:05):
Or if it does, they're considered invalid and rebuilt
Floris van Doorn (May 06 2022 at 16:47):
oh, ok. Then that is a problem.
Floris van Doorn (May 06 2022 at 16:59):
I can reproduce that the cache of 502be36986
behaves the way Eric says: the error is reported in analysis/normed_space/basic
, but if I recompile locally it starts recompiling from tactic/cancel_denoms
Mauricio Collares (May 06 2022 at 17:10):
Weird, the oleans from the GitHub artifact (https://github.com/leanprover-community/mathlib/suites/6402242134/artifacts/233212189) seem to work OK, but not the Azure version
Eric Wieser (May 06 2022 at 17:13):
So they're different as binaries?
Mauricio Collares (May 06 2022 at 17:14):
Yes
Mauricio Collares (May 06 2022 at 17:15):
The Azure version of src/tactic/cancel_denoms.olean
has MD5 208fbe78b66c5bc6d924a25b7fc3e03d
, the GH artifact version has MD5 2043d4862a2e327db6cc547f9fbd9b90
.
Eric Wieser (May 06 2022 at 17:24):
Is either of those hashes the same a the parent commit?
Mauricio Collares (May 06 2022 at 17:29):
The Azure one has a timestamp of 05 May 2022, 01:13
in the archive, the GH one has a timestamp of 06 May 2022, 14:42
(there are files with older timestamps in the GH archive, just not this one).
Mauricio Collares (May 06 2022 at 17:30):
But there are only two olean files in the tactic folder with today's date in the Azure archive.
Mauricio Collares (May 06 2022 at 17:31):
I don't know how reliable the timestamps are, let me search for parent commit hashes.
Mauricio Collares (May 06 2022 at 17:44):
A possible theory is that Azure caches aren't overwritten by default: The first run for 502be36986
uploaded some stale oleans onto 502be36986a2a14757cb43a61f4cfec4edb7361c.tar.xz
because it was cancelled (or suffered from an infra failure). The second run produced valid oleans but didn't upload them properly due to the pre-existing archive.
Mauricio Collares (May 06 2022 at 17:44):
Do you see the same problem with 2c79857a01
? Here lean --make
fails quite quickly (analysis/normed_space/basic.lean:475:24: error: failed to synthesize type class instance for
...) with the Azure oleans for this commit.
Rob Lewis (May 06 2022 at 17:53):
This is plausible, Azure caches are not overwritten if they already exist. https://github.com/leanprover-community/mathlib/blob/master/.github/workflows/build.yml.in#L77
Rob Lewis (May 06 2022 at 17:53):
(I assume that's what --overwrite false
means)
Mauricio Collares (May 06 2022 at 17:54):
And, according to the same file, something is pushed to Azure on the first run as long as the build started. So restarting a run doesn't help with fixing oleans if the worker OOMed the first time, for example. But a separate commit should have worked (and it does here, as far as I can tell).
Rob Lewis (May 06 2022 at 18:08):
#2039 is the relevant PR. I'm not sure what I meant at the time, and it seems like GitHub has deleted the logs from the demo runs. Why would the second build fail at the push stage? azcopy
defaults to --overwrite true
Mauricio Collares (May 06 2022 at 18:11):
Conditioning on the fact that the build failed there, I'm guessing azcopy
defaults to interactively asking the user if the file should be overwritten.
Mauricio Collares (May 06 2022 at 18:14):
Maybe it was prompting due to https://github.com/Azure/azure-storage-azcopy/pull/1538 before?
Rob Lewis (May 06 2022 at 18:14):
The docs could be wrong or the behavior could have changed in the past two years, but they claim it defaults to --overwrite true
Rob Lewis (May 06 2022 at 18:14):
Mauricio Collares said:
Maybe it was prompting due to https://github.com/Azure/azure-storage-azcopy/pull/1538 before?
Huh, I'm not sure what that means, but maybe!
Rob Lewis (May 06 2022 at 18:16):
In any case, we can test that --overwrite true
does what we want
Mauricio Collares (Jul 23 2022 at 18:18):
The other thread (https://leanprover.zulipchat.com/#narrow/stream/113488-general/topic/.22saving.20olean.22.3F) reminded me of this. Is adding --overwrite true
still worth a try?
Last updated: Dec 20 2023 at 11:08 UTC