Zulip Chat Archive
Stream: mathlib4
Topic: CI: uploading cache
Michael Stoll (Sep 14 2025 at 20:12):
In the "Upload Archive and Counterexamples cache to Azure" step (on #29649) I see lots of lines
Transfer failed (error code: 201)
interspersed with stuff like
Non-JSON output from curl:
<?xml version="1.0" encoding="utf-8"?><Error><Code>BlobAlreadyExists</Code><Message>The specified blob already exists.
offset 0: unexpected input
Non-JSON output from curl:
RequestId:0c6c11af-701e-005c-65b1-25ecc5000000
offset 0: unexpected input
Non-JSON output from curl:
Time:2025-09-14T19:57:50.8630653Z</Message></Error>{"certs":"","content_type":"application/xml","conn_id":40,"errormsg":null,"exitcode":0,"filename_effective":null,"ftp_entry_path":null,"http_code":409,"http_connect":0,"http_version":"1.1","local_ip":"172.17.0.2","local_port":51958,"method":"PUT","num_certs":0,"num_connects":0,"num_headers":7,"num_redirects":0,"proxy_ssl_verify_result":0,"redirect_url":null,"referer":null,"remote_ip":"20.209.19.193","remote_port":443,"response_code":409,"scheme":"HTTPS","size_download":220,"size_header":311,"size_request":372,"size_upload":62481,"speed_download":6925,"speed_upload":1966976,"ssl_verify_result":0,"time_appconnect":0.000000,"time_connect":0.000000,"time_namelookup":0.000016,"time_pretransfer":0.000048,"time_redirect":0.000000,"time_starttransfer":0.031757,"time_total":0.031765,"url":"https://lakecache.blob.core.windows.net/mathlib4/f/MichaelStollBayreuth/mathlib4/77cf339325021eac.ltar?***","url.scheme":"https","url.user":null,"url.password":null,"url.options":null,"url.host":"lakecache.blob.core.windows.net","url.port":"443","url.path":"/mathlib4/f/MichaelStollBayreuth/mathlib4/77cf339325021eac.ltar","url.query":"***","url.fragment":null,"url.zoneid":null,"urle.scheme":"https","urle.user":null,"urle.password":null,"urle.options":null,"urle.host":"lakecache.blob.core.windows.net","urle.port":"443","urle.path":"/mathlib4/f/MichaelStollBayreuth/mathlib4/77cf339325021eac.ltar","urle.query":"***","urle.fragment":null,"urle.zoneid":null,"url_effective":"https://lakecache.blob.core.windows.net/mathlib4/f/MichaelStollBayreuth/mathlib4/77cf339325021eac.ltar?***","urlnum":1833,"xfer_id":1833,"curl_version":"libcurl/8.5.0 OpenSSL/3.0.13 zlib/1.3 brotli/1.1.0 zstd/1.5.5 libidn2/2.3.7 libpsl/0.21.2 (+libidn2/2.3.7) libssh/0.10.6/openssl/zlib nghttp2/1.59.0 librtmp/2.3 OpenLDAP/2.6.7"}
offset 0: unexpected input
There is also a number of lines like
Uploaded: 0 file(s) [attempted 7280/7282 = 99%, 621 KB/s], 7280 failed
In line 8123 it says
Compressing cache
Attempting to upload 7224 file(s) to MichaelStollBayreuth/mathlib4 cache
followed by a further sea of messages like
offset 0: unexpected input
Transfer failed (error code: 201)
Non-JSON output from curl:
<?xml version="1.0" encoding="utf-8"?><Error><Code>BlobAlreadyExists</Code><Message>The specified blob already exists.
(there are also quite long renderings of "Non-JSON output from curl"). The log has > 70000 lines.
The following step (uploading to Cloudflare) looks OK, though, and CI passes.
Is this something one should be worried about?
Bryan Gin-ge Chen (Sep 14 2025 at 20:47):
If CI passes then it's probably OK, but maybe @Kim Morrison will be interested (?).
Kim Morrison (Sep 15 2025 at 00:23):
Hmm... in practice it seems the Azure cache actually works, so I'm not sure what to make of this.
Wrenna Robson (Sep 22 2025 at 14:03):
I am also having a similar issue on mathlib4-nightly-testing#70, and in this case the cache does NOT appear to be working...
Wrenna Robson (Sep 22 2025 at 15:11):
Wrenna Robson (Sep 22 2025 at 15:11):
See here:
Attempting to upload 7225 file(s) to linesthatinterlace/mathlib4 cache
Non-JSON output from curl:
<?xml version="1.0" encoding="utf-8"?><Error><Code>BlobAlreadyExists</Code><Message>The specified blob already exists.
offset 0: unexpected input
Non-JSON output from curl:
RequestId:ee7f1f9b-701e-005c-05d0-2becc5000000
offset 0: unexpected input
Wrenna Robson (Sep 22 2025 at 15:11):
Don't know if I have something misconfigured
Wrenna Robson (Sep 22 2025 at 19:16):
This is still happening and it's causing me some major issues.
Wrenna Robson (Sep 22 2025 at 23:15):
I have made https://github.com/leanprover-community/batteries/issues/1425
Michael Stoll (Nov 04 2025 at 18:27):
Michael Stoll said:
In the "Upload Archive and Counterexamples cache to Azure" step I see lots of lines...
I again or still see lines
Non-JSON output from curl:
followed by seemingly random text (see here). It does not look to me like this is supposed to be so. Maybe somebody can look at it again? @Kim Morrison ?
EDIT: Maybe merging master helps. I'm trying this right now...
Bryan Gin-ge Chen (Nov 04 2025 at 18:31):
I've been tracking issues caused by this here: #mathlib4 > "verify that everything was available in the cache" fails CI
It seems that merging master usually fixes it, but not always (sometimes master has to be merged again after more commits have landed).
Michael Stoll (Nov 04 2025 at 18:52):
After merging master, "Upload cache to Azure" prints lots of lines
Transfer failed (error code: 201)
and the same is true for the "Upload Archive and Counterexamples cache to Azure" step in CI. (Link)
Bryan Gin-ge Chen (Nov 04 2025 at 18:53):
Right, you might have to merge master again, unfortunately.
Michael Stoll (Nov 04 2025 at 18:55):
Bryan Gin-ge Chen said:
Right, you might have to merge
masteragain, unfortunately.
How could that improve things when there are no new commits?
Kevin Buzzard (Nov 04 2025 at 18:56):
CI is non-deterministic!
Bryan Gin-ge Chen (Nov 04 2025 at 18:57):
Sorry, I meant after more commits have landed in master.
Michael Stoll (Nov 04 2025 at 18:59):
How would "normal" commits change this behavior?
Bryan Gin-ge Chen (Nov 04 2025 at 19:04):
I have no idea, since I don't understand the root cause, but so far the failures in the other thread have started working after merging master once or twice.
Sebastian Ullrich (Nov 04 2025 at 19:17):
And retrying once or twice has not? Just trying to understand the nondeterminism nature
Michael Stoll (Nov 04 2025 at 19:25):
Retrying what precisely?
Sebastian Ullrich (Nov 04 2025 at 19:35):
Restarting the workflow without pushing new commits
Bryan Gin-ge Chen (Nov 04 2025 at 19:36):
Michael Stoll said:
After merging master, "Upload cache to Azure" prints lots of lines
Transfer failed (error code: 201)and the same is true for the "Upload Archive and Counterexamples cache to Azure" step in CI. (Link)
I went to try re-running the job in this link but it looks like CI had already passed when you posted the link. The confusing thing is that even successful uploads have lots of "Transfer failed" messages, which we also ought to fix.
Bryan Gin-ge Chen (Nov 04 2025 at 19:37):
Ah, I see you've triggered a re-run just now.
Michael Stoll (Nov 04 2025 at 19:37):
Sebastian Ullrich said:
Restarting the workflow without pushing new commits
Bryan Gin-ge Chen (Nov 04 2025 at 19:38):
I think the cache for that PR should already be working though, I saw a green check on the commit at #29649 (at least until you re-ran the workflow).
Michael Stoll (Nov 04 2025 at 19:43):
I'm re-running the workflow. I notice that it does not build any Mathlib files (as expected), but it does build a lot of stuff in Archive and Counterexamples. Then the "Upload Archive and Counterexamples cache to Azure" step shows again a lot of junk:
Attempting to upload 27 file(s) to MichaelStollBayreuth/mathlib4 cache
Non-JSON output from curl:
<?xml version="1.0" encoding="utf-8"?><Error><Code>BlobAlreadyExists</Code><Message>The specified blob already exists.
offset 0: unexpected input
Non-JSON output from curl:
RequestId:f1010c24-001e-0034-74c3-4d8a55000000
offset 0: unexpected input
Non-JSON output from curl:
Time:2025-11-04T19:41:36.2735025Z</Message></Error>{"certs":"Subject:C = US, ST = WA, L = Redmond, O = Microsoft Corporation, CN = *.blob.core.windows.net\nIssuer:C = US, O = Microsoft Corporation, CN = Microsoft Azure RSA TLS Issuing CA 08\nVersion:2\nSerial Number:330263b4c2a7aac4fd83de827c00000263b4c2\nSignature Algorithm:sha384WithRSAEncryption\nPublic Key Algorithm:rsaEncryption\nCT Precertificate SCTs:Signed Certificate Timestamp:\n Version : v1 (0x0)\n Log ID : 96:97:64:BF:55:58:97:AD:F7:43:87:68:37:08:42:77:\n E9:F0:3A:D5:F6:A4:F3:36:6E:46:A4:3F:0F:CA:A9:C6\n Timestamp : Oct 2 02:42:20.869 2025 GMT\n Extensions: none\n Signature : ecdsa-with-SHA256\n 30:46:02:21:00:8E:88:A0:A6:88:F0:C0:13:06:24:D1:\n 20:26:F8:97:58:3F:B0:40:35:B2:76:90:86:E0:95:CC:\n E1:76:9F:A7:E5:02:21:00:C8:7D:EF:D2:DD:58:58:B8:\n E6:3D:D3:A5:A5:40:EF:5B:7A:20:90:B9:F7:6C:CA:B7:\n DF:90:4E:E8:D0:D0:B8:0D\nSigned Certific
offset 0: unexpected input
Non-JSON output from curl:
<?xml version="1.0" encoding="utf-8"?><Error><Code>BlobAlreadyExists</Code><Message>The specified blob already exists.
offset 0: unexpected input
Non-JSON output from curl:
RequestId:b253ca9b-001e-0056-6bc3-4d4872000000
etc. etc.
Michael Stoll (Nov 04 2025 at 19:44):
It looks like it expects to receive JSON, but gets fed XML instead...
Michael Stoll (Nov 04 2025 at 19:48):
"get cache for Mathlib":
Using cache (Azure) from origin: leanprover-community/mathlib4
Attempting to download 7458 file(s) from leanprover-community/mathlib4 cache
...
Downloaded: 5877 file(s) [attempted 7458/7458 = 100%, 4740 KB/s]
Warning: some files were not found in the cache.
This usually means that your local checkout of mathlib4 has diverged from upstream.
If you push your commits to a branch of the mathlib4 repository, CI will build the oleans and they will be available later.
Alternatively, if you already have pushed your commits to a branch, this may mean the CI build has failed part-way through building.
Decompressing 5877 file(s)
Unpacked in 14896 ms
Completed successfully!
Attempting to download 1581 file(s) from MichaelStollBayreuth/mathlib4 cache
...
Downloaded: 1581 file(s) [attempted 1581/1581 = 100%, 274 KB/s]
Decompressing 7458 file(s)
Unpacked in 17839 ms
Completed successfully!
Why does it decompress twice?
Sebastian Ullrich (Nov 04 2025 at 19:48):
I think we've seen that JSON output many times before, are you sure it has anything to do with failures?
Michael Stoll (Nov 04 2025 at 19:49):
"get cache for Archive and Counterexamples" look similar.
Michael Stoll (Nov 04 2025 at 19:49):
Sebastian Ullrich said:
I think we've seen that JSON output many times before, are you sure it has anything to do with failures?
I have no idea, but it does not look right, somehow.
Bryan Gin-ge Chen (Nov 04 2025 at 19:50):
So it looks like things are now working properly though the log output might be confusing. Note that cache files are downloaded from two separate caches: one for leanprover-community/mathlib (files from the master branch) and MichaelStollBayreuth/mathlib4 (files specific to your PR); that's why you see two decompressing steps.
Michael Stoll (Nov 04 2025 at 19:51):
But when I do lake exe cache get locally, it also downloads from two sources, but only decompresses once at the end.
Bryan Gin-ge Chen (Nov 04 2025 at 19:51):
It may be that the master files were already present on your computer from a previous download.
Bryan Gin-ge Chen (Nov 04 2025 at 19:52):
If opening the files in your editor / lake build is fast, then I wouldn't worry more about this. We do need to improve the console output at some point though.
Michael Stoll (Nov 04 2025 at 19:53):
BTW, when "verify[ing] that everything was available in the cache" there is some noise:
ℹ [97/221] Replayed Batteries.Data.Array.Match
trace: .> LEAN_PATH=/home/runner/work/mathlib4/mathlib4/.lake/packages/Cli/.lake/build/lib/lean:/home/runner/work/mathlib4/mathlib4/.lake/packages/batteries/.lake/build/lib/lean:/home/runner/work/mathlib4/mathlib4/.lake/packages/Qq/.lake/build/lib/lean:/home/runner/work/mathlib4/mathlib4/.lake/packages/aesop/.lake/build/lib/lean:/home/runner/work/mathlib4/mathlib4/.lake/packages/proofwidgets/.lake/build/lib/lean:/home/runner/work/mathlib4/mathlib4/.lake/packages/importGraph/.lake/build/lib/lean:/home/runner/work/mathlib4/mathlib4/.lake/packages/LeanSearchClient/.lake/build/lib/lean:/home/runner/work/mathlib4/mathlib4/.lake/packages/plausible/.lake/build/lib/lean:/home/runner/work/mathlib4/mathlib4/.lake/build/lib/lean /home/runner/.elan/toolchains/leanprover--lean4---v4.25.0-rc2/bin/lean /home/runner/work/mathlib4/mathlib4/.lake/packages/batteries/Batteries/Data/Array/Match.lean -o /home/runner/work/mathlib4/mathlib4/.lake/packages/batteries/.lake/build/lib/lean/Batteries/Data/Array/Match.olean -i /home/runne
and
ℹ [501/590] Replayed proofwidgets/widgetPackageLock
trace: ././widget> npm install
trace: stdout:
added 1 package, changed 2 packages, and audited 367 packages in 2s
67 packages are looking for funding
run `npm fund` for details
1 moderate severity vulnerability
To address all issues, run:
npm audit fix
Run `npm audit` for details.
ℹ [502/590] Replayed proofwidgets/widgetJsAll
trace: /home/runner/work/ProofWidgets4/ProofWidgets4/widget> npm clean-install
trace: stdout:
added 365 packages, and audited 366 packages in 8s
68 packages are looking for funding
run `npm fund` for details
1 low severity vulnerability
To address all issues, run:
npm audit fix
Run `npm audit` for details.
trace: stderr:
npm warn deprecated three-mesh-bvh@0.7.8: Deprecated due to three.js version incompatibility. Please use v0.8.0, instead.
trace: /home/runner/work/ProofWidgets4/ProofWidgets4/widget> npm run build
trace: stdout:
> @leanprover-community/proofwidgets4@0.1.0 build
> npx tsc && rollup --environment NODE_ENV:production --config
trace: stderr:
dist/rubiks.js → ../.lake/build/js...
created ../.lake/build/js in 9.9s
dist/recharts.js → ../.lake/build/js...
(!) Circular dependencies
node_modules/d3-interpolate/src/value.js -> node_modules/d3-interpolate/src/array.js -> node_modules/d3-interpolate/src/value.js
node_modules/d3-interpolate/src/value.js -> node_modules/d3-interpolate/src/object.js -> node_modules/d3-interpolate/src/value.js
node_modules/recharts/es6/util/ChartUtils.js -> node_modules/recharts/es6/util/getLegendProps.js -> node_modules/recharts/es6/util/ChartUtils.js
created ../.lake/build/js in 6.5s
dist/rbTree.js → ../.lake/build/js...
(!) Circular dependencies
node_modules/d3-selection/src/selection/index.js -> node_modules/d3-selection/src/selection/select.js -> node_modules/d3-selection/src/selection/index.js
node_modules/d3-selection/src/selection/index.js -> node_modules/d3-selection/src/selection/selectAll.js -> node_modules/d3-selection/src/selection/index.js
node_modules/d3-selection/src/selection/index.js -> node_modules/d3-selection/src/selection/filter.js -> node_modules/d3-selection/src/selection/index.js
...and 12 more
created ../.lake/build/js in 1.6s
dist/presentSelection.js → ../.lake/build/js...
created ../.lake/build/js in 165ms
dist/penroseDisplay.js → ../.lake/build/js...
(!) Circular dependencies
node_modules/true-myth/dist/es/maybe.js -> node_modules/true-myth/dist/es/result.js -> node_modules/true-myth/dist/es/maybe.js
node_modules/@penrose/core/dist/engine/EngineUtils.js -> node_modules/@penrose/core/dist/utils/Util.js -> node_modules/@penrose/core/dist/engine/EngineUtils.js
node_modules/@penrose/core/dist/engine/BBox.js -> node_modules/@penrose/core/dist/contrib/Queries.js -> node_modules/@penrose/core/dist/engine/BBox.js
...and 14 more
created ../.lake/build/js in 14s
dist/penroseCanvas.js → ../.lake/build/js...
(!) Circular dependencies
node_modules/true-myth/dist/es/maybe.js -> node_modules/true-myth/dist/es/result.js -> node_modules/true-myth/dist/es/maybe.js
node_modules/@penrose/core/dist/engine/EngineUtils.js -> node_modules/@penrose/core/dist/utils/Util.js -> node_modules/@penrose/core/dist/engine/EngineUtils.js
node_modules/@penrose/core/dist/engine/BBox.js -> node_modules/@penrose/core/dist/contrib/Queries.js -> node_modules/@penrose/core/dist/engine/BBox.js
...and 14 more
created ../.lake/build/js in 13.6s
dist/ofRpcMethod.js → ../.lake/build/js...
created ../.lake/build/js in 206ms
dist/makeEditLink.js → ../.lake/build/js...
created ../.lake/build/js in 109ms
dist/interactiveSvg.js → ../.lake/build/js...
created ../.lake/build/js in 158ms
dist/interactiveExpr.js → ../.lake/build/js...
created ../.lake/build/js in 109ms
dist/index.js → ../.lake/build/js...
created ../.lake/build/js in 128ms
dist/htmlDisplayPanel.js → ../.lake/build/js...
created ../.lake/build/js in 129ms
dist/htmlDisplay.js → ../.lake/build/js...
created ../.lake/build/js in 124ms
dist/goalTypePanel.js → ../.lake/build/js...
created ../.lake/build/js in 166ms
dist/filterDetails.js → ../.lake/build/js...
created ../.lake/build/js in 135ms
dist/exprPresentation.js → ../.lake/build/js...
created ../.lake/build/js in 150ms
dist/d3Graph.js → ../.lake/build/js...
(!) Circular dependencies
node_modules/d3-selection/src/selection/index.js -> node_modules/d3-selection/src/selection/select.js -> node_modules/d3-selection/src/selection/index.js
node_modules/d3-selection/src/selection/index.js -> node_modules/d3-selection/src/selection/selectAll.js -> node_modules/d3-selection/src/selection/index.js
node_modules/d3-selection/src/selection/index.js -> node_modules/d3-selection/src/selection/filter.js -> node_modules/d3-selection/src/selection/index.js
...and 12 more
created ../.lake/build/js in 2.9s
dist/cancellable.js → ../.lake/build/js...
created ../.lake/build/js in 116ms
dist/animatedHtml.js → ../.lake/build/js...
created ../.lake/build/js in 136ms
I find the "circular dependencies(!)" a bit scary...
Bryan Gin-ge Chen (Nov 04 2025 at 19:54):
Yeah, that's expected. There's always a bit of output from proofwidgets for some reason.
Bryan Gin-ge Chen (Nov 04 2025 at 19:54):
If there's no red check at the end of the job, it's usually safe to assume things are working as expected.
Ruben Van de Velde (Nov 04 2025 at 20:00):
<Error><Code>BlobAlreadyExists</Code><Message>The specified blob already exists.
seems to suggest that we're trying to upload something that was uploaded already
Kenny Lau (Nov 19 2025 at 22:12):
https://github.com/leanprover-community/mathlib4/actions/runs/19516744620/job/55870339751
This commit builds successfully, but on the "Upload cache to Azure" step I get the error "Transfer failed (error code: 201)" 122 times (once for each new file), and nothing got uploaded to the "kckennylau/mathlib4 cache".
Kenny Lau (Nov 20 2025 at 10:29):
Anyone? I realise it wasn't clear, but I'm asking for help in the message above.
Kenny Lau (Nov 20 2025 at 12:52):
And this is certainly not an isolated incident, a random new PR I looked at by @Jovan Gerbscheid also suffers from the same error: https://github.com/leanprover-community/mathlib4/actions/runs/19536610578/job/55932181913?pr=31847
and I would hypothesize that every PR is having the same issue?
Kenny Lau (Nov 20 2025 at 12:54):
one, two, three, four out of 4 randomly sampled PR's from the first page
Bryan Gin-ge Chen (Nov 20 2025 at 12:54):
Is lake exe cache get broken for these PRs? I think cache logs are a bit noisy, I've noticed that step outputting lots of stuff which looks like it might be an error even though things end up OK. Generally if the post-build-job succeeds, then that means that CI checked that lake exe cache get worked.
Bryan Gin-ge Chen (Nov 20 2025 at 12:56):
There is an issue where sometimes uploading does fail causing the post build job to break which I've been tracking here: #mathlib4 > "verify that everything was available in the cache" fails CI The fact that the log in the upload step is noisy certainly doesn't help in debugging what's going on there.
Kenny Lau (Nov 20 2025 at 12:57):
Bryan Gin-ge Chen said:
I've noticed that step outputting lots of stuff which looks like it might be an error even though things end up OK.
Why I discovered the error is that I noticed that when I push new changes to my PR (and I only changed proofs compared to the previous commit), it actually starts building from the main cache (even though under the new module system if i only change proofs i shouldn't have to recompile everything)
Kenny Lau (Nov 20 2025 at 12:59):
Links:
- The PR
- The commit that only changed proofs
- The previous successful build log
- The current build log, which does not use the last cache
Bryan Gin-ge Chen (Nov 20 2025 at 13:02):
It's very possible that the way cache works could be improved to take advantage of the new module system and that no one has gotten around to it yet.
Kenny Lau (Nov 20 2025 at 13:02):
sure, but that log still says it failed to download the new cache
Bryan Gin-ge Chen (Nov 20 2025 at 13:10):
This is speculation on my part: the way cache files are hashed probably doesn't take into account the fact that a change to Lean files that only touches proofs shouldn't require re-compiling downstream files, so it's looking for files that don't exist. Unfortunately, I don't have the capacity to look into this further...
Kenny Lau (Nov 20 2025 at 13:43):
Bryan Gin-ge Chen said:
so it's looking for files that don't exist
but the logs show that it tried to download the previous cache and failed
Bryan Gin-ge Chen (Nov 20 2025 at 13:55):
Right, I think that's consistent with what I suggested. cache is running on a checkout of your branch including your latest changes. To find the cached version of oleans for a Lean file, it looks for a file on the cloud server with a filename coming from a hashing algorithm that takes into account the Lean file + its dependencies. So if your changes caused the hashes of a bunch of files to change (unnecessarily, possibly) then in that step, cache will end up trying to download a bunch of files that don't exist (yet). (It's been a while since I looked at the code so this could be wrong; if you're interested I encourage you to take a look here; you might get further than I did in understanding what's happening!)
Kenny Lau (Nov 20 2025 at 13:58):
Ok, I understand; would it be a better algorithm to just download the cache of the previous commit? Which, I suppose, is your previous message saying that we could make use of the new module system in a future algorithm
Last updated: Dec 20 2025 at 21:32 UTC