Zulip Chat Archive

Stream: mathlib4

Topic: CI: uploading cache


Michael Stoll (Sep 14 2025 at 20:12):

In the "Upload Archive and Counterexamples cache to Azure" step (on #29649) I see lots of lines

Transfer failed (error code: 201)

interspersed with stuff like

Non-JSON output from curl:
  <?xml version="1.0" encoding="utf-8"?><Error><Code>BlobAlreadyExists</Code><Message>The specified blob already exists.
offset 0: unexpected input
Non-JSON output from curl:
  RequestId:0c6c11af-701e-005c-65b1-25ecc5000000
offset 0: unexpected input
Non-JSON output from curl:
  Time:2025-09-14T19:57:50.8630653Z</Message></Error>{"certs":"","content_type":"application/xml","conn_id":40,"errormsg":null,"exitcode":0,"filename_effective":null,"ftp_entry_path":null,"http_code":409,"http_connect":0,"http_version":"1.1","local_ip":"172.17.0.2","local_port":51958,"method":"PUT","num_certs":0,"num_connects":0,"num_headers":7,"num_redirects":0,"proxy_ssl_verify_result":0,"redirect_url":null,"referer":null,"remote_ip":"20.209.19.193","remote_port":443,"response_code":409,"scheme":"HTTPS","size_download":220,"size_header":311,"size_request":372,"size_upload":62481,"speed_download":6925,"speed_upload":1966976,"ssl_verify_result":0,"time_appconnect":0.000000,"time_connect":0.000000,"time_namelookup":0.000016,"time_pretransfer":0.000048,"time_redirect":0.000000,"time_starttransfer":0.031757,"time_total":0.031765,"url":"https://lakecache.blob.core.windows.net/mathlib4/f/MichaelStollBayreuth/mathlib4/77cf339325021eac.ltar?***","url.scheme":"https","url.user":null,"url.password":null,"url.options":null,"url.host":"lakecache.blob.core.windows.net","url.port":"443","url.path":"/mathlib4/f/MichaelStollBayreuth/mathlib4/77cf339325021eac.ltar","url.query":"***","url.fragment":null,"url.zoneid":null,"urle.scheme":"https","urle.user":null,"urle.password":null,"urle.options":null,"urle.host":"lakecache.blob.core.windows.net","urle.port":"443","urle.path":"/mathlib4/f/MichaelStollBayreuth/mathlib4/77cf339325021eac.ltar","urle.query":"***","urle.fragment":null,"urle.zoneid":null,"url_effective":"https://lakecache.blob.core.windows.net/mathlib4/f/MichaelStollBayreuth/mathlib4/77cf339325021eac.ltar?***","urlnum":1833,"xfer_id":1833,"curl_version":"libcurl/8.5.0 OpenSSL/3.0.13 zlib/1.3 brotli/1.1.0 zstd/1.5.5 libidn2/2.3.7 libpsl/0.21.2 (+libidn2/2.3.7) libssh/0.10.6/openssl/zlib nghttp2/1.59.0 librtmp/2.3 OpenLDAP/2.6.7"}
offset 0: unexpected input

There is also a number of lines like

Uploaded: 0 file(s) [attempted 7280/7282 = 99%, 621 KB/s], 7280 failed

In line 8123 it says

Compressing cache
Attempting to upload 7224 file(s) to MichaelStollBayreuth/mathlib4 cache

followed by a further sea of messages like

offset 0: unexpected input
Transfer failed (error code: 201)
Non-JSON output from curl:
  <?xml version="1.0" encoding="utf-8"?><Error><Code>BlobAlreadyExists</Code><Message>The specified blob already exists.

(there are also quite long renderings of "Non-JSON output from curl"). The log has > 70000 lines.
The following step (uploading to Cloudflare) looks OK, though, and CI passes.

Is this something one should be worried about?

Bryan Gin-ge Chen (Sep 14 2025 at 20:47):

If CI passes then it's probably OK, but maybe @Kim Morrison will be interested (?).

Kim Morrison (Sep 15 2025 at 00:23):

Hmm... in practice it seems the Azure cache actually works, so I'm not sure what to make of this.

Wrenna Robson (Sep 22 2025 at 14:03):

I am also having a similar issue on mathlib4-nightly-testing#70, and in this case the cache does NOT appear to be working...

Wrenna Robson (Sep 22 2025 at 15:11):

https://github.com/leanprover-community/mathlib4-nightly-testing/actions/runs/17917789817/job/50944481384#step:24:63350

Wrenna Robson (Sep 22 2025 at 15:11):

See here:

 Attempting to upload 7225 file(s) to linesthatinterlace/mathlib4 cache
Non-JSON output from curl:
  <?xml version="1.0" encoding="utf-8"?><Error><Code>BlobAlreadyExists</Code><Message>The specified blob already exists.
offset 0: unexpected input
Non-JSON output from curl:
  RequestId:ee7f1f9b-701e-005c-05d0-2becc5000000
offset 0: unexpected input

Wrenna Robson (Sep 22 2025 at 15:11):

Don't know if I have something misconfigured

Wrenna Robson (Sep 22 2025 at 19:16):

This is still happening and it's causing me some major issues.

Wrenna Robson (Sep 22 2025 at 23:15):

I have made https://github.com/leanprover-community/batteries/issues/1425

Michael Stoll (Nov 04 2025 at 18:27):

Michael Stoll said:

In the "Upload Archive and Counterexamples cache to Azure" step I see lots of lines...

I again or still see lines

Non-JSON output from curl:

followed by seemingly random text (see here). It does not look to me like this is supposed to be so. Maybe somebody can look at it again? @Kim Morrison ?

EDIT: Maybe merging master helps. I'm trying this right now...

Bryan Gin-ge Chen (Nov 04 2025 at 18:31):

I've been tracking issues caused by this here: #mathlib4 > "verify that everything was available in the cache" fails CI

It seems that merging master usually fixes it, but not always (sometimes master has to be merged again after more commits have landed).

Michael Stoll (Nov 04 2025 at 18:52):

After merging master, "Upload cache to Azure" prints lots of lines

Transfer failed (error code: 201)

and the same is true for the "Upload Archive and Counterexamples cache to Azure" step in CI. (Link)

Bryan Gin-ge Chen (Nov 04 2025 at 18:53):

Right, you might have to merge master again, unfortunately.

Michael Stoll (Nov 04 2025 at 18:55):

Bryan Gin-ge Chen said:

Right, you might have to merge master again, unfortunately.

How could that improve things when there are no new commits?

Kevin Buzzard (Nov 04 2025 at 18:56):

CI is non-deterministic!

Bryan Gin-ge Chen (Nov 04 2025 at 18:57):

Sorry, I meant after more commits have landed in master.

Michael Stoll (Nov 04 2025 at 18:59):

How would "normal" commits change this behavior?

Bryan Gin-ge Chen (Nov 04 2025 at 19:04):

I have no idea, since I don't understand the root cause, but so far the failures in the other thread have started working after merging master once or twice.

Sebastian Ullrich (Nov 04 2025 at 19:17):

And retrying once or twice has not? Just trying to understand the nondeterminism nature

Michael Stoll (Nov 04 2025 at 19:25):

Retrying what precisely?

Sebastian Ullrich (Nov 04 2025 at 19:35):

Restarting the workflow without pushing new commits

Bryan Gin-ge Chen (Nov 04 2025 at 19:36):

Michael Stoll said:

After merging master, "Upload cache to Azure" prints lots of lines

Transfer failed (error code: 201)

and the same is true for the "Upload Archive and Counterexamples cache to Azure" step in CI. (Link)

I went to try re-running the job in this link but it looks like CI had already passed when you posted the link. The confusing thing is that even successful uploads have lots of "Transfer failed" messages, which we also ought to fix.

Bryan Gin-ge Chen (Nov 04 2025 at 19:37):

Ah, I see you've triggered a re-run just now.

Michael Stoll (Nov 04 2025 at 19:37):

Sebastian Ullrich said:

Restarting the workflow without pushing new commits

Trying this right now.

Bryan Gin-ge Chen (Nov 04 2025 at 19:38):

I think the cache for that PR should already be working though, I saw a green check on the commit at #29649 (at least until you re-ran the workflow).

Michael Stoll (Nov 04 2025 at 19:43):

I'm re-running the workflow. I notice that it does not build any Mathlib files (as expected), but it does build a lot of stuff in Archive and Counterexamples. Then the "Upload Archive and Counterexamples cache to Azure" step shows again a lot of junk:

Attempting to upload 27 file(s) to MichaelStollBayreuth/mathlib4 cache
Non-JSON output from curl:
  <?xml version="1.0" encoding="utf-8"?><Error><Code>BlobAlreadyExists</Code><Message>The specified blob already exists.
offset 0: unexpected input
Non-JSON output from curl:
  RequestId:f1010c24-001e-0034-74c3-4d8a55000000
offset 0: unexpected input
Non-JSON output from curl:
  Time:2025-11-04T19:41:36.2735025Z</Message></Error>{"certs":"Subject:C = US, ST = WA, L = Redmond, O = Microsoft Corporation, CN = *.blob.core.windows.net\nIssuer:C = US, O = Microsoft Corporation, CN = Microsoft Azure RSA TLS Issuing CA 08\nVersion:2\nSerial Number:330263b4c2a7aac4fd83de827c00000263b4c2\nSignature Algorithm:sha384WithRSAEncryption\nPublic Key Algorithm:rsaEncryption\nCT Precertificate SCTs:Signed Certificate Timestamp:\n    Version   : v1 (0x0)\n    Log ID    : 96:97:64:BF:55:58:97:AD:F7:43:87:68:37:08:42:77:\n                E9:F0:3A:D5:F6:A4:F3:36:6E:46:A4:3F:0F:CA:A9:C6\n    Timestamp : Oct  2 02:42:20.869 2025 GMT\n    Extensions: none\n    Signature : ecdsa-with-SHA256\n                30:46:02:21:00:8E:88:A0:A6:88:F0:C0:13:06:24:D1:\n                20:26:F8:97:58:3F:B0:40:35:B2:76:90:86:E0:95:CC:\n                E1:76:9F:A7:E5:02:21:00:C8:7D:EF:D2:DD:58:58:B8:\n                E6:3D:D3:A5:A5:40:EF:5B:7A:20:90:B9:F7:6C:CA:B7:\n                DF:90:4E:E8:D0:D0:B8:0D\nSigned Certific
offset 0: unexpected input
Non-JSON output from curl:
  <?xml version="1.0" encoding="utf-8"?><Error><Code>BlobAlreadyExists</Code><Message>The specified blob already exists.
offset 0: unexpected input
Non-JSON output from curl:
  RequestId:b253ca9b-001e-0056-6bc3-4d4872000000

etc. etc.

Michael Stoll (Nov 04 2025 at 19:44):

It looks like it expects to receive JSON, but gets fed XML instead...

Michael Stoll (Nov 04 2025 at 19:48):

"get cache for Mathlib":

 Using cache (Azure) from origin: leanprover-community/mathlib4
Attempting to download 7458 file(s) from leanprover-community/mathlib4 cache
...
Downloaded: 5877 file(s) [attempted 7458/7458 = 100%, 4740 KB/s]
Warning: some files were not found in the cache.
This usually means that your local checkout of mathlib4 has diverged from upstream.
If you push your commits to a branch of the mathlib4 repository, CI will build the oleans and they will be available later.
Alternatively, if you already have pushed your commits to a branch, this may mean the CI build has failed part-way through building.
Decompressing 5877 file(s)
Unpacked in 14896 ms
Completed successfully!
Attempting to download 1581 file(s) from MichaelStollBayreuth/mathlib4 cache
...
Downloaded: 1581 file(s) [attempted 1581/1581 = 100%, 274 KB/s]
Decompressing 7458 file(s)
Unpacked in 17839 ms
Completed successfully!

Why does it decompress twice?

Sebastian Ullrich (Nov 04 2025 at 19:48):

I think we've seen that JSON output many times before, are you sure it has anything to do with failures?

Michael Stoll (Nov 04 2025 at 19:49):

"get cache for Archive and Counterexamples" look similar.

Michael Stoll (Nov 04 2025 at 19:49):

Sebastian Ullrich said:

I think we've seen that JSON output many times before, are you sure it has anything to do with failures?

I have no idea, but it does not look right, somehow.

Bryan Gin-ge Chen (Nov 04 2025 at 19:50):

So it looks like things are now working properly though the log output might be confusing. Note that cache files are downloaded from two separate caches: one for leanprover-community/mathlib (files from the master branch) and MichaelStollBayreuth/mathlib4 (files specific to your PR); that's why you see two decompressing steps.

Michael Stoll (Nov 04 2025 at 19:51):

But when I do lake exe cache get locally, it also downloads from two sources, but only decompresses once at the end.

Bryan Gin-ge Chen (Nov 04 2025 at 19:51):

It may be that the master files were already present on your computer from a previous download.

Bryan Gin-ge Chen (Nov 04 2025 at 19:52):

If opening the files in your editor / lake build is fast, then I wouldn't worry more about this. We do need to improve the console output at some point though.

Michael Stoll (Nov 04 2025 at 19:53):

BTW, when "verify[ing] that everything was available in the cache" there is some noise:

  ℹ [97/221] Replayed Batteries.Data.Array.Match
  trace: .> LEAN_PATH=/home/runner/work/mathlib4/mathlib4/.lake/packages/Cli/.lake/build/lib/lean:/home/runner/work/mathlib4/mathlib4/.lake/packages/batteries/.lake/build/lib/lean:/home/runner/work/mathlib4/mathlib4/.lake/packages/Qq/.lake/build/lib/lean:/home/runner/work/mathlib4/mathlib4/.lake/packages/aesop/.lake/build/lib/lean:/home/runner/work/mathlib4/mathlib4/.lake/packages/proofwidgets/.lake/build/lib/lean:/home/runner/work/mathlib4/mathlib4/.lake/packages/importGraph/.lake/build/lib/lean:/home/runner/work/mathlib4/mathlib4/.lake/packages/LeanSearchClient/.lake/build/lib/lean:/home/runner/work/mathlib4/mathlib4/.lake/packages/plausible/.lake/build/lib/lean:/home/runner/work/mathlib4/mathlib4/.lake/build/lib/lean /home/runner/.elan/toolchains/leanprover--lean4---v4.25.0-rc2/bin/lean /home/runner/work/mathlib4/mathlib4/.lake/packages/batteries/Batteries/Data/Array/Match.lean -o /home/runner/work/mathlib4/mathlib4/.lake/packages/batteries/.lake/build/lib/lean/Batteries/Data/Array/Match.olean -i /home/runne

and

ℹ [501/590] Replayed proofwidgets/widgetPackageLock
  trace: ././widget> npm install
  trace: stdout:
  added 1 package, changed 2 packages, and audited 367 packages in 2s
  67 packages are looking for funding
    run `npm fund` for details
  1 moderate severity vulnerability
  To address all issues, run:
    npm audit fix
  Run `npm audit` for details.
  ℹ [502/590] Replayed proofwidgets/widgetJsAll
  trace: /home/runner/work/ProofWidgets4/ProofWidgets4/widget> npm clean-install
  trace: stdout:
  added 365 packages, and audited 366 packages in 8s
  68 packages are looking for funding
    run `npm fund` for details
  1 low severity vulnerability
  To address all issues, run:
    npm audit fix
  Run `npm audit` for details.
  trace: stderr:
  npm warn deprecated three-mesh-bvh@0.7.8: Deprecated due to three.js version incompatibility. Please use v0.8.0, instead.
  trace: /home/runner/work/ProofWidgets4/ProofWidgets4/widget> npm run build
  trace: stdout:
  > @leanprover-community/proofwidgets4@0.1.0 build
  > npx tsc && rollup --environment NODE_ENV:production --config
  trace: stderr:

  dist/rubiks.js → ../.lake/build/js...
  created ../.lake/build/js in 9.9s

  dist/recharts.js → ../.lake/build/js...
  (!) Circular dependencies
  node_modules/d3-interpolate/src/value.js -> node_modules/d3-interpolate/src/array.js -> node_modules/d3-interpolate/src/value.js
  node_modules/d3-interpolate/src/value.js -> node_modules/d3-interpolate/src/object.js -> node_modules/d3-interpolate/src/value.js
  node_modules/recharts/es6/util/ChartUtils.js -> node_modules/recharts/es6/util/getLegendProps.js -> node_modules/recharts/es6/util/ChartUtils.js
  created ../.lake/build/js in 6.5s

  dist/rbTree.js → ../.lake/build/js...
  (!) Circular dependencies
  node_modules/d3-selection/src/selection/index.js -> node_modules/d3-selection/src/selection/select.js -> node_modules/d3-selection/src/selection/index.js
  node_modules/d3-selection/src/selection/index.js -> node_modules/d3-selection/src/selection/selectAll.js -> node_modules/d3-selection/src/selection/index.js
  node_modules/d3-selection/src/selection/index.js -> node_modules/d3-selection/src/selection/filter.js -> node_modules/d3-selection/src/selection/index.js
  ...and 12 more
  created ../.lake/build/js in 1.6s

  dist/presentSelection.js → ../.lake/build/js...
  created ../.lake/build/js in 165ms

  dist/penroseDisplay.js → ../.lake/build/js...
  (!) Circular dependencies
  node_modules/true-myth/dist/es/maybe.js -> node_modules/true-myth/dist/es/result.js -> node_modules/true-myth/dist/es/maybe.js
  node_modules/@penrose/core/dist/engine/EngineUtils.js -> node_modules/@penrose/core/dist/utils/Util.js -> node_modules/@penrose/core/dist/engine/EngineUtils.js
  node_modules/@penrose/core/dist/engine/BBox.js -> node_modules/@penrose/core/dist/contrib/Queries.js -> node_modules/@penrose/core/dist/engine/BBox.js
  ...and 14 more
  created ../.lake/build/js in 14s

  dist/penroseCanvas.js → ../.lake/build/js...
  (!) Circular dependencies
  node_modules/true-myth/dist/es/maybe.js -> node_modules/true-myth/dist/es/result.js -> node_modules/true-myth/dist/es/maybe.js
  node_modules/@penrose/core/dist/engine/EngineUtils.js -> node_modules/@penrose/core/dist/utils/Util.js -> node_modules/@penrose/core/dist/engine/EngineUtils.js
  node_modules/@penrose/core/dist/engine/BBox.js -> node_modules/@penrose/core/dist/contrib/Queries.js -> node_modules/@penrose/core/dist/engine/BBox.js
  ...and 14 more
  created ../.lake/build/js in 13.6s

  dist/ofRpcMethod.js → ../.lake/build/js...
  created ../.lake/build/js in 206ms

  dist/makeEditLink.js → ../.lake/build/js...
  created ../.lake/build/js in 109ms

  dist/interactiveSvg.js → ../.lake/build/js...
  created ../.lake/build/js in 158ms

  dist/interactiveExpr.js → ../.lake/build/js...
  created ../.lake/build/js in 109ms

  dist/index.js → ../.lake/build/js...
  created ../.lake/build/js in 128ms

  dist/htmlDisplayPanel.js → ../.lake/build/js...
  created ../.lake/build/js in 129ms

  dist/htmlDisplay.js → ../.lake/build/js...
  created ../.lake/build/js in 124ms

  dist/goalTypePanel.js → ../.lake/build/js...
  created ../.lake/build/js in 166ms

  dist/filterDetails.js → ../.lake/build/js...
  created ../.lake/build/js in 135ms

  dist/exprPresentation.js → ../.lake/build/js...
  created ../.lake/build/js in 150ms

  dist/d3Graph.js → ../.lake/build/js...
  (!) Circular dependencies
  node_modules/d3-selection/src/selection/index.js -> node_modules/d3-selection/src/selection/select.js -> node_modules/d3-selection/src/selection/index.js
  node_modules/d3-selection/src/selection/index.js -> node_modules/d3-selection/src/selection/selectAll.js -> node_modules/d3-selection/src/selection/index.js
  node_modules/d3-selection/src/selection/index.js -> node_modules/d3-selection/src/selection/filter.js -> node_modules/d3-selection/src/selection/index.js
  ...and 12 more
  created ../.lake/build/js in 2.9s

  dist/cancellable.js → ../.lake/build/js...
  created ../.lake/build/js in 116ms

  dist/animatedHtml.js → ../.lake/build/js...
  created ../.lake/build/js in 136ms

I find the "circular dependencies(!)" a bit scary...

Bryan Gin-ge Chen (Nov 04 2025 at 19:54):

Yeah, that's expected. There's always a bit of output from proofwidgets for some reason.

Bryan Gin-ge Chen (Nov 04 2025 at 19:54):

If there's no red check at the end of the job, it's usually safe to assume things are working as expected.

Ruben Van de Velde (Nov 04 2025 at 20:00):

<Error><Code>BlobAlreadyExists</Code><Message>The specified blob already exists.

seems to suggest that we're trying to upload something that was uploaded already

Kenny Lau (Nov 19 2025 at 22:12):

https://github.com/leanprover-community/mathlib4/actions/runs/19516744620/job/55870339751

This commit builds successfully, but on the "Upload cache to Azure" step I get the error "Transfer failed (error code: 201)" 122 times (once for each new file), and nothing got uploaded to the "kckennylau/mathlib4 cache".

Kenny Lau (Nov 20 2025 at 10:29):

Anyone? I realise it wasn't clear, but I'm asking for help in the message above.

Kenny Lau (Nov 20 2025 at 12:52):

And this is certainly not an isolated incident, a random new PR I looked at by @Jovan Gerbscheid also suffers from the same error: https://github.com/leanprover-community/mathlib4/actions/runs/19536610578/job/55932181913?pr=31847

and I would hypothesize that every PR is having the same issue?

Kenny Lau (Nov 20 2025 at 12:54):

one, two, three, four out of 4 randomly sampled PR's from the first page

Bryan Gin-ge Chen (Nov 20 2025 at 12:54):

Is lake exe cache get broken for these PRs? I think cache logs are a bit noisy, I've noticed that step outputting lots of stuff which looks like it might be an error even though things end up OK. Generally if the post-build-job succeeds, then that means that CI checked that lake exe cache get worked.

Bryan Gin-ge Chen (Nov 20 2025 at 12:56):

There is an issue where sometimes uploading does fail causing the post build job to break which I've been tracking here: #mathlib4 > "verify that everything was available in the cache" fails CI The fact that the log in the upload step is noisy certainly doesn't help in debugging what's going on there.

Kenny Lau (Nov 20 2025 at 12:57):

Bryan Gin-ge Chen said:

I've noticed that step outputting lots of stuff which looks like it might be an error even though things end up OK.

Why I discovered the error is that I noticed that when I push new changes to my PR (and I only changed proofs compared to the previous commit), it actually starts building from the main cache (even though under the new module system if i only change proofs i shouldn't have to recompile everything)

Kenny Lau (Nov 20 2025 at 12:59):

Links:

Bryan Gin-ge Chen (Nov 20 2025 at 13:02):

It's very possible that the way cache works could be improved to take advantage of the new module system and that no one has gotten around to it yet.

Kenny Lau (Nov 20 2025 at 13:02):

sure, but that log still says it failed to download the new cache

Bryan Gin-ge Chen (Nov 20 2025 at 13:10):

This is speculation on my part: the way cache files are hashed probably doesn't take into account the fact that a change to Lean files that only touches proofs shouldn't require re-compiling downstream files, so it's looking for files that don't exist. Unfortunately, I don't have the capacity to look into this further...

Kenny Lau (Nov 20 2025 at 13:43):

Bryan Gin-ge Chen said:

so it's looking for files that don't exist

but the logs show that it tried to download the previous cache and failed

Bryan Gin-ge Chen (Nov 20 2025 at 13:55):

Right, I think that's consistent with what I suggested. cache is running on a checkout of your branch including your latest changes. To find the cached version of oleans for a Lean file, it looks for a file on the cloud server with a filename coming from a hashing algorithm that takes into account the Lean file + its dependencies. So if your changes caused the hashes of a bunch of files to change (unnecessarily, possibly) then in that step, cache will end up trying to download a bunch of files that don't exist (yet). (It's been a while since I looked at the code so this could be wrong; if you're interested I encourage you to take a look here; you might get further than I did in understanding what's happening!)

Kenny Lau (Nov 20 2025 at 13:58):

Ok, I understand; would it be a better algorithm to just download the cache of the previous commit? Which, I suppose, is your previous message saying that we could make use of the new module system in a future algorithm


Last updated: Dec 20 2025 at 21:32 UTC