Zulip Chat Archive
Stream: PrimeNumberTheorem+
Topic: CI issues
Alex Kontorovich (Jan 13 2026 at 14:34):
Quick question (to @Pietro Monticone): is there anything we could do to speed up compilation (and CI, but I'm more interested in the former)? Does it really need to take >40 mins (and counting) to update two apostrophes in the blueprint?
Pietro Monticone (Jan 13 2026 at 14:58):
What do you mean by “compilation” in this case? The build job step takes approximately 12 mins.
David Renshaw (Jan 13 2026 at 15:00):
CI for https://github.com/AlexKontorovich/PrimeNumberTheoremAnd/commit/614e98c194d437d45c195a4fe9d0dc1487d40b82 ran for 61 minutes before being cancelled.
Pietro Monticone (Jan 13 2026 at 15:02):
Sure, that’s the docgen-action step.
David Renshaw (Jan 13 2026 at 15:03):
shouldn't it have been a cache hit and therefore relatively fast?
Pietro Monticone (Jan 13 2026 at 15:03):
Nothing to do with the blueprint. It might be that the docgen-action isn’t saving cache appropriately. I’m not maintaining that one and unfortunately I don’t have time for debugging it right now. @Anne Baanen would definitely know better.
Alex Kontorovich (Jan 13 2026 at 15:07):
I'm sure I have it set up incorrectly. I just don't know what the right settings are...
Pietro Monticone (Jan 13 2026 at 15:11):
I don’t think you have it set it up incorrectly. I have already updated the entire .github of the project to apply those that are currently considered best practices.
I simply think someone should probably take a closer look at how the docgen-action is saving to cache since I couldn’t and won’t be able to find the time to do it soon.
Maybe we could temporarily disable the API doc generation to transition from 60+ mins to approximately 15 mins.
Alex Kontorovich (Jan 13 2026 at 15:23):
I will do as I'm told; this stuff is above my paygrade...
David Renshaw (Jan 13 2026 at 15:29):
It looks to me like pushes to main have had slow (~ 1 hour) CI starting at https://github.com/AlexKontorovich/PrimeNumberTheoremAnd/commit/6294a4c719a032af3f631bc3e7f53e9dea2dd6bf. However, I don't see anything immediately suspicious in that commit.
David Renshaw (Jan 13 2026 at 15:50):
it's possible that deleting the cached docs build would help. @Alex Kontorovich could you try deleting the Docs-XXX caches at https://github.com/AlexKontorovich/PrimeNumberTheoremAnd/actions/caches ? Then we would expect the next push to main to still take an hour, but possibly subsequent ones would be faster.
David Renshaw (Jan 13 2026 at 15:53):
Deleting the cache should not make a difference to CI runs on PRs.
Pietro Monticone (Jan 13 2026 at 15:53):
Done
David Renshaw (Jan 13 2026 at 18:56):
https://github.com/AlexKontorovich/PrimeNumberTheoremAnd/actions/runs/20964258436/job/60258564817
Error: Fetching artifact metadata failed. Is githubstatus.com reporting issues with API requests, Pages, or Actions? Please re-run the deployment at a later time.
Error: Error: Multiple artifacts named "github-pages" were unexpectedly found for this workflow run. Artifact count is 2.
at getArtifactMetadata (/home/runner/work/_actions/actions/deploy-pages/d6db90164ac5ed86f2b6aed7e0febac5b3c0c03e/src/internal/api-client.js:89:1)
at processTicksAndRejections (node:internal/process/task_queues:95:5)
at Deployment.create (/home/runner/work/_actions/actions/deploy-pages/d6db90164ac5ed86f2b6aed7e0febac5b3c0c03e/src/internal/deployment.js:66:1)
at main (/home/runner/work/_actions/actions/deploy-pages/d6db90164ac5ed86f2b6aed7e0febac5b3c0c03e/src/index.js:30:1)
Error: Error: Multiple artifacts named "github-pages" were unexpectedly found for this workflow run. Artifact count is 2.
Notification Bot (Jan 13 2026 at 18:57):
13 messages were moved here from #PrimeNumberTheorem+ > Explicit estimates, outstanding tasks v2. by David Renshaw.
Aayush Rajasekaran (Jan 13 2026 at 19:05):
CI issues? That's actually something I can help with! :smile:
(I'm a fan of the project, but the math's a bit too much for me to contribute)
Pietro Monticone (Jan 13 2026 at 19:06):
David Renshaw said:
https://github.com/AlexKontorovich/PrimeNumberTheoremAnd/actions/runs/20964258436/job/60258564817
Error: Fetching artifact metadata failed. Is githubstatus.com reporting issues with API requests, Pages, or Actions? Please re-run the deployment at a later time. Error: Error: Multiple artifacts named "github-pages" were unexpectedly found for this workflow run. Artifact count is 2. at getArtifactMetadata (/home/runner/work/_actions/actions/deploy-pages/d6db90164ac5ed86f2b6aed7e0febac5b3c0c03e/src/internal/api-client.js:89:1) at processTicksAndRejections (node:internal/process/task_queues:95:5) at Deployment.create (/home/runner/work/_actions/actions/deploy-pages/d6db90164ac5ed86f2b6aed7e0febac5b3c0c03e/src/internal/deployment.js:66:1) at main (/home/runner/work/_actions/actions/deploy-pages/d6db90164ac5ed86f2b6aed7e0febac5b3c0c03e/src/index.js:30:1) Error: Error: Multiple artifacts named "github-pages" were unexpectedly found for this workflow run. Artifact count is 2.
Re-run right now.
Aayush Rajasekaran (Jan 13 2026 at 19:06):
David Renshaw said:
It looks to me like pushes to
mainhave had slow (~ 1 hour) CI starting at https://github.com/AlexKontorovich/PrimeNumberTheoremAnd/commit/6294a4c719a032af3f631bc3e7f53e9dea2dd6bf. However, I don't see anything immediately suspicious in that commit.
It looks like most of the time is spent on "Compile blueprint and documentation", and that commit touches the blueprint, so I think that commit might actually be suspicious.
Aayush Rajasekaran (Jan 13 2026 at 19:07):
Lol, it looks like it's attempting to build all of Mathlib's docs each time?
Aayush Rajasekaran (Jan 13 2026 at 19:07):
(which it wasn't before commit 6294a4c719a032af3f631bc3e7f53e9dea2dd6bf)
Aayush Rajasekaran (Jan 13 2026 at 19:09):
aha
info: stderr:
PANIC at String.Slice.Pos.get! Init.Data.String.Basic:1149:29: Cannot retrieve character at end position
Aayush Rajasekaran (Jan 13 2026 at 19:10):
i think the panic is causing docgen to always have a cache mismatch, and build things from scratch every time.
Terence Tao (Jan 13 2026 at 20:03):
By the way, the most recent merges to the repository are not building properly, even though they pass CI. Not sure if this is related to recent changes to the cache etc.
Aayush Rajasekaran (Jan 13 2026 at 22:26):
@Terence Tao the recent CI failures on main are simply because the build gets cancelled when a new commit lands in main — a new build is kicked off for the new commit. This behaviour can be changed if we’d like to not cancel the old build. I would suggest making this change.
Aayush Rajasekaran (Jan 13 2026 at 22:28):
As for why the builds are suddenly taking much longer, I think the issue lies in docgen-action not using good cache keys. Specifically, the key doesn’t seem to get updated when the references bib changes, so CI fetches an old, incorrect cache that it then can’t use. Deleting the cache (which we did earlier) should help with this, though it might require a bit more manual intervention. And we’ll wind up with the same problem every time references are updated.
I’m not 100% sure that this is the problem, though — I can investigate more tomorrow and then open some issues / fixes.
David Renshaw (Jan 13 2026 at 23:14):
Most recent run succeeded in 70 minutes: https://github.com/AlexKontorovich/PrimeNumberTheoremAnd/actions/runs/20974156634/job/60284851545
David Renshaw (Jan 13 2026 at 23:15):
we'll see whether subsequent runs are faster due to new cache
Chris Henson (Jan 13 2026 at 23:36):
Aayush Rajasekaran said:
Specifically, the key doesn’t seem to get updated when the references bib changes, so CI fetches an old, incorrect cache that it then can’t use.
I think this also explains #PrimeNumberTheorem+ > Blueprint not compiling in repo despite passing CI, with my PR to docgen-action being fine in isolation but interacting with the cache bug that Kim later fixed in this PR. (So hopefully fixed going forward)
Terence Tao (Jan 14 2026 at 00:23):
@Pietro Monticone and I had discussed enabling concurrent builds but I was concerned that thus would be unnecessarily CPU intensive. But perhaps if the caching is improved to the point where builds can take, say, 30 mins or less, then perhaps the CPU overhead of concurrent building would be at acceptable levels?
Pietro Monticone (Jan 14 2026 at 04:03):
Temporarily downgraded docgen-action to avoid issues with references and similar. I'll wait for the fix upstream.
Aayush Rajasekaran (Jan 14 2026 at 14:25):
Pietro Monticone said:
Temporarily downgraded
docgen-actionto avoid issues with references and similar. I'll wait for the fix upstream.
Thanks, I'll ping you once a fix is in. Issue opened here.
EDIT: and fix opened here.
Kevin Buzzard (Jan 14 2026 at 16:55):
FWIW in the FLT project we don't run docgen on commits to non-master branches. In particular when you're making a PR you don't have to go through the pain of compiling docgen at all (so CI finishes quickly when you're developing on a branch and want to merge or get reviews). It's only when things are merged that docs are generated (and then I agree that it's time-consuming, but then again I am not usually waiting impatiently for docs; if a PR is merged then I can already start on something new and/or merge master into my branch)
Pietro Monticone (Jan 14 2026 at 18:32):
Yes, we’re using the same setup here.
Terence Tao (Jan 14 2026 at 21:44):
Build times are back to well under half an hour now, which I think is an acceptable level for this project.
Aayush Rajasekaran (Jan 14 2026 at 21:57):
@Terence Tao Yes, but references are currently disabled due to the docgen-action downgrade.
In theory when this PR lands, and we bump docgen back up, it'll stay at under half an hour, except for when the entire docs need to be rebuilt due to a change :fingers_crossed:
Pietro Monticone (Jan 19 2026 at 19:50):
The original issue I was referring to (i.e. not requiring a PATH/TO/.bib by default) isn't fixed yet, right? @Chris Henson
Chris Henson (Jan 19 2026 at 19:53):
I believe the original issue was also because of the cache, and should be fixed by @Aayush Rajasekaran's now merged PR to docgen-action.
Pietro Monticone (Jan 19 2026 at 19:56):
I don't think so. It's still requiring a PATH/TO/.bib by default. Just tested on #Carleson too (see here).
I needed to downgrade the action version there too.
Chris Henson (Jan 19 2026 at 20:21):
I really am not sure then, these cache issues are outside the scope of the PR I made. The action has a check that the file exists before trying to copy it specifically to avoid this problem, but this is still not being picked up by the cache is my theory. I'd guess that @Anne Baanen is a better person to ask than me.
Jireh Loreaux (Jan 20 2026 at 14:28):
@Pietro Monticone I'm having the same issues in another project. I guess the temporary fix is to downgrade docgen-action temporarily. To what version should I downgrade to avoid both the cache misses and the need for a .bib file?
Pietro Monticone (Jan 20 2026 at 14:29):
Yes, the temporary fix is to downgrade.
Pietro Monticone (Jan 20 2026 at 14:29):
Yes, the temporary fix is to downgrade.
Pietro Monticone (Jan 20 2026 at 14:30):
You should use:
leanprover-community/docgen-action@deed0cdc44dd8e5de07a300773eb751d33e32fc8 # 2025-10-26
Kim Morrison (Jan 30 2026 at 04:06):
FYI the references.bib issue that required the docgen-action downgrade has been fixed. We updated the doc-gen4 v4.28.0-rc1 tag to include https://github.com/leanprover/doc-gen4/pull/354.
@Pietro Monticone you should now be able to upgrade back to docgen-action@main for projects using Lean toolchain v4.28.0-rc1 or later.
Pietro Monticone (Jan 30 2026 at 04:10):
Great, thanks! I’ll bump them to v4.28.0-rc1 and then bump the docgen-action accordingly soon.
Pietro Monticone (Jan 30 2026 at 15:33):
Opened PNT#824 to bump to v4.28.0-rc1.
Pietro Monticone (Jan 30 2026 at 16:03):
Bumped docgen-action
Pietro Monticone (Jan 30 2026 at 18:45):
It worked as expected.
Pietro Monticone (Jan 31 2026 at 16:04):
Opened the same PR for #Carleson (carleson#534) and #FLT (FLT#852).
Alex Kontorovich (Feb 03 2026 at 17:29):
It looks like we still have a bad compilation on the current PNT+ repo? I think we should clear that before merging more new PRs?
Terence Tao (Feb 04 2026 at 16:51):
Previously we were able to fix the build by adding a dummy references.bib file. Can this trick work twice?
Alex Kontorovich (Feb 04 2026 at 20:19):
@Aayush Rajasekaran graciously agreed to try to work on it, so let's see where we get. (It's easier to make progress when we can see the blueprint...)
Aayush Rajasekaran (Feb 04 2026 at 20:20):
So I'm pretty sure nothing's wrong with our setup, and the bug's probably in docgen-action.
I'm going to downgrade docgen-action again -- I think that's better than dodging the issue with a dummy file.
Aayush Rajasekaran (Feb 04 2026 at 20:20):
I'll then take a look at fixing docgen-action itself.
Alex Kontorovich (Feb 04 2026 at 20:21):
As I think @Pietro Monticone mentioned - do we really need docgen? I do use the mathlib docs quite regularly. But not so much for the PNT docs - for that, I go to the blueprint. Maybe we should consider permanently disabling docgen?
Aayush Rajasekaran (Feb 04 2026 at 20:43):
Alex Kontorovich said:
As I think Pietro Monticone mentioned - do we really need docgen?
I don't have an opinion on that, I defer to you fine folks.
Aayush Rajasekaran (Feb 04 2026 at 20:44):
Downgrade PR opened here, should get CI green on the PNT repo again.
Also opened a docgen-action fix here, will circle back if / when that lands.
Terence Tao (Feb 04 2026 at 20:55):
For the files that are blueprinted, doc-gen is largely redundant, I agree. We have a separate discussion with @Harald Helfgott with what to do with the Riemann-Stieltjes files: from the point of view of the project, it would be most convenient if they were blueprinted (in particular, the blueprint could then refer to theorems in those files in a standard LaTeX style (Theorem 2.4.5, etc.) rather than by referencing a Lean theorem), but ther was also the option of having them exist outside the blueprint and only documented within doc-gen. Personally I think the former option makes more sense, especially if wea re going to downgrade doc-gen, but we can continue discussing this.
Alex Kontorovich (Feb 04 2026 at 22:11):
Maybe we can just add an issue that someone should go through (perhaps with AI help) and blueprint what Harald has written?
Terence Tao (Feb 04 2026 at 22:18):
Fair enough. I'd be happy to add blueprint data starting from some other form of documentation (e.g., docstrings).
Harald Helfgott (Feb 04 2026 at 22:19):
I'll be glad to do some of it myself, but of course that's compatible with adding an issue. Go ahead.
Terence Tao (Feb 04 2026 at 22:21):
Maybe you could splat some text (as comments) next to all the major theorems describing the statement (in LaTeX/natural language), a reference to source material (e.g., MV Theorem 1, etc.), (optionally) a sketch of the proof, and any other relevant commentary? I can do all the relevant formatting, but I need some source text to work with (i don't have direct access to MV or WZ).
Harald Helfgott (Feb 04 2026 at 22:24):
Sure. ("Splat"?) I can also upload either pdfs of the original source material or the Markdown versions I actually gave to the AI, but where?
Terence Tao (Feb 04 2026 at 22:26):
One could perhaps cut and paste (which is what I meant by "splat") relevant snippets of that markdown next to the Lean theorems they correspond to, which is the main point of the blueprint (to link up the informal text with the Lean code).
Harald Helfgott (Feb 04 2026 at 22:27):
Ah, sure, I can do that.
Harald Helfgott (Feb 04 2026 at 22:27):
Let me start with the shortest file (which is also the one that took the least effort on my part) and then people can tell me what else they need or what they do not need.
Harald Helfgott (Feb 04 2026 at 22:28):
Should I just push it with git once it's ready?
Terence Tao (Feb 04 2026 at 22:30):
Sure. The end result would look like https://github.com/AlexKontorovich/PrimeNumberTheoremAnd/blob/6fb06700cd0ec53c45ff4787d2124970d72351a1/PrimeNumberTheoremAnd/MobiusLemma.lean in which lines containing blueprint_comment and @[blueprint ...] are interspersed with the Lean code. But you can just have some markdown of similar content in Lean comments (one uses -- (single line comment) or /- (multi-line comment) -/) and I will reformat.
Kim Morrison (Feb 05 2026 at 01:20):
Aayush Rajasekaran said:
Also opened a docgen-action fix here, will circle back if / when that lands.
Merged.
Aayush Rajasekaran (Feb 05 2026 at 02:54):
Thanks for the quick review @Kim Morrison !
I've opened PNT#885, which bumps docgen-action to the latest main commit. Everything should just work now.
Aayush Rajasekaran (Feb 05 2026 at 12:44):
Okay, CI has passed again with the fix in docgen-action. The real test will be CI for the next commit that lands in main, which should:
- pass CI :check:
- not take too long (~20 mins) :hourglass:
Aayush Rajasekaran (Feb 06 2026 at 00:24):
Apologies, I forgot to add the references back in -- fixed in PNT#887
Aayush Rajasekaran (Feb 06 2026 at 23:42):
@Kim Morrison Two more PRs that should help with this workflow (not specific to the PNT project):
- https://github.com/leanprover-community/docgen-action/pull/23
- https://github.com/leanprover/doc-gen4/pull/356
Terence Tao (Feb 08 2026 at 04:55):
For some reason the previous docgen-action fix no longer works. Is there another temporary fix? Without the ability to rebuild the blueprint it is a challenge to issue new tasks.
Aayush Rajasekaran (Feb 08 2026 at 14:56):
@Terence Tao I think we need https://github.com/leanprover-community/docgen-action/pull/23 to land, and then update docgen.
We can temporarily downgrade again for a quick fix: https://github.com/AlexKontorovich/PrimeNumberTheoremAnd/pull/898.
Kim Morrison (Feb 09 2026 at 01:21):
I merged https://github.com/leanprover-community/docgen-action/pull/23, but per Henrik's comment don't intend to merge https://github.com/leanprover/doc-gen4/pull/356.
Aayush Rajasekaran (Feb 09 2026 at 12:49):
Okay, and this PR bumps docgen-action again in PNT: https://github.com/AlexKontorovich/PrimeNumberTheoremAnd/pull/923
Aayush Rajasekaran (Feb 09 2026 at 19:26):
The latest CI failure is because GitHub is having outages today. We can re-run it after that's resolved, and it should pass.
Last updated: Feb 28 2026 at 14:05 UTC