Zulip Chat Archive
Stream: lean4
Topic: Cache vs Lake flags
Henrik Böving (Jan 04 2024 at 12:07):
I noticed a little issue with the meta flags. Consider this CI pipeline: https://github.com/leanprover-community/mathlib4_docs/blob/30f98df234e610b33b4139eed8247f020325c2b4/.github/workflows/docs.yaml we always run with -kdoc=on but the cache is pretty much ignored. It gets downloaded and unpacked but once we reach the compile mathlib4 step it will recompile mathlib plus deps from scratch.
I assume that this is a cache invalidation issue in Lake? CC @Mac Malone
Mac Malone (Jan 04 2024 at 18:03):
@Henrik BövingLake does not use the configuration options as part of the mdoule trace and I do not believe doc-gen4 does either. However, cache does hash the lake-manifest.json
, which could cause the problem?
Henrik Böving (Jan 04 2024 at 18:09):
Mac Malone said:
Henrik BövingLake does not use the configuration options as part of the mdoule trace and I do not believe doc-gen4 does either. However, cache does hash the
lake-manifest.json
, which could cause the problem?
Okay i updated it to this: https://github.com/leanprover-community/mathlib4_docs/blob/main/.github/workflows/docs.yaml#L37-L52
in the CI run: https://github.com/leanprover-community/mathlib4_docs/actions/runs/7413174666/job/20171569945 it downloads 4k cache files and then begins to build mathlib from scratch :(
Mario Carneiro (Jan 04 2024 at 18:11):
IMO this is why the -Kdoc=on
design with optional dependencies is bad. It's not a dependency of mathlib, and the cache files are built without it
Henrik Böving (Jan 04 2024 at 18:11):
But that is a separate concern isn't it?
Mario Carneiro (Jan 04 2024 at 18:11):
@Henrik Böving what is your desired solution? The cache works, or the cache is not run?
Henrik Böving (Jan 04 2024 at 18:11):
The cache is already downloaded, why does lake not respect it?
Henrik Böving (Jan 04 2024 at 18:12):
Mario Carneiro said:
Henrik Böving what is your desired solution? The cache works, or the cache is not run?
the cache works I would say, doc-gen itself only runs like 10 min.
Mario Carneiro (Jan 04 2024 at 18:12):
Why do you need to insert docgen as an additional dependency of mathlib?
Henrik Böving (Jan 04 2024 at 18:13):
Because doc-gen operates as a lake facet right now.
Mario Carneiro (Jan 04 2024 at 18:13):
Do you need elaboration outputs or are oleans sufficient?
Mario Carneiro (Jan 04 2024 at 18:14):
No I get that much, I'm asking about the ideal future design here
Henrik Böving (Jan 04 2024 at 18:14):
I need information from the oleans and from the lakefile (for source linking)
Mario Carneiro (Jan 04 2024 at 18:14):
Doesn't lake env
already give you what you need with LEAN_SRC_PATH
then?
Mario Carneiro (Jan 04 2024 at 18:15):
If so, doc-gen could just be a standalone executable you run under lake env
, similar to e.g. lean4checker
Mario Carneiro (Jan 04 2024 at 18:16):
These kind of programs can be very fast, lake exe shake
analyzes the import graph of mathlib by looking at oleans and it only takes 8 seconds to do everything
Henrik Böving (Jan 04 2024 at 18:18):
No, we want to link to e.g. the github locations so I actually need access to the lake interna (getSrcUri). doc-gen used to be a standalone executable, there were 3 reasons that I moved away from that approach once facets came up:
- It wasn't well integrated with the (dynamic) rebuilding from lake, i.e. it would always compile all docs. Now if you have a file that you didn't change it also wont regenerate docs.
- It wasn't parallelized, which would of course have been possible but lake gave us this for free.
- When core added new env extensions you'd have be careful to compile the binary with the exact same compiler version as your project such that you dont get weird crashes. (This was the most annoying one because it meant that people either had to manually edit lean-toolchain or make sure they stay in sync with doc-gen or the other way around in the case of mathlib)
Re performance: doc-gen is not "slow" (if analyzing thousands of files in 10 min counts as slow) because of lake, I did ran perf on the top level lake call and one can see that by now the majority of time is spent in the delaboration facilities of the lean compiler itself.
Mario Carneiro (Jan 04 2024 at 18:20):
(I didn't mean to suggest that 10 min is slow, but rather that the hour previous to those 10 min could be skipped with this approach)
Mario Carneiro (Jan 04 2024 at 18:22):
Still, I think the fundamental issue is that doc-gen should work on lake projects that do not contain doc-gen
in them anywhere
Mario Carneiro (Jan 04 2024 at 18:22):
and "facets" as currently designed don't seem to capture that properly
Mario Carneiro (Jan 04 2024 at 18:23):
assuming that mathlib has to be the one to add a doc-gen facet to Mathlib
Mario Carneiro (Jan 04 2024 at 18:28):
Henrik Böving said:
When core added new env extensions you'd have be careful to compile the binary with the exact same compiler version as your project such that you dont get weird crashes. (This was the most annoying one because it meant that people either had to manually edit lean-toolchain or make sure they stay in sync with doc-gen or the other way around in the case of mathlib)
This is also an issue for other standalone tools, I would really like to have a better solution for getting lake to do this properly
Mario Carneiro (Jan 04 2024 at 18:29):
I want to point to two lake projects A and B and tell lake "please run A on B" without them being dependencies of each other in either direction
Mario Carneiro (Jan 04 2024 at 18:30):
A common project C that imports both might be an option, but it's really just a shim
Joe Hendrix (Jan 04 2024 at 18:31):
It seems like doc-gen
could be a standalone executable if we exported the ability to load a workspace into a standalone library and perhaps some of the concurrent build logic.
Joe Hendrix (Jan 04 2024 at 18:32):
I like the idea of a common project that imports both as an immediate workaround though.
Mario Carneiro (Jan 04 2024 at 18:33):
yeah, a code interface to Lake would be great
Henrik Böving (Jan 04 2024 at 18:33):
Joe Hendrix said:
It seems like
doc-gen
could be a standalone executable if we exported the ability to load a workspace into a standalone library and perhaps some of the concurrent build logic.
doc-gen used to do this by simply import Lake
(which, just like import Lean
, just works :tm: ). But Mac tends to break his APIs semi frequently so that was also a bit annoying to maintain in the long run which is another reason that I'm with the current approach.
Mario Carneiro (Jan 04 2024 at 18:34):
A stable and supported code interface to lake
Mario Carneiro (Jan 04 2024 at 18:34):
something that you can count on about as well as the lakefile format itself
Henrik Böving (Jan 04 2024 at 18:36):
The flake approach is not actually a solution to this instability btw. the doc-gen flake (which is arguably the most complex one out there) does also break on lake changes every now and then (although that frequency has also decreased over the time)
Mario Carneiro (Jan 04 2024 at 18:37):
I feel like support for nix in the lean ecosystem has waned
Henrik Böving (Jan 04 2024 at 18:38):
Oh sorry not flake, facet of course, facet.
Mario Carneiro (Jan 04 2024 at 18:38):
I know approximately nothing about flakes
Mario Carneiro (Jan 04 2024 at 18:39):
I think the solution to the instability is to get @Mac Malone to document the actual public API instead of equivocating about it
Henrik Böving (Jan 04 2024 at 18:39):
Joe Hendrix said:
I like the idea of a common project that imports both as an immediate workaround though.
Fun fact: Thats exactly how the mathlib4_docs build used to work until I posted this issue, but for an entirely different reason! The reason was that lake wasn't able to update dependencies selectively so if i ran lake update in a mathlib4 tree i could not specify to only update doc-gen. However that's now possible and I had a bit of free time so I thought why not just try it out.
Guess I'll revert my just try it out changes for now^^
Mario Carneiro (Jan 04 2024 at 18:41):
If you have a project that depends on mathlib (without -Kdoc=on), then you should be able to reuse the cache without issue
Henrik Böving (Jan 04 2024 at 18:41):
Yes, it can do that
Henrik Böving (Jan 04 2024 at 18:42):
If you view older doc-gen builds you'll see they are in the 15 min ballpark
Henrik Böving (Jan 04 2024 at 18:42):
Mario Carneiro (Jan 04 2024 at 18:42):
is that a doc-gen job or a mathlib job?
Mario Carneiro (Jan 04 2024 at 18:43):
or should it be a separate project?
Henrik Böving (Jan 04 2024 at 18:43):
That's a doc-gen job that pulls the cache, builds doc-gen, runs doc-gen and uploads the result. It runs over here: https://github.com/leanprover-community/mathlib4_docs/actions?page=2
Mario Carneiro (Jan 04 2024 at 18:43):
it feels like building mathlib docs is something mathlib should be doing
Mario Carneiro (Jan 04 2024 at 18:44):
although I can imagine you want to watch it for doc-gen regressions and issues
Henrik Böving (Jan 04 2024 at 18:44):
Mario Carneiro said:
it feels like building mathlib docs is something mathlib should be doing
yes yes, we've had this talk before^^ originally we had the job on my repo, then it moved to this one.
Mario Carneiro (Jan 04 2024 at 18:45):
So it sounds like "project C" already exists, maybe it could be a lake project?
Henrik Böving (Jan 04 2024 at 18:46):
Henrik Böving said:
Joe Hendrix said:
I like the idea of a common project that imports both as an immediate workaround though.
Fun fact: Thats exactly how the mathlib4_docs build used to work until I posted this issue, but for an entirely different reason! The reason was that lake wasn't able to update dependencies selectively so if i ran lake update in a mathlib4 tree i could not specify to only update doc-gen. However that's now possible and I had a bit of free time so I thought why not just try it out.
Guess I'll revert my just try it out changes for now^^
-> https://github.com/leanprover-community/mathlib4_docs/blob/main/.github/workflows/docs.yaml#L46
It makes a lake project!
Joe Hendrix (Jan 04 2024 at 18:54):
That's awesome.
Henrik Böving (Jan 04 2024 at 19:02):
And everything back to normal: https://github.com/leanprover-community/mathlib4_docs/actions/runs/7413577649
Mario Carneiro (Jan 04 2024 at 19:05):
wow, there are a lot of failures on that workflow, is this just the recent cache hiccup or does it really break that often
Henrik Böving (Jan 04 2024 at 19:10):
THe recent cache hiccup is responsible for the recent failures. In decreasing order of frequency the reasons for failure are:
- The fact that this job is run on different machines (and for reasons unknownst to me never in a docker container) and sometimes these machines have a git configuration set up (which I didn't dare override) sometimes they don't. Since I do create a commit on the workaround project, I need an author and email set up. Some of these machines have this configured, others dont. (This was my main motivation for doing what I did today) If I dont have an author and try to commit, git fails. But I learned just now that you can have an author per repo so I added that and now these issues should be issues of the past
- The hard drive is full ^^
- lake breaks its API
- (very rare nowadays) some Lean compiler API that I use inside of doc-gen changed its types and doc-gen doesn't compile on the release that mathlib uses
so TLDR: This ratio should drastically improve from now on.
Last updated: May 02 2025 at 03:31 UTC