Zulip Chat Archive
Stream: lean4
Topic: doc-gen4 Equations
Henrik Böving (Dec 24 2024 at 13:27):
I have a questions for people that consume the auto generated documentation. How important is the Equations feature for you? Would it be acceptable if we just put it aside, given the fact that you can click the source links anyway?
I just did some benchmarks simply compiling the documentation for the core modules and with equations enabled we have:
- 1.6G peak memory consumption
- 1min10s build time
with equations disabled we get:
- 900M peak memory consumption
- 14s build time
These improvements are of course not going to translate to mathlib build time 1:1 because in mathlib a lot of stuff is theorem and not def but I expect that as people write more software style projects in Lean the demand for def heavy projects is going to increase.
Henrik Böving (Dec 24 2024 at 13:28):
/poll The future of equations in doc-gen
Remove
Keep
Henrik Böving (Dec 24 2024 at 13:44):
(Keep but only for very simple definitions is a non trivial option because it is difficult to identify what "simple" means so we'll still have to do computation for every term)
Junyan Xu (Dec 24 2024 at 15:29):
Note that equations of slightly complicated definitions are already not shown in the docs, e.g. docs#Ring.DirectLimit.
One or more equations did not get rendered due to their size.
Henrik Böving (Dec 24 2024 at 15:32):
Yes we already run a heuristic for that behavior, the choice there is an artefact from someone voting that didn't know this
Kevin Buzzard (Dec 24 2024 at 15:42):
I never use the functionality because the last n times I tried it the answers were useless ("did not get rendered") so I basically gave up on it.
Junyan Xu (Dec 24 2024 at 15:43):
Maybe things have improved now? The equation for docs#nonZeroDivisors was not rendered when I opened doc-gen4#160 but it is now shown.
Joachim Breitner (Dec 24 2024 at 15:44):
How does it generate the equations? using the same machinery that simp etc uses, including proving the equational theorems if they are not proven in the first place, and thus running expensive tactics?
Junyan Xu (Dec 24 2024 at 15:45):
If faster doc-gen compile time means we get more docs updates per day, that may be a worthy tradeoff. I'm not sure how often it's updated currently.
Henrik Böving (Dec 24 2024 at 15:46):
Joachim Breitner said:
How does it generate the equations? using the same machinery that simp etc uses, including proving the equational theorems if they are not proven in the first place, and thus running expensive tactics?
The cost is spread two fold, for one we are running the same tactics that simp does, though notably with the kernel disabled so that saves us a bit. We also spend around half of the run time in the pretty printer though.
Henrik Böving (Dec 24 2024 at 15:47):
Junyan Xu said:
If faster doc-gen compile time means we get more docs updates per day, that may be a worthy tradeoff. I'm not sure how often it's updated currently.
Every 8 hours
Junyan Xu (Dec 24 2024 at 15:49):
Does it take ~8 hours to build docs for mathlib once?
Henrik Böving (Dec 24 2024 at 15:50):
No it takes about 15 minutes on the build machines
Henrik Böving (Dec 24 2024 at 15:50):
I just didn't see a point in burning more power than doing an 8 hour cycle when i set it up
Junyan Xu (Dec 24 2024 at 15:55):
That's a lot faster than I expected. Building the whole mathlib (on the bench machines) already takes more than 17 minutes. It seems feasible to run doc-gen for each bors batch, as the cost seems negligible compared to CI runs on all mathlib branches. Is there cache for the docs as well?
Joachim Breitner (Dec 24 2024 at 15:56):
I think the first half is avoidable; you could use the function that calculates the types of the equational theorems without actually proving them.
Or you could skip even that an just print the value of the definition (or the predefinion for recursive functions). This would then show a single equation with sometimes a match on the RHS. In a way that would even be more relevant. As a user I think I wouldn't mind that.
So I think this feature, or a similarly useful variant, should be possible with neglectible cost to obtain the equation. Doesn't solve the pretty printing part, though.
Henrik Böving (Dec 24 2024 at 15:58):
The first thing sounds reasonable if we have that function nicely exposed somewhere. The second thing is already the default when we do not manage to obtain an equation in the first place. Agreed on the pretty printing part. I also talked to Kyle about it and he estimates that the pretty printer will get slower before it gets faster due to some features that people are interested in so we can't expect some miracles here most likely.
Henrik Böving (Dec 24 2024 at 16:12):
Junyan Xu said:
That's a lot faster than I expected. Building the whole mathlib (on the
benchmachines) already takes more than 17 minutes. It seems feasible to run doc-gen for each bors batch, as the cost seems negligible compared to CI runs on all mathlib branches. Is there cache for the docs as well?
There does not currently exist a cache no, I think it might be possible to build one in general though, not quite sure. Either way just because we can do these types of runs I don't really see a need to. We can tune the frequency a little but I've not yet seen people complain about the 8 hour cycle.
Daniel Weber (Dec 24 2024 at 17:51):
Henrik Böving said:
(Keep but only for very simple definitions is a non trivial option because it is difficult to identify what "simple" means so we'll still have to do computation for every term)
Would it maybe be possible to use some inaccurate heuristic, like "length of the source code"?
Henrik Böving (Dec 24 2024 at 17:51):
The source code is not available from the .olean files
Henrik Böving (Dec 24 2024 at 17:54):
Of course it generally lies next to the .olean files somewhere (gets more interesting with e.g. bultin files) but we'd need to write the logic to find that + you need to account for automatically generated declaratuons that don't have real source associated with them.
Joachim Breitner (Dec 24 2024 at 18:53):
The first thing sounds reasonable if we have that function nicely exposed somewhere.
Not yet nicely exposed (you'd currently have to manually check the env extensions for structural and wf recursion) but certainly doable as an extension of the current equations API.
Artie Khovanov (Dec 26 2024 at 21:22):
I use the feature, but I guess it's not too much effort to click through
Bhavik Mehta (May 21 2025 at 12:05):
Coming back to this, is there an option to turn off the Equations feature in doc-gen running in a project? I'd like to run the same benchmarking experiment
Henrik Böving (May 21 2025 at 12:14):
No though you can get this pretty easily by changing this function https://github.com/leanprover/doc-gen4/blob/main/DocGen4/Process/DefinitionInfo.lean#L30C5-L30C21 to return #[]
Kevin Buzzard (May 21 2025 at 12:22):
I don't understand the relationship between docgen4 and FLT. Am I able to do this locally in FLT and save me some time in CI? I am not at all convinced that anyone is using the FLT doc-gen output, but if I switch it off I'll break blueprint. I am trying to get my FLT CI faster and I think these equations are costing me 3 minutes:
Screenshot from 2025-05-21 13-22-02.png
Henrik Böving (May 21 2025 at 12:28):
Equations are being generated on the fly for all encountered definitions so just because you can't see these prints doesnt mean it is busy for so long. The issue with making this configurable is that it is not quite clear right now how to pipe external config nicely into lake projects. We have abused env variables in the past though, so maybe I can add an env var flag for this as wwll
Bhavik Mehta (May 21 2025 at 15:53):
Kevin Buzzard said:
if I switch it off I'll break blueprint
Actually I don't think this is true, the blueprint steps in FLT CI happen before the build docs step happens
Kevin Buzzard (May 21 2025 at 16:08):
I mean "if I switch doc-gen off completely then blueprint will create broken links" but I would be happy to switch off equations if it saves me 3 minutes.
Jz Pan (May 21 2025 at 16:27):
Kevin Buzzard said:
I am trying to get my FLT CI faster and I think these equations are costing me 3 minutes:
Unfortunately those equations are Lean system's function (probably for metaprogramming), nothing to do with mathematics.
Bhavik Mehta (May 21 2025 at 18:59):
In FLT, without caching the docbuild, disabling equations in the way you describe makes it twice as fast. I think it would be nice to have an env var flag for this option (that way it can also work with caching)
Last updated: Dec 20 2025 at 21:32 UTC