Zulip Chat Archive
Stream: lean4
Topic: doc-gen4 Equations
Henrik Böving (Dec 24 2024 at 13:27):
I have a questions for people that consume the auto generated documentation. How important is the Equations
feature for you? Would it be acceptable if we just put it aside, given the fact that you can click the source links anyway?
I just did some benchmarks simply compiling the documentation for the core modules and with equations enabled we have:
- 1.6G peak memory consumption
- 1min10s build time
with equations disabled we get:
- 900M peak memory consumption
- 14s build time
These improvements are of course not going to translate to mathlib build time 1:1 because in mathlib a lot of stuff is theorem
and not def
but I expect that as people write more software style projects in Lean the demand for def
heavy projects is going to increase.
Henrik Böving (Dec 24 2024 at 13:28):
/poll The future of equations in doc-gen
Remove
Keep
Henrik Böving (Dec 24 2024 at 13:44):
(Keep but only for very simple definitions is a non trivial option because it is difficult to identify what "simple" means so we'll still have to do computation for every term)
Junyan Xu (Dec 24 2024 at 15:29):
Note that equations of slightly complicated definitions are already not shown in the docs, e.g. docs#Ring.DirectLimit.
One or more equations did not get rendered due to their size.
Henrik Böving (Dec 24 2024 at 15:32):
Yes we already run a heuristic for that behavior, the choice there is an artefact from someone voting that didn't know this
Kevin Buzzard (Dec 24 2024 at 15:42):
I never use the functionality because the last n times I tried it the answers were useless ("did not get rendered") so I basically gave up on it.
Junyan Xu (Dec 24 2024 at 15:43):
Maybe things have improved now? The equation for docs#nonZeroDivisors was not rendered when I opened doc-gen4#160 but it is now shown.
Joachim Breitner (Dec 24 2024 at 15:44):
How does it generate the equations? using the same machinery that simp etc uses, including proving the equational theorems if they are not proven in the first place, and thus running expensive tactics?
Junyan Xu (Dec 24 2024 at 15:45):
If faster doc-gen compile time means we get more docs updates per day, that may be a worthy tradeoff. I'm not sure how often it's updated currently.
Henrik Böving (Dec 24 2024 at 15:46):
Joachim Breitner said:
How does it generate the equations? using the same machinery that simp etc uses, including proving the equational theorems if they are not proven in the first place, and thus running expensive tactics?
The cost is spread two fold, for one we are running the same tactics that simp does, though notably with the kernel disabled so that saves us a bit. We also spend around half of the run time in the pretty printer though.
Henrik Böving (Dec 24 2024 at 15:47):
Junyan Xu said:
If faster doc-gen compile time means we get more docs updates per day, that may be a worthy tradeoff. I'm not sure how often it's updated currently.
Every 8 hours
Junyan Xu (Dec 24 2024 at 15:49):
Does it take ~8 hours to build docs for mathlib once?
Henrik Böving (Dec 24 2024 at 15:50):
No it takes about 15 minutes on the build machines
Henrik Böving (Dec 24 2024 at 15:50):
I just didn't see a point in burning more power than doing an 8 hour cycle when i set it up
Junyan Xu (Dec 24 2024 at 15:55):
That's a lot faster than I expected. Building the whole mathlib (on the bench
machines) already takes more than 17 minutes. It seems feasible to run doc-gen for each bors batch, as the cost seems negligible compared to CI runs on all mathlib branches. Is there cache for the docs as well?
Joachim Breitner (Dec 24 2024 at 15:56):
I think the first half is avoidable; you could use the function that calculates the types of the equational theorems without actually proving them.
Or you could skip even that an just print the value of the definition (or the predefinion for recursive functions). This would then show a single equation with sometimes a match
on the RHS. In a way that would even be more relevant. As a user I think I wouldn't mind that.
So I think this feature, or a similarly useful variant, should be possible with neglectible cost to obtain the equation. Doesn't solve the pretty printing part, though.
Henrik Böving (Dec 24 2024 at 15:58):
The first thing sounds reasonable if we have that function nicely exposed somewhere. The second thing is already the default when we do not manage to obtain an equation in the first place. Agreed on the pretty printing part. I also talked to Kyle about it and he estimates that the pretty printer will get slower before it gets faster due to some features that people are interested in so we can't expect some miracles here most likely.
Henrik Böving (Dec 24 2024 at 16:12):
Junyan Xu said:
That's a lot faster than I expected. Building the whole mathlib (on the
bench
machines) already takes more than 17 minutes. It seems feasible to run doc-gen for each bors batch, as the cost seems negligible compared to CI runs on all mathlib branches. Is there cache for the docs as well?
There does not currently exist a cache no, I think it might be possible to build one in general though, not quite sure. Either way just because we can do these types of runs I don't really see a need to. We can tune the frequency a little but I've not yet seen people complain about the 8 hour cycle.
Daniel Weber (Dec 24 2024 at 17:51):
Henrik Böving said:
(Keep but only for very simple definitions is a non trivial option because it is difficult to identify what "simple" means so we'll still have to do computation for every term)
Would it maybe be possible to use some inaccurate heuristic, like "length of the source code"?
Henrik Böving (Dec 24 2024 at 17:51):
The source code is not available from the .olean files
Henrik Böving (Dec 24 2024 at 17:54):
Of course it generally lies next to the .olean files somewhere (gets more interesting with e.g. bultin files) but we'd need to write the logic to find that + you need to account for automatically generated declaratuons that don't have real source associated with them.
Joachim Breitner (Dec 24 2024 at 18:53):
The first thing sounds reasonable if we have that function nicely exposed somewhere.
Not yet nicely exposed (you'd currently have to manually check the env extensions for structural and wf recursion) but certainly doable as an extension of the current equations API.
Artie Khovanov (Dec 26 2024 at 21:22):
I use the feature, but I guess it's not too much effort to click through
Last updated: May 02 2025 at 03:31 UTC