Zulip Chat Archive

Stream: mathlib4

Topic: doc-gen duration


Henrik Böving (Apr 10 2023 at 23:42):

Sooooo...doc-gen is now taking longer than 6 hours to generate documentation: https://github.com/leanprover/doc-gen4/actions/runs/4659085245 (quite a surprising increase of duration from ~4h20min to 6 hours, maybe it also just ran on a weaker CPU or something? Regardless this is an issue we will face sooner or later) as I see it we have pretty much 3 options:

  1. simplest one: somehow increase max duration of the runner
  2. almost as simple: I've been wondering this for a while, how many people actually use the built-in LeanInk feature? Not running all of the files through LeanInk again would be a tremendous time saver.
  3. I go figure out some performance increasing stuff within the code of doc-gen4 itself

What are people's opinions on 1 and 2?

Adam Topaz (Apr 10 2023 at 23:45):

I can’t speak for others, but I personally have never used the leanink feature of docs4.

Henrik Böving (Apr 10 2023 at 23:45):

On a more positive note! Thanks to the shenanigans we figured out for the generation itself doc-gen has now run without a single failure for 3 weeks straight \o/

Adam Topaz (Apr 10 2023 at 23:47):

On a separate note, is it possible to somehow hide instance names (possibly under some option)? With the autogenerated names being so long, the auto populated list under the search bar quickly becomes unwieldy when the instance names are around.

Alex J. Best (Apr 10 2023 at 23:48):

I've also not used leanink, but I do think it is quite nice, and that not using it is more out of habit than anything else. It definitely is more convenient for quickly checking out a proof than loading lean. So it would be a shame to give up on it completely. Temporarily disabling it is fine if performance improvements would take too long right now.

Gabriel Ebner (Apr 10 2023 at 23:51):

@Henrik Böving We could also run doc-gen4 on the Hoskinson machines, which are significantly faster.

Henrik Böving (Apr 10 2023 at 23:51):

Adam Topaz said:

On a separate note, is it possible to somehow hide instance names (possibly under some option)? With the autogenerated names being so long, the auto populated list under the search bar quickly becomes unwieldy when the instance names are around.

If lean tags them auto generated on its own that would be trivially possible yes. If not I guess I could compare the real name to another run of the name generator and check if the names match up? So yes should be possible. Can you open an issue about it?

Henrik Böving (Apr 10 2023 at 23:52):

Gabriel Ebner said:

Henrik Böving We could also run doc-gen4 on the Hoskinson machines, which are significantly faster.

Oh sure if that can be arranged it would be nice. doc-gen should also close to linearly profit from more CPUs due to the inherent parallelism of the task.

Gabriel Ebner (Apr 10 2023 at 23:53):

It's as simple as changing the runs-on field in the github actions, but we'd need to move the repo that runs the action to the leanprover-community org.

Henrik Böving (Apr 10 2023 at 23:55):

I see...I guess in theory mathlib4 could just run the task on its own? I'll try to patch up a PR for that in the coming days.

Heather Macbeth (Apr 12 2023 at 18:24):

Henrik Böving said:

  1. almost as simple: I've been wondering this for a while, how many people actually use the built-in LeanInk feature? Not running all of the files through LeanInk again would be a tremendous time saver.

I think in principle it is nice for the docs to include the full LeanInk output, but the current display is hard to read. Some work on Alectryon would improve this, notably
https://github.com/leanprover/alectryon/issues/4
to fix a line length issue, and
https://github.com/leanprover/alectryon/issues/5
the development of a better default syntax highlighting theme for Pygments (currently everything is in shades of grey and orange).

Heather Macbeth (Apr 12 2023 at 18:44):

It would also be more useful if there were an Alectryon display designed specifically for large projects, with some kind of navigation bar; it's less useful when the only way to navigate between files is to click the "back" button and then navigate within the docs.

Heather Macbeth (Apr 12 2023 at 18:48):

The docs are one of our "outward-facing" products, a first point of interaction of newcomers with the standard libraries, and for this reason it's good for them to look polished. I would actually argue for removing the Alectryon links until issues such as the above are fixed.

Jireh Loreaux (Apr 12 2023 at 19:24):

Some other pain points with LeanInk (I have no idea what it would take to fix these):

  1. when you click the ink link for a declaration, it doesn't take you directly to that declaration (i.e., the anchors don't work, and looking at the source it doesn't seem like there are any anchors).
  2. Hover works for many things (and that's fantastic!), but not notation, so you can't hover over →ₐ to see that it's an AlgHom.

Jireh Loreaux (Apr 12 2023 at 19:29):

That being said, I have used LeanInk several times to examine proofs without firing up Lean. It's quite handy, so even if we nix it for now, I would certainly like to get it back in the docs at some point. What would be incredibly nice to have eventually, but probably too much to hope for, is to have this embedded in the docs per declaration, so that clicking ink just shows the LeanInk without transfering you to another page.

Eric Wieser (Apr 12 2023 at 19:57):

and the development of a better default syntax highlighting theme for Pygments (currently everything is in shades of grey and orange).

Does pygments even know about Lean 4 yet?

Heather Macbeth (Apr 12 2023 at 21:42):

Jireh Loreaux said:

  1. when you click the ink link for a declaration, it doesn't take you directly to that declaration (i.e., the anchors don't work, and looking at the source it doesn't seem like there are any anchors).

Agreed, I opened
https://github.com/leanprover/alectryon/issues/6
for this a few months ago. Maybe the other point you mention is also worth opening an issue for?

Heather Macbeth (Apr 12 2023 at 21:44):

Eric Wieser said:

Does pygments even know about Lean 4 yet?

I believe Alectryon is bundling its own Lean 4 lexer, it's not in the official Pygments distribution.
https://github.com/leanprover/alectryon/blob/master/alectryon/pygments_lexer.py

Eric Wieser (Apr 12 2023 at 22:26):

Huh, that's pretty weird. We should probably just upstream that lexer at some point

Eric Wieser (Apr 12 2023 at 22:28):

Here's the Lean3 one for comparison

Eric Rodriguez (Apr 12 2023 at 23:08):

on a related note, it'd be nice if the mathlib4 docgen was indexed by google and such liek

Henrik Böving (Apr 13 2023 at 07:44):

A few things. First things first doc-gen4 does not use alectryon (it uses LeanInk to get the raw data though) because I didn't want to pull in an additional step of complexity with depending on python and 3rd party python libs and so on. Instead it renders the HTML on its own based on raw data from LeanInk. Furthermore it does also not use pygments to do the highlighting it uses the semantic highlighting from the LSP server, if that was improved the highlighting in the render too would be improved.

Te bug with the going right to the declarations I didn't notice until now. I'll see if I can push an easy fix

Also note that rolling its own renderer is not really a big thing for doc-gen, the part of alectryon that renders is in fact fairely trivial: https://github.com/leanprover/doc-gen4/blob/main/DocGen4/LeanInk/Output.lean

Henrik Böving (Apr 13 2023 at 07:49):

AH yeah looking at the implementation again I remember why I didn't implement the thing where it jumps straight. Basically LeanInk does not actually do any distinction between almost all of hte tokens in a Lean file, so based on the raw data I am getting from LeanInk it is not really possible to figure out where a declaration actually is. I guess there are ways around this? Like for example I could figure out the line it is on from the environment and try to match that up with a line in the resulting HTML file? But that feels rather hacky. So I guess some work needs to be done on the LeanInk side of things.

Heather Macbeth (Apr 14 2023 at 15:43):

Henrik Böving said:

doc-gen4 does not use alectryon ... it does also not use pygments to do the highlighting it uses the semantic highlighting from the LSP server, if that was improved the highlighting in the render too would be improved.

The issue I was raising was not about the tagging of tokens for the syntax highlighting, just about the theme, which is very grey- and orange-heavy. It would be great if someone who knows what they're doing with colour choices developed a nice theme with a wide range of different easy-on-the-eye colours for different token types, or even just manually translated the default VSCode one, which is ok, or used the default Pygments one, which is also ok: that was the issue I posted on the Alectryon repo.
https://github.com/leanprover/alectryon/issues/5
Even though doc-gen is not using Alectryon, it seems to be using a CSS style sheet which is copied from Alectryon's one.

Heather Macbeth (Apr 14 2023 at 15:55):

The line length issue I mentioned, i.e.
https://github.com/leanprover/alectryon/issues/4
is also seemingly an issue in an Alectryon CSS file which has been copied over without change into doc-gen.

Henrik Böving (Apr 19 2023 at 08:06):

https://github.com/leanprover-community/mathlib4/pull/3520

Ruben Van de Velde (Apr 19 2023 at 08:33):

Henrik Böving said:

https://github.com/leanprover-community/mathlib4/pull/3520

I marked as awaiting-review, hope that's okay

Henrik Böving (Apr 19 2023 at 08:37):

Yes

Henrik Böving (Apr 21 2023 at 23:17):

The PR is in mathlib4 now with a speedup to ~1h from ~5h ! :stock_market:

Ruben Van de Velde (Apr 22 2023 at 05:26):

Is this just the result of using more powerful machines or were there other changes?

Henrik Böving (Apr 22 2023 at 10:49):

The former for now


Last updated: Dec 20 2023 at 11:08 UTC