Zulip Chat Archive

Stream: lean4

Topic: html in doc-gen4


Henrik Böving (Dec 06 2021 at 18:24):

I'll take a look into that as soon as I get there then!

A final question, regarding HTML templating, would it be more reasonable to implement some HTML like DSL in Lean itself and write with that or to implement a custom parser for something like Jinja2 etc. and use external templates? Does it even matter?

Gabriel Ebner (Dec 06 2021 at 18:25):

I would make a DSL in Lean.

Gabriel Ebner (Dec 06 2021 at 18:27):

One of the most annoying things (imho) in jinja2 is that it doesn't check well-formedness of the HTML. That is you can write <div><p>{% include "child.j2" %}</p> and you'll only notice that something is wrong once you look at the page in the browser. This kind of error would immediately become impossible in Lean.

Sebastian Ullrich (Dec 06 2021 at 18:38):

I assume JSX/TSX would be a better source of inspiration, though I've never used them

Gabriel Ebner (Dec 06 2021 at 18:41):

You can also look at the webserver demo: https://github.com/leanprover/lean4/blob/7b6732a137a436a64e00f9d319e2a345b517be2d/tests/playground/webserver/Webserver.lean#L172-L174

Mac (Dec 06 2021 at 18:42):

Sebastian Ullrich said:

I assume JSX/TSX would be a better source of inspiration, though I've never used them

@Henrik Böving, note that @Wojciech Nawrocki already wrote a JSX like DSL for Lean 4 in PR #723 so you could probably just adapt that.

Patrick Massot (Dec 06 2021 at 19:00):

Jinja2 is really great. JSX looks a lot less attractive, but comparing the two is probably meaningless anyway, they have different purposes.

Gabriel Ebner (Dec 06 2021 at 19:12):

Interesting how opinions can differ. What do you like about jinja2 compared to jsx?

Patrick Massot (Dec 06 2021 at 19:12):

It's really straightforward to understand and use, and nice to write.

Patrick Massot (Dec 06 2021 at 19:13):

Note that we use jinja2 a lot on our community website.

Gabriel Ebner (Dec 06 2021 at 19:15):

I think that's only a question of experience. If you have used more jsx then jsx will feel more straightforward. (As a bonus there's no if-then-else or for or include or filter syntax to learn in jsx, it's literally the same syntax as javascript. In the Lean DSL, it's literally Lean expressions.)

Gabriel Ebner (Dec 06 2021 at 19:16):

(Did I mention that you have type-checking in your templates? You'll never forget to declare a filter again.)

Wojciech Nawrocki (Dec 06 2021 at 21:00):

Sebastian Ullrich said:

The delaborator attaches synthetic positions to the generated Syntax objects that can be used to retrieve the original Expr, which you could e.g. check for being an Expr.const, or an application thereof (Expr.getAppFn) if you also want to e.g. link + to Add.add. The function for looking up the subterm given the position should be https://github.com/leanprover/lean4/blob/45917f2f900f177259eebc7a6740193f86292927/src/Lean/Widget/InteractiveCode.lean#L39, though apparently it is currently unused (because a TypeScript version is used instead in the client?).

The association between Syntax elements and their source Exprs is currently unused because it is subsumed by the InfoTree output. Basically the delaborator will give you back a Syntax together with an RBMap from positions within this syntax to Elab.Info objects, as in here. You can then query this RBMap and in your example at the ℕ syntax tree it will contain something like (iirc) Elab.Info.ofTermInfo with an expr whose value is Nat : Type.

Gabriel Ebner (Dec 08 2021 at 09:40):

The delaborator attaches synthetic positions to the generated Syntax objects that can be used to retrieve the original Expr, [...] The function for looking up the subterm given the position should be [Lean.Widget.traverse] apparently it is currently unused [...]

I forgot to comment on this remark, but I have to warn you that it is only half-true. Such a function could work for direct subterms (for example a + 0 as a subterm of 42 * (a+0)), but the pretty-printer also produces syntax which does not directly correspond to subterms. These have a dedicated position called nextExtraPos.

And this is not just a theoretical possibility: core Lean already makes use of this feature. Namely when pretty-printing structure instances: the expression A.mk 42 is printed as { a := 42 }. The field name a is not a subterm of the original expression, but it has information attached. Users can also add their own delaborators which may even have whole expressions and not just identifiers on these non-existing positions.

As Wojciech said, please just use the InfoTree output instead. The current Lean.Widget.traverse implementation also silently returns the wrong(?) result for these non-existing positions, so avoid it.

Henrik Böving (Dec 13 2021 at 20:38):

Mac said:

Sebastian Ullrich said:

I assume JSX/TSX would be a better source of inspiration, though I've never used them

Henrik Böving, note that Wojciech Nawrocki already wrote a JSX like DSL for Lean 4 in PR #723 so you could probably just adapt that.

Its making progress with Lean/JSX:
image.png

Also working with that DSL was an absolute joy, basically everything I wanted just worked :tm: \o/

Henrik Böving (Dec 13 2021 at 20:39):

I also hope its fine I'm basically copy pasting the css and js files from doc-gen? They are both under apache2 after all...

Arthur Paulino (Dec 13 2021 at 21:28):

@Henrik Böving You recently said that we wouldn't need to rely on something like Read the Docs. Do you have other ideas to make the task of hosting documentation easy and accessible?

Henrik Böving (Dec 13 2021 at 21:29):

Right now its just a static webpage so you could even put it on github pages if you wanted to.

Arthur Paulino (Dec 13 2021 at 21:30):

But then you'd need a ...github.io repository for every project?
Or push doc folders to a centralized .github.io repository and separate by path (like arthurpaulino.github.io/proj1, arthurpaulino.github.io/proj2, ...)

Henrik Böving (Dec 13 2021 at 21:37):

That's at least how mathlib is doing it right now. https://github.com/leanprover-community/mathlib_docs. I would imagine that if lean actually takes off to the point where a lot of people are interested in the rendered and searchable documentation of libraries (right now I can only think of mathlib4 and the lean compiler itself + maybe this stdlib++ thing we have been talking about) a service like https://docs.rs would eventually emerge.

Henrik Böving (Dec 18 2021 at 20:52):

Wojciech Nawrocki said:

Sebastian Ullrich said:

The delaborator attaches synthetic positions to the generated Syntax objects that can be used to retrieve the original Expr, which you could e.g. check for being an Expr.const, or an application thereof (Expr.getAppFn) if you also want to e.g. link + to Add.add. The function for looking up the subterm given the position should be https://github.com/leanprover/lean4/blob/45917f2f900f177259eebc7a6740193f86292927/src/Lean/Widget/InteractiveCode.lean#L39, though apparently it is currently unused (because a TypeScript version is used instead in the client?).

The association between Syntax elements and their source Exprs is currently unused because it is subsumed by the InfoTree output. Basically the delaborator will give you back a Syntax together with an RBMap from positions within this syntax to Elab.Info objects, as in here. You can then query this RBMap and in your example at the ℕ syntax tree it will contain something like (iirc) Elab.Info.ofTermInfo with an expr whose value is Nat : Type.

I saw that in the Widget module you use this TaggedText type with CodeTokens so I decided to take a look at that as well but I'm facing a slight isuee, for a type such as: IO ℕ it produces (the info's themselves are replaced with 1s so i can actually pretty print it):

Lean.Widget.TaggedText.tag
  1
  (Lean.Widget.TaggedText.append
    #[Lean.Widget.TaggedText.text "IO ", Lean.Widget.TaggedText.tag 1 (Lean.Widget.TaggedText.text "ℕ")])

so with these tagged things it seems like the information about what IO itself actually is is attached to the entire application of IO (and indeed the Expr associated with this group is an application so that's perfectly fine) but for my purposes it would of course be better if the information about the type IO was attached there as well and not just represented as a TaggedText.text so I tried to take another step back and look at the output of format but:

Std.Format.group
  (Std.Format.group
    (Std.Format.nest
      2
      (Std.Format.tag
        1
        (Std.Format.append
          (Std.Format.group (Std.Format.nest 2 (Std.Format.text "IO")) (Std.Format.FlattenBehavior.fill))
          (Std.Format.append
            (Std.Format.line)
            (Std.Format.group
              (Std.Format.nest 2 (Std.Format.tag 5 (Std.Format.text "ℕ")))
              (Std.Format.FlattenBehavior.fill))))))
    (Std.Format.FlattenBehavior.fill))
  (Std.Format.FlattenBehavior.fill)

Format already associates this tag info with the entire application instead of just the single type as well...so now I'm kind of lost which datastructure I want to operate on to actually get information I want? Shall I operate on the raw Syntax output for this? I'm just asking so I don't go down a rabbit hole just to learn that it was pointless all the time.

Henrik Böving (Dec 20 2021 at 08:48):

Or maybe @Gabriel Ebner has an idea about this?

Gabriel Ebner (Dec 20 2021 at 08:57):

From what I can tell the delaborator doesn't add any information to syntax trees produced for constants yet.

Gabriel Ebner (Dec 20 2021 at 08:58):

That's because this feature has so far only been used for the interactive widgets, where we want information for the whole application and not the head symbol.

Gabriel Ebner (Dec 20 2021 at 09:00):

What I think should happen:

  1. Call annotateCurPos&addTermInfo from delabConst.
  2. Ignore those extra infos in tagExprInfos.

Henrik Böving (Dec 20 2021 at 09:04):

Ah so getting this to work properly does actually require some changes in the compiler?

Gabriel Ebner (Dec 20 2021 at 09:06):

AFAICT yes, this requires some changes in lean.

Henrik Böving (Dec 20 2021 at 09:07):

Do you think this is a change someone (such as me :p) should contribute or should I just open an issue about it?

Gabriel Ebner (Dec 20 2021 at 09:12):

I've outlined the change above, I'd guess it's about ten lines of code or so and should be straightforward. ^^

Gabriel Ebner (Dec 20 2021 at 09:12):

cc @Sebastian Ullrich @Wojciech Nawrocki Do you also think this is the way to go here?

Sebastian Ullrich (Dec 20 2021 at 09:13):

Anything that makes the delaborator match the elaborator more closely sounds reasonable to me

Patrick Massot (Dec 20 2021 at 10:12):

I'm very happy to see work on doc-gen4 but I always wonder. How does this interact with the work on LeanInk? It seems to me there should be at least some overlap. Is there any coordination here?

Henrik Böving (Dec 20 2021 at 10:21):

There isn't any coordination right now no. If I understood the purpose of Leanink/Alectryon correctly it's a tool, mostly meant for interactively exploring the state of a tactic proof at different steps throughout the proof right? The program itself mainly focuses on obtaining these states from the Lean compiler and handing them over to Alectryon via an export format. So it's basically a tool for taking a look at the internals of a proof.

Meanwhile the focus of doc-gen4 is to provide more of an external view on a Lean module, ala "what do I get if I import module XYZ" and also navigating between modules. I would imagine that both could indeed be combined by e.g. linking from doc-gen4 theorem's to the ink view and maybe from the types etc. used in Alectryon back to doc-gen4? But that sounds like something that would be done when both tools are at least sort of mature themselves.

What does @Niklas Bülow think about our overlap? And how/if the tools should integrate into each other?

Patrick Massot (Dec 20 2021 at 10:44):

Exactly, I think both should be combined. It's annoying in the current Lean 3 doc-gen that you can't easily see proofs, and it's even worse you can't see the body of a definition.

Eric Taucher (Dec 20 2021 at 10:48):

Patrick Massot said:

can't easily see proofs

For see I am thinking that the proof would be static text inline with the document.

Are you and/or meaning link to in that one can click on a link if the document is open in a viewer such as an internet browser or PDF reader?

Patrick Massot (Dec 20 2021 at 10:54):

I'm sorry, I don't understand the question. By seeing proofs, I mean getting at least as much information on a proof as you can get from within VSCode.

Eric Taucher (Dec 20 2021 at 11:20):

Patrick Massot said:

By seeing proofs, I mean getting at least as much information on a proof as you can get from within VSCode.

That clarifies it enough that I understand what you mean. Thanks.

Niklas Bülow (Dec 20 2021 at 21:48):

Patrick Massot said:

Exactly, I think both should be combined. It's annoying in the current Lean 3 doc-gen that you can't easily see proofs, and it's even worse you can't see the body of a definition.

I think making use of LeanInk and Alectryon for doc-gen4 would benefit the generated docs a lot. LeanInk itself just extracts the necessary data from the lean compiler so Alectryon can then generate a website that allows exploring the lean code interactively. doc-gen could use this to embed explorable code snippets of a given proof into the documentation.

LeanInk currently only extracts the tactic info and displays them at the corresponding location in the code. This implementation is complete. We are now trying to extend the functionality to also display type information and probably the doc-string on hover, similar to the hover popups in VSCode.

We could probably implement an interface in LeanInk, so it can be used as a dependency more easily. This would allow doc-gen4 to call it directly and maybe give it some additional info (links to other doc pages etc.). Alectryon does offer a way to generate HTML directly from the export format that LeanInk generates, so calling that should be sufficient for the use case.

Patrick Massot (Dec 20 2021 at 22:24):

It would be really nice to make sure the work you do on LeanInk is not too much tied to Alectryon. Getting all that information out of a Lean file without pretending to be an editor is a really important task. One thing that makes current Alectryon a lot less useful than it could be is that it really work at a single file level, and doesn't include "jump to definition" links.

Niklas Bülow (Dec 20 2021 at 23:55):

Sorry, I think I’m not fully understanding what you’re trying to say.

Arthur Paulino (Dec 21 2021 at 01:43):

One (minor) concern that I have is that Alectryon is a Python lib whereas doc-gen4 is a pure Lean 4 package, AFAIK. So adding LeanInk as a dependency of doc-gen4 would carry out the Python dependency.

Lean setup, however, already depends on having Python available :big_smile:

Patrick Massot (Dec 21 2021 at 09:42):

Niklas, could you tell me more precisely what you don't understand? Let's take an example. If you look at https://alectryon-paper.github.io/snippets/ide.html, you can see tactic state everywhere, but there is no way you can see what is the statement of Mult.mult_plus_distr_l or jump to the definition of sum. This is why I say Alectryon is a great tool, but not enough. So I hope than LeanInk will be able to generate more data, either for consumption by Alectryon or, preferably, for consumption by any other tool.

Patrick Massot (Dec 21 2021 at 09:46):

Note that I also think we should have more powerful ways to read Lean 4 files. For instance it would be very nice if we could allows users to build alternative implementations of tactics that would probably be a bit slower, hence unsuitable for interactive use, but would emit more information for LeanInk/doc-gen4. For a semi-realistic example, each tactic line could also output the constructed partial proof term, as in mathlib's tactic#show_term.

Gabriel Ebner (Dec 21 2021 at 09:48):

I don't think that requires changing the tactic though, you can get that information from the tactic state already (so you don't even need to run the proof again).

Gabriel Ebner (Dec 21 2021 at 09:49):

I completely agree that it would be awesome to have LeanInk in doc-gen4 and with more features.

Patrick Massot (Dec 21 2021 at 09:54):

Gabriel, I don't understand how the tactic static could give you the partial proof term.

Patrick Massot (Dec 21 2021 at 09:55):

And what I actually have in mind is constructing a natural language proof from a tactic script.

Gabriel Ebner (Dec 21 2021 at 10:00):

Gabriel, I don't understand how the tactic static could give you the partial proof term.

Goals are represented as metavariables in Lean. Solving a goal with a proof term is technically the same as assigning the proof term to the metavariable. And the tactic state contains the meta-variable assignment. So if you have the tactic state from before and after you can just look if any goal metavariables have been assigned, and that's the partial proof term.

Sebastian Ullrich (Dec 21 2021 at 10:13):

Patrick Massot said:

So I hope than LeanInk will be able to generate more data, either for consumption by Alectryon or, preferably, for consumption by any other tool.

Yes, adding more metadata to the output is the plan as Niklas mentioned. See e.g. https://github.com/insightmind/LeanInk/blob/main/test/theorem_proving/009.lean.leanInk.expected for the current output.

Sebastian Ullrich (Dec 21 2021 at 10:24):

Note that the most fundamental difference between LeanInk and doc-gen is that LeanInk works from the Lean source input (including literate comments in between code) while doc-gen works from the delaborated .olean input. The latter is not inferior, it means that e.g. variables are inlined into all signatures, which is the right thing to do for API docs. But because both views of the same Lean module are useful, the simplest way to combine their strengths would be to generate both and add links between them, like Haddock does. It would still be nice if they shared the behavior and perhaps implementation of common features such as hover of course.

Patrick Massot (Dec 21 2021 at 10:44):

I don't understand why LeanInk couldn't reconstruct the signature with inlined variables.


Last updated: Dec 20 2023 at 11:08 UTC