Zulip Chat Archive

Stream: lean4

Topic: constructor and field doc strings


Henrik Böving (Apr 09 2022 at 15:39):

After seeing the compiler team add so many doc strings especially to fields of structure types etc. in the past few days I decided to take a look into rendering these in doc-gen4 as well, however I'm not 100% sure yet how to do this visually (since I suck at UX and webdev after all :p) So for example with the doc string of docs4#Lean.Meta.Context would you prefer to have the giant blob of a doc string inline into the structure declaration as it is within the code: https://github.com/leanprover/lean4/blob/6aee3fb30465da8e0e3df12651832f548cb0159b/src/Lean/Meta/Basic.lean#L176-L192 or something different? If yes, what's your alternatives?

Same question for doc strings on inductive type ctors.

Mario Carneiro (Apr 09 2022 at 15:45):

you could have doc text between the lines of monospace text

Henrik Böving (Apr 09 2022 at 18:27):

Minus the indendation (that's missing some dark CSS magic) (I'm also not sure why the markdown renderer made a code block out of the remark) that would look like this: image.png

Which is fine ish I'd say?

Ruben Van de Velde (Apr 09 2022 at 18:39):

Is the remark indented?

Henrik Böving (Apr 09 2022 at 18:47):

You can see the markdown in the github link above, it shouldn't be rendering as a code block IMO.

Arthur Paulino (Apr 09 2022 at 18:50):

Is there another occurrence of a comment block with two+ paragraphs? It might be related. What's the corresponding markdown that was generated?

James Gallicchio (Apr 09 2022 at 18:58):

Henrik Böving said:

Which is fine ish I'd say?

IMO it is a little hard to tell whether the comment is about the field above or below it

James Gallicchio (Apr 09 2022 at 18:59):

(Is it possible to make the vertical offset different above vs below?)

Henrik Böving (Apr 09 2022 at 19:00):

First things first this is how it looks without the CSS messing up the generated indentation: image.png

Henrik Böving (Apr 09 2022 at 19:01):

James Gallicchio said:

(Is it possible to make the vertical offset different above vs below?)

Oh for sure it is I just dont know how..yet

Henrik Böving (Apr 09 2022 at 19:04):

Arthur Paulino said:

Is there another occurrence of a comment block with two+ paragraphs? It might be related. What's the corresponding markdown that was generated?

I did look through a bunch of mathlib and compiler files but I couldnt find one yet, seems they are rather rare which would also explain why I didnt notice this happening before.

Eric Wieser (Apr 09 2022 at 19:07):

Another option would be to emit separate definition documentation blocks for the projections, and only show the docstrings there

Eric Wieser (Apr 09 2022 at 19:09):

That has the advantage of making the behavior more uniform between projections and definitions; consider how docs#subtype.prop and docs#subtype.property currently link to differently presented information.

James Gallicchio (Apr 09 2022 at 19:10):

I'm also not sure that the docstrings should come before the relevant field, rather than after. For functions it makes sense to give the docstring beforehand since the arguments are in a negative position, but with a structure the fields are in a positive position.

/-- Pair -/
structure Pair where
  fst : Nat /-- the first number -/
  snd : Nat /-- the second number -/

But allowing both seems incredibly annoying to handle in doc generation. Doxygen allows this by requiring a < at the start docstring token (like /--< -/

Henrik Böving (Apr 09 2022 at 19:12):

Its actually not annoying for us at all, we don't ever parse the code ourselves, doc-gen4 is a lean meta program that asks the compiler for all the information including the doc string, if the compiler has the information we can get it without issue

Arthur Paulino (Apr 09 2022 at 19:14):

TBH I thought those comment blocks that weren't placed right before a declaration (a function, structure, theorem etc) were meant for in-code documentation only (for those who are reading the source code) rather than to be extracted to a documentation page

James Gallicchio (Apr 09 2022 at 19:16):

Henrik Böving said:

Its actually not annoying for us at all

But I think it gets funky if for example you have a docstring after the last field of a structure and then a function immediately after. Not sure how we would differentiate where the docstring should go in something like

/-- Pair -/
structure Pair where
  fst : Nat /-- the first number -/
  snd : Nat /-- the second number -/
def myFn () := ()

Henrik Böving (Apr 09 2022 at 19:17):

Oh with "us" I was referring to the doc-gen4 people, it sure is annoying to parse :p

Arthur Paulino (Apr 09 2022 at 19:17):

By extracting those to a documentation page we might end up with a bunch of paragraphs with statements out of context, containing explanations about implementation details

Henrik Böving (Apr 09 2022 at 19:18):

Arthur Paulino said:

By extracting those to a documentation page we might end up with a bunch of paragraphs with statements out of context

Lean does have the /- -/ multi line comment blocks and the -- single line comments for one line stuff, if its marked with a /-- -/ its explicitly different from /- -/ and meant to be interpreted as a doc string so whether it shows up here or not is the authors choice

James Gallicchio (Apr 09 2022 at 19:18):

(deleted)

Henrik Böving (Apr 09 2022 at 19:18):

correct

James Gallicchio (Apr 09 2022 at 19:19):

whoops henrik sniped me :p

James Gallicchio (Apr 09 2022 at 19:21):

Henrik Böving said:

Oh with "us" I was referring to the doc-gen4 people, it sure is annoying to parse :p

So the compiler decides where in the AST a docstring is bound? (If so, where do the docstrings get bound in the example above?)

Arthur Paulino (Apr 09 2022 at 19:22):

James Gallicchio said:

Henrik Böving said:

Which is fine ish I'd say?

IMO it is a little hard to tell whether the comment is about the field above or below it

I'm used to think that Lean docstrings come before the object being documented. Maybe it's worth it to keep this pattern for the sake of simplicity

Henrik Böving (Apr 09 2022 at 19:23):

Eric Wieser said:

That has the advantage of making the behavior more uniform between projections and definitions; consider how docs#subtype.prop and docs#subtype.property currently link to differently presented information.

I did think of this as well but once I end up implementing this feature for inductive constructors as well we would either:

  • have to do it like I implemented it for structures rn
  • render inductive ctors as a seperate definition again as well with a doc string, which I don't think is too sensible because you usually want to see them in context with the others (its an inductive after all)

and since I wanted to keep behaviour consistent between the two (its of course open to debate whether that should be the case) I decided against it

Henrik Böving (Apr 09 2022 at 19:26):

James Gallicchio said:

Henrik Böving said:

Oh with "us" I was referring to the doc-gen4 people, it sure is annoying to parse :p

So the compiler decides where in the AST a docstring is bound? (If so, where do the docstrings get bound in the example above?)

Yes and to answer this question we can just ask the compiler:

import Lean

open Lean Meta

/-- Pair -/
structure Pair where
  fst : Nat /-- the first number -/
  snd : Nat /-- the second number -/

def foo := ()

#eval show MetaM _ from do
  let env  getEnv
  IO.println $ findDocString? env `Pair.fst
  IO.println $ findDocString? env `Pair.snd
  IO.println $ findDocString? env `foo
none
(some (the first number ))
(some (the second number ))

James Gallicchio (Apr 09 2022 at 19:28):

Ah, yeah, so we'd have to change the parser's behavior for that

Henrik Böving (Apr 09 2022 at 19:32):

James Gallicchio said:

(Is it possible to make the vertical offset different above vs below?)

image.png

good? more? less?

James Gallicchio (Apr 09 2022 at 19:40):

That definitely looks better. The space below could be smaller IMO, but space above looks good

Mario Carneiro (Apr 09 2022 at 21:49):

I think a subtle filled box background around the docstring and element would work to separate the elements

Kyle Miller (Apr 09 2022 at 23:52):

James Gallicchio said:

I'm also not sure that the docstrings should come before the relevant field, rather than after.

In Haddock (a Haskell documentation generator), it has separate syntax for each: https://haskell-haddock.readthedocs.io/en/latest/markup.html#constructors-and-record-fields

Kyle Miller (Apr 09 2022 at 23:57):

Here's an example of how Haddock handles constructor with documented fields: https://hackage.haskell.org/package/parsec-3.1.15.0/docs/Text-ParserCombinators-Parsec-Token.html (it shows the type name, followed by the type's documentation right below it, followed by a table for each constructor showing all the fields with their documentation to the right. For classes Haddock seems to put the field documentation below each field.)

David Thrane Christiansen (Apr 10 2022 at 08:25):

A downside of Haddock's flexibility is that teams then end up needing to rely on tooling to enforce the particular choice that they have made for their codebase, and most non-tiny projects end up adopting just one of the options in each possible context anyway.

James Gallicchio (Apr 10 2022 at 17:10):

^Yeah, we should definitely pick something and stick with it. I like that Lean is quite opinionated about docstrings, it makes the docs experience very smooth

Henrik Böving (Apr 12 2022 at 20:38):

Aha! https://leanprover-community.github.io/mathlib4_docs/Lean/Syntax.html#Lean.Syntax.updateLeading found another one @Arthur Paulino the double paragraph is indeed at fault

Arthur Paulino (Apr 12 2022 at 20:40):

I suspected of the double paragraph because the second one started at 4 white spaces from the left border and that qualifies for a code block according to markdown syntax

Xubai Wang (Apr 13 2022 at 02:41):

The first one is parsed with no space as I test it.

def printDoc (n : Name) : MetaM Unit := do
  match  findDocString? ( getEnv) n with
  | some docs => IO.println docs
  | _ => throwError "not found"

#eval printDoc `Lean.Syntax.updateLeading
Set `SourceInfo.leading` according to the trailing stop of the preceding token.
    The result is a round-tripping syntax tree IF, in the input syntax tree,
    * all leading stops, atom contents, and trailing starts are correct
    * trailing stops are between the trailing start and the next leading stop.

    Remark: after parsing, all `SourceInfo.leading` fields are empty.
    The `Syntax` argument is the output produced by the parser for `source`.
    This function "fixes" the `source.leading` field.
......

And the rendered sample with commonmark spec

Xubai Wang (Apr 13 2022 at 02:50):

The docComment parser is here:

def docComment := leading_parser ppDedent $ "/--" >> ppSpace >> commentBody >> ppLine

We may remove the ppSpace and then clean up the indentations. But I'm not sure if we can get position of the /-- parser.


Last updated: Dec 20 2023 at 11:08 UTC