Zulip Chat Archive

Stream: lean4

Topic: docs4#


Henrik Böving (Feb 13 2022 at 15:03):

I just ported the doc-gen search over to doc-gen4 (https://leanprover-community.github.io/mathlib4_docs/), including the /find/decl endpoints so in theory someone with the correct permissions should be able to set up something similar to this docs# stuff the Lean 3 people are using (I propose docs4#)

Notification Bot (Feb 15 2022 at 17:47):

Henrik Böving has marked this topic as unresolved.

Henrik Böving (Feb 15 2022 at 17:47):

I think that should be fine for a first PoC yes, but we should definitely try to swap it out for a pure Lean 4 parser at some point.

Arthur Paulino (Feb 15 2022 at 17:52):

References that might help:

Xubai Wang (Feb 15 2022 at 18:40):

Henrik Böving said:

I think that should be fine for a first PoC yes, but we should definitely try to swap it out for a pure Lean 4 parser at some point.

I agree with it, but as I have started a new semester, it will take more time than I have expected.

@Arthur Paulino Thanks, I will read about that.

Arthur Paulino (Feb 15 2022 at 18:44):

@Xubai Wang if/when you start the pure Lean 4 parser, feel free to ping me and I will help

Xubai Wang (Feb 15 2022 at 18:48):

Kevin Buzzard said:

That definition doesn't have a docstring!

Both doc string and mod doc rendering has been done here, based on cmark and positioned mod doc. Once lean supports the latter feature, we can immediately merge these commits.

Xubai Wang (Feb 15 2022 at 18:48):

@Arthur Paulino Sure!

Arthur Paulino (Feb 17 2022 at 11:34):

I went ahead and defined a markdown DSL inspired by those Haskell references that I posted.
However, if I am to parse the string, I might do it in a "try hard" way.

What's a smarter way of doing so? I've searched on Lean 4 repo and I couldn't find an API for regular expressions

Gabriel Ebner (Feb 17 2022 at 11:36):

I would try using these parser combinators: https://github.com/leanprover/lean4/blob/6b1297fe85575c91cf613640bdd96cd1241d1a9e/src/Lean/Data/Parsec.lean (example XML parser: https://github.com/leanprover/lean4/blob/95aec2cf93762530a9c69dfc98c165a8b54238e2/src/Lean/Data/Xml/Parser.lean)

Gabriel Ebner (Feb 17 2022 at 11:37):

We don't have regular expressions (yet).

Arthur Paulino (Feb 17 2022 at 11:42):

Thanks! Will check those out

Xubai Wang (Feb 17 2022 at 12:22):

I would recommend looking at harkdown and commonmark-hs (especially the latter one for its extensibility, we may need more features like equations and diagrams). They both implement CommonMark, using Parsec and following the two stage (block -> inline) approach.

Gabriel Ebner said:

I would try using these parser combinators: https://github.com/leanprover/lean4/blob/6b1297fe85575c91cf613640bdd96cd1241d1a9e/src/Lean/Data/Parsec.lean (example XML parser: https://github.com/leanprover/lean4/blob/95aec2cf93762530a9c69dfc98c165a8b54238e2/src/Lean/Data/Xml/Parser.lean)

The Lean Parsec itself cannot fit in this job, as the link reference definitions element in CommonMark requires recording user state. One approach I can think of (and tried) is to extend Parsec with StateT:

def LinkRefMap := HashMap String (String × Option String)

def StateParsec σ := StateT σ Parsec

instance : MonadLift Parsec (StateParsec σ) where   -- to reuse current Parsec code
  monadLift p := match p s with
  | Parsec.ParseResult.success it res => Parsec.ParseResult.success it (res, st)
  | Parsec.ParseResult.error it err => Parsec.ParseResult.error it err

def Parser := StateParsec LinkRefMap

def partial : Parser Partial := sorry  -- block only

def markdown : Parser Markdown := sorry  -- full with both block and inline elements

Mario Carneiro (Feb 17 2022 at 12:26):

You also don't have to limit yourself to haskell libraries (all the examples given thus far seem to have this in common so I guess it's not a coincidence). Better to use a good / well tested library than an FP library as basis

Gabriel Ebner (Feb 17 2022 at 12:30):

@Xubai Wang It should be enough to abbrev Parser := StateT LinkRefMap Parsec. If you make it an abbrev, then you get the MonadLift/etc. instances automatically.

Arthur Paulino (Feb 17 2022 at 12:31):

Just for the record, this is the DSL I wrote yesterday:

inductive Inline
  | text   : String  Inline
  | bold   : String  Inline
  | italic : String  Inline
  | link   : String  String  Inline
  | image  : String  String  Inline
  | code   : String  Inline

inductive HeaderLevel
  | h1 | h2 | h3 | h4 | h5 | h6

abbrev Content := List Inline

inductive Block
  | header    : HeaderLevel  Content  Block
  | paragraph : Content      Block
  | quote     : Content      Block
  | itemize   : List Block   Block
  | enumerate : List Block   Block
  | codeBlock : String       String   Block
  | divider   : Block

Xubai Wang (Feb 17 2022 at 12:33):

Yes, markdown-it is also an good option to look at, and it provides the highest flexibility among all the packages I have looked at. commonmark-hs is limited due to its SyntaxSpec definitions, while markdown-it uses a chain of rules. But in order to imitate it we need more flexible data structure, and I don't know how to get this done in a strongly typed functional language.

Matthew Ballard (Feb 17 2022 at 14:19):

Can latex be included in documentation for either v3 or v4?

Xubai Wang (Feb 17 2022 at 14:24):

In doc-gen3 the mathjax is used (see this page). And in doc-gen4 we will continue to use mathjax since mathml is not supported by Chrome.

Matthew Ballard (Feb 17 2022 at 14:26):

Great! I assume it is rendered client side?

Arthur Paulino (Feb 17 2022 at 14:26):

Yeah, more info here

Matthew Ballard (Feb 17 2022 at 14:28):

I use katex for course notes and render it on my machine before pushing (now using mdbook, before I did it client-side). What I would give for an extended markdown language that provides support for theorem environments and easy cross-linking...

Arthur Paulino (Feb 17 2022 at 14:42):

@Matthew Ballard offtopic, but this is how I set MathJax in my blog. If your repo has templates, you can just add that line inside the <head> of your pages and they will render LaTeX automagically.
Or you can follow these instructions

Then this is how you can type stuff that will be rendered in LaTeX. It's written in Brazilian portuguese, but you'll get what I mean

Jireh Loreaux (Feb 17 2022 at 16:31):

@Matthew Ballard I wrote a plugin for Gitit that uses Pandoc's native divs to support these features. The Gitit server keeps a database (just a text file) of information used for cross-linking and it has support for automatic numbering within each page. The code is kind of garbage and you have to run your own server, but I used it for a few semesters with students and I keep it for my private use too. If you want, I'll give you access to the code.

Xubai Wang (Feb 17 2022 at 17:43):

Hey, guys. I would like to collect some feedback on intra-documentation links.

I have rendered my Socket.lean package to GitHub Pages, and such a definition

/--
  Opaque reference to underlying platform specific socket.
  To create a `Socket`, refer to [`Socket.mk`](#Socket.Socket.mk). Then you
  can manipulate the `Socket` using common socket functions like
  [`Socket.bind`](#Socket.Socket.bind),[`Socket.connect`](#Socket.Socket.connect),
  etc. For all functions available, refer to the [`Socket` module](Socket/Socket.html).
  *NOTE*: `NonemptyType` is used to implement `Inhabited` for `Socket`. [detailed explanation](#explanation-usage-of-nonemptytype)
  ```lean
  import Socket
  open Socket
  def main : IO Unit := do
    let s ← Socket.mk
    -- some socket operations
    s.close
  ```
-/
def Socket : Type := Socket.Nonempty.type

renders to https://xubaiw.github.io/Socket.lean/Socket/Basic.html#Socket.Socket, with the following screen capture

Socket.Socket

So the plain code spans ` Socket ` and ` Inhabited ` automatically turns into links, [`Socket.mk`](#Socket.Socket.mk) automatically get linked to Socket.mk in another page (note that there is a # prefix),
[Socket module](Socket/Socket.html) is prefixed with baseUrl, and[detailed explanation](#explanation-usage-of-nonemptytype) get linked to that id in this page.

I'm not very familiar with old doc-gen3 (and fail to find some intra documentation links in mathlib), and not sure whether this design is "correct", good enough or not, but it feels more consistent than the Rust way. These are the pseudo processing steps:

-- parse as name or id
if link starts with "#"
    if name found globally (precise) or in current module
        | yes => link it to the name
        | no => do not modify (id on the same page)
-- parse as absolute or relative url
else
    | absolute  => do not modify
    | relative => add module base

Xubai Wang (Feb 17 2022 at 17:47):

I also wonder if there are some other missing doc string features that someone can remind me of (except for code highlighting and equations, I will work on them tomorrow).

Gabriel Ebner (Feb 17 2022 at 18:08):

I don't think we have a special way to link to definitions in doc-gen 3 (except for writing the full definition name). The proposed syntax looks reasonable enough, although it would be nice if it didn't fail silently if the declaration can't be found.

Gabriel Ebner (Feb 17 2022 at 18:09):

BTW:

NOTE: NonemptyType is used to implement Inhabited for Socket. detailed explanation

This is Nonempty now, not Inhabited.

Xubai Wang (Feb 17 2022 at 18:13):

Gabriel Ebner said:

I don't think we have a special way to link to definitions in doc-gen 3 (except for writing the full definition name). The proposed syntax looks reasonable enough, although it would be nice if it didn't fail silently if the declaration can't be found.

But we also need to link to headings (#id), in this example the #explanation-usage-of-nonemptytype.

Since id and name share the same syntax, so if explicit failure is required we may pick another syntax for name first.

Gabriel Ebner (Feb 17 2022 at 18:18):

That's what I was hinting at.

Xubai Wang (Feb 17 2022 at 18:22):

I cannot come up with a good alternative, maybe @, %, ~ or ->?

Gabriel Ebner (Feb 17 2022 at 18:45):

@decl has the advantage that it is valid Lean syntax. My other suggestion would have been ##decl.

Xubai Wang (Feb 17 2022 at 18:53):

##decl looks great!

Xubai Wang (Feb 19 2022 at 14:21):

My Unicode.lean package may do some help to CommonMark parsing, as the specification involves some unicode stuff.

Henrik Böving (Feb 20 2022 at 11:04):

Kevin Buzzard said:

That definition doesn't have a docstring!

https://leanprover-community.github.io/mathlib4_docs/Mathlib/Data/List/Basic.html doc strings are alive thanks to @Xubai Wang :octopus:

Henrik Böving (Feb 20 2022 at 17:49):

Before we merge https://github.com/leanprover/doc-gen4/pull/38#issuecomment-1046287290 is it possible to change the src# and docs# links to support this format? @Sebastian Ullrich I don't know how powerful Zulip is in this regard so just to make sure we don't have to revert it right away.

Sebastian Ullrich (Feb 20 2022 at 17:56):

What's changed, just the #? The current rules look like this:

docs4#(?P<decl>[\w_.'-]*[\w_'-])    https://leanprover-community.github.io/mathlib4_docs/find/%(decl)s

Henrik Böving (Feb 20 2022 at 17:57):

Yeah and for the src# you'd need to append the /src to it

Mario Carneiro (Feb 21 2022 at 04:21):

That's how src# links already work, so if you are doing the same thing then it should just work

Xubai Wang (Feb 21 2022 at 05:04):

Mario Carneiro said:

That's how src# links already work, so if you are doing the same thing then it should just work

There is a bit difference. We now need an extra # number hash before declaration name, like <base>/find/#mul_le_mul and <base>/find/#mul_le_mul/src. The original way is not Windows friendly, and # is inevitable as we are limited to static serving.

Mario Carneiro (Feb 21 2022 at 05:21):

Doesn't ? also have a special meaning in URL strings?

Mario Carneiro (Feb 21 2022 at 05:22):

although this is after the fragment specifier so maybe it's okay

Mario Carneiro (Feb 21 2022 at 05:23):

Does that mean that <base>/find/ is now a huge page with every definition in it?

Xubai Wang (Feb 21 2022 at 06:05):

No, it just works the same way of search. It fetches the "every definition" and then do the search.

Xubai Wang (Feb 21 2022 at 06:08):

It doesn't lead to obvious time delay when I test it on Mathlib4 locally. But if you are worried about the performance, we can then store the "every definition" in browser cache (indexedDB) and add a hash file to check if it needs to be updated.

Xubai Wang (Feb 21 2022 at 06:24):

Mario Carneiro said:

Doesn't ? also have a special meaning in URL strings?

Oh yes, I forgot about that. And ? is a definitely more extensible solution. We may use /find/?query=<decl>&type=src, and may add more options to that later (e.g. type=interactive for LeanInk integration). @Henrik Böving How do you think about that?

Mario Carneiro (Feb 21 2022 at 06:31):

I was going to suggest that myself, but ? itself causes a problem with that

Mario Carneiro (Feb 21 2022 at 06:31):

You would need the linkifier to do percent encoding

Mario Carneiro (Feb 21 2022 at 06:32):

(I'm wondering whether any of this is making Leo and Sebastian reconsider ? in identifiers...)

Xubai Wang (Feb 21 2022 at 06:37):

Mario Carneiro said:

You would need the linkifier to do percent encoding

I think it's not a problem for modern browsers?

screen capture

Mario Carneiro (Feb 21 2022 at 06:54):

well, depending on how precise the parser is, that string might be rejected as not a URL

Gabriel Ebner (Feb 21 2022 at 09:08):

Read RFC 3986: https://datatracker.ietf.org/doc/html/rfc3986/#section-3.4 The question mark is not an issue; the query string begins with the first question mark and only ends with a # character. As long as we don't get Haskell-style UInt8# identifiers we should be fine.

Gabriel Ebner (Feb 21 2022 at 09:09):

We could also put the query in the fragment identifier, which has no such restriction. That is, /find/src/#<decl>.

Xubai Wang (Feb 21 2022 at 10:11):

Comparing these two, personally I would stand behind the ? query syntax:

  1. ? Query has a clearer semantic, we are requesting for a redirect service rather than some element on one page.
  2. # Fragment is path based and can not be combined with each other, which is less extensible.
  3. For developing, # requires writing one more Lean module (and corresponding JS module) for that, while for ? we only need to modify the endpoint JS file.

I also suggest renaming the endpoint from /find to /endpoint in case we need to change the Zulip linkifier in future.

Gabriel Ebner (Feb 21 2022 at 10:13):

For developing, # requires writing one more Lean module (and corresponding JS module) for that, while for ? we only need to modify the endpoint JS file.

I'm not sure I understand this point, you can access both # and ? from javascript. You could even do /find/#src/<decl> if you don't want to do two pages.

Xubai Wang (Feb 21 2022 at 10:24):

That's true.

Mario Carneiro (Feb 21 2022 at 10:29):

not a fan of /endpoint, that's a bit too jargon for my taste

Xubai Wang (Feb 21 2022 at 11:04):

Reading about the fragment part of RFC 3986, fragment is used to represent a portion or a view of the primary resource.
So semantically best syntax might be /find?q=Nat.add#src and /find?q=Nat.add#doc :upside_down:

Gabriel Ebner (Feb 21 2022 at 11:06):

That intention is long gone, # is used for many other things today.

Mario Carneiro (Feb 21 2022 at 11:08):

heh, just look up at the url of this zulip page

Xubai Wang (Feb 21 2022 at 11:45):

That also follows the portion meaning, and as zulip is not statically served, it won't conflict with the semantic for html.

Well, I think we are now some bit off the topic. The actual syntax of find doesn't matter a lot, as users will not interact with it directly. Any windows-compatible way is good enough. :D

Henrik Böving (Feb 23 2022 at 18:50):

@Sebastian Ullrich the search syntax was now changed as explained here https://github.com/leanprover/doc-gen4/pull/38#issuecomment-1048238235, src stuff should work as well now, furthermore generating documentation under Windows should now work flawlessly as well since we no longer create the problematic files @windows users feel free to try.

Mario Carneiro (Feb 23 2022 at 18:50):

are the links live now?

Henrik Böving (Feb 23 2022 at 18:52):

Ehh, sebastian or someone with the correct permissions would need to change the way Zulip links the docs4# and src4# stuff but the github pages end should work yes

Mario Carneiro (Feb 23 2022 at 18:52):

that's me, I'm looking for the syntax to add

Mario Carneiro (Feb 23 2022 at 18:53):

I lost track of what syntax got implemented in the end

Henrik Böving (Feb 23 2022 at 18:53):

That's the one linked in the issue comment above ^ (well both work now :p, pick your poison)

Mario Carneiro (Feb 23 2022 at 18:55):

docs4#Nat.add src4#Nat.add

Henrik Böving (Feb 23 2022 at 18:57):

So now where we got all the "easy" features implemented we can start looking into the fancy stuff :eyes:, how is leanink doing? @Niklas Bülow

Niklas Bülow (Feb 27 2022 at 23:55):

Sorry, for the slight delay in responding. I'm currently in the process of writing down my bachelor thesis, but I'll probably complete it in the next few weeks. Do you already have something in mind on how we can integrate both tools with each other? or a preferred interface point in doc-gen4 from where you would like to interact with LeanInk?

Xubai Wang (Feb 28 2022 at 05:56):

Can we put it in a <details> fold like Equations and Instances? Or a separate page to be linked to like source?

Sebastian Ullrich (Feb 28 2022 at 08:39):

Yes, a (separate) source link (or moving the current one to a GH icon or something) like in Haddock sounds good & simple to me.
https://hackage.haskell.org/package/transformers-0.6.0.4/docs/Control-Monad-Trans-State-Lazy.html#t:StateT
https://hackage.haskell.org/package/transformers-0.6.0.4/docs/src/Control.Monad.Trans.State.Lazy.html#StateT

Xubai Wang (Feb 28 2022 at 09:13):

How about naming it interactive or simply ink, rather than replacing the source? There seems to be lots of interactive widgets in Alectryon.

Patrick Massot (Feb 28 2022 at 09:37):

The interactive source would be superior to the GitHub link in every aspect, I don't really see why we would want to keep both. The interactive source could still include a link to GitHub just in case, but there is no need to clutter the docs4 page.

Gabriel Ebner (Feb 28 2022 at 10:06):

The github page allows you to see the list of commits touching the file, the git-blame output, etc., which is not in the interactive source. But I agree, the default link should go to the interactive source.


Last updated: Dec 20 2023 at 11:08 UTC