Zulip Chat Archive

Stream: general

Topic: markdown module docstrings

Chris Hughes (Nov 04 2019 at 20:41):

Noob question.

Since we're writing all these docstrings at the top of .lean files in mathlib in markdown, presumably there's an easy way to see the markdown? What does /-! -/ do?

Rob Lewis (Nov 04 2019 at 20:45):

Nothing useful with Lean 3.4.2 -- I think there's something missing from the API that's either fixed or fixable in the community fork. They're docstrings associated with the file instead of a particular declaration, and metaprograms can in principle access them.

Rob Lewis (Nov 04 2019 at 20:46):

IIRC in 3.4.2 you can only get the docstring for the current file.

Rob Lewis (Nov 04 2019 at 20:46):

It's more of a future-proofing thing. At some point we'll want to generate HTML docs for the library, and having the comments already in markdown will be convenient then.

Rob Lewis (Nov 04 2019 at 20:48):

It might actually be a useful VSCode plugin to format and display the module doc for the current file when you press a button.

matt rice (Nov 05 2019 at 08:10):

You can see lean 3.4.2's tests doc_string3.lean for the extent of the non-community edition support e.g. there are some tactics for printing out the doc_strings.

I haven't tested if lumpy-leandoc does the right thing for /-! -/ yet (edit: it should work)... but it'd be very easy to add something which extracts the .md file to it in addition to the limited html/pdf stuff it currently does.

matt rice (Nov 05 2019 at 20:18):

@Chris Hughes I fixed up lumpy so it could build mathlib again, mathlib-docs.zip

Rob Lewis seems correct in that it's not useful with 3.4.2 in lumpy at least, the /-! -/ comments never make it to the output, so /-! -/ only shows up when the lean files were built with the community edition.

The way lumpy treats /-! -/ is to include it verbatim, i.e. It doesn't prepend it with the name of the definition to which the doc_string is associated, because it's generally just associated with the section that corresponds to the file.
Hopefully that helps.

Rob Lewis (Nov 05 2019 at 20:25):

That's actually closer to real doc generation than I realized we were.

Rob Lewis (Nov 05 2019 at 20:26):

It seems to be treating the sections of module docs as sections of the document, which is very confusing.

Rob Lewis (Nov 05 2019 at 20:26):

But the core structure is there.

Rob Lewis (Nov 05 2019 at 20:28):

It seems to be treating the sections of module docs as sections of the document, which is very confusing.

In the PDF, that is. The HTML looks better, except it doesn't print the def names? Which do appear in the PDF?

matt rice (Nov 05 2019 at 20:28):

Yeah, it's because I was following the html -> tex convention that e.g. <h1> corresponds to \chapter <h2> \section etc, and the docs aren't really built with that in mind

Rob Lewis (Nov 05 2019 at 20:29):

Yeah, this was the forseeable problem with creating the doc standards before we could generate docs.

matt rice (Nov 05 2019 at 20:31):

Yup, I haven't done a lot of work on the html generation (or in fact looked at it today), it should in theory be easy to add unless i have forgotten why it is missing, I will try and look at it again tomorrow and fix that if I can.

Rob Lewis (Nov 05 2019 at 20:37):

Cool. What are the prospects for linking? Something like this: "if an identifier appears between backticks, and we've seen a declaration with that name, turn it into a link." I'm guessing that's way more work?

matt rice (Nov 05 2019 at 20:48):

Something like that, It'll need some thought, as it parses all lean files in parallel currently, so i'll probably have to do a pass over declarations build a table of declarations first before generating links (I'm guessing we want to link across files, which means we also need to mirror the import behavior I suppose)... :octopus:

Rob Lewis (Nov 05 2019 at 20:54):

With two passes I think you could ignore the import tree. Generate a table the first pass, and note which html file each declaration will live in. Then fill in the info in the second pass. Namespacing means this will be impossible to get exactly right though. There's no way doc strings will be consistent about referring to declarations with full namespaces, and there will be ambiguous cases.

matt rice (Nov 05 2019 at 20:58):

yeah, shall see if we can get away with that, anyhow i must get going.

Rob Lewis (Nov 05 2019 at 21:02):

Thanks for looking into this! It would be great to generate HTML docs for the website and this isn't so far off.

matt rice (Nov 06 2019 at 09:10):

Well, i updated it (and the zip file) to add the names of declarations, and a horizontal rule separating the header.

I think i will look at making TeX rendering optional, because the output isn't ready for consumption and it makes integrating into CI difficult since it needs lots of font stuff compiled in a specific way... With that it'll hopefully be easier to integrate into the website...

Rob Lewis (Nov 06 2019 at 09:41):

Nice! I just checked out a couple of the better documented files. It looks great. https://leanprover-community.github.io/doc_demo/manifold.html

Rob Lewis (Nov 06 2019 at 09:42):

What about printing the type of the declaration along with the name?

Rob Lewis (Nov 06 2019 at 09:45):

Right now it only prints declarations that have docstrings, right? Our doc guidelines don't require them for theorems, since in principle all you need to know about a theorem is its statement. Complicated/big/important theorems should have docstrings anyway, but we probably want more than just those in the doc output.

Rob Lewis (Nov 06 2019 at 09:46):

Is it overkill to print everything in the html docs? It feels to me like it should describe a "public" API, and there are plenty of things in the library that don't need to show up there. But I'm not sure.

Johan Commelin (Nov 06 2019 at 09:47):

@matt rice Wow, that looks really good

Rob Lewis (Nov 06 2019 at 09:47):

The <div>s containing the declarations should have a class so we can style it better, but that's easy.

Rob Lewis (Nov 06 2019 at 09:48):

Also we need index pages. Also easy, I hope :)

Rob Lewis (Nov 06 2019 at 09:53):

Once this is up and running, it gives another argument for a mathlib release schedule. We could update the docs daily. But it would probably be clearer to have fixed versions.

matt rice (Nov 06 2019 at 09:57):

I will look at printing the type declaration, it's been on my list of things to do, but I have been scared that it's just going to give me the thing that the type is definitionally equal to, since i'm pulling it out of the bytecode :D

I don't know anything about html really, i'll try and figure out the div's class, I assume i need to declare some empty style e.g. decl and then give it the class decl?

I also wish I could style the syntax highlighting rather than hard coded colors, but that is going to take some work.

Rob Lewis (Nov 06 2019 at 10:00):

For the div classes, the easy part is printing <div class="decl"> instead of just <div>. Then someone needs to add a css file that styles them properly, which we probably want to integrate with the rest of the website.

Rob Lewis (Nov 06 2019 at 10:01):

I suspect you'll get a presentable type, but I don't know.

matt rice (Nov 06 2019 at 10:02):

index should be easy it's already in a big table just need to enumerate the keys of.

Johan Commelin (Nov 06 2019 at 10:03):

Can you use pygments to generate highlighted code?

matt rice (Nov 06 2019 at 10:06):

it's currently using syntect with a conversion of the vscode syntax highlighting, I haven't really compared to how vscode gets highlighted yet to see if that conversion is accurate though.

Using pygments would be a big change. Part of the reason i chose syntect is so i can compile the whole thing to web assembly eventually and not shell out to stuff.

Johan Commelin (Nov 06 2019 at 10:09):

Aha... pandoc uses pygments, and so does Zulip. I don't know syntect.

Rob Lewis (Nov 06 2019 at 10:13):

For the syntax highlighting, we could at least move the style definition to a separate css file, so the colors are specified once instead of every time code is displayed.

Rob Lewis (Nov 06 2019 at 10:14):

The Zulip archive lets Zulip do the structuring (with pygments, I think?) and then copies the Zulip CSS to display it.

Rob Lewis (Nov 06 2019 at 10:17):

I don't know how different the syntect structuring is, presumably it's not compatible. But we should be able to manually match the colors for analogous items.

matt rice (Nov 06 2019 at 10:18):

Yeah, the problem is the syntax highlighter hard coding the colors, the current solution to this is to hard code the colors for each thing to a specific unique value and then sniff the colors and turn them into the appropriate CSS. Ideally I would like to just get it to emit CSS directly.

Rob Lewis (Nov 06 2019 at 10:19):

Oh, I see. That's kind of annoying.

matt rice (Nov 06 2019 at 10:20):

Yeah, there should be a way to do it, but this was easy to add as a placeholder for now.

Rob Lewis (Nov 06 2019 at 10:22):

It's maybe not ideal, but it looks fine as-is. I wouldn't worry about it too much.

Rob Lewis (Nov 06 2019 at 10:23):

Nice! I just checked out a couple of the better documented files. It looks great. https://leanprover-community.github.io/doc_demo/manifold.html

Updated to apply the default stylesheet. Not bad, hmm?

matt rice (Nov 06 2019 at 10:24):

The main thing that had me tearing my hair out with pygments was that each backend has different supported rendering options, like line numbers, and things So like html may support line-numbers and latex ignore them and what not. Eventually i would like to be able to attach footnotes to lines-numbers and for that I need some fairly custom syntax highlighting anyways.

Rob Lewis (Nov 06 2019 at 10:27):

Speaking of line numbers, does the Lean byte code let you access the line number of a declaration? Linking declarations to their statement on GitHub would be really nice.

Johan Commelin (Nov 06 2019 at 10:31):

Right now it only prints declarations that have docstrings, right? Our doc guidelines don't require them for theorems, since in principle all you need to know about a theorem is its statement. Complicated/big/important theorems should have docstrings anyway, but we probably want more than just those in the doc output.

I think that maybe we should only print things that do have a docstring. Otherwise it becomes close to replicating the source. These pages could be a good intermediate step that extracts the "juicy" bits.

Rob Lewis (Nov 06 2019 at 10:34):

Then I think we need to raise the percentage of theorems that get documented. The Python docs wouldn't be very helpful if they only told you the juicy parts about manipulating strings.

Rob Lewis (Nov 06 2019 at 10:34):

In principle the module docs at the top should point to the juicy bits.

Johan Commelin (Nov 06 2019 at 10:35):

So what is the purpose of these documents over the github source (besides typesetting the markdown)?

Rob Lewis (Nov 06 2019 at 10:35):

Omitting declaration bodies gets you pretty far from the source.

Rob Lewis (Nov 06 2019 at 10:36):

Imagine if the Python docs showed you the implementation of every function.

matt rice (Nov 06 2019 at 10:36):

Yeah, you can extract line numbers, olean-rs/types.rs PosInfo structure. We should be able to make links revision-stable as well if we can pull the sha out of git.

Rob Lewis (Nov 06 2019 at 10:40):

But again, I'm not sure that every theorem deserves to show up in the HTML. Maybe we could use a @[nodoc] attribute to make it opt-out instead of opt-in. We should have access to attributes from the byte code, right?

matt rice (Nov 06 2019 at 10:49):

I believe so looks like either Modification::UserAttribute or Modification::Attr. let me know what you guys decide, currently I just ignore undocumented things.

matt rice (Nov 06 2019 at 10:50):

Anyhow, going go try and implement some of the things discussed.

Rob Lewis (Nov 06 2019 at 10:51):

Hmm, a tougher one, can you identify when declarations are builtin or user-defined? We have a tactic to do this. But that might not be contained in the byte code.

Rob Lewis (Nov 06 2019 at 10:51):

We don't want to include random recursors and aux definitions in the docs.

matt rice (Nov 06 2019 at 10:52):

I'm not sure, there has been plenty to do without getting too in depth with the bytecode so far :)

Rob Lewis (Nov 06 2019 at 10:53):

Hah, fair enough!

Wojciech Nawrocki (Nov 06 2019 at 11:03):

It seems that most (all?) Lean-generated definitions have a name starting with an underscore at some point in the namespace - _main, _sunfold, _eqn_n. In addition, Lean rejects identifiers starting with the underscore, since these are reserved for the system, so it might suffice to just look for those in the name.

Rob Lewis (Nov 06 2019 at 11:06):

The things that come along with an inductive type don't. rec_on, no_confusion, etc.

Rob Lewis (Nov 06 2019 at 11:07):

Of course, it's easy enough to blacklist those.

Mario Carneiro (Nov 06 2019 at 11:08):

There is a pretty comprehensive autogenerated definition check in the linter

Rob Lewis (Nov 06 2019 at 11:09):

But if we're extracting things with olean-rs we can't run that check.

Rob Lewis (Nov 06 2019 at 11:09):

Maybe the check could be reimplemented.

Wojciech Nawrocki (Nov 07 2019 at 11:48):

An issue with printing types is that olean files seem to contain them in raw form. Unless I'm missing something, pretty-printing would then require processing all notation and pretty much reimplementing the pretty-printer. As an alternative, we could query the Lean server for types of things from lumpy. This might be quite slow unfortunately.

matt rice (Nov 07 2019 at 21:12):

updated the docs again, to include index and the css stuff. mathlib-docs.zip, I find the alphabetical order on directory/file mix bothersome, but it functions.

Rob Lewis (Nov 08 2019 at 15:30):

Maybe there's a better approach to this. Instead of extracting info directly from the oleans, should we do a first pass with a Lean metaprogram? That way we can use the existing autogenerated check, pretty-print types, etc.

Rob Lewis (Nov 08 2019 at 15:31):

We'd just fold over the environment and for each declaration export the name, type, file location, docstring, and any attributes we want to display. The export can be in whatever format is convenient for the formatter to parse.

Rob Lewis (Nov 08 2019 at 15:34):

One thing you can't get from this approach is info about file imports. I guess we'd want to print that at some point.

Wojciech Nawrocki (Nov 08 2019 at 15:34):

Yes, that is the approach I think we should take. Instead of a metaprogram though, I've hacked on Lean a bit to make it produce the list of declarations in each .lean file, as well as the top-level docstrings for it. Re. file imports, I'm not sure about tactic mode, but they should be obtainable from module_info in the c++.

Rob Lewis (Nov 08 2019 at 15:38):

Ah, cool. Does it produce the list at the time the file is compiled, or after the whole build is done? If we want to print attributes, it's important to generate this at the end. If a declaration gets tagged as a simp lemma in a later file than where it's declared, we need to catch that.

Wojciech Nawrocki (Nov 09 2019 at 21:45):

It produces the list using the environment that exists at the last line of the file being compiled. Note that we cannot know a priori all places where a lemma is tagged as simp, just as we cannot know a priori all instances of a type class. For this reason, I would say that such tags should be documented in the module in which they exist rather than at the point where the to-be-tagged-simp lemma is declared. Same goes for typeclass instances. We could aggregate them after processing all of mathlib and then retroactively add forward-links, but that wouldn't extend to a general notion of package and would be more of a hack.

Wojciech Nawrocki (Nov 09 2019 at 21:47):

By the way, while for typeclasses this makes perfect sense, the design choice of allowing lemmas to be tagged as simp after they are declared seems to only make sense if a lemma can be locally-simp, i.e. a useful simplification in the context of some theory, but maybe not generally. I'm not sure if this happens in mathlib but it seems plausible.

Wojciech Nawrocki (Nov 09 2019 at 22:18):

Oh, and because the signature of module_doc_strings : tactic (list (option name × string)) is weird, I am considering deleting it and replacing with module_doc_string : string -> string, which returns the module doc string for the module identified by hopefully same scheme as in import, e.g. module_doc_string ".mymod" should return the doc for ./import.lean. The declaration docs are still retrievable by folding over the environment, so no expressivity is lost.

Rob Lewis (Nov 10 2019 at 13:25):

I think, in general, we add attributes as soon as they're available. But this isn't always in the file where the declaration is defined: see e.g. the bottom of norm_cast. It intuitively seems useful to document that int.coe_nat_succ is a move_cast lemma. That is indeed some kind of forward reference, but I'm not sure that's a problem for this kind of documentation. It's documenting the library as a whole, forward references aren't necessarily wrong. But it does seem wrong when an attribute is added deep in the file hierarchy. Declaring nat.prime as a class in the p-adics files is an example of this, but there are also other reasons this shouldn't happen and WIP to change it.

Rob Lewis (Nov 10 2019 at 13:27):

Besides tactic attributes like move_cast, ext, etc, there aren't many cases where global attributes are added long after the declarations are defined. There are some reducibility level changes I think.

Rob Lewis (Nov 11 2019 at 16:02):

Oh, and because the signature of module_doc_strings : tactic (list (option name × string)) is weird, I am considering deleting it and replacing with module_doc_string : string -> string, which returns the module doc string for the module identified by hopefully same scheme as in import, e.g. module_doc_string ".mymod" should return the doc for ./import.lean. The declaration docs are still retrievable by folding over the environment, so no expressivity is lost.

Coming back to this: what would be best (for me) is a function that returns a list of pairs (file, docs), where file is a string that identifies a Lean file and docs is the list of module doc strings for that file. (By module doc strings, I mean the file-level ones, not the ones associated with a declaration.)

Rob Lewis (Nov 11 2019 at 16:03):

As far as I can tell, there's still no way within Lean to get a module doc string from outside the current file. You're storing it in the oleans now, but it's not accessible by a metaprogram yet, right?

Rob Lewis (Nov 11 2019 at 18:35):

I spent a little too long doing this this afternoon, and it's clearly still rough. Here are mathlib docs generated by a combination of metaprogramming and Python: http://robertylewis.com/mathlib_docs/

Rob Lewis (Nov 11 2019 at 18:37):

Essentially, it folds through the environment and writes a json file with the info needed to display. A Python script formats it and tries to link identifiers that it finds in code blocks. Right now, the Python script scrapes module docs from the Lean source files. This will be cleaner if/when they're accessible with a metaprogram.

Rob Lewis (Nov 11 2019 at 18:39):

The formatting is modular. If it's better to produce the data from C++ or olean-rs or whatever, that works too.

Sebastien Gouezel (Nov 11 2019 at 18:42):

This is completely awesome!

Patrick Massot (Nov 11 2019 at 20:39):

Rob, did you put the code somewhere? How can we help? We definitely need some CSS work.

Rob Lewis (Nov 11 2019 at 20:43):

I left the office without pushing it anywhere, but I'll make it available tomorrow. If you want to work on the CSS, you could work on a single file as an example.

Rob Lewis (Nov 11 2019 at 20:44):

The code is almost certainly not Windows friendly.

Patrick Massot (Nov 11 2019 at 20:44):

Who cares?

Rob Lewis (Nov 11 2019 at 20:44):

And scraping the module docs from source is really ugly. It will be more portable if we can avoid that.

Rob Lewis (Nov 11 2019 at 20:44):

Well, me when I'm at home working on my laptop...

Rob Lewis (Nov 11 2019 at 20:47):

You could also try to think of a way to linkify notation. There's not a general solution, since notation can reference more than one declaration, so there's no unique place to link it to. But it looks wrong to have ℕ not be a link.

Patrick Massot (Nov 11 2019 at 20:48):

The first easy thing to do is to filter out _inst_. : .

Rob Lewis (Nov 11 2019 at 20:48):

That's hopeless if we're using a metaprogram to generate the data, but @Wojciech Nawrocki or @matt rice 's approaches might be able to do more.

Johan Commelin (Nov 11 2019 at 20:48):

We really need Lean to cooperate here

Johan Commelin (Nov 11 2019 at 20:49):

Could all of this be easier as an extension of Lean 3.5.c?

Rob Lewis (Nov 11 2019 at 20:50):

With small changes to 3.5 I think we could avoid scraping the module docs from the source, at least.

Johan Commelin (Nov 11 2019 at 20:50):

Even if we don't switch mathlib over to lean 3.5.c, we could still use 3.5.c to generate the docs

Patrick Massot (Nov 11 2019 at 20:51):

Lean 3.5 can already build mathlib, right? So adding stuff that would make documentation easier seems like a very good idea.

Rob Lewis (Nov 11 2019 at 20:52):

Yeah, of course. Wojciech has already been working on the module doc API there.

Wojciech Nawrocki (Nov 11 2019 at 20:54):

As far as I can tell, there's still no way within Lean to get a module doc string from outside the current file. You're storing it in the oleans now, but it's not accessible by a metaprogram yet, right?

This is correct. I have a local branch in which they're accessible from C++ - I could add a tactic that exposes this storage.

Patrick Massot (Nov 11 2019 at 21:00):

The generated html is not well-formed and lacks classes, so working on CSS is not yet possible (but there is no hurry of course).

matt rice (Nov 11 2019 at 21:00):

@Rob Lewis Just wanted to note that there is a branch on lumpy json-input for trying out Wojciech's approach in lumpy, I haven't got it using any of the additional information accessible yet, but i'll probably migrate to that if it works out.

Rob Lewis (Nov 11 2019 at 21:01):

Another formatting thing that might be nice: instead of printing the name above and the full type below, we could bind variables with the name. So instead of

iff_of_eq
∀ {a b : Prop}, a = b → (a ↔ b)

we'd have

iff_of_eq {a b : Prop} (h : a = b)
(a ↔ b)

This would take some experimenting to get the formatting right. The identifier needs to stand out.

Wojciech Nawrocki (Nov 11 2019 at 21:02):

In fact, I also have something that sounds like your JSON generation. hensel.json is an example. It kind of works but performance is really bad as of now, so I'd like to fix that before PRing to lean 3.5. There is also all the internally-generated noise (_private.1428265847.newton_seq_dist_tendsto') which I didn't try to do anything about just yet.
@Rob Lewis does this JSON look similar to what you were exporting?
@matt rice also has a branch of lumpy-leandoc which can parse the JSON

Rob Lewis (Nov 11 2019 at 21:04):

Yeah, same idea, slightly different structure, but it would be easy to convert.

Rob Lewis (Nov 11 2019 at 21:05):

For the internal stuff, there are some meta defs that filter it that we use in the linter.

Rob Lewis (Nov 11 2019 at 21:06):

They could probably be replicated in C++.

Kevin Buzzard (Nov 11 2019 at 21:40):

Because I don't know anything about this black magic going on in this thread I've only half been paying attention. But several people have been in touch with me (because of natnumgame) asking me what maths Lean now has, what maths it needs etc. I am currently pouring all my Lean time into constructing a challenging inequality world for nng but once that's done I was thinking of a blog post of the form "hey, here's what we have now [list of stuff] and here's what we would like [list of other stuff]. I thought I'd mention this here because I was wondering if I could somehow integrate this list into the story somehow, e.g. "for more about groups, see this link here".

Puny initial attempt of "what we need" is here: https://github.com/kbuzzard/xena/blob/795563bfab087fc8cd8e23107fae43dc78b7b46e/100mathchallenges.txt (completely unfinished, the result of a number theoretic brain-dump).

Bryan Gin-ge Chen (Nov 11 2019 at 21:44):

Maybe the "theories" docs could be updated / reworked as well?

Patrick Massot (Nov 11 2019 at 21:53):

Of course it should be updated. And maybe the we should bring it closer to the Lean file, and put it on the same website, with links to the "per Lean file webpages".

Rob Lewis (Nov 11 2019 at 21:54):

I'm not sure we should rush to make the html docs public. This kind of thing works better with fixed releases of mathlib. Having it constantly update makes it harder to link to and harder for Google to index.

Rob Lewis (Nov 11 2019 at 21:54):

That said, it's not a huge deal if we put up a draft version soon-ish.

Rob Lewis (Nov 11 2019 at 21:55):

A lot of the md files in mathlib would be better converted to html and put on the community webpage.

Rob Lewis (Nov 11 2019 at 21:55):

Kevin's list would fit there too, if he wants it to be less Kevin and more mathlib.

matt rice (Nov 11 2019 at 21:56):

I could easily allow lumpy to read .md as well as lean and just do a straight html conversion

Rob Lewis (Nov 11 2019 at 21:58):

I could easily allow lumpy to read .md as well as lean and just do a straight html conversion

For static pages (that aren't generated from the library), it's not necessary. We already use Jekyll to generate the site. Stick the right header on the md file and it will fit right in, styled and everything.

Rob Lewis (Nov 11 2019 at 21:59):

For things like the tactic docs that change a lot, maybe incorporating them into the doc generation would be good.

Wojciech Nawrocki (Nov 12 2019 at 00:19):

81 should do what you need.

Rob Lewis (Nov 12 2019 at 10:58):

Rob, did you put the code somewhere? How can we help? We definitely need some CSS work.

https://github.com/robertylewis/doc_gen/

Rob Lewis (Nov 12 2019 at 10:59):

It generates valid html now, and colors declarations based on whether they're theorems, defs, etc.

Rob Lewis (Nov 12 2019 at 13:20):

One more update for now. I should stop wasting time on this.

Rob Lewis (Nov 12 2019 at 13:20):

http://robertylewis.com/mathlib_docs/linear_algebra/basis.html

Bryan Gin-ge Chen (Nov 12 2019 at 14:33):

It looks like we'll have to be careful now about making sure that module docs and docstrings render well. The second wikipedia URL at the top of ring_theory/maps has underscores which are interpreted as markdown.

Rob Lewis (Nov 12 2019 at 14:48):

Yeah, there are tons of problems. It's hard to get it correct without seeing the end results. A lot more needs to be enclosed in backticks.

Rob Lewis (Nov 12 2019 at 21:13):

It looks like we'll have to be careful now about making sure that module docs and docstrings render well. The second wikipedia URL at the top of ring_theory/maps has underscores which are interpreted as markdown.

I managed to fix this and some mangled lists by passing the right options to the markdown formatter.

Rob Lewis (Nov 12 2019 at 21:14):

Unfortunately there doesn't seem to be an option to turn the urls into links.

Patrick Massot (Nov 12 2019 at 21:15):

Would it be unreasonnable to write markdown urls in Lean file?

Rob Lewis (Nov 12 2019 at 21:15):

That seems totally reasonable.

Patrick Massot (Nov 12 2019 at 21:15):

I'm pretty sure there are currently very few of them, and they are very easy to find.

Patrick Massot (Nov 12 2019 at 21:16):

So we could easily switch.

Rob Lewis (Nov 12 2019 at 21:21):

Yeah. I think we're going to find tons of things that need to be fixed. The mistakes become obvious once you actually parse the markdown.

Rob Lewis (Nov 13 2019 at 12:50):

So, I'm reaching a point where I'm kind of happy with the layout/CSS. Comments are welcome. https://robertylewis.com/mathlib_docs//geometry/manifold/smooth_manifold_with_corners.html

Rob Lewis (Nov 13 2019 at 12:51):

Declaration names link to the corresponding line on GitHub.

Patrick Massot (Nov 13 2019 at 12:51):

Are you color-blind? I'm asking out of curiosity.

Rob Lewis (Nov 13 2019 at 12:51):

What? No, it's a random color palette some website gave me.

Patrick Massot (Nov 13 2019 at 12:52):

We need a web designer.

Rob Lewis (Nov 13 2019 at 12:52):

The pink does jump out a little too much.

Patrick Massot (Nov 13 2019 at 12:52):

Yes, the pink in particular.

Patrick Massot (Nov 13 2019 at 12:52):

And it does feel old too.

Patrick Massot (Nov 13 2019 at 12:53):

And I think keeping the keywords def, lemma etc. would help.

Patrick Massot (Nov 13 2019 at 12:54):

I should have first written that the current result is already amazing!

Rob Lewis (Nov 13 2019 at 12:54):

You're welcome to suggest alternate colors!

Rob Lewis (Nov 13 2019 at 12:54):

And I think keeping the keywords def, lemma etc. would help.

Oh, yeah, I meant to do this. I don't think there's any way to distinguish lemma/theorem, so we'd need to pick one.

Rob Lewis (Nov 13 2019 at 12:55):

It gets a little complicated with structures and inductives, because those are "constants" in the environment. But it should be possible to identify that.

Patrick Massot (Nov 13 2019 at 12:55):

It would be nice to use LaTeX in module docstrings when not quoting Lean code.

Rob Lewis (Nov 13 2019 at 12:56):

It's easy enough to support MathJax.

Patrick Massot (Nov 13 2019 at 12:56):

Is there a link to the GitHub page of the file somewhere? (I know there are links to individual declarations, although they come as a surprise).

matt rice (Nov 13 2019 at 12:56):

It's kind of a minefield though

Patrick Massot (Nov 13 2019 at 12:57):

We should also think about having a responsive layout, for small screen browsing.

Rob Lewis (Nov 13 2019 at 12:57):

Is there a link to the GitHub page of the file somewhere? (I know there are links to individual declarations, although they come as a surprise).

No, where would you suggest adding it?

Patrick Massot (Nov 13 2019 at 12:57):

Somewhere at the top probably.

Patrick Massot (Nov 13 2019 at 12:57):

Maybe even h1 could be a link.

matt rice (Nov 13 2019 at 12:58):

I've been leaning towards something like dvisvgm in lumpy which can convert pretty much any old tex to svg

Sebastian Ullrich (Nov 13 2019 at 12:58):

What I did for my masters thesis was to basically rip off Rust's CSS, which gave me a nice basic layout without any effort on my side...

Rob Lewis (Nov 13 2019 at 12:58):

It's kind of a minefield though

I don't know much about it. We can add hooks in the markdown processing that capture $$ ... $$ blocks. That's as far as I got.

Patrick Massot (Nov 13 2019 at 12:58):

But I don't want to distract you from polya, I shouldn't be writing those comments, and try to find time to try something on my side.

Rob Lewis (Nov 13 2019 at 12:59):

Maybe even h1 could be a link.

This is slightly tricky actually. The entire top block is the module doc from the Lean file. It's much easier to add things around it than inside it.

Patrick Massot (Nov 13 2019 at 12:59):

Sebastian is right, I'm sure we can steal ideas without hiring a web designer.

Rob Lewis (Nov 13 2019 at 13:01):

Yeah, I'm not very interested in making major design changes myself. If someone else wants to reformat I'm very happy to let them.

matt rice (Nov 13 2019 at 13:04):

It's kind of a minefield though

I don't know much about it. We can add hooks in the markdown processing that capture $$ ... $$ blocks. That's as far as I got.

Yeah I think the hard part is that each markdown tool supports variations on a similar syntax, but somehow they all manage to be different
and there is no standard to be found anywhere. here is a breakdown someone did mathdown

Rob Lewis (Nov 13 2019 at 13:11):

Maybe even h1 could be a link.

Oh, most files don't even have an h1 because there's no module doc. I'll add a "View file source" link in the navigation panel instead.

Rob Lewis (Nov 13 2019 at 13:13):

Yeah I think the hard part is that each markdown tool supports variations on a similar syntax, but somehow they all manage to be different

Right now I'm using the markdown2 library for Python which I don't think supports anything. But I can extract the raw text and feed it somewhere else for processing.

Rob Lewis (Nov 13 2019 at 13:13):

Again, in principle, I haven't actually looked into it yet.

matt rice (Nov 13 2019 at 13:13):

@Rob Lewis about your suggestion in 81, If we added h2 to those groupings and had links for those too,

Rob Lewis (Nov 13 2019 at 13:14):

And since there's almost no LaTeX in md blocks right now, I don't think it's a priority!

Patrick Massot (Nov 13 2019 at 13:14):

With mathjax you only need to make sure LaTeX parts are untouched.

Rob Lewis (Nov 13 2019 at 13:17):

Rob Lewis about your suggestion in 81, If we added h2 to those groupings and had links for those too,

Yeah, we could build a TOC and put it after the header block if there are any subsections to link to.

matt rice (Nov 13 2019 at 13:18):

nod my markdown parser doesn't handle them either (well, it refuses until there is a standard).

i've been using e.g.

```latex
```
and
```math
```

matt rice (Nov 13 2019 at 13:18):

a bit verbose...

matt rice (Nov 13 2019 at 16:21):

Anyhow, I know Rob's is a lot nicer, but set up a page with the current index generation here: doc_preview

Wojciech Nawrocki (Nov 13 2019 at 17:27):

The problem with mathjax/katex is that they both have absolutely zero support for diagrams, so if category theorists wanted to draw something with TikZ, it wouldn't work. Dvisvgm does support that, but requires offline generation of svgs.

Rob Lewis (Nov 13 2019 at 17:39):

Alright. Another update, with better support for structures/inductives, printing def/lemma/etc, and a few more tweaks. https://robertylewis.com/mathlib_docs/geometry/manifold/manifold.html

Rob Lewis (Nov 13 2019 at 17:39):

The code is now more or less portable. https://github.com/robertylewis/doc_gen

Rob Lewis (Nov 13 2019 at 17:41):

update-mathlib isn't working when I try to do this locally. It says it succeeds, but then Lean wants to compile all of mathlib, even if I touch all the oleans first.

Rob Lewis (Nov 13 2019 at 17:42):

So if you try to install it you might need to do a full compile or get the oleans from elsewhere. Or maybe it's an issue on my end.

Alex J. Best (Nov 13 2019 at 18:16):

The problem with mathjax/katex is that they both have absolutely zero support for diagrams, so if category theorists wanted to draw something with TikZ, it wouldn't work. Dvisvgm does support that, but requires offline generation of svgs.

Absolutely zero support for diagrams is a little strong, https://github.com/sonoisa/XyJax does exist for mathjax and I've used it for many years as has the stacks project (https://stacks.math.columbia.edu/tag/0BW7), it is basically unsupported officially though and might even be broken with mathjax 3.0.

Patrick Massot (Nov 13 2019 at 20:24):

I don't think we should encourage people to use xypic in 2019. Accepting tikz-cd with server-side rendering seems much wiser.

Bryan Gin-ge Chen (Nov 13 2019 at 20:30):

I just found a project to render Tikz in the browser with WebAssembly: https://github.com/kisonecat/tikzjax

Alex J. Best (Nov 13 2019 at 20:36):

I'm not encouraging, simply stating the facts that it is possible with client-side js

Alex J. Best (Nov 13 2019 at 20:36):

This tikz project seems very cool though!

Wojciech Nawrocki (Nov 13 2019 at 21:31):

Ah sorry, I meant "out-of-the-box support". Tikzjax certainly looks interesting, but I would question its stability - reloading a coredump of the entire TeX compiler in the browser sounds a bit crazy.

Patrick Massot (Nov 13 2019 at 21:33):

Server (ie author)-side rendering looks much simpler and faster. You don't want you browser to reflow a commutative diagram anyway.

Bryan Gin-ge Chen (Nov 13 2019 at 22:03):

How many commutative diagrams do we have in the docstrings right now?

Rob Lewis (Nov 20 2019 at 12:24):

Okay, I found a bit more time to work on this. Here's a refresh. Maybe the colors are more to Patrick's taste. https://robertylewis.com/mathlib_docs/

Rob Lewis (Nov 20 2019 at 12:26):

This builds with @Wojciech Nawrocki 's additions to 3.5c, so it can display module docs used to section a file. I don't think we have any of these in mathlib yet, but here's a demo: https://robertylewis.com/mathlib_docs/topology/whoa.html

Mario Carneiro (Nov 20 2019 at 12:29):

The font for theorem statements is a bit large (being used also as the header). Most statements don't fit on one line, and a decent fraction of them seem to have the line break exactly where things get interesting

Mario Carneiro (Nov 20 2019 at 12:30):

One possibility is to remove all implicit and instance implicit arguments from the main display, and put them in smaller type afterward

Johan Commelin (Nov 20 2019 at 12:31):

Nice work! Starts to look really good. I agree with Mario's points though.

Mario Carneiro (Nov 20 2019 at 12:32):

Also, you might try to detect unnamed hypotheses by type so that you can turn more arguments into arrows (especially arguments named a_1, a_2 that were arrows to begin with)

Rob Lewis (Nov 20 2019 at 12:33):

Sure, it's easy enough to change the font sizes. The line breaks will obviously depend on how wide your window is. This can be fiddled with.

Rob Lewis (Nov 20 2019 at 12:34):

One possibility is to remove all implicit and instance implicit arguments from the main display, and put them in smaller type afterward

Not sure exactly what you have in mind with "put them in smaller type afterward." Do you mean state the type twice, once without implicits and once with?

Rob Lewis (Nov 20 2019 at 12:35):

Also, you might try to detect unnamed hypotheses by type so that you can turn more arguments into arrows (especially arguments named a_1, a_2 that were arrows to begin with)

What's the robust way to do this? In effect, "intros until the last named argument."

Mario Carneiro (Nov 20 2019 at 12:37):

not sure if it's better to quote just the omitted implicits or the full type. I was thinking something like:

Line 1 (header): theorem add_monoid.zero_smul (a : β) :
Line 2 (main text): add_monoid.smul 0 a = 0
Line 3 (footnote size): @[simp]; {β : Type u} [add_monoid β]

Mario Carneiro (Nov 20 2019 at 12:38):

If you have expr info, you can check whether a variable appears in the body to determine if it can be "arrowed"

Mario Carneiro (Nov 20 2019 at 12:38):

there is an expr function for this

Mario Carneiro (Nov 20 2019 at 12:39):

expr.is_arrow

Mario Carneiro (Nov 20 2019 at 12:39):

which uses expr.has_var

Mario Carneiro (Nov 20 2019 at 12:42):

I don't feel strongly about how much lean syntax to use in these displays. I and many others will be used to it, but I would also be happy with something that reimagines the syntax a bit or puts more things in a prose style

Rob Lewis (Nov 20 2019 at 12:43):

not sure if it's better to quote just the omitted implicits or the full type. I was thinking something like:

Line 1 (header): theorem add_monoid.zero_smul (a : β) :

Line 2 (main text): add_monoid.smul 0 a = 0

Line 3 (footnote size): @[simp]; {β : Type u} [add_monoid β]

Hmm. This hides the information about where the implicits appear in the signature. Usually you don't need to know this but sometimes you do, and it limits how useful the docs are if they don't give it. What about a javascript toggle to flip between "all variables" and "explicit only"? "explicit only" could be the default.

Rob Lewis (Nov 20 2019 at 12:44):

I hate to suggest more javascript work for me to do, but it would look nice.

Rob Lewis (Nov 20 2019 at 12:45):

expr.is_arrow

Ah, right. I think I can do this easily.

Mario Carneiro (Nov 20 2019 at 12:47):

I know that this makes exact implicit order not visible, but I think that's a tolerable tradeoff. We already have methods like #print or hovers for getting this kind of info from a definition, and we're already not sharing some information that might be relevant, like the definition or proof itself

Mario Carneiro (Nov 20 2019 at 12:48):

If you want to have a JS toggle for this, I wouldn't mind it though

Mario Carneiro (Nov 20 2019 at 12:50):

For a much harder problem, I would really like to see some proof displays. I have gotten a lot of value out of the metamath web pages that give super detailed proofs of anything you could hope for, and it would be great if lean had anything half as good

Rob Lewis (Nov 20 2019 at 13:00):

I think the best kind of proof display we'll have in the near future is the current "click the declaration name to go to the source on GitHub." It doesn't seem helpful to display full proof terms. And adding any sort of interactivity to a displayed proof script seems... challenging.

Mario Carneiro (Nov 20 2019 at 13:02):

Like I said, it's not an easy problem. I can dream though

Mario Carneiro (Nov 20 2019 at 13:03):

I actually do want to see full proof terms, just not in the horrible way pp.all does it

Mario Carneiro (Nov 20 2019 at 13:04):

interactivity opens up a huge number of possibilities here, and the space is certainly not well explored

Rob Lewis (Nov 20 2019 at 13:11):

Fair enough. For a really good display I'm not sure even the proof term is enough. My dream interactive browser would note when a proof term was generated by a certain tactic. I agree it's not well explored, but it would take a lot of exploring to get to something really useful.

Mario Carneiro (Nov 20 2019 at 13:12):

You certainly need the types of subterms. There is probably a way to turn such information into something like an evolving tactic state

Mario Carneiro (Nov 20 2019 at 13:14):

you are right that recovering provenance data is hard to impossible and also pretty valuable (although not the whole story). I always felt cheated when I went through an isabelle tutorial and everything was by auto. I wasn't any closer to understanding why fact X is true, and the computer didn't know how to communicate why

Kevin Buzzard (Nov 20 2019 at 13:16):

Maybe the colors are more to Patrick's taste.

If they're not then you can tell him to fix it up himself ;-) They look fine to me! (but then again I guess you could question my tastes in colours...)

Patrick Massot (Nov 20 2019 at 13:32):

What is doable is to diplay the proof taken from the Lean file, with tactic state as it appears in VScode. This is what is done by format_lean.

Mario Carneiro (Nov 20 2019 at 13:34):

that doesn't solve the by auto problem though (or by tidy in our case)

Patrick Massot (Nov 20 2019 at 13:34):

Sure, but it would already be a huge improvement.

Mario Carneiro (Nov 20 2019 at 13:35):

true

Rob Lewis (Nov 20 2019 at 13:37):

you are right that recovering provenance data is hard to impossible and also pretty valuable (although not the whole story). I always felt cheated when I went through an isabelle tutorial and everything was by auto. I wasn't any closer to understanding why fact X is true, and the computer didn't know how to communicate why

The problem is that sometimes the answer to "why" is "by auto," and sometimes the answer is what auto did, so you you need the option to see both.

Reid Barton (Nov 20 2019 at 13:45):

This reminds me of Lamport's structured proofs https://lamport.azurewebsites.net/pubs/proof.pdf

Rob Lewis (Nov 20 2019 at 15:03):

Okay. It's updated to print implicit arguments below, and to prefer arrows in the type to named arrows when that makes sense. I still have all the data so it won't be hard to create a JS toggle to show the full type. The hardest thing there will be where to put the button.

Rob Lewis (Nov 22 2019 at 13:27):

@Gabriel Ebner pointed out a big benefit of making these docs public, even though they're in an early state: they make it much easier to notice doc strings that are missing, wrong, malformed, etc. So I've put them up at an "official" location. https://leanprover-community.github.io/mathlib_docs/

Rob Lewis (Nov 22 2019 at 13:27):

I don't know how well the search will work. We need to wait for Google to index the docs before it will work at all.

Rob Lewis (Nov 22 2019 at 13:32):

At some point we'll want to regenerate them daily. I think it's probably best to do this as part of the Travis build that updates lean-3.4.2, but maybe it's better to use @Scott Morrison 's server?

Sebastian Ullrich (Nov 22 2019 at 13:32):

I love that the first doc string I looked at was this one: https://github.com/leanprover-community/mathlib/blob/3f9c4d886493afe2dbaada17d25e486b1872e047/src/algebra/associated.lean#L13

Rob Lewis (Nov 22 2019 at 13:34):

There's, uh, a big range of doc string quality in the library right now...

Johan Commelin (Nov 22 2019 at 13:34):

Unfortunately we can't (yet) have a linter for that. The problem is NLP-complete...

Rob Lewis (Nov 22 2019 at 13:37):

On the other side of things from is unit are @Sebastien Gouezel 's files. https://leanprover-community.github.io/mathlib_docs/geometry/manifold/manifold.html

Johan Commelin (Nov 22 2019 at 13:48):

@Rob Lewis Here's a little comment: consider https://leanprover-community.github.io/mathlib_docs/geometry/manifold/smooth_manifold_with_corners.html
I think the stuff under main definitions is typeset as a list in the source code. Is there some markdownish reason why that didn't happen here?

Johan Commelin (Nov 22 2019 at 13:49):

I guess https://github.com/leanprover-community/mathlib/blob/3f9c4d886493afe2dbaada17d25e486b1872e047/src/geometry/manifold/smooth_manifold_with_corners.lean#L24-L36 is not interpreted as a list by markdown

Rob Lewis (Nov 22 2019 at 13:49):

Because it's not actually typeset as a list.

Rob Lewis (Nov 22 2019 at 13:49):

Each item should start with * or - or something.

Sebastien Gouezel (Nov 22 2019 at 13:50):

Yes, my mistake, I did not write it using proper markdown (as I was not aware at the time that real markdown would be relevant).

Rob Lewis (Nov 22 2019 at 13:50):

This is why I wanted to publicize the HTML docs. These things are impossible to see in the Lean files, but obvious when you see them in action.

Johan Commelin (Nov 22 2019 at 13:51):

What is the reason that times_cont_diff_groupoid is hyperlinked in that blob of text, but other names are not?

Rob Lewis (Nov 22 2019 at 13:51):

Because that's the full name. The other things have namespaces in front of them.

Johan Commelin (Nov 22 2019 at 13:52):

Ach so

Rob Lewis (Nov 22 2019 at 13:52):

I don't think there's any way around this, the name resolution would be ambiguous.

Rob Lewis (Nov 22 2019 at 13:53):

Actually -- in this case that might not be the issue, smooth_manifold_with_corners isn't namespaced.

Rob Lewis (Nov 22 2019 at 13:54):

Not sure why that's failing to link. I'll look into it soon.

Bryan Gin-ge Chen (Nov 23 2019 at 05:59):

With the power of Observable, I've written a tool that renders all the markdown docstrings in a Lean file on GitHub. You can also paste in the URL to a PR (or even just the number of a PR to mathlib) and all the docstrings in the changed Lean files will be displayed. I hope this will be useful for mathlib PR reviewers or contributors who want to check the markdown in their docstrings.

Note that I'm not able to show the declaration (with the name / type) corresponding to the docstring at the moment. Also, the tool relies on marked so some things may appear slightly different from mathlib_docs, which uses markdown2. Let me know if there's anything I can change to make it easier to use!

Wojciech Nawrocki (Dec 04 2019 at 21:47):

Another example: This presentation of Isabelle/ZF theories is quite nice.

Rob Lewis (Dec 17 2019 at 12:13):

The docs are updated, thanks largely to @Gabriel Ebner ! https://leanprover-community.github.io/mathlib_docs/geometry/manifold/manifold.html

New features include prettier styling, collapsible display of implicit arguments, linked notation, and lists of class instances (see the collapsible list at https://leanprover-community.github.io/mathlib_docs/algebra/module.html#module).

Oliver Nash (Dec 17 2019 at 12:56):

IMHO, this is absolutely awesome. Clicking around a bit I wonder if it would be worth styling the links to GH source differently from the links to elsewhere in the docs.

Oliver Nash (Dec 17 2019 at 12:57):

Anyway, great work.

Rob Lewis (Dec 20 2019 at 16:11):

An early Christmas present for you all: I've added some of the mathlib doc files (#1815) to the doc display. https://leanprover-community.github.io/mathlib_docs/tactic_writing.html

Rob Lewis (Dec 20 2019 at 16:11):

This is work in progress, the menus on the left aren't quite right yet.

Rob Lewis (Dec 20 2019 at 16:11):

And, of course, the doc files themselves need updating.

Rob Lewis (Dec 20 2019 at 16:12):

But there won't be much progress here for the next few weeks, so it's up for now!

Kevin Buzzard (Dec 20 2019 at 16:44):

This is so cool. Our library is growing up :D

Patrick Massot (Dec 27 2019 at 22:30):

I have a small contribution to mathlib docs website effort. I tried to fix CSS so I can use it on my phone. I modified one sample page and put it at https://www.math.u-psud.fr/~pmassot/test/. Please test on various screen sizes (but beware that clicking any link will bring you to the current website). @Rob Lewis you can diff the html and CSS against the current website, changing the documentation generator should be extremely easy. This is minimal modification that, on screens narrower than 1000px, removes the left side bar and collapse the right one with a menu button on a sticky header.

Rob Lewis (Dec 27 2019 at 23:07):

Thanks! I don't have time now, but I'll play around with it. The fixed "Lean mathlib docs" header looks a little funny on a wide screen, especially how it pushes the scroll bars and search windows down. It'll take some fiddling.

Patrick Massot (Dec 28 2019 at 00:47):

This is very easy to fix. I just pushed a version which hides that header on wide screens. I guess we could instead give it a different background color, or put more stuff in it.

Rob Lewis (Jan 29 2020 at 20:21):

#1927

Rob Lewis (Jan 29 2020 at 20:22):

Patrick, I haven't forgotten about your contribution. But the mobile friendly version still looks off on my non-mobile screens and I haven't found the energy to dive back into the CSS yet.

Patrick Massot (Jan 29 2020 at 22:30):

I hope I'll be able to resume work on this next week (not before Wednesday though). Your recent work should make it easier for me to try things, but it would help to tell me more precisely what "looks off".

Last updated: May 02 2025 at 03:31 UTC