Zulip Chat Archive

Stream: general

Topic: leandoc


Johan Commelin (Apr 30 2019 at 12:34):

Now that the perfectoid project compiles, we want to start beautifying it. Parts will probably displayed with Patricks lean-formatter. However, we would also like to generate hyperlinked and syntax-highlighted docstringified webpages from the source code. This should be done by a tool called leandoc that doesn't yet seem to exist.
I invite everyone to point to their secret repos where they have started working on this tool.
Alternatively you may also write a braindump of all the bugs and features that you would like to have in future leandoc.
Let's see how fast we can throw something together!

Kevin Buzzard (Apr 30 2019 at 13:01):

There is no book which defines a perfectoid space. All we have is research articles and random pdfs written by graduate students and post-docs who were typically preparing for a talk they had to give on perfectoid spaces. It would be great to turn the Lean definition into a useful resource. I can write the LaTeX around the definition (as could Johan) but some sort of cool way of jumping around between definitions would be brilliant.

Koundinya Vajjha (Apr 30 2019 at 18:21):

Here was something I had in mind: with the new Lean 4 parser, it should be theoretically possible to extend the tactic monad with state to capture the tactics being invoked (the tactics are erased after they are parsed in Lean 3).
This would look like a tactic class begin [foo] ... end, which would parse tactics as they are written and docstrings as well to produce Patrick's webpage-like output.
I don't know to what extent this is possible but.

Patrick Massot (Apr 30 2019 at 18:24):

I hope all this will be much easier with Lean 4. But we can already ask Lean information about each and every position in the file.

Patrick Massot (Apr 30 2019 at 18:25):

It's only a matter of patience to write a version of format_lean that display information about everything

Patrick Massot (Apr 30 2019 at 18:28):

In https://github.com/leanprover-community/format_lean/blob/master/src/format_lean/objects.py#L369-L372 you can see python asking for the tactic state at start and end of each proof line. You can replace that with a loop on each character in the line, recording everything (with shared information when the state is constant) and add request for definitions

Simon Hudon (Apr 30 2019 at 18:29):

For such a leandoc tool, do you need to know about variable construct? If not, you might already be able to render it from olean files

Simon Hudon (Apr 30 2019 at 18:29):

Oh nice! I didn't know you were already this far ahead

Patrick Massot (Apr 30 2019 at 18:30):

Did you look at my lectures notes?

Simon Hudon (Apr 30 2019 at 18:30):

No

Patrick Massot (Apr 30 2019 at 18:30):

https://www.math.u-psud.fr/~pmassot/enseignement/math114/cours4.html

Patrick Massot (Apr 30 2019 at 18:30):

It does display tactic state

Simon Hudon (Apr 30 2019 at 18:32):

That's so cool!

Moses Schönfinkel (Apr 30 2019 at 22:21):

Do you just want to parse comment-enhanced leandoc specific regions and "attach" these to their closest Lean construct? Or do you actually want to parse Lean as well to attach properly scoped variables, constants, namespaces, etc? Or do you want to parse leandoc specific comment regions with combination of querying Lean server for additional information about the construct the region in question pertains to (I have no clue what information is available this way) - this could dodge quite a bit of scope/dependency resolution and parsing. Is it still the case that Lean 3 grammar is parser-specified and there's no EBNF - or was I dreaming this?

Johan Commelin (Apr 30 2019 at 22:24):

We want to emulate the experience you have in VScode.

Kevin Buzzard (Apr 30 2019 at 22:29):

But with added LaTeX

Moses Schönfinkel (Apr 30 2019 at 22:29):

Eschew leandoc-specific annotations altogether?

Simon Hudon (Apr 30 2019 at 22:30):

Kevin, LaTeX might have to wait. If we can get it to work in HTML, we can give LaTeX a go but it's truly no fun to work with so a first prototype will have to omit it

Kevin Buzzard (Apr 30 2019 at 23:02):

aah, the realities of real life.

matt rice (May 01 2019 at 16:58):

fwiw i have a start on this, which currently just dumps text out... I will try and get into some format, and uploaded

Johan Commelin (May 01 2019 at 16:59):

Be aware of Patrick's leancrawler...

Johan Commelin (May 01 2019 at 17:00):

But, yes, I'm excited to hear this, and I'll cheer you on. (And if I can find some time I'll also contribute... but my todo list is growing exponentially the last 3 weeks...)

matt rice (May 01 2019 at 17:02):

I was using Mario's olean-rs, because it makes it simple to do so

Simon Hudon (May 01 2019 at 17:18):

Be aware of my fork. I've added some features

Mario Carneiro (May 01 2019 at 17:34):

maybe make a PR?

Simon Hudon (May 01 2019 at 17:36):

Done

Patrick Massot (May 01 2019 at 17:37):

I'm really puzzled. How do you use olean-rs, appart from listing imports?

Patrick Massot (May 01 2019 at 17:37):

Do you write rust code, or can you use the dump output in any way?

Simon Hudon (May 01 2019 at 17:39):

I write Rust code

Patrick Massot (May 01 2019 at 17:42):

Ok. I can see this tool was designed from the bery beginning to make sure mathematicians stay where they belong, and don't interfere with software development.

matt rice (May 01 2019 at 17:42):

Same, I added a library thing to my fork here so i just replace the main.rs file with documentation specific rust code, (will PR when ready)

Simon Hudon (May 01 2019 at 17:43):

@Patrick Massot That is indeed the first goal of the project

Simon Hudon (May 01 2019 at 17:48):

@matt rice @Mario Carneiro Maybe we should consider splitting the project into one library and multiple executables

matt rice (May 01 2019 at 17:52):

@Simon Hudon yeah, thats done here afaict all i needed to do was add lib.rs which adds the pub mod's for various things along side main.rs.

Simon Hudon (May 01 2019 at 17:53):

Nice! :+1:

Mario Carneiro (May 02 2019 at 02:01):

Ok. I can see this tool was designed from the very beginning to make sure mathematicians stay where they belong, and don't interfere with software development.

I should hope not. It was written in a sensible language with a strong type system, in as clear a way as I could manage, to document the olean file format and provide a path for further analysis. I'm glad that Matt and Simon are extending it - that's exactly what it was intended for. @Patrick Massot Is there something I can do to make this easier? I had to pick some language to write it in, and modern C++ is a mess, Lean 3 is far too slow for this to be practical (but I also wrote it in lean anyway). I am guessing you would prefer Python? That can be done, although the type of the core Modifications type in olean-rs is much of the documentation and that wouldn't transfer so well to Python.

Simon Hudon (May 02 2019 at 02:08):

I wouldn't want to work with Python for this. Rust is not exactly easy to learn but I think the type system helps you do a good job. C++ would make things more difficult than necessary. I think OCaml and Haskell could have been good choices too.

Patrick Massot (May 02 2019 at 08:47):

I can believe the heavy lifting require a heavy language. But if you want regular people to use it then you need binders to a human-oriented language

Mario Carneiro (May 02 2019 at 09:05):

you are using a lot of weasel words. What's a "human oriented language"? Almost all programming languages are human oriented

Mario Carneiro (May 02 2019 at 09:08):

again, you haven't said Python explicitly but I'm assuming that's your preference based on your previous projects. I don't think python is very good for large projects, it's too scripty

Patrick Massot (May 02 2019 at 09:15):

Isn't scripty a weasel word?

Kevin Buzzard (May 02 2019 at 09:15):

I wonder if there are two different things here. Firstly it would be great to be able to look at Lean code on e.g. my phone's web browser, without having to constantly wonder what the definition of X is -- if I could just click on X instead and then see and then go back, that would be a great step forward for me. Secondly, it would be really interesting -- not just for this community here but broadly for mathematicians, even those who know nothing about Lean, to have some sort of coherent interactive way of exploring what a perfectoid space is. This would be completely new for mathematicians, and some are bound to be interested. These two projects seem related, but don't sound the same to me.

Patrick Massot (May 02 2019 at 09:16):

The fact that python isn't good for large projects is simply refuted by facts

Patrick Massot (May 02 2019 at 09:16):

And saying that olean-rs is a large project is...

Johan Commelin (May 02 2019 at 11:41):

Python is an excellent choice for interactively playing with a bunch of data. I can imagine losing a lot of time while playing with some Jupyter notebook inspecting all sorts of data that we extract from mathlib.

Patrick Massot (May 02 2019 at 12:02):

Johan, you're pointlessly damaging your keyboard. They are the aristocracy of programming. You could as well advertise numerical analysis to Langlands program people...

matt rice (May 02 2019 at 12:03):

In this regard one thing I suppose I was excited about yesterday, is a rust library for persistent data structures, https://en.wikipedia.org/wiki/Persistent_data_structure (I haven't looked at Mario's Trie implementation yet to see if it is persistent), but in theory you could go through the git history of something like the perfectoid project and inspect the data, and how it changes over time... That is actually probably painful though with the way i've built it since it would mean compiling each commit... But it would be immediately useful in that something like mathlib you could have the tree for file x, and file y, and all the data would be shared with that of the tree for all of mathlib.

Johan Commelin (May 02 2019 at 12:09):

@Patrick Massot Something like https://press.princeton.edu/titles/9491.html?

Patrick Massot (May 02 2019 at 12:12):

Johan, those people are traitors, like Kevin or William Stein

Patrick Massot (May 02 2019 at 12:13):

anyway...

matt rice (May 02 2019 at 12:13):

Anyhow, I hope that perhaps https://docs.rs/inline-python/0.2.0/inline_python/ may help, the primary reason I did it this way was because I knew it would be a) simple to implement b) simple to install, a) was important since it is highly likely lean 4 puts a clock on it's self life.

Simon Hudon (May 02 2019 at 13:27):

@Patrick Massot Feel free to have those tantrums in private. Nobody is obliged to choose Python to pretend like your only choice of language is a good one in every circumstances. If you don't want to learn a new language don't contribute. We'll live.

Johan Commelin (May 02 2019 at 13:36):

@matt rice do you already have a repo that we can watch/follow/stargaze?

matt rice (May 02 2019 at 13:41):

not yet, will work on that now, even though it has some issues... I was mostly waiting until I got command line arguments stablized so it wasn't changing on people, but i'll probably doing that last.

Patrick Massot (May 02 2019 at 14:05):

Simon, I'm not 100% sure what you mean because I had to look up the word "tantrum". But I'm not angry. We mathematicians are experts at snobbery, we don't have any problem with that. I also have no problem with the idea to learn a new computer language: Lean is certainly very different from everything I knew before (even including my OCAML lectures 20 years ago). In this particular case I don't see any reason to learn Rust which is marketed as a "highly concurrent and memory safe language" since, up to now, I was never faced with any concurrency or memory management issue. But I'd very happy to use anything you or Mario will build using this language. Simply I won't be able to help in any way.

Patrick Massot (May 02 2019 at 14:10):

By the way, I don't mean you need any help from me or anyone else.

matt rice (May 02 2019 at 14:29):

@Johan Commelin https://github.com/ratmice/lumpy-leandoc theres 2 branches the command line & a TeX one w/ issues

Johan Commelin (May 02 2019 at 14:34):

Ooh, don't worry about TeX output

Johan Commelin (May 02 2019 at 14:35):

I think what Kevin really meant, when he was talking about LaTeX, was that he wanted to use TeX in the comments, and have MathJaX render it in the generated html files. But maybe I misunderstood him.

Johan Commelin (May 02 2019 at 14:35):

@matt rice Thanks for the efforts so far!

Johan Commelin (May 02 2019 at 14:36):

I would guess that it is easier to generate HTML than to generate TeX, right?

matt rice (May 02 2019 at 14:39):

Yeah, I just really like/prefer tex output, and generally know nothing about html stuff , the TeX output is almost there except some long function names, and the blasted missing unicode symbols. But yes, HTML is probably easier for someone who knows anything about it.

Johan Commelin (May 02 2019 at 14:40):

I certainly agree that the TeX output will look better from a typographic point of view. But it doesn't allow for too much interaction. That's why I prefer html output in this case. But that shouldn't stop you from generating beautiful TeX

matt rice (May 05 2019 at 19:45):

I think what Kevin really meant, when he was talking about LaTeX, was that he wanted to use TeX in the comments, and have MathJaX render it in the generated html files. But maybe I misunderstood him.

So perhaps beneath mention, but in regards to this, I'm at the point where I need to institute some format for this, e.g. I need to know which parts of the comment (or in whole) need to be TeX escaped, and which should not be, my thoughts were to control this by adding a docstring_format key to leanpkg.toml and optionally some magic string in the comment itself, to allow for competing ideas and/or future flexibility. I haven't tried yet, but it looks like leanpkg itself will just ignore this.

matt rice (May 05 2019 at 19:58):

Anyhow, after that i'll try and run it through https://dlmf.nist.gov/LaTeXML/ which seems to use MathML rather than MathJax the results on https://dlmf.nist.gov/ look pretty good to me

Patrick Massot (May 05 2019 at 20:19):

Noooo!

Patrick Massot (May 05 2019 at 20:20):

Going from Lean to HTML through LaTeX seems completely masochistic, especially if you add perl on top of that. And MathML is dead, after a very long battle to get born.

Patrick Massot (May 05 2019 at 20:21):

Could you explain again why you want to go to LaTeX first?

matt rice (May 05 2019 at 20:54):

@Patrick Massot I was just looking at that and yeah, perl is pretty terrible, I figured that as far as the comment format stuff It should in theory need much the same for extracting MathJax. When working in groups it is often nice to upgrade in sync, and referencing documentation on the web generally reflects the most current version, it's nice as well when I have my tablet and no Wifi...

Patrick Massot (May 05 2019 at 20:56):

What's the problem you see in the approach I use in format_lean?

matt rice (May 05 2019 at 20:58):

Nothing in particular, I had hoped to show that it could be integrated with the lean supported doc-strings, for e.g. REPL/VSCode integration and the like eventually, So the lack of lean's awareness of it I would say is the issue.

Patrick Massot (May 05 2019 at 20:59):

What you describe sounds like a Lean 4 target

Patrick Massot (May 05 2019 at 20:59):

All this should be much easier with the Lean 4 parser

Patrick Massot (May 05 2019 at 21:00):

because we will have much more access to information

matt rice (May 05 2019 at 21:03):

Right, which was why i was trying to do it as simply as possible, I imagine we need a doc generator for lean 3 because I imagine mathlib is going to be stuck on it for a while... I also imagine lean 4 is going to use the doc_strings format, and so it'd be nice to get something formatting these before we go on a big effort towards documenting things so we can format them nicely as we write.

Patrick Massot (May 05 2019 at 21:04):

Note that many docstrings are already formatted using markdown

Patrick Massot (May 05 2019 at 21:05):

Markdown+TeX is a really sweet combination as long as you don't require a fancy layout

Patrick Massot (May 05 2019 at 21:06):

and it has the advantage that you get many Markdown to HTML formater that can be chained with Mathjax to handle math mode

Patrick Massot (May 05 2019 at 21:07):

Also note that, although it's not much used, you can do author-side mathjax and ship pure HTML(+SVG) if you don't like waiting for mathjax to render a webpage

matt rice (May 05 2019 at 21:07):

Right, It's these that i'm having trouble with, in particular the unicode symbols outside of math-mode support is terrible

Patrick Massot (May 05 2019 at 21:08):

support by what? LaTeX?

matt rice (May 05 2019 at 21:08):

yeah.

Patrick Massot (May 05 2019 at 21:09):

Yes, even modern TeX compilers like xelatex or lualatex have trouble with that

Patrick Massot (May 05 2019 at 21:09):

TeX is simply too old :sad:

matt rice (May 05 2019 at 21:28):

Yeah, one of the issues i'm having is I built a big character fallback list, to redirect to math fonts, but then TeX tries to be helpful when it noticed specific characters that look like an equation so it tries to enter math mode, but then can't figure out where to stop it.

matt rice (May 05 2019 at 21:29):

Anyhow, thanks for the pushback on LaTeXML, i don't really need to be jumping down unwanted rabbit holes...

matt rice (Jun 14 2019 at 03:54):

So, I've done an initial release, lumpy-leandoc
Example output can be seen here

I haven't tried it yet on windows yet expect to do that soon.
still things that need doing, but somewhat presentable.


Last updated: Dec 20 2023 at 11:08 UTC