Zulip Chat Archive

Stream: general

Topic: RFC: Explaining folder content


Yaël Dillies (Oct 18 2023 at 06:24):

I find myself wanting to explain what the goal of a folder is, and how to decide whether a new file belongs to some folder X rather than some closely related folder Y, or whether a new lemma belongs to some file A instead of some closely related file B in the same folder.

Johan Commelin (Oct 18 2023 at 06:26):

I think we could have README.mds for each top folder. Of course they do come with a maintenance obligation.

Yaël Dillies (Oct 18 2023 at 06:26):

Johan, don't spoil my suggestion: :grinning:

Yaël Dillies (Oct 18 2023 at 06:27):

This would be addressable by adding a README.md file to each folder (/each folder we care enough about). And that file could be used as a landing page for its folder, as @Alex Altair suggested here.

Yaël Dillies (Oct 18 2023 at 06:31):

I specifically want to explain non-top folders as well. The folders I'm thinking of are typically LinearAlgebra.AffineSpace vs Analysis.Convex. In that case, I'm actually planning on creating a third folder Geometry.Convex and develop the theory of convex spaces axiomatically there, until my refactor is far enough along that this new folder can act as a drop-in replacement for most of Analysis.Convex (where the complement of "most" is the theory of real and complex convex spaces).

Yaël Dillies (Oct 18 2023 at 06:34):

Another use case would be for me to expose the results in Combinatorics.SetFamily or Combinatorics.Additive in a semi-coherent fashion. I could list the important results that we have, how to use them, who to contact if people have question, and what future results people have been working and where to find that work.

Yaël Dillies (Oct 18 2023 at 06:36):

If people are happy with my suggestion, I can just start adding a README.md file to the aforementioned folders and a few more, and we can worry about how to get doc-gen to display them later.

Antoine Chambert-Loir (Oct 18 2023 at 07:27):

I would suggest going further.
Each file could be accompanied with a .tex file that explains in vernacular language what is done in the .lean file, details the definitions, the way they are supposed to be used (the “API”).
I would even dare dreaming of everything stemming out a single file that is automatically processed into the .lean and .tex files.

Vincent Beffara (Oct 18 2023 at 08:01):

That's more or less what doc-gen is doing right now isn't it, so you are actually dreaming of the .lean file itself? (With markdown instead of tex but that is probably enough.) Or do you envision something more like the .dtx format? It seems to me that this would add a lot of burden while providing little additional value. But I might be wrong about that.

One markdown file per folder makes sense for a more general overview!

Antoine Chambert-Loir (Oct 18 2023 at 08:30):

The .lean file does not contain enough information. There are cases where it explains the rationale behind the API, I'm thinking of docs#Set.Nonempty for example, but it is really tough to read. And it rarely says anything about the proofs (which are the object of interest here, I mean, we not only care that a theorem is proved, we also are interested in how we prove it).
The .dtx format is a bit hard to write, but VSCode/Emacs extensions are there to help.

Scott Morrison (Oct 18 2023 at 08:44):

@David Thrane Christiansen will be pleased to hear that you're all thinking about this, and is coming up with solutions for you!

Vincent Beffara (Oct 18 2023 at 08:44):

How about splitting this huge file Mathlib/Data/Set/Basic.lean which is 3k lines long into smaller .lean files, with a comment like you want inside a /-! ... -/ comment on top of that?

Scott Morrison (Oct 18 2023 at 08:45):

Yes, please do!

Any PR that splits a file over 1000 lines is highly welcome, and I'm happy to be pinged to merge quickly. :-)

Vincent Beffara (Oct 18 2023 at 08:49):

(I was not volunteering to do that :sweat_smile: just suggesting that literate programming may be less useful if the source files are small.)

Utensil Song (Oct 18 2023 at 09:07):

Does this mean every directory would get a doc-gen4 page? Now it's not possible to click on e.g. Mathlib.LinearAlgebra to learn what it is about, and its Basic.lean isn't about it as a whole.

David Thrane Christiansen (Oct 18 2023 at 09:11):

Thanks for the CC, @Scott Morrison . I'm currently at the design stage of a documentation tool for Lean that allows the structure of the documentation to be different from the structure of the code when that's what makes sense, extensible in Lean itself. It will cooperate with doc-gen to allow the inclusion of API documentation in a larger, structured document. I'll watch threads like this for inspiration!

Yaël Dillies (Oct 18 2023 at 09:12):

Does that cover my "one readme per folder" idea?

Utensil Song (Oct 18 2023 at 09:32):

In the demo of the new tool, it seems that one readme per folder would be written in Lean DSL, but I guess this topic is about the convention of having such a readme (in whatever format), a meta question.

If the new documentation tool also introduce a convention like this, then it's partially covered to the extent that such a readme lean file should exist (EDIT: and its name). But it's still up to the Mathlib community to discuss what to include in this readme as a convention for contents.

Yaël Dillies (Oct 18 2023 at 10:05):

Okay, I will open a PR to mathlib with a few example readmes.

Henrik Böving (Oct 18 2023 at 11:26):

Yaël Dillies said:

Does that cover my "one readme per folder" idea?

In the lean compiler we have one file per folder in the folder above that imports all of the files from below. A module file if you wish. If you just do that and put your documentation in there in a doc string it will just work with doc-gen as well.

Vincent Beffara (Oct 18 2023 at 11:48):

This is useful to describe API. IIUC, what @Antoine Chambert-Loir wants, and I believe that most mathematicians would agree, is the ability to add exegesis along proofs when needed, so as close to the code as possible, and a way to export a readable commented version. We don't after all believe that proofs are irrelevant :wink: but this is kind of orthogonal to API documentation.

Yaël Dillies (Oct 18 2023 at 12:02):

Henrik Böving said:

Yaël Dillies said:

Does that cover my "one readme per folder" idea?

In the lean compiler we have one file per folder in the folder above that imports all of the files from below. A module file if you wish. If you just do that and put your documentation in there in a doc string it will just work with doc-gen as well.

I really dislike dumping information about folder X outside X, though. I would much prefer X/README.md or X/README.lean over X.lean.

Eric Wieser (Oct 18 2023 at 12:04):

@Yaël Dillies, I think that ship sailed with the removal of default.lean (though there is a thread asking to bring it back)

Utensil Song (Oct 18 2023 at 12:04):

There is another sad thing about writing docstring in Lean, no comments are allowed before imports, so one can't add docstring to each import in that lean file for that directory.

/-!
-/
import Mathlib.Tactic

gives invalid 'import' command, it must be used in the beginning of the file.

Eric Wieser (Oct 18 2023 at 12:06):

In the lean compiler we have one file per folder in the folder above that imports all of the files from below.

This strategy doesn't work in mathlib; often two parallel directories import each other in sequence; Foo/B and Bar/B import both of Foo/A and Bar/A, Foo/C and Bar/C import both of Foo/B and Bar/B, etc

Yaël Dillies (Oct 18 2023 at 13:25):

The point of folders is also to make the working tree cleaner. Adding a file outside the corresponding folder works against that, by having two objects corresponding to one semantic entity.

Eric Wieser (Oct 18 2023 at 13:38):

I don't disagree, but I think I think that's more suited to an RFC adding back default.lean or something analogous

Mac Malone (Oct 18 2023 at 17:35):

Utensil Song said:

There is another sad thing about writing docstring in Lean, no comments are allowed before imports

Comments are allowed (e.g., /- ... -/ and -- ... work). However, docstrings and module documentation (e.g., /-- ... -/ and /-! ... -/) do not. This is because the later are first-class syntax in Lean whereas the former are ignored by the lexer.

Vincent Beffara (Oct 18 2023 at 18:49):

Is there a way to attach docstrings to parameters and assumptions of functions (kind of like it is possible for fields in a structure)? That would be useful to explain why an assumption is necessary, or what each parameter means.

Alex J. Best (Oct 18 2023 at 19:26):

I guess we would just use the function docstring for this, looking at https://github.com/leanprover/lean4/blob/5e0f6049c002e04e3593f1a08c96488bc6f8c49a/src/Lean/Parser/Term.lean#L257 there is no place for a docstring there, so I don't think there is an existing mechanism to add an actual docstring (compared to https://github.com/leanprover/lean4/blob/5e0f6049c002e04e3593f1a08c96488bc6f8c49a/src/Lean/Parser/Command.lean#L178 where we have an explicit declModifiers)

Mario Carneiro (Oct 18 2023 at 19:36):

Even if we could put doc comments on parameters:

  1. That would be really awkward to format
  2. I'm not sure how we would surface that documentation other than function hovers as we currently do

Eric Wieser (Oct 18 2023 at 19:57):

The documentation could be surfaced as hover text for param on (param := _)

Antoine Chambert-Loir (Oct 18 2023 at 20:39):

Would it be possible to have something of the kind that exists for Coq / Agda, where the html pages show the code in a more human-friendly way than what is given here? I'm thinking of Martin Escardo's Type Topology archive, for example.
https://github.com/martinescardo/TypeTopology

Patrick Massot (Oct 18 2023 at 21:42):

Yes, this is what LeanInk is about. Let's say this is work in progress.

Utensil Song (Oct 19 2023 at 01:05):

Mac Malone said:

Utensil Song said:

There is another sad thing about writing docstring in Lean, no comments are allowed before imports

Comments are allowed (e.g., /- ... -/ and -- ... work). However, docstrings and module documentation (e.g., /-- ... -/ and /-! ... -/) do not. This is because the later are first-class syntax in Lean whereas the former are ignored by the lexer.

Yes, but only docstrings and module documentation are rendered by Alectryon and doc-gen4.

Mario Carneiro (Oct 19 2023 at 01:45):

Why do module docs have to go before imports?

Utensil Song (Oct 19 2023 at 01:50):

Initially it was for literate programming, there could be opening paragraphs before the first import. In the context of this topic, maybe in the doc lean file for the directory, each import is explained before the import, or there is also an opening before all imports.

This was a surprise to me, compared to other languages, where import can happen anywhere in the file even inside a function (e.g. for Rust and Python).

Utensil Song (Oct 19 2023 at 01:59):

For example, lean4-metaprogramming-book has opening paragraphs in /- -/ before imports but it was rendered with lean2md instead of LeanInk/Doc-gen4 and I need type and goal annotations generated by LeanInk, so I can't really use lean2md.

This may seem possible to relax from the doc tool side, but relaxing it will pollute the doc with comments unintended for doc render(e.g. for Mathlib and doc-gen4), so there's the dilemma.

Utensil Song (Oct 19 2023 at 02:08):

image.png

Interesting, in the doc tool DSL demo, there's even no import but @import.

Mario Carneiro (Oct 19 2023 at 02:10):

I'm guessing this is in order to make keywords more distinct since the substrate is plain text / markdown

Mario Carneiro (Oct 19 2023 at 02:13):

This was a surprise to me, compared to other languages, where import can happen anywhere in the file even inside a function (e.g. for Rust and Python).

This is a complete mis-feature. (In C you can even #include in the middle of an array literal, and I have seen people do this.) Rust does not allow imports to happen anywhere in the file, imports are like extern crate foo; and they only happen at the top level. You are probably thinking of use declarations and these are more like lean's open command.

Mario Carneiro (Oct 19 2023 at 02:16):

The main reason you shouldn't be able to put imports anywhere in the (lean) file is because imports are not only parsed by lean but also directly by lake (which in particular means it needs to reimplement some of the comment and name lexing stuff just to read that part of the file) so that it knows what other files are required to make this one compile. If imports could appear anywhere you would get caught in a circularity since you need lean to parse the file but lean needs the imports to parse the file and lake needs to know what to compile first before this one...

Mario Carneiro (Oct 19 2023 at 02:17):

I think we would end up needing an external file dependency specification

Mario Carneiro (Oct 19 2023 at 02:26):

Mario Carneiro said:

Rust does not allow imports to happen anywhere in the file, imports are like extern crate foo; and they only happen at the top level. You are probably thinking of use declarations and these are more like lean's open command.

Actually that's not quite true, mod is also kind of like an import statement (it determines how files are located in the module graph) and those can actually go anywhere. This has been regarded as a mistake however, because it is hostile to IDEs that want to know the module graph up-front without parsing the whole project. This problem would be much worse for lean because "parsing the whole project" is already equivalent to a full build.

Mac Malone (Oct 19 2023 at 02:39):

Mario Carneiro said:

The main reason you shouldn't be able to put imports anywhere in the (lean) file is because imports are not only parsed by lean but also directly by lake

While a reasonable inference, this is not actually the main reason. Lake would very much like to just elaborate the file itself (to easily solve things like lake#86). The reason the imports are pre-parsed and fixed at the top is due to restrictions of the Lean Environment. For example, the environment does not support multi-threading and parts of the environment extension are optimized in such away that it is nontrivial to extend it further later.

Mac Malone (Oct 19 2023 at 02:41):

Mario Carneiro said:

This was a surprise to me, compared to other languages, where import can happen anywhere in the file even inside a function

This is a complete mis-feature.

I disagree with calling this a "mis-feature". Lazy and asynchronous imports are very important performance optimizations in Ruby, Python, and JavaScript, and I have no doubt they would be quite useful in Lean. However, it is not clear the trade-offs required to enable such a feature are worth it for Lean.

Utensil Song (Oct 19 2023 at 03:14):

Yes, lazy and asynchronous imports (plus conditional and optional imports, which, btw, is also available in Julia, to provide optional extensions if the certain libraries are present) are use cases made use of allowing imports everywhere in these languages, but most of them are too dynamic to follow, so Lean could have a less strict but not totally relaxed rule about this, like allowing limited grammar structures happen before import(e.g. meta if?), but maybe it would be more clear when a valid use case surfaces and becomes blocked by it. And of course this would be a nitpick if only for literate programming.

Kevin Buzzard (Oct 19 2023 at 07:26):

Wait, there is surely a difference between "I want to be able to put comments at the very top of a file so a documentation system can see them" and "I want to be able to import stuff in the middle of a proof". One seems infinitely easier to implement than the other, for example.

Mario Carneiro (Oct 19 2023 at 07:30):

Note that lean is currently already capable of taking additional dependencies on other files during elaboration (e.g. include_str). It just isn't tracked by lake

Utensil Song (Oct 19 2023 at 07:40):

Kevin Buzzard said:

Wait, there is surely a difference between "I want to be able to put comments at the very top of a file so a documentation system can see them" and "I want to be able to import stuff in the middle of a proof". One seems infinitely easier to implement than the other, for example.

Yes, I wonder if there's any reason that /-! -/ or /-- -/ can't be placed before imports, and if it's not the preferred way, then which way is better to implement "be able to put comments at the very top of a file so a documentation system can see them" in Lean.

Mario Carneiro (Oct 19 2023 at 07:41):

the documentation system can see /- comments if it chooses to interpret them

Mario Carneiro (Oct 19 2023 at 07:43):

We had to implement /- comment parsing for mathport because the copyright header is exactly this - it is a regular comment placed before the imports

Utensil Song (Oct 19 2023 at 08:07):

The dilemma mentioned above is exactly if doc-gen4 chooses to see and render /- -/or even -- then lots of comments unintended for doc would also be rendered and polluted the doc.

Mario Carneiro (Oct 19 2023 at 08:09):

is this about doc-gen though? I thought it was about some literate lean processor

Mario Carneiro (Oct 19 2023 at 08:09):

I would indeed expect doc-gen to ignore /- comments but that's a different application

Utensil Song (Oct 19 2023 at 08:12):

The thing is there should be a consistent standard to treat them, or there would be fragmentation of doc tools, like the difference between lean2md and doc-gen4, one relies on /- -/ heavily and the other relies on ignoring it completely.

Utensil Song (Oct 19 2023 at 12:28):

Interesting note from lean2md:

Important: since lean2md is so simple, please avoid using comment sections in Lean code blocks with /- ... -/. If you want to insert commentaries, do so with double dashes --.

This isn't practical in real code, people would need internal multi-line comments with /- -/, while /-! -/ and /-- -/ are for docs that are expected to be rendered.

Damiano Testa (Oct 19 2023 at 20:25):

I think that lean2md was just a simple-minded program that was intended for being used only for the metaprogramming book. And even then, my impression is that the metaprogramming book outgrew it: I do not think that lean2md should be seriously considered outside of its initial scope.

Eric Wieser (Oct 19 2023 at 20:50):

Especially in light of the new tools David is building


Last updated: Dec 20 2023 at 11:08 UTC