## Stream: general

### Topic: documentation requirements

#### Rob Lewis (Jul 15 2019 at 09:43):

We had a few discussions last week about documentation, resulting in this PR: https://github.com/leanprover-community/mathlib/pull/1229

It has a proposal for required documentation for all future additions to mathlib. This is a first draft, and we're very interested to get feedback here!

#### Johan Commelin (Jul 22 2019 at 15:18):

I tried doing this for all public-facing lemmas in my recent power series PR. I got bored very quickly. Many stupid silly lemmas didn't seem to have a nice description apart from repeating their name.

#### Johan Commelin (Jul 22 2019 at 15:18):

I don't think a docstring would offer more than what the hover-text shows by default: the type of the term.

#### Johan Commelin (Jul 22 2019 at 15:19):

So I am leaning towards "doc strings on major theorems" and there is no clear definition of what is a major theorem, but "we know it when we see it".

#### Rob Lewis (Jul 22 2019 at 15:20):

On the other hand, if and when there's some kind of doc generation tool, having natural language statements of these lemmas would be helpful.

#### Johan Commelin (Jul 22 2019 at 15:21):

For the record, these were things like coeff_zero and coeff_add etc...

#### Kevin Buzzard (Jul 22 2019 at 15:32):

There is still surely an argument which says that newcomers would definitely prefer a line of English to having to read the Lean. I think it's easy to forget how intimidating the style is for beginners.

#### Wojciech Nawrocki (Jul 22 2019 at 16:34):

This is bordering on CNLs, but perhaps in the doc generation some documentation could be generated directly from Lean theorem names provided they stick to the conventions like prop2_of_prop1, add_comm etc.

#### Scott Morrison (Jul 22 2019 at 21:15):

I'm not sure how to address this, but for me a big problem with the readability of Lean files is that everything has to be in logical order, unlike in informal mathematics. In any decently written paper or book, the introduction states the major theorem(s). Then in each section, some text addresses the purpose of the section. Proofs are often longer, and incorporate many "Lean lemmas" directly into the body of the proof, so that they don't have to be state ahead of time, without clear motivation. And finally very often proofs of highly technical facts are deferred to appendixes (or even "forthcoming" other papers).

I think it would be great if we could aspire to better section-level commenting. (And more use of section, especially using a name.) Or, for that matter, moving in the direction of splitting developments in more, smaller files.

Awesome (but requiring lots of language support), would be something like a "deferred" keyword:

theorem my_awesome_theorem : 196884 = 1 + 196883 := deferred

... lots of lemmas and definitions here ...

proof of my_awesome_theorem :=
begin
...
end


#### Mario Carneiro (Jul 22 2019 at 21:23):

The obvious solution is to not read lean files like you would a book

#### Mario Carneiro (Jul 22 2019 at 21:24):

You first read my_awesome_theorem because that's what you care about, then when you see an interesting lemma that seems to do most of the work you follow that, and so on until you believe the result

#### Mario Carneiro (Jul 22 2019 at 21:27):

Yes, this puts more power and responsibility on the reader to read lemmas in the right order, but I think that being able to follow links according to the reader's interest rather than having to determine this in advance as in a math textbook is a strength

#### Simon Hudon (Jul 22 2019 at 21:29):

I'm also in favor of documenting only important lemmas. A lot of the lemmas, especially simp lemmas, don't even need to be used explicitly. They might deserve an explanation but more as a set of lemma rather than individually. For instance, it might be worth explaining that they try to rewrite a formula into a canonical form and describe that form.

#### Simon Hudon (Jul 22 2019 at 21:30):

Also I don't think that more documentation is better than less. More documentation is more work to keep up to date and outdated documentation is worse than no documentation. Let's make sure that the comments we require have value

#### Scott Morrison (Jul 22 2019 at 21:31):

My usual problem when opening a Lean file is that I can't find my_awesome_theorem (or guess which are the awesome theorems I should be looking for...) Of course, good file-level comments solve this.

#### Simon Hudon (Jul 22 2019 at 21:34):

It might be worth adding a rule that the important theorems be listed at the beginning.

#### Kevin Buzzard (Jul 22 2019 at 21:49):

There still seems to be a big argument for some sort of overview of the file at the beginning. This could contain pointers to main sections and theorems.

#### Simon Hudon (Jul 22 2019 at 21:55):

A bit like the last paragraph of a paper's introduction?

#### Kevin Buzzard (Jul 22 2019 at 22:22):

Yes maybe. Just saying what's in the file. I think this is a fine idea

#### Rob Lewis (Jul 23 2019 at 08:54):

So it sounds like there's a general consensus not to require every lemma to have a docstring, which is what the current proposal says. So let's stick with that for now. It will be up to reviewers to enforce a strict enough notion of "important theorem" to require docstrings.

#### Rob Lewis (Jul 23 2019 at 08:55):

As for an overview at the beginning, that's already in the proposal, isn't it? We ask for a summary of the content of the file, followed by

• Main definitions (optional, can be covered in the summary)
• Main statements (optional, can be covered in the summary)

#### Rob Lewis (Jul 23 2019 at 08:56):

Explaining the intent behind a collection of simp lemmas seems to fall under "Implementation notes (description of important design decisions or interface features)." But I think it's a good idea to mention explicitly that simp normal form is an important interface feature.

#### Simon Hudon (Jul 23 2019 at 15:21):

Right now, the files probably mostly meet your rules. How do you suggest you bring those files up to code?

#### Rob Lewis (Jul 23 2019 at 18:44):

Right now, the files probably mostly meet your rules.

What files? I think almost nothing in mathlib meets these standards. The idea is that future PRs should be held to these requirements, and should update what they touch (within reason). It's a long term effort to document existing files, of course, but I hope the community will pitch in.

#### Simon Hudon (Jul 23 2019 at 21:19):

I think file level comments is a lot to tack onto every PR that casually touches it. You may want to create a specific issue for the file level comments

#### Rob Lewis (Jul 24 2019 at 10:49):

Yes, this is why I write "within reason." A PR that adds one lemma about list shouldn't have to document the whole file. But if someone understands a file enough to reorganize half of it, it seems reasonable to ask them to write documentation. Creating an issue for adding these seems fine.

#### Rob Lewis (Jul 24 2019 at 10:53):

Does anyone else have thoughts about the proposed requirements? If not, let's merge the PR and enact the rules.

#### Johan Commelin (Jul 24 2019 at 14:36):

Can we make the doc_blame (without !!! :double_exclamation: :stuck_out_tongue_wink: ) some part of the Travis checks? Or is this going to cause lots of griefs because there are always tiny exceptions to these iron rules?

#### Rob Lewis (Jul 24 2019 at 14:57):

It would be very hard to integrate into the Travis workflow and there will indeed be lots of exceptions.

#### Rob Lewis (Jul 25 2019 at 08:19):

I think there's been enough time for people to voice objections. Can someone approve the PR?

Done (-;

#### Rob Lewis (Jul 25 2019 at 09:45):

Maybe we should have a competition for who can update the most files. What's the prize?

#### Kevin Buzzard (Jul 25 2019 at 09:56):

Free copy of mathlib

Last updated: May 13 2021 at 06:15 UTC