Zulip Chat Archive
Stream: mathlib4
Topic: Style :bicycle: : indenting second lines in doc-strings
Michael Rothgang (Feb 19 2025 at 15:42):
I found a small beautiful bikeshed which I would like to show you. It relates to indentation of items in doc-strings.
Question 1. In a declaration doc-string, should the second lines be indented?
/-- My favourite declaration
has a really long doc-string. -/
(edited to add: Let's assume that we want to write this without line breaks before/after the comment markers, because this really just has two lines. That aspect was discussed elsewhere.)
I believe the consensus is not, and the preferred style is
/-- My favourite declaration with a multi-line doc-string
has unindented second line. -/
Michael Rothgang (Feb 19 2025 at 15:43):
Question 2. What about items in module doc-strings?
Imagine the following is part of a module doc-string: is the indentation on the second line required, forbidden or optional either way?
## Main results
- `exists_foo`: the main existence theorem of `foo`s.
- `bar_of_foo`: a construction of a `bar`, given a `foo` under very complicated
and intricate circumstances worth describing in excruciating detail.
- `bar_eq` : the main classification theorem of `bar`s.
I don't remember a consensus on this question. Did I miss one? I'd prefer a decision one way or the other, but don't care which one.
(I'm happy to create a poll if this wasn't discussed yet.)
Ruben Van de Velde (Feb 19 2025 at 15:44):
Option 1c:
/--
My favourite declaration with a multi-line doc-string
has unindented second line.
-/
Michael Rothgang (Feb 19 2025 at 15:45):
Ruben Van de Velde said:
Option 1c:
/-- My favourite declaration with a multi-line doc-string has unindented second line. -/
Of course. Let me clarify that question 1 is meant to be orthogonal of the "multiple paragraphs" question (which IIRC was discussed and mostly concluded a while ago).
Eric Wieser (Feb 19 2025 at 15:49):
I think the answer to question 2 is "yes, always indent", because while
* This is a long sentence
that is badly wrapped.
* Another
works,
* This is a long sentence
that is badly wrapped.
This is the next paragraph of that sentence
* Another
renders incorrectly as
- This is a long sentence
that is badly wrapped.This is the next paragraph of that sentence
- Another
Eric Wieser (Feb 19 2025 at 15:50):
(which is to say; this is just the way that markdown expects to be written)
Michael Rothgang (Feb 19 2025 at 16:04):
:bulb: I see, thanks for clarifying!
Michael Rothgang (Feb 19 2025 at 16:04):
I have created https://github.com/leanprover-community/leanprover-community.github.io/pull/596 to clarify both points.
Michael Rothgang (Feb 19 2025 at 16:11):
And of course, linting for both of these would be nice.
@Damiano Testa Are there any complications with writing a syntax linter for module doc-strings, or doc-strings in general? Seems like a nice quick win.
(I can try that myself, though perhaps only on two weeks.)
Michael Rothgang (Feb 19 2025 at 16:30):
I got a bit nerd-sniped: here's a function to check some of the requirements. The remaining part is hooking this up to a syntax linter for the doc-strings, and fixing any violations.
/-- Checks whether `input` is a declaration doc-string conforming syntactically
to mathlib's style guidelines. -/
def isGoodDocstring (input : String) : IO Bool := do
if !input.startsWith "/--" || !input.endsWith "-/" then
IO.eprintln s!"error: {input} is not a valid doc-string"
return false
if !(input.startsWith "/--\n" || input.startsWith "/-- ") then
IO.eprintln s!"error: doc-string \"{input}\" should start with a space or newline"
return false
if !(input.endsWith "\n-/" || input.startsWith " -/") then
IO.eprintln s!"error: doc-string \"{input}\" end start with a space or newline"
return false
-- Catch extraneous spaces.
if (input.startsWith "/-- " || input.endsWith " -/") then
IO.eprintln s!"error: doc-string \"{input}\" should start resp. end with at most a single space"
return false
-- Catch misleading indentation.
let lines := (input.split (· == '\n')).drop 0
if lines.any (fun l ↦ l.startsWith " ") then
IO.eprintln s!"error: subsequent lines in the doc-string \"{input}\" should not be indented"
return false
-- This list of checks is not exhaustive, but a good start.
-- Future ideas: doc-strings ending with a single quote or with a comma
return true
Michael Rothgang (Feb 19 2025 at 16:40):
Here's an offer: if somebody hooks this up to a syntax linter, I'll help with landing the PR - by doing (some of) fixing up violations, writing tests for the linter or reviewing.
Damiano Testa (Feb 19 2025 at 17:36):
@Michael Rothgang #22095 has the commands to extract the docstring from syntax. I think, I have not tested it much.
Yaël Dillies (Feb 19 2025 at 18:00):
Michael Rothgang said:
Question 1. In a declaration doc-string, should the second lines be indented?
I have been going around saying No in reviews :innocent:
Damiano Testa (Feb 19 2025 at 18:02):
I personally also do not like indentation for Q1. I think that indentation should be present only when it is for markdown.
Damiano Testa (Feb 19 2025 at 18:03):
However, I think that if there is one style convention that everyone agrees on, then the linter should focus on just that one and then grow gradually.
Damiano Testa (Feb 19 2025 at 18:04):
I suspect that even finding one format to enforce may be tricky.
Michael Rothgang (Feb 19 2025 at 18:38):
Damiano Testa said:
Michael Rothgang #22095 has the commands to extract the docstring from syntax. I think, I have not tested it much.
I just added a first implementation of the linter: let's see what it finds in mathlib!
Damiano Testa (Feb 19 2025 at 20:03):
Just under 3.5k exceptions!! :octopus:
Michael Rothgang (Feb 20 2025 at 09:43):
#22117 and #22118 fix some clear cases where doc-strings should be un-indented.
Damiano Testa (Feb 26 2025 at 22:52):
The linter at #22095 should now be passing CI. All the exceptions that it found have slowly made it into mathlib.
Damiano Testa (Feb 26 2025 at 22:52):
The new exceptions that have been accumulated since the linter started linting are in #22343.
Damiano Testa (Feb 26 2025 at 22:53):
The linter does minimal checks: it simply verifies that the first character after /--
and before -/
in a doc-string is either a space or a line break and that the next/previous character is neither a space nor a line break.
Damiano Testa (Feb 26 2025 at 22:53):
This seems pretty uncontroversial and no one objected to any of the linter exceptions that have been fixed.
Damiano Testa (Feb 26 2025 at 22:55):
Of the two PRs mentioned above, #22343 is a strict subset of #22095 and just contains the "straightening up" of the new linter exceptions.
Michael Rothgang (Mar 17 2025 at 16:19):
I have one last question for this bikeshed: what about multi-line docstrings for to_additive? Say, this one
@[to_additive
"An element `a : M` of an `AddMonoid` is an `AddUnit` if it has a two-sided additive inverse.
The actual definition says that `a` is equal to some `u : AddUnits M`,
where `AddUnits M` is a bundled version of `IsAddUnit`."]
Yaël Dillies (Mar 17 2025 at 16:21):
I vote for "don't indent". This way the additive and multiplicative versions of the docstring are most likely to be whitespaced the same way
Eric Wieser (Mar 17 2025 at 17:01):
An alternative would be "indent with \ continuations" - or are you talking about inconsistent line wrapping due to a reduced column width?
Yaël Dillies (Mar 17 2025 at 17:04):
Latter
Thomas Murrills (Mar 17 2025 at 17:12):
(Slight tangent, but we could also change the syntax of to_additive
so that the docstring argument is syntactically a docstring (/-- -/
) instead of a string literal. Then it’s a bit more natural to lint it, it’s syntax-highlighted the same way, etc.)
Thomas Murrills (Mar 29 2025 at 00:23):
Ok, eight thumbs up. I'll get on that. :)
Michael Rothgang (Apr 18 2025 at 15:57):
#24166 fixes the indentation in about 100 doc-strings to match markdown syntax. This is just the first of these PRs, but will correct a few mis-formatted doc-strings.
Michael Rothgang (Apr 19 2025 at 08:17):
#24171 fixes more such doc-strings --- and corrects a few actual mis-formattings
Michael Rothgang (Apr 19 2025 at 08:52):
One detail which came up in this PR: for a comment like
Under the following assumptions
* first property
* and second property
we have proven the following theorems.
we'd like the "we have proven the following theorems" to be "part of" the same paragraph, visually.
On the other hand, that syntax is hard to distinguish from
Under the following assumptions
* first property
* and second property which is very long,
and therefore spans a second line: this is bad
where conceptually, really the item should be indented.
Michael Rothgang (Apr 19 2025 at 08:52):
Imposing the the first case be always written as
Under the following assumptions
* first property
* and second property
we have proven the following theorems.
makes it clear which one is meant (and allows writing a linter for it).
Are there any downsides, is there opposition to requiring this?
Michael Rothgang (Apr 19 2025 at 08:53):
This does not change the visual rendering in doc-gen.
Ruben Van de Velde (Apr 19 2025 at 08:56):
Weakly in favour of requiring the empty line, based on how often the indentation really is missing
Bhavik Mehta (Apr 19 2025 at 09:02):
Provided this doesn't change the visual rendering in doc-gen, so it is visually part of the same paragraph there, I have no objection.
Michael Rothgang (Apr 19 2025 at 10:52):
Point of information: within the +-550 lines diff, all added lines should be contained in this commit, about 25 lines.
Eric Wieser (Apr 19 2025 at 16:47):
Michael Rothgang said:
On the other hand, that syntax is hard to distinguish from
Under the following assumptions * first property * and second property which is very long, and therefore spans a second line: this is bad
where conceptually, really the item should be indented.
Various markdown renderers do not support this, it would be worth checking what the commonmark spec says
Eric Wieser (Apr 19 2025 at 16:48):
(note that html does not allow lists mid-paragraph either)
Last updated: May 02 2025 at 03:31 UTC