Zulip Chat Archive
Stream: mathlib4
Topic: Style :bicycle: : indenting second lines in doc-strings
Michael Rothgang (Feb 19 2025 at 15:42):
I found a small beautiful bikeshed which I would like to show you. It relates to indentation of items in doc-strings.
Question 1. In a declaration doc-string, should the second lines be indented?
/-- My favourite declaration
has a really long doc-string. -/
(edited to add: Let's assume that we want to write this without line breaks before/after the comment markers, because this really just has two lines. That aspect was discussed elsewhere.)
I believe the consensus is not, and the preferred style is
/-- My favourite declaration with a multi-line doc-string
has unindented second line. -/
Michael Rothgang (Feb 19 2025 at 15:43):
Question 2. What about items in module doc-strings?
Imagine the following is part of a module doc-string: is the indentation on the second line required, forbidden or optional either way?
## Main results
- `exists_foo`: the main existence theorem of `foo`s.
- `bar_of_foo`: a construction of a `bar`, given a `foo` under very complicated
and intricate circumstances worth describing in excruciating detail.
- `bar_eq` : the main classification theorem of `bar`s.
I don't remember a consensus on this question. Did I miss one? I'd prefer a decision one way or the other, but don't care which one.
(I'm happy to create a poll if this wasn't discussed yet.)
Ruben Van de Velde (Feb 19 2025 at 15:44):
Option 1c:
/--
My favourite declaration with a multi-line doc-string
has unindented second line.
-/
Michael Rothgang (Feb 19 2025 at 15:45):
Ruben Van de Velde said:
Option 1c:
/-- My favourite declaration with a multi-line doc-string has unindented second line. -/
Of course. Let me clarify that question 1 is meant to be orthogonal of the "multiple paragraphs" question (which IIRC was discussed and mostly concluded a while ago).
Eric Wieser (Feb 19 2025 at 15:49):
I think the answer to question 2 is "yes, always indent", because while
* This is a long sentence
that is badly wrapped.
* Another
works,
* This is a long sentence
that is badly wrapped.
This is the next paragraph of that sentence
* Another
renders incorrectly as
- This is a long sentence
that is badly wrapped.This is the next paragraph of that sentence
- Another
Eric Wieser (Feb 19 2025 at 15:50):
(which is to say; this is just the way that markdown expects to be written)
Michael Rothgang (Feb 19 2025 at 16:04):
:bulb: I see, thanks for clarifying!
Michael Rothgang (Feb 19 2025 at 16:04):
I have created https://github.com/leanprover-community/leanprover-community.github.io/pull/596 to clarify both points.
Michael Rothgang (Feb 19 2025 at 16:11):
And of course, linting for both of these would be nice.
@Damiano Testa Are there any complications with writing a syntax linter for module doc-strings, or doc-strings in general? Seems like a nice quick win.
(I can try that myself, though perhaps only on two weeks.)
Michael Rothgang (Feb 19 2025 at 16:30):
I got a bit nerd-sniped: here's a function to check some of the requirements. The remaining part is hooking this up to a syntax linter for the doc-strings, and fixing any violations.
/-- Checks whether `input` is a declaration doc-string conforming syntactically
to mathlib's style guidelines. -/
def isGoodDocstring (input : String) : IO Bool := do
if !input.startsWith "/--" || !input.endsWith "-/" then
IO.eprintln s!"error: {input} is not a valid doc-string"
return false
if !(input.startsWith "/--\n" || input.startsWith "/-- ") then
IO.eprintln s!"error: doc-string \"{input}\" should start with a space or newline"
return false
if !(input.endsWith "\n-/" || input.startsWith " -/") then
IO.eprintln s!"error: doc-string \"{input}\" end start with a space or newline"
return false
-- Catch extraneous spaces.
if (input.startsWith "/-- " || input.endsWith " -/") then
IO.eprintln s!"error: doc-string \"{input}\" should start resp. end with at most a single space"
return false
-- Catch misleading indentation.
let lines := (input.split (· == '\n')).drop 0
if lines.any (fun l ↦ l.startsWith " ") then
IO.eprintln s!"error: subsequent lines in the doc-string \"{input}\" should not be indented"
return false
-- This list of checks is not exhaustive, but a good start.
-- Future ideas: doc-strings ending with a single quote or with a comma
return true
Michael Rothgang (Feb 19 2025 at 16:40):
Here's an offer: if somebody hooks this up to a syntax linter, I'll help with landing the PR - by doing (some of) fixing up violations, writing tests for the linter or reviewing.
Damiano Testa (Feb 19 2025 at 17:36):
@Michael Rothgang #22095 has the commands to extract the docstring from syntax. I think, I have not tested it much.
Yaël Dillies (Feb 19 2025 at 18:00):
Michael Rothgang said:
Question 1. In a declaration doc-string, should the second lines be indented?
I have been going around saying No in reviews :innocent:
Damiano Testa (Feb 19 2025 at 18:02):
I personally also do not like indentation for Q1. I think that indentation should be present only when it is for markdown.
Damiano Testa (Feb 19 2025 at 18:03):
However, I think that if there is one style convention that everyone agrees on, then the linter should focus on just that one and then grow gradually.
Damiano Testa (Feb 19 2025 at 18:04):
I suspect that even finding one format to enforce may be tricky.
Michael Rothgang (Feb 19 2025 at 18:38):
Damiano Testa said:
Michael Rothgang #22095 has the commands to extract the docstring from syntax. I think, I have not tested it much.
I just added a first implementation of the linter: let's see what it finds in mathlib!
Damiano Testa (Feb 19 2025 at 20:03):
Just under 3.5k exceptions!! :octopus:
Michael Rothgang (Feb 20 2025 at 09:43):
#22117 and #22118 fix some clear cases where doc-strings should be un-indented.
Damiano Testa (Feb 26 2025 at 22:52):
The linter at #22095 should now be passing CI. All the exceptions that it found have slowly made it into mathlib.
Damiano Testa (Feb 26 2025 at 22:52):
The new exceptions that have been accumulated since the linter started linting are in #22343.
Damiano Testa (Feb 26 2025 at 22:53):
The linter does minimal checks: it simply verifies that the first character after /-- and before -/ in a doc-string is either a space or a line break and that the next/previous character is neither a space nor a line break.
Damiano Testa (Feb 26 2025 at 22:53):
This seems pretty uncontroversial and no one objected to any of the linter exceptions that have been fixed.
Damiano Testa (Feb 26 2025 at 22:55):
Of the two PRs mentioned above, #22343 is a strict subset of #22095 and just contains the "straightening up" of the new linter exceptions.
Michael Rothgang (Mar 17 2025 at 16:19):
I have one last question for this bikeshed: what about multi-line docstrings for to_additive? Say, this one
@[to_additive
"An element `a : M` of an `AddMonoid` is an `AddUnit` if it has a two-sided additive inverse.
The actual definition says that `a` is equal to some `u : AddUnits M`,
where `AddUnits M` is a bundled version of `IsAddUnit`."]
Yaël Dillies (Mar 17 2025 at 16:21):
I vote for "don't indent". This way the additive and multiplicative versions of the docstring are most likely to be whitespaced the same way
Eric Wieser (Mar 17 2025 at 17:01):
An alternative would be "indent with \ continuations" - or are you talking about inconsistent line wrapping due to a reduced column width?
Yaël Dillies (Mar 17 2025 at 17:04):
Latter
Thomas Murrills (Mar 17 2025 at 17:12):
(Slight tangent, but we could also change the syntax of to_additive so that the docstring argument is syntactically a docstring (/-- -/) instead of a string literal. Then it’s a bit more natural to lint it, it’s syntax-highlighted the same way, etc.)
Thomas Murrills (Mar 29 2025 at 00:23):
Ok, eight thumbs up. I'll get on that. :)
Michael Rothgang (Apr 18 2025 at 15:57):
#24166 fixes the indentation in about 100 doc-strings to match markdown syntax. This is just the first of these PRs, but will correct a few mis-formatted doc-strings.
Michael Rothgang (Apr 19 2025 at 08:17):
#24171 fixes more such doc-strings --- and corrects a few actual mis-formattings
Michael Rothgang (Apr 19 2025 at 08:52):
One detail which came up in this PR: for a comment like
Under the following assumptions
* first property
* and second property
we have proven the following theorems.
we'd like the "we have proven the following theorems" to be "part of" the same paragraph, visually.
On the other hand, that syntax is hard to distinguish from
Under the following assumptions
* first property
* and second property which is very long,
and therefore spans a second line: this is bad
where conceptually, really the item should be indented.
Michael Rothgang (Apr 19 2025 at 08:52):
Imposing the the first case be always written as
Under the following assumptions
* first property
* and second property
we have proven the following theorems.
makes it clear which one is meant (and allows writing a linter for it).
Are there any downsides, is there opposition to requiring this?
Michael Rothgang (Apr 19 2025 at 08:53):
This does not change the visual rendering in doc-gen.
Ruben Van de Velde (Apr 19 2025 at 08:56):
Weakly in favour of requiring the empty line, based on how often the indentation really is missing
Bhavik Mehta (Apr 19 2025 at 09:02):
Provided this doesn't change the visual rendering in doc-gen, so it is visually part of the same paragraph there, I have no objection.
Michael Rothgang (Apr 19 2025 at 10:52):
Point of information: within the +-550 lines diff, all added lines should be contained in this commit, about 25 lines.
Eric Wieser (Apr 19 2025 at 16:47):
Michael Rothgang said:
On the other hand, that syntax is hard to distinguish from
Under the following assumptions * first property * and second property which is very long, and therefore spans a second line: this is badwhere conceptually, really the item should be indented.
Various markdown renderers do not support this, it would be worth checking what the commonmark spec says
Eric Wieser (Apr 19 2025 at 16:48):
(note that html does not allow lists mid-paragraph either)
Michael Rothgang (Aug 05 2025 at 20:54):
I wrote a linter checking the indentation as in Option 1: most violations are fixed in the following PRs (plus a few more changes I'll PR once these are merged).
Michael Rothgang (Aug 05 2025 at 20:54):
#28002 (small, +- 20 lines)
Michael Rothgang (Aug 05 2025 at 20:54):
#28003 (also small, +-50 lines)
Michael Rothgang (Aug 05 2025 at 20:55):
#28004 (+- 100 lines)
Michael Rothgang (Aug 05 2025 at 20:55):
#28005 (+- 250 lines)
Michael Rothgang (Aug 05 2025 at 20:55):
#28006 (+- 210 lines)
Michael Rothgang (Aug 05 2025 at 20:56):
#28007 (+- 50 lines)
Michael Rothgang (Aug 05 2025 at 20:56):
Quick reviews are very welcome, as these PRs are very bitrotty by design.
Michael Rothgang (Aug 06 2025 at 13:24):
Thanks for the very fast reviews!
Michael Rothgang (Aug 06 2025 at 13:27):
I have another small bikeshed to resolve: what indentation style do we prefer for doc-strings with enumerated items? Do we prefer
/-- Some doc-string
1. An item which spans multiple lines.
Start of the second line.
2. Second item.
-/
example : Nat := 0
(i.e., three spaces),
/-- Some doc-string
1. An item which spans multiple lines.
Start of the second line.
2. Second item.
-/
example : Nat := 0
(i.e., two spaces) or
/-- Some doc-string
1. An item which spans multiple lines.
Start of the second line.
2. Second item.
-/
example : Nat := 0
(four spaces)?
Michael Rothgang (Aug 06 2025 at 13:28):
At the moment, mathlib uses a mix of all three styles. I'd like to unify on one of them.
Michael Rothgang (Aug 06 2025 at 13:29):
Some arguments I can think of:
- two spaces is consistent with unnumbered items
- three and four spaces produce some alignment (which breaks once you have at least ten items --- that is rare, though)
- three and four spaces means linters get more complicated: enforcing two spaces is easiest to implement
Michael Rothgang (Aug 06 2025 at 13:33):
#28041 fixes more indentation (in line of the initial question of this thread)
Ruben Van de Velde (Aug 06 2025 at 13:34):
How about
1. X
Y
(or does doc-gen support
# X
Y
? I guess that's a header instead)
Michael Rothgang (Aug 06 2025 at 13:34):
So, you're proposing "five spaces"?
Michael Rothgang (Aug 06 2025 at 13:34):
I also think # X is a header.
Michael Rothgang (Aug 06 2025 at 13:39):
#28043 is another small batch (+- 50 lines)
Ruben Van de Velde (Aug 06 2025 at 13:39):
No, four spaces (and two after the 1.)
Eric Wieser (Aug 06 2025 at 21:36):
My take is that column alignment matters and the number of spaces besides that doesn't
Michael Rothgang (Aug 07 2025 at 07:46):
#28070 (+-110 lines) improves some doc-strings accordingly
Thomas Murrills (Aug 08 2025 at 12:27):
Thomas Murrills said:
(Slight tangent, but we could also change the syntax of
to_additiveso that the docstring argument is syntactically a docstring (/-- -/) instead of a string literal. Then it’s a bit more natural to lint it, it’s syntax-highlighted the same way, etc.)
I’ve finally done this in #28066. :) It’s a massive diff, but much of it automated. Hopefully it can somehow be reviewed before it falls too far out of sync with master! :grinning_face_with_smiling_eyes:
(Note: I use some automation inspired by #13973, but implemented slightly differently: I modified the to_additive elaboration to store edits in the oleans via persistent env extension, then looped through the full Array (Array Edit) made available by importing Mathlib. This is not a general-purpose approach, but worked well enough for the case at hand!)
Eric Wieser (Aug 08 2025 at 12:41):
Can you split the new syntax into a separate PR from adapting every caller
Eric Wieser (Aug 08 2025 at 12:43):
(I would guess you can write an implementation that supports both syntaxes, which should be small enough to not make github badly behaved; then we can do the global replacement, then finally remove / deprecate the support for the old syntax)
Thomas Murrills (Aug 08 2025 at 12:48):
If desired, sure! Though to be honest, I’m not sure we’re achieving all that much, and may be introducing extra friction: we then need to review two changes to to_additive internals instead of one, and the change is quite small. (Note that the PR does not actually remove str support per se, but throws a deprecation warning within Mathlib.) Is it really that helpful to do so?
Eric Wieser (Aug 08 2025 at 12:52):
I don't think I have any capacity to review #28066 in its current form; github really does not behave well on diffs that large. If the whole diff is autogenerated then that's not a problem, but this one is a mix of meta code that needs review and automated changes which do not
Michael Rothgang (Aug 08 2025 at 12:55):
I'm happy to review some of the automated changes, but if those could land piece-meal (i.e., in reviewable chunks), that's much nicer.
Michael Rothgang (Aug 08 2025 at 12:55):
How difficult is it to make to_additive support both forms, but without any warning yet?
Then the deprecation can land in a later PR, which verifies that no instances of the old syntax are left.
Thomas Murrills (Aug 08 2025 at 13:00):
Eric Wieser said:
I don't think I have any capacity to review #28066 in its current form; github really does not behave well on diffs that large.
Ah, I see, I wasn’t aware of the extent of the github difficulties.
Thomas Murrills (Aug 08 2025 at 13:01):
Michael Rothgang said:
How difficult is it to make
to_additivesupport both forms, but without any warning yet?
Then the deprecation can land in a later PR, which verifies that no instances of the old syntax are left.
Not difficult at all—just removing a couple lines should do it. :)
Thomas Murrills (Aug 08 2025 at 13:10):
(I’ll first have the chance to do this later today.)
Thomas Murrills (Aug 08 2025 at 23:44):
I've split this into three PRs as requested:
Thomas Murrills (Aug 08 2025 at 23:44):
#28133: accept docComment syntax (in addition to str syntax) (CI is green)
Thomas Murrills (Aug 08 2025 at 23:44):
#28066: reformat existing to_additive docstrings
Thomas Murrills (Aug 08 2025 at 23:45):
#28135: deprecate str syntax in Mathlib
Thomas Murrills (Aug 08 2025 at 23:47):
I also rebased/reorganized the changes in the large reformatting PR such that the commit history is hopefully more helpful when reviewing. :)
Last updated: Dec 20 2025 at 21:32 UTC