Zulip Chat Archive
Stream: mathlib4
Topic: A whitespace linter
Damiano Testa (May 01 2025 at 14:24):
#24465 contains a linter that enforces whitespace formatting on all declarations up to and excluding the type signature. Essentially, it checks that the "hypotheses" of each declaration are correctly formatted.
Damiano Testa (May 01 2025 at 14:24):
While the scope is limited, the linter found hundreds of exceptions in mathlib, including one in #24119, merged 8 hours ago.
Damiano Testa (May 01 2025 at 14:24):
The PR is long, but a lot of it is documentation, tests and making the tests compliant. All the mathlib adaptation PRs that resulted from it have been completely straightforward.
Damiano Testa (May 01 2025 at 14:24):
Another strong point in favour of the linter is that it has no "special-casing", although it has an exclusion mechanism to allow the linter to ignore parts of a declaration, based on the syntax type that gave rise to them.
Damiano Testa (May 01 2025 at 14:25):
Overall, I think that it works very well and the fact that there isn't a single exception in all of mathlib seems a strong endorsement!
Damiano Testa (May 01 2025 at 14:28):
Some previous discussions:
Michael Rothgang (May 01 2025 at 14:53):
As I already said on github (also perhaps on a related PR), I think this is great to have. I look forward to extending this linter in the future, towards checking all formatting (with an opt-out for syntax it doesn't understand). I can try to review the linter implementation soon.
Michael Rothgang (May 01 2025 at 14:53):
Let me dream big: what is needed to make this closer to an auto-formatter, i.e. enable the author of a pull request to automatically reformat all affected lines to comply with the linter? I presume there are two steps:
- making the linter generate "try this" suggestions (or similar) --- this is blocked on lean#4363 (the RFC is accepted, I think - and medium priority)
- some way of "apply all try this suggestions in this file" (which could be be useful in many other settings also, such as tryAtEachStep or for auto-fixing all non-terminal simps, and generally for self-rewriting code)
Is there a way to move any of these items forward?
Michael Rothgang (May 01 2025 at 14:56):
(Without opening the can of worms that is lake exe refactor.)
Damiano Testa (May 01 2025 at 15:04):
Right now, the linter piggy-backs on the pretty-printer for deciding what is correctly formatted or not (plus a little custom-handling of where to put line breaks).
Damiano Testa (May 01 2025 at 15:05):
However, the pretty-printer has too many short-comings to be reliably used as an "oracle formatter". This, I view as the main obstacle, and the linter takes no step in this direction.
Damiano Testa (May 01 2025 at 15:05):
The other question, of automated code-editing is certainly (and easily) within reach. I think that all that is lacking is the desire to accept one way as the standard and then build infrastructure around it.
Damiano Testa (May 01 2025 at 15:06):
I ran a few tests, before and after refactor, using it and not using it. They all worked, of course each with its pros and cons.
Damiano Testa (May 04 2025 at 05:05):
Here is another recent whitespace lint: #24585.
Mario Carneiro (May 06 2025 at 00:46):
One thing I would love to have in this vein is something like the vscode typescript formatter. It's very non-judgmental, it doesn't try to do any extra line breaking, and mainly just makes sure that indentation of successive lines is correct and whitespace around operators and parentheses is correct, which means that you get the easy stuff fixed and can leave the harder questions for manual intervention, which seems like a good compromise for a hugely complex syntax like lean's where we want to err on the side of doing something useful but not making a mess
Martin Dvořák (May 06 2025 at 06:41):
Out of curiosity, why is there Symbol in CommandStart.lean?
Damiano Testa (May 06 2025 at 06:51):
Martin Dvořák said:
Out of curiosity, why is there
Symbolin CommandStart.lean?
This is a test: at some point, the linter flagged it as an error and I made it into one of the test elements.
Damiano Testa (May 06 2025 at 06:57):
With respect to Mario's comment, the linter implements a very initial step towards that: it verifies whitespace around variables, but trusts line breaks provided by the source code. It also does not provide autocorrection.
Damiano Testa (May 07 2025 at 15:14):
Damiano Testa (May 13 2025 at 09:13):
Yet more whitespace: #24837.
Damiano Testa (May 13 2025 at 09:14):
Note also that these PRs are all completely automatic: the linter only suggests correct changes :trademark: . In case someone feels like merging it, here is the PR: #24465! :slight_smile:
Sébastien Gouëzel (May 13 2025 at 09:19):
I would love to see the linter in, and active on PRs, because it would save us a lot of boring work when reviewing. Unfortunately, I can't review it for lack of skills, sorry...
Damiano Testa (May 16 2025 at 08:35):
Damiano Testa (May 16 2025 at 08:38):
I also want to point out what I consider to be an added benefit of this linter: since it actually inspects the pretty-printed version of the various commands, it helps in catching situations where the newly introduced notation would not print accordingly to what you expect. This happened in several of the past PRs, including the one I just linked to.
Damiano Testa (May 16 2025 at 08:39):
I view this as an incremental way of producing a formatter: little by little, the scope of the linter can expand and, each time there is a discrepancy between what has been written and how it pretty-prints, it may mean that there is a style decision to be made about what is the "correct" way of writing a particular part of the syntax.
Damiano Testa (May 19 2025 at 10:48):
Damiano Testa (Jun 17 2025 at 18:57):
Dear All,
the whitespace linter has just made its way into mathlib! Essentially, it checks for correct formatting of all declarations up to and excluding the type specification:
theorem X {a : Unit} (b : Nat) :/- end of checking -/ ...
Damiano Testa (Jun 17 2025 at 18:57):
As usual, with all such new additions, there may be issues: let me know if you run into any unwanted behaviour!
Kenny Lau (Jun 17 2025 at 18:59):
Are there currently files that violate the linter and could we have a look at an example? Sometimes I'm not sure if things need an extra indentation, like I'll write something like
very_long_proof_term_1 (moderately_long_proof_term
the_complement_of_the_moderately_long_proof_term)
proof_term_2 proof_term_3 proof_term_4
Damiano Testa (Jun 17 2025 at 19:00):
Specifically indentation is not something that the linter checks, it trusts that you know what you are doing.
Damiano Testa (Jun 17 2025 at 19:00):
Also, note that it only inspects the hypotheses, so such long terms are rare.
Damiano Testa (Jun 17 2025 at 19:01):
The main reason for these limitations is that the linter basically uses the pretty-printer as an oracle for formatting, and the pretty-printer is only reliable in a limited setting.
Damiano Testa (Jun 17 2025 at 19:02):
In fact, while making mathlib compliant, I uncovered several missing spaces in syntax declarations, since no one had ever looked at how they pretty-printed.
Kenny Lau (Jun 17 2025 at 19:03):
Ah, ok, I had a look at the diff, very interesting, thanks
Kenny Lau (Jun 17 2025 at 19:04):
Kenny Lau (Jun 17 2025 at 19:04):
it looks like this is the only instance where whitespaces issues were identified and fixed?
Kenny Lau (Jun 17 2025 at 19:06):
ah, and also a few instances of changing notation_class * to notation_class*
Damiano Testa (Jun 17 2025 at 19:26):
For all (or almost all) the adaptation PRs that I made, I referenced the "linter PR", so they appear as comments. There were a lot of exceptions. Here are some examples.
and so on.
Jovan Gerbscheid (Jun 20 2025 at 19:09):
Note that the linter adds 2 minutes of linting time to the mathlib build, according to the speedcenter. Has this been considered and can it be improved?
Jovan Gerbscheid (Jun 20 2025 at 19:48):
@Damiano Testa, I don't know if the preformance bottleneck is in Lean.PrettyPrinter.ppCategory, Std.Format.pretty or in the implementation of the linter itself. But what I find suspicious is that the linter works with String instead of Substring. This is very memory inefficient, because after every little string token that is being processed, you are allocating memory to copy the whole rest of the String (using String.drop 1). So I would suggest using Substring to avoid this extra memory allocation.
Michael Rothgang (Jun 20 2025 at 21:15):
I tried doing so in #26240: some of these changes are definitely readability regressions. Let's see if this helps meaningfully...
Damiano Testa (Jun 20 2025 at 21:27):
Actually, I had initially planned this linter to be a test for a new kind of "relative" linter. However, since there were initially quite a few exceptions and the linter was not the easiest to write, I decided to leave the "relative" part for a later development.
Damiano Testa (Jun 20 2025 at 21:27):
Now, "relative" refers to the fact that I wanted this linter to only act on the git diff of a PR, and not on all of mathlib. So, the plan is
- write the linter (done);
- make mathlib compliant (done);
- write the code for detecting the git diff and store it in an environment extension (partly done);
- hook the commandStart linter into the git diff mechanism, so it only inspects code that changes, since the other is already compliant.
Damiano Testa (Jun 20 2025 at 21:28):
Since I imagine that this will noticeably reduce the running time of the linter, I was not too worried about its performance.
Ruben Van de Velde (Jun 20 2025 at 21:30):
In my experience such relative checks have a high likelihood of missing things over time and then failing when you touch code nearby. I'd carefully consider whether the tradeoffs are worth the increased complexity
Damiano Testa (Jun 20 2025 at 21:32):
That is a possibility, so I was also toying with the idea that maybe bors would run the linter on all of mathlib. At the same time, it is not like a misplaced whitespace is such a tragedy and there could be a monthly clean up run of all mathlib.
Damiano Testa (Jun 20 2025 at 21:36):
By the way, most style linters could hook into the "relative" mechanism as well. And, moreover, the "relative" mechanism need not be so fine-grained at the declaration (or even syntax node level): the linter could simply run on every modified file, not just on the modified parts of the files.
Jovan Gerbscheid (Jun 20 2025 at 23:18):
Michael Rothgang said:
I tried doing so in #26240
The result shows that this is indeed somewhat of a performance improvement, but it only saves about 5 seconds.
So, I think this does show that we might be better off with a "relative" linter, which runs on all changed files, and also runs every now and then on the whole of mathlib.
Kenny Lau (Jun 21 2025 at 00:50):
@Damiano Testa as a middle ground, is it possible to only do the linter for the files affected? (so bypassing the vulnerability of only focusing on the git diff, but also not running the linter on the whole mathlib)
Jovan Gerbscheid (Jun 21 2025 at 01:09):
I think that's what Damiano suggested in his last message
Damiano Testa (Jun 23 2025 at 12:35):
A first implementation of the "git-aware" version of the linter is available at #26299.
Damiano Testa (Jun 23 2025 at 12:36):
#26298 is a test PR that should show that the linter is active on modified files, new or otherwise.
EDIT: the test passed, since CI failed with error
Some required builds logged failures:
- Mathlib.Data.Array.Extract
- Mathlib.New
Kenny Lau (Jun 23 2025 at 20:28):
Bug report? Type (u+1) got corrected to Type (u + 1)
Damiano Testa (Jun 23 2025 at 20:29):
I've been using u + 1 when the linter suggested it. Though we can switch back in case that is not desired.
Aaron Liu (Jun 23 2025 at 20:43):
Kenny Lau said:
Bug report?
Type (u+1)got corrected toType (u + 1)
Is that not normal?
Mario Carneiro (Jun 23 2025 at 20:45):
that one might be worth taking a census for
Mario Carneiro (Jun 23 2025 at 20:46):
my guess is that humans usually write it without spaces, but the formatter always uses spaces there
Mario Carneiro (Jun 23 2025 at 20:47):
331 results for u + 1 and 94 for u+1 so maybe not
Kenny Lau (Jun 23 2025 at 20:51):
universe u
structure Foo : Type u :=
(x : Type u)
/-
invalid universe level for field 'x', has type
Type u
at universe level
u+2
which is not less than or equal to the structure's resulting universe level
u+1
-/
Kenny Lau (Jun 23 2025 at 20:51):
the printer sometimes uses u+1 and sometimes uses u + 1
Mario Carneiro (Jun 23 2025 at 20:53):
ah right, there is a second printer specifically for printing levels (src#Lean.Level.format)
Mario Carneiro (Jun 23 2025 at 20:53):
I think that one will also do things like (max u v)+1 instead of max u v + 1
Mario Carneiro (Jun 23 2025 at 20:56):
universe u v
/--
error: invalid universe level for field 'x', has type
Sort (imax u v)
at universe level
(imax u v)+1
which is not less than or equal to the structure's resulting universe level
u+1
-/
#guard_msgs in structure Foo : Type u where x : Sort imax u v
/-- info: Type (imax u v) : Type ((imax u v) + 1) -/
#guard_msgs in #check Type imax u v
Mario Carneiro (Jun 23 2025 at 20:56):
I guess the normal printer also puts unnecessary parentheses there
Edward van de Meent (Jun 23 2025 at 21:29):
i've always treated universe expressions as though they were any others, meaning spaces around +... TIL some people disagree
Jovan Gerbscheid (Jul 10 2025 at 13:51):
Here's an interesting issue I encountered when leaving an extra space at the end of the @[simp] line:
@[simp]
theorem foo : 2 = 2 := rfl
It gives the error
remove line break in the source
This part of the code
'@[simp]
theorem'
should be written as
'@[simp]
theorem'
Note: This linter can be disabled with `set_option linter.style.commandStart false`
Both the replacement suggestion and the error message are confusing here.
Instead of just remove line break in the source, maybe a better message would be whitespace linter: please remove the extra whitespace.
Damiano Testa (Jul 10 2025 at 15:35):
This looks similar to , right?
Damiano Testa (Jul 10 2025 at 15:36):
It is true that removing the line break would also silence the linter. I may make the message say that one way of silencing the linter is to follow its suggestion, but it is not the only way to silence it.
Damiano Testa (Jul 10 2025 at 15:37):
(Also, I have an option that automatically removes trailing whitespace on every line upon saving, so I am very rarely in a situation such as the one that you describe.)
Jovan Gerbscheid (Jul 10 2025 at 15:38):
I also have that option, but I'm not always saving files immediately :)
Damiano Testa (Jul 10 2025 at 15:39):
I do save pretty obsessively my files, to be honest... :slight_smile:
Jovan Gerbscheid (Jul 10 2025 at 15:39):
Oh, I only realize now that the linter meant to suggest
should be written as
'@[simp] theorem'
Damiano Testa (Jul 10 2025 at 15:40):
Indeed, line breaks in the pretty-printer are not ideal, though...
Damiano Testa (Jul 10 2025 at 15:40):
But I am very pleasantly surprised by the fact that spaces are surprisingly good!
Damiano Testa (Jul 10 2025 at 15:41):
(And, besides, controllable by the user, at least if you can modify the code where the syntax is defined.)
Damiano Testa (Jul 10 2025 at 15:43):
Maybe I should also make it clear that, right now, I am aiming for the following goals from the commandStart linter:
- that it is active on most of mathlib;
- that it virtually never suggests something undesirable;
- that it is pretty consistent.
Being completely thorough is not (yet) a goal.
Jovan Gerbscheid (Jul 10 2025 at 15:44):
Do you think the linter should (after that goal is reached) enforce either a newline or not a newline in @[simp]? For long tags, they should definitely be on a separate line, but maybe we can tell it to remove the newline when the tag @[simp] is at most 10 (?) characters long.
Damiano Testa (Jul 10 2025 at 15:45):
I have not really thought about line breaks, but I would like it if it inspected the length of the string and, up to 100 chars, would suggest to place the whole command on a single line.
Damiano Testa (Jul 10 2025 at 15:45):
Breaking down more and more bits as the 100 chars limit is exceeded.
Damiano Testa (Jul 10 2025 at 15:46):
However, with line breaks and whitespace sensitivity, there are a lot of constraints and I have not really given it much thought.
Damiano Testa (Jul 10 2025 at 15:47):
For the moment, I find that allowing some (currently complete!) leeway in where the line-breaks are placed is actually a good compromise.
Damiano Testa (Jul 10 2025 at 15:48):
Btw, the "breaking down depending on the length" is also what the add_deprecation script does: it tries to fit @[deprecated (since := "..."] alias xxx := yyy on as few lines as possible, with breaks allowed at the attribute, and after the :=.
Last updated: Dec 20 2025 at 21:32 UTC