Zulip Chat Archive
Stream: mathlib4
Topic: linting_rules (and foo_rules)
Thomas Murrills (Mar 20 2024 at 18:48):
For the past few weeks, I've been exploring the machinery that would let you write something like
linting_rules : deprecated
| `(tactic|test) => return m!"don't use test!"
where the return
ed message would be shown whenever test
was used in a tactic. (This was inspired by the desire to deprecate "old-style" have
and tfae_have
syntax; currently only declarations can be deprecated. See #mathlib4 > deprecate syntax?)
But more generally, I was curious as to what it would take to implement foo_rules
commands in general, and, for a hypothetical linting_rules
, provide extensible categories (like : deprecated
above).
Keep in mind this is primarily just an exploration for exploration's sake! (I'm happy to contribute if there's interest and demand, but more on that later—all I'll say here is "performance concerns". [EDIT: actually, those seem to have been fixed!]) And if you're interested, read on. :)
Generic syntax rules
This involves, first, making a generic framework for implementing foo_rules
commands.
If you haven't looked at the internals, the way macro_rules
and elab_rules
work under the hood is by attaching an attribute to a declaration. For example,
elab_rules : tactic
| `(fooStx|foo $id) => sorry
expands to (more or less)
@[tactic fooStx]
def auto.generated.name : Syntax → TacticM Unit :=
fun
| `(fooStx|foo $id) => sorry
| _ => no_error_if_unused% throwUnsupportedSyntax
Note that fooStx
is the syntax nodekind. The tactic
attribute is keyed by this nodekind, and later, when elaborating tactic syntax with nodekind fooStx
, we retrieve all declarations tagged with @[tactic fooStx]
(then evaluate each of them on the syntax until we don't get an unsupported syntax exception).
So if we want to implement our own foo_rules
, we'll need to provide
- an attribute (like
tactic
) which is keyed by syntax nodekinds (or a name derived from it) - a type (like
Syntax → TacticM Unit
) that the tagged decl should have (in the form of a name for technical reasons; for example,Syntax → TacticM Unit
is actually the abbrevLean.Elab.Tactic.Tactic
in reality) - a way of turning the
matchAlts
into a term (here, we takealts
to`(term|fun $alts:matchAlt*)
(after we've added the_
case toalts
)).
So the first part of this exploration bundles all that data together into SyntaxRuleData
(name pending); provides a syntax category syntaxRulesHeader
where you can register e.g. "foo_rules" ":" ident
as header syntax for matchAlts
; and provides a way to supply rules for turning that header syntax into SyntaxRuleData
. (These implementations of headers are recorded behind the scenes using an attribute.)
It also implements a couple nice macros, such that this part of implementing a foo_rules
command looks like
-- register `foo_rules : ident` as a header for match alts
syntax (name := fooRulesStx) "foo_rules" ":" ident : syntaxRulesHeader
/- provide the implementation data to use when encountering `foo_rules : $id:ident` as
a header to matchAlts (note that this doesn't include the `alts`) -/
syntax_rules_header
| `(fooRulesStx|foo_rules : $id:ident) => do
...
return (data : SyntaxRuleData)
Under the hood, syntax_rules_header
, likewise, attaches a @[syntax_rules_header_impl fooStx]
attribute. That's how we keep track of generic implementations.
You never need to declare command
syntax directly (after the above, foo_rules : id | alt1 => sorry | alt2 => sorry ...
will already parse as a command, and do what you expect).
To implement this, we actually just have $header:syntaxRulesHeader $alts*
macro-expand to syntax_rules (header := $header) $alts*
, and then implement that by searching for implementations tagged with @[syntax_rules_header_impl _]
that apply to $header
's specific nodekind. (After that, we mirror the structure of elab_rules
/macro_rules
.)
Now, what you do with the attribute you've included in SyntaxRuleData
is totally up to you. All this infrastructure does is infer the syntax nodekind your alts
apply to, insert your data in all the right places to make a decl using those rules, and tag the resulting decls with your specified attribute keyed by that nodekind.
This is the content of the draft PR #11519. (Functional, but many TODOs, and documentation not written yet, as I'm not sure if there's interest from the community.)
Moving on to linting_rules
Now it's easy to implement the core of linting_rules
! The next step is getting the extensible linting_rules
categories like deprecated
working. To do that, we demand a bundle of data come with each linting_rules
category declaration, namely:
- the name of the category (of course)
- the return type the user should be providing for each rule (for instance, in the example at the top, we wanted to return
MessageData
, but a different category thandeprecated
might ask for a different type) - a way to turn that return type into
CommandElabM LintingRuleStep
(whereLintingRuleStep
tells us how to move through syntax: continue or stop descending) - what option turns this linting category on or off (via
set_option
), so we know whether to lint with it - plus some extra
stop
andstart
functions to override behavior, which can be convenient (e.g. stop before entering quotations, or don't even start linting until you're within a certain tactic syntax)
We bundle that data in a structure LintingRulesCat
, and keep track of these categories with a very simple attribute @[linting_rules_cat]
.
Due to some restrictions, we can't actually retrieve a structure that stores Type
s like that and use it where we need to (namely, in CommandElabM
). (See #lean4 > Recover arbitrary type from attribute data .) So in reality, we create decls for the incompatible data, make a different structure LintingRulesCatImpl
behind the scenes which holds the names of those decls, and tag that structure with @[linting_rules_cat]
. [EDIT: If you look at the PR now, this will have changed, since I’m testing different options for performance.] Luckily someone making a linting_rules
category never has to worry about all this, because:
We define a command register_linting_rules_cat <catName> := <catData>
for registering categories, so that you actually only need to provide the data: the conversion to LintingRulesCatImpl
, the addition of @[linting_rules_cat]
, and also the registration of a new option (if you didn't decide to use an existing option) is taken care of for you. For example, the entirety of registering deprecated
is quite simple:
register_linting_rules_cat deprecated := {
Out := MessageData
resolve := fun msg => do
logWarning <| .tagged ``deprecatedAttr msg
return .done
opt := .existing linter.deprecated
}
And it all works out! This lets us lint (and, in this case, deprecate) syntax using match-style rules:
Well...it works out as long as you don't look at performance. linting_rules
is the content of the draft PR #11520 (likewise, functional, but many cleanup TODOs, and all the code is currently in one big file), and the bench results show an increase of around 12% in interpretation. (It might be possible to get this down, though—I'm not sure what the limiting factor is, but I benched a simple dummy syntax-traversing linter (#11523), and I don't think it's intrinsic to traversing syntax.) EDIT: it’s now down to “no significant changes!
Just to reiterate, this is all simply an exploration. If people like it, I'd be happy to get it PR-ready, including adding extensive documentation/step-by-step instructions so that anyone can use it and extend it easily and systematically. But the performance issue might admittedly be a dealbreaker. Though, as we develop (and change) more notation, maybe linting it might be worth it—I'm not sure. Plus, even if organized better for a PR, it is a nontrivial amount of code, entailing nontrivial review. The domain of applicability is rather specialized, too—but there's always more stuff to do with syntax, whether it's linting it, transforming it, formatting it in new ways, identifying features for manipulation, etc.
Either way, it was a great way to learn more about attributes and command elaboration, so feel free to ask questions about the implementation if you're interested and/or let me know if you think this would be useful to have somewhere. :)
Thomas Murrills (Mar 20 2024 at 21:14):
Actually, I believe I’ve fixed the performance issues! #11520 is now at “no significant changes” (~3% increase in interpretation time) according to the latest bench. So I’m considering cleaning this up and PRing it formally. Maybe it could even be optimized further. :)
Damiano Testa (Mar 20 2024 at 21:22):
I do not understand the implementation, but I do think that it is important to have mechanisms that make the code guide through the changes that continuously happen!
Thomas Murrills (Mar 20 2024 at 21:25):
It is a nontrivial amount of stuff going on in the implementation, but happy to answer questions! :)
Also a big part of “cleaning this up” before formally PRing will be including detailed explanations of what’s happening in the docstrings/module docs, as I think that’s essential for maintenance and usability. So hopefully that will help make it understandable!
Last updated: May 02 2025 at 03:31 UTC