Zulip Chat Archive

Stream: mathlib4

Topic: linting_rules (and foo_rules)

Thomas Murrills (Mar 20 2024 at 18:48):

For the past few weeks, I've been exploring the machinery that would let you write something like

linting_rules : deprecated
| `(tactic|test) => return m!"don't use test!"

where the returned message would be shown whenever test was used in a tactic. (This was inspired by the desire to deprecate "old-style" have and tfae_have syntax; currently only declarations can be deprecated. See #mathlib4 > deprecate syntax?)

But more generally, I was curious as to what it would take to implement foo_rules commands in general, and, for a hypothetical linting_rules, provide extensible categories (like : deprecated above).

Keep in mind this is primarily just an exploration for exploration's sake! (I'm happy to contribute if there's interest and demand, but more on that later—all I'll say here is "performance concerns". [EDIT: actually, those seem to have been fixed!]) And if you're interested, read on. :)

Generic syntax rules

This involves, first, making a generic framework for implementing foo_rules commands.

If you haven't looked at the internals, the way macro_rules and elab_rules work under the hood is by attaching an attribute to a declaration. For example,

elab_rules : tactic
| `(fooStx|foo $id) => sorry

expands to (more or less)

@[tactic fooStx]
def auto.generated.name : Syntax → TacticM Unit :=
  fun
  | `(fooStx|foo $id) => sorry
  | _ => no_error_if_unused% throwUnsupportedSyntax

Note that fooStx is the syntax nodekind. The tactic attribute is keyed by this nodekind, and later, when elaborating tactic syntax with nodekind fooStx, we retrieve all declarations tagged with @[tactic fooStx] (then evaluate each of them on the syntax until we don't get an unsupported syntax exception).

So if we want to implement our own foo_rules, we'll need to provide

an attribute (like tactic) which is keyed by syntax nodekinds (or a name derived from it)
a type (like Syntax → TacticM Unit) that the tagged decl should have (in the form of a name for technical reasons; for example, Syntax → TacticM Unit is actually the abbrev Lean.Elab.Tactic.Tactic in reality)
a way of turning the matchAlts into a term (here, we take alts to `(term|fun $alts:matchAlt*) (after we've added the _ case to alts)).

So the first part of this exploration bundles all that data together into SyntaxRuleData (name pending); provides a syntax category syntaxRulesHeader where you can register e.g. "foo_rules" ":" ident as header syntax for matchAlts; and provides a way to supply rules for turning that header syntax into SyntaxRuleData. (These implementations of headers are recorded behind the scenes using an attribute.)

It also implements a couple nice macros, such that this part of implementing a foo_rules command looks like

-- register `foo_rules : ident` as a header for match alts
syntax (name := fooRulesStx) "foo_rules" ":" ident : syntaxRulesHeader

/- provide the implementation data to use when encountering `foo_rules : $id:ident` as
a header to matchAlts (note that this doesn't include the `alts`) -/
syntax_rules_header
| `(fooRulesStx|foo_rules : $id:ident) => do
  ...
  return (data : SyntaxRuleData)

Under the hood, syntax_rules_header, likewise, attaches a @[syntax_rules_header_impl fooStx] attribute. That's how we keep track of generic implementations.

You never need to declare command syntax directly (after the above, foo_rules : id | alt1 => sorry | alt2 => sorry ... will already parse as a command, and do what you expect).

To implement this, we actually just have $header:syntaxRulesHeader $alts* macro-expand to syntax_rules (header := $header) $alts*, and then implement that by searching for implementations tagged with @[syntax_rules_header_impl _] that apply to $header's specific nodekind. (After that, we mirror the structure of elab_rules/macro_rules.)

Now, what you do with the attribute you've included in SyntaxRuleData is totally up to you. All this infrastructure does is infer the syntax nodekind your alts apply to, insert your data in all the right places to make a decl using those rules, and tag the resulting decls with your specified attribute keyed by that nodekind.

This is the content of the draft PR #11519. (Functional, but many TODOs, and documentation not written yet, as I'm not sure if there's interest from the community.)

Moving on to `linting_rules`

Now it's easy to implement the core of linting_rules! The next step is getting the extensible linting_rules categories like deprecated working. To do that, we demand a bundle of data come with each linting_rules category declaration, namely:

the name of the category (of course)
the return type the user should be providing for each rule (for instance, in the example at the top, we wanted to return MessageData, but a different category than deprecated might ask for a different type)
a way to turn that return type into CommandElabM LintingRuleStep (where LintingRuleStep tells us how to move through syntax: continue or stop descending)
what option turns this linting category on or off (via set_option), so we know whether to lint with it
plus some extra stop and start functions to override behavior, which can be convenient (e.g. stop before entering quotations, or don't even start linting until you're within a certain tactic syntax)

We bundle that data in a structure LintingRulesCat, and keep track of these categories with a very simple attribute @[linting_rules_cat].

Due to some restrictions, we can't actually retrieve a structure that stores Types like that and use it where we need to (namely, in CommandElabM). (See #lean4 > Recover arbitrary type from attribute data .) So in reality, we create decls for the incompatible data, make a different structure LintingRulesCatImpl behind the scenes which holds the names of those decls, and tag that structure with @[linting_rules_cat]. [EDIT: If you look at the PR now, this will have changed, since I’m testing different options for performance.] Luckily someone making a linting_rules category never has to worry about all this, because:

We define a command register_linting_rules_cat <catName> := <catData> for registering categories, so that you actually only need to provide the data: the conversion to LintingRulesCatImpl, the addition of @[linting_rules_cat], and also the registration of a new option (if you didn't decide to use an existing option) is taken care of for you. For example, the entirety of registering deprecated is quite simple:

register_linting_rules_cat deprecated := {
  Out := MessageData
  resolve := fun msg => do
    logWarning <| .tagged ``deprecatedAttr msg
    return .done
  opt := .existing linter.deprecated
}

And it all works out! This lets us lint (and, in this case, deprecate) syntax using match-style rules:

image.png

~~Well...it works out as long as you don't look at performance.~~ linting_rules is the content of the draft PR #11520 (likewise, functional, but many cleanup TODOs, and all the code is currently in one big file), and the bench results show an increase of around 12% in interpretation. (It might be possible to get this down, though—I'm not sure what the limiting factor is, but I benched a simple dummy syntax-traversing linter (#11523), and I don't think it's intrinsic to traversing syntax.) EDIT: it’s now down to “no significant changes!

Just to reiterate, this is all simply an exploration. If people like it, I'd be happy to get it PR-ready, including adding extensive documentation/step-by-step instructions so that anyone can use it and extend it easily and systematically. ~~But the performance issue might admittedly be a dealbreaker. Though, as we develop (and change) more notation, maybe linting it might be worth it—I'm not sure.~~ Plus, even if organized better for a PR, it is a nontrivial amount of code, entailing nontrivial review. The domain of applicability is rather specialized, too—but there's always more stuff to do with syntax, whether it's linting it, transforming it, formatting it in new ways, identifying features for manipulation, etc.

Either way, it was a great way to learn more about attributes and command elaboration, so feel free to ask questions about the implementation if you're interested and/or let me know if you think this would be useful to have somewhere. :)

Thomas Murrills (Mar 20 2024 at 21:14):

Actually, I believe I’ve fixed the performance issues! #11520 is now at “no significant changes” (~3% increase in interpretation time) according to the latest bench. So I’m considering cleaning this up and PRing it formally. Maybe it could even be optimized further. :)

Damiano Testa (Mar 20 2024 at 21:22):

I do not understand the implementation, but I do think that it is important to have mechanisms that make the code guide through the changes that continuously happen!

Thomas Murrills (Mar 20 2024 at 21:25):

It is a nontrivial amount of stuff going on in the implementation, but happy to answer questions! :)

Also a big part of “cleaning this up” before formally PRing will be including detailed explanations of what’s happening in the docstrings/module docs, as I think that’s essential for maintenance and usability. So hopefully that will help make it understandable!

Last updated: May 02 2025 at 03:31 UTC