Zulip Chat Archive
Stream: lean4
Topic: Matching Tokens in Macro Patterns
Mac (May 16 2021 at 21:24):
How do I match optional and alternating tokens in macro patterns?
For example, I want to do something like the following:
syntax "foo "? ("bar " <|> "baz ") "ahoy" : term
macro_rules
| `( $[$foo?]? $barz ahoy) => `("ahoy!")
where foo?
matches the optional "foo"
token and barz
matches the "bar "
or "baz "
alternation.
My current partial solutions is to define syntax kinds for the tokens and use them like so:
syntax fooTk := "foo "
syntax barzTk := "bar " <|> "baz "
syntax (fooTk)? barzTk "ahoy" : term
macro_rules
| `($[$foo?:fooTk]? $barz:barzTk ahoy) => `("ahoy!")
But this doesn't work with pre-existing syntaxes that were not defined in this manner. That is, where no such token syntax kind exists. For example, the :=
/where
token alternation in the class
/structure
syntax (and the class
/structure
alternation itself).
Sebastian Ullrich (May 16 2021 at 21:28):
You need to use separate patterns
Mac (May 16 2021 at 21:37):
Really? But that can easily cause exponential blow up? That is, each optional token doubles the number of patterns, each token alternative of size n multiplies the number by n. In my current use case I have 2 optional tokens and 1 alternative of size 2. Thus, I would already need 8 of essentially the same pattern to match. And then I have to duplicate code across the matches as well.
Mario Carneiro (May 16 2021 at 22:22):
is there a catch-all token kind so that you can do the secondary matching in a separate match
statement?
Mario Carneiro (May 16 2021 at 22:24):
something like this:
syntax "foo "? ("bar " <|> "baz ") "ahoy" : term
macro_rules
| `( $[$foo?]? $barz:stx ahoy) =>
match barz with
| `(bar) => `("bar!")
| `(baz) => `("baz!")
Mario Carneiro (May 16 2021 at 22:25):
(in rust, you would use the $e:tt
macro matcher for this purpose, where tt means "token tree" and is roughly equivalent to lean's Syntax
)
Mac (May 17 2021 at 00:33):
Mario Carneiro said:
is there a catch-all token kind so that you can do the secondary matching in a separate
match
statement?
In fact, for these examples (and my uses cases) all I need is a syntax kind that catches all keywords/tokens. I don't need to catch complex multi-token syntaxes. Then I could do something like this:
syntax "foo "? ("bar " <|> "baz ") "ahoy" : term
macro_rules
| `( $[$foo?:tok]? $barz:tok ahoy) => `("ahoy!")
Mac (May 17 2021 at 00:36):
I think a way to match any Syntax.atom
would work for this?
Gabriel Ebner (May 17 2021 at 07:06):
Mario Carneiro said:
something like this:
| `( $[$foo?]? $barz:stx ahoy) =>
How would this (matcher) work? I thought that the category annotations (like $bar:term
) only influence how the syntax in the pattern is parsed, and are completely ignored after that. That is, Syntax
objects don't store their category, so there's nothing to match on.
Mario Carneiro (May 17 2021 at 07:11):
The parser has already been specified, in the syntax
rule. It just doesn't have an elaborator, and this says that when elaborating this term constructor, just take the syntax without further inspection. A lot of lean built in notations just skip the pattern match altogether and use stx[0]
, stx[1]
etc, but this is a middle ground where foo
and ahoy
are matched but barz
is just a raw syntax
Gabriel Ebner (May 17 2021 at 07:16):
Ah, now I see what you mean. But it's still not clear to me how the parser could ever figure out how to parse foo $barz:stx ahoy
.
Gabriel Ebner (May 17 2021 at 07:16):
What's preventing you from doing declare_syntax_cat barz
?
Mario Carneiro (May 17 2021 at 07:24):
Gabriel Ebner said:
Ah, now I see what you mean. But it's still not clear to me how the parser could ever figure out how to parse
foo $barz:stx ahoy
.
The parsing is already done, by syntax "foo "? ("bar " <|> "baz ") "ahoy" : term
. That will produce a syntax tree like node [atom "foo", atom "bar", atom "ahoy"]
when the parser does its thing. Then macro expansion kicks in, and the `( $[$foo?]? $barz:stx ahoy)
pattern will match against this tree and bind atom "bar"
to the barz
variable (and some (atom "foo")
to the foo
variable), equivalent to approximately let barz := stx[1]; let foo := if stx[0].isSome then none else some stx[0];
.
Mario Carneiro (May 17 2021 at 07:27):
Gabriel Ebner said:
What's preventing you from doing
declare_syntax_cat barz
?
That was mentioned as a workaround, but it's a bit cumbersome (not to mention inefficient) to declare extensible syntax categories for every sub-parser; that makes the combinators kind of useless, and doesn't match the way lean itself does it.
Gabriel Ebner (May 17 2021 at 07:32):
Mario Carneiro said:
That will produce a syntax tree like
node [atom "foo", atom "bar", atom "ahoy"]
when the parser does its thing. Then macro expansion kicks in, and the`( $[$foo?]? $barz:stx ahoy)
pattern will match against this tree and bindatom "bar"
to thebarz
variable (andsome (atom "foo")
to thefoo
variable), equivalent to approximatelylet barz := stx[1]; let foo := if stx[0].isSome then none else some stx[0];
.
Ok, this operation makes sense. The syntax is ambiguous though, I would expect it to match a binary postfix operator ahoy
with two arguments $foo and $barz (where barz is in the syntax category stx).
Mario Carneiro (May 17 2021 at 07:34):
ah, I was assuming the example was just trying to have interesting combinators in it
Mario Carneiro (May 17 2021 at 07:34):
I'm not sure how "foo"?
is encoded in syntax, but just use whatever that encoding is
Mario Carneiro (May 17 2021 at 07:35):
I believe that the $[$foo]?
matcher is saying to match the syntax produced by the optional parser into an Option Syntax
Mario Carneiro (May 17 2021 at 07:36):
and that if then else on stx[0]
was my guess at what it does
Mario Carneiro (May 17 2021 at 07:37):
probably the encoding is actually node [stx]
vs node []
Gabriel Ebner (May 17 2021 at 07:38):
This is how you construct and destruct your foo bar ahoy term:
do
let a ←
do
let info ← MonadRef.mkInfoFromRefPos
getCurrMacroScope
getMainModule
pure
(Syntax.node (Name.mkStr Name.anonymous "termFooBarBazAhoy")
(Array.push
(Array.push
(Array.push Array.empty
(Syntax.node (Name.mkStr Name.anonymous "null") (Array.push Array.empty (Syntax.atom info "foo"))))
(Syntax.atom info "bar"))
(Syntax.atom info "ahoy")))
let discr : Syntax := a
if Syntax.isOfKind discr (Name.mkStr Name.anonymous "termFooBarBazAhoy") = true then
let discr : Syntax := Syntax.getArg discr 0;
if Syntax.matchesNull discr 1 = true then pure Unit.unit else
panicWithPosWithDecl "[anonymous]" "_check" 9 11 "unreachable code has been reached"
else panicWithPosWithDecl "[anonymous]" "_check" 9 11 "unreachable code has been reached" : MacroM Unit
Gabriel Ebner (May 17 2021 at 07:39):
I'm just pasting the output of:
open Lean in
#check show MacroM Unit from do
match ← `(foo bar ahoy) with
| `(foo bar ahoy) => ()
| _ => unreachable!
Gabriel Ebner (May 17 2021 at 07:44):
Though, looking closer at this matcher. I believe I've found a bug:
syntax "foo "? ("bar " <|> "baz ") "ahoy" : term
macro_rules
| `( foo bar ahoy) => `("ahoy!")
#check foo baz ahoy -- "ahoy!"
Sebastian Ullrich (May 17 2021 at 07:49):
I should have been more specific: syntax match
explicitly ignores tokens. After all we usually don't care whether you're using =>
or ⇒
. If you do care about a token, put it in its own syntax abbreviation so it gets its own syntax (and antiquotation) kind. If you cannot change the parser, you'll have to match manually.
Gabriel Ebner (May 17 2021 at 07:53):
This is a surprising difference between ?
and <|>
!
Sebastian Ullrich (May 17 2021 at 07:55):
Haha, I agree. Not sure if there is a better matching spec, but for semantic transformations ignoring tokens has worked out very well for us so far.
Sebastian Ullrich (May 17 2021 at 07:55):
For implementing syntactic transformations, we might want a match
flag that makes it care about tokens, but we haven't gotten to that point yet.
Sebastian Ullrich (May 17 2021 at 20:28):
Perhaps it would have been nicer to always check tokens and define any alternative notation as a macro for the "canonical" one. That would be a nontrivial change at this point though.
Last updated: Dec 20 2023 at 11:08 UTC