Zulip Chat Archive

Stream: lean4

Topic: Matching Tokens in Macro Patterns


Mac (May 16 2021 at 21:24):

How do I match optional and alternating tokens in macro patterns?

For example, I want to do something like the following:

syntax  "foo "? ("bar " <|> "baz ") "ahoy" : term
macro_rules
  | `( $[$foo?]? $barz ahoy) => `("ahoy!")

where foo? matches the optional "foo" token and barz matches the "bar " or "baz " alternation.

My current partial solutions is to define syntax kinds for the tokens and use them like so:

syntax fooTk := "foo "
syntax barzTk := "bar " <|> "baz "

syntax (fooTk)? barzTk "ahoy" : term
macro_rules
  | `($[$foo?:fooTk]? $barz:barzTk ahoy) => `("ahoy!")

But this doesn't work with pre-existing syntaxes that were not defined in this manner. That is, where no such token syntax kind exists. For example, the :=/where token alternation in the class/structure syntax (and the class/structure alternation itself).

Sebastian Ullrich (May 16 2021 at 21:28):

You need to use separate patterns

Mac (May 16 2021 at 21:37):

Really? But that can easily cause exponential blow up? That is, each optional token doubles the number of patterns, each token alternative of size n multiplies the number by n. In my current use case I have 2 optional tokens and 1 alternative of size 2. Thus, I would already need 8 of essentially the same pattern to match. And then I have to duplicate code across the matches as well.

Mario Carneiro (May 16 2021 at 22:22):

is there a catch-all token kind so that you can do the secondary matching in a separate match statement?

Mario Carneiro (May 16 2021 at 22:24):

something like this:

syntax  "foo "? ("bar " <|> "baz ") "ahoy" : term
macro_rules
  | `( $[$foo?]? $barz:stx ahoy) =>
    match barz with
    | `(bar) => `("bar!")
    | `(baz) => `("baz!")

Mario Carneiro (May 16 2021 at 22:25):

(in rust, you would use the $e:tt macro matcher for this purpose, where tt means "token tree" and is roughly equivalent to lean's Syntax)

Mac (May 17 2021 at 00:33):

Mario Carneiro said:

is there a catch-all token kind so that you can do the secondary matching in a separate match statement?

In fact, for these examples (and my uses cases) all I need is a syntax kind that catches all keywords/tokens. I don't need to catch complex multi-token syntaxes. Then I could do something like this:

syntax  "foo "? ("bar " <|> "baz ") "ahoy" : term
macro_rules
  | `( $[$foo?:tok]? $barz:tok ahoy) => `("ahoy!")

Mac (May 17 2021 at 00:36):

I think a way to match any Syntax.atom would work for this?

Gabriel Ebner (May 17 2021 at 07:06):

Mario Carneiro said:

something like this:

  | `( $[$foo?]? $barz:stx ahoy) =>

How would this (matcher) work? I thought that the category annotations (like $bar:term) only influence how the syntax in the pattern is parsed, and are completely ignored after that. That is, Syntax objects don't store their category, so there's nothing to match on.

Mario Carneiro (May 17 2021 at 07:11):

The parser has already been specified, in the syntax rule. It just doesn't have an elaborator, and this says that when elaborating this term constructor, just take the syntax without further inspection. A lot of lean built in notations just skip the pattern match altogether and use stx[0], stx[1] etc, but this is a middle ground where foo and ahoy are matched but barz is just a raw syntax

Gabriel Ebner (May 17 2021 at 07:16):

Ah, now I see what you mean. But it's still not clear to me how the parser could ever figure out how to parse foo $barz:stx ahoy.

Gabriel Ebner (May 17 2021 at 07:16):

What's preventing you from doing declare_syntax_cat barz?

Mario Carneiro (May 17 2021 at 07:24):

Gabriel Ebner said:

Ah, now I see what you mean. But it's still not clear to me how the parser could ever figure out how to parse foo $barz:stx ahoy.

The parsing is already done, by syntax "foo "? ("bar " <|> "baz ") "ahoy" : term. That will produce a syntax tree like node [atom "foo", atom "bar", atom "ahoy"] when the parser does its thing. Then macro expansion kicks in, and the `( $[$foo?]? $barz:stx ahoy) pattern will match against this tree and bind atom "bar" to the barz variable (and some (atom "foo") to the foo variable), equivalent to approximately let barz := stx[1]; let foo := if stx[0].isSome then none else some stx[0];.

Mario Carneiro (May 17 2021 at 07:27):

Gabriel Ebner said:

What's preventing you from doing declare_syntax_cat barz?

That was mentioned as a workaround, but it's a bit cumbersome (not to mention inefficient) to declare extensible syntax categories for every sub-parser; that makes the combinators kind of useless, and doesn't match the way lean itself does it.

Gabriel Ebner (May 17 2021 at 07:32):

Mario Carneiro said:

That will produce a syntax tree like node [atom "foo", atom "bar", atom "ahoy"] when the parser does its thing. Then macro expansion kicks in, and the `( $[$foo?]? $barz:stx ahoy) pattern will match against this tree and bind atom "bar" to the barz variable (and some (atom "foo") to the foo variable), equivalent to approximately let barz := stx[1]; let foo := if stx[0].isSome then none else some stx[0];.

Ok, this operation makes sense. The syntax is ambiguous though, I would expect it to match a binary postfix operator ahoy with two arguments $foo and $barz (where barz is in the syntax category stx).

Mario Carneiro (May 17 2021 at 07:34):

ah, I was assuming the example was just trying to have interesting combinators in it

Mario Carneiro (May 17 2021 at 07:34):

I'm not sure how "foo"? is encoded in syntax, but just use whatever that encoding is

Mario Carneiro (May 17 2021 at 07:35):

I believe that the $[$foo]? matcher is saying to match the syntax produced by the optional parser into an Option Syntax

Mario Carneiro (May 17 2021 at 07:36):

and that if then else on stx[0] was my guess at what it does

Mario Carneiro (May 17 2021 at 07:37):

probably the encoding is actually node [stx] vs node []

Gabriel Ebner (May 17 2021 at 07:38):

This is how you construct and destruct your foo bar ahoy term:

  do
    let a 
      do
        let info  MonadRef.mkInfoFromRefPos
        getCurrMacroScope
        getMainModule
        pure
            (Syntax.node (Name.mkStr Name.anonymous "termFooBarBazAhoy")
              (Array.push
                (Array.push
                  (Array.push Array.empty
                    (Syntax.node (Name.mkStr Name.anonymous "null") (Array.push Array.empty (Syntax.atom info "foo"))))
                  (Syntax.atom info "bar"))
                (Syntax.atom info "ahoy")))
    let discr : Syntax := a
    if Syntax.isOfKind discr (Name.mkStr Name.anonymous "termFooBarBazAhoy") = true then
        let discr : Syntax := Syntax.getArg discr 0;
        if Syntax.matchesNull discr 1 = true then pure Unit.unit else
          panicWithPosWithDecl "[anonymous]" "_check" 9 11 "unreachable code has been reached"
      else panicWithPosWithDecl "[anonymous]" "_check" 9 11 "unreachable code has been reached" : MacroM Unit

Gabriel Ebner (May 17 2021 at 07:39):

I'm just pasting the output of:

open Lean in
#check show MacroM Unit from do
  match  `(foo bar ahoy) with
    | `(foo bar ahoy) => ()
    | _ => unreachable!

Gabriel Ebner (May 17 2021 at 07:44):

Though, looking closer at this matcher. I believe I've found a bug:

syntax  "foo "? ("bar " <|> "baz ") "ahoy" : term
macro_rules
  | `( foo bar ahoy) => `("ahoy!")

#check foo baz ahoy -- "ahoy!"

Sebastian Ullrich (May 17 2021 at 07:49):

I should have been more specific: syntax match explicitly ignores tokens. After all we usually don't care whether you're using => or . If you do care about a token, put it in its own syntax abbreviation so it gets its own syntax (and antiquotation) kind. If you cannot change the parser, you'll have to match manually.

Gabriel Ebner (May 17 2021 at 07:53):

This is a surprising difference between ? and <|>!

Sebastian Ullrich (May 17 2021 at 07:55):

Haha, I agree. Not sure if there is a better matching spec, but for semantic transformations ignoring tokens has worked out very well for us so far.

Sebastian Ullrich (May 17 2021 at 07:55):

For implementing syntactic transformations, we might want a match flag that makes it care about tokens, but we haven't gotten to that point yet.

Sebastian Ullrich (May 17 2021 at 20:28):

Perhaps it would have been nicer to always check tokens and define any alternative notation as a macro for the "canonical" one. That would be a nontrivial change at this point though.


Last updated: Dec 20 2023 at 11:08 UTC