Zulip Chat Archive

Stream: lean4

Topic: Preventing Parser Backtracking


Mac (Sep 12 2021 at 21:19):

Is there a way to stop the Lean parser from backtracking after certain in a parse. That is, is there some way to tell the Pratt parser to treat syntax errors after a certain point as just that (syntax errors) rather than trying to parser it with a different (lower priority).

Alternatively, is there a way to tell Lean to use the syntax/errors from the higher priority parser (i.e., the first one tried), rather than the syntax/errors of the lower priority parser (i.e., the last one tried).

Mac (Sep 12 2021 at 21:28):

Here is a #mwe:

declare_syntax_cat foo (behavior := symbol)

macro (priority := low) x:term : foo => x
macro "bar " x:str " baz" : foo => x
macro "foo " x:foo : term => x

#check foo bar "hi" bap
/-
  Errors on `bar` with `unknown identifier 'bar'`
  Instead of on `baz` with `expected 'baz'`
  Also breaks semantic syntax highlighting
-/

Sebastian Ullrich (Sep 12 2021 at 21:56):

Is <|> an option in your case?

Mac (Sep 12 2021 at 22:09):

Sadly, no, the category is very categorical. To be specific, the case in questions is for the augmented do notation of Papyrus.

Mac (Sep 12 2021 at 22:10):

The regular doElem syntax preempts that of the augmented keywords (when syntax errors occur) which makes me think there is a bug when it is just a typo on my part (which I imagine would also flummox end-users). It also makes such typos hard to debug (as the errors are nonsensical).

Mac (Sep 12 2021 at 22:16):

In addition to the uniformative errors, the low priority parser also kicks in when its parse is longer than that of the higher priority parser. This cases a short instruction (e.g., ret i32 0) to get merged with a subsequent assignment (e.g., %foo = load i1, bar) because it ends up parsing them as two terms joined by the % operator.

Mac (Sep 12 2021 at 22:17):

I can likely get around this by just require pure Lean statements to be prefixed with some token (e.g., #), but I was curious if there was a way to avoid such a requirement.

Mario Carneiro (Sep 12 2021 at 23:10):

One approach that is used in the lean grammar is the selective use of notFollowedBy in the fallback parser to make it inapplicable when you use a keyword. So for example your term => doElem' parser would instead be syntax notFollowedBy("ret") term : doElem', where you extend the list to include all keywords you want to avoid. It's not perfect since you are still using an exhaustive list, but perhaps you can at least list your keywords.

Sebastian Ullrich (Sep 13 2021 at 15:20):

I don't think there is. But you are essentially writing a quasiquotation syntax, so I would definitely expect special tokens for introducing antiquotations there.

Mac (Sep 13 2021 at 19:15):

@Sebastian Ullrich Fair enough. I was actually wondering there was a way I could use the antiquotation syntax itself (i.e., $(x)), but I don't think there is a way to use that without also making things like $(x):ident valid, which makes no sense in this context.


Last updated: Dec 20 2023 at 11:08 UTC