Zulip Chat Archive
Stream: lean4
Topic: Lean fails to parse simple language
Jason Ganz (Nov 25 2023 at 13:08):
I'm defining a DSL, but Lean is having trouble parsing it. I've reduced the problem to a minimal test case:
declare_syntax_cat thing
syntax "thing% " thing* : term
syntax "a" thing "b" thing : thing
syntax thing "b" thing : thing
syntax ident : thing
example := thing% a x b x x
I get an expected 'b' error on the third x. But there is a correct parse as (a x b x) x. What's going on? Can this be fixed, to have it correctly parse my context-free language?
James Gallicchio (Nov 25 2023 at 13:24):
I think this language is still ambiguous if you have something like a x b x b x, right? If I make it unambiguous by making x b x too low a precedence to occur in the left of a _ b _, Lean parses fine:
declare_syntax_cat thing
syntax "thing% " thing : term
syntax:100 "a" thing:100 "b" thing : thing
syntax:90 thing "b" thing : thing
syntax:100 ident : thing
James Gallicchio (Nov 25 2023 at 13:31):
although I'm not sure how to get the left recursive variant of this working...
declare_syntax_cat thing
syntax "thing% " thing : term
syntax:100 "a" thing:90 "b" thing:100 : thing
syntax:90 thing:100 "b" thing:90 : thing
syntax:100 ident : thing
Jason Ganz (Nov 25 2023 at 13:33):
I see, the language is in fact ambiguous, but I didn't realize this would prevent Lean from parsing even unambiguous expressions. Thanks for the info!
James Gallicchio (Nov 25 2023 at 13:39):
I'm not sure how ambiguity has to do with what grammars Lean can/can't parse. You can definitely parse ambiguous grammars, and unambiguous grammars can definitely not parse the way you want (see above). So someone with more knowledge than me should chime in.
I think the _ b _ rule needs some extra combinators or something. I'm not sure how left recursion is handled.
Mac Malone (Nov 28 2023 at 00:36):
@Jason Ganz For your original example, the way Lean will end up parsing a x b x x as a thing is as follows:
- As
thingis a syntax category, it will select the longest matching parse of its alternatives (if equal, chooses the successful parse with the highest priority). Lean parses category syntax via a single leading parser and many optional trailing parsers. Trailing parser are those whose LHS is recursion (e.g.,thing "b" thing). This approach is Pratt parsing. - As the expression starts with the symbol
a, the only possible leading parser is"a" thing "b" thing. - It will then recurse on the first thing in
"a" **thing** "b" thing. - Since the next token is
x, the only possible leading parser isident. - Finished with the level 2 leading
thing, Lean will then attempt to match zero or more trailing"b" thing(i.e., parse manything "b" thing) . - It matches the
band recurses on the followingthing. - Once again, since the next token is
x, the only possible leading parser isident. - As there are no longer any matching trailing parsers (i.e., more trailing
"b" thing), Lean will stop and work back up the parse call stack. - Prior to this at failure this point, Lean has parsed
a x b xand the parse call stack is currently:"a" **thing** "b" thing->thing "b" **thing**(LHS parsed as theident,x) ->ident. - It thus works back up to
"a" **thing** "b" thingwith the finishedthingbeingx b x. - Now, the next required token is
b, but the next token in the expression isx, hence the error.
Mac Malone (Nov 28 2023 at 00:46):
More visually:
a x b x x
----------------------------------
"a" thing ~"b" thing~
|
ident ["b" thing] [~"b" thing~]
|
ident [~"b" thing~]
[..] is the RHS of the attempted trailing parser, and ~...~ is a failed match.
Last updated: May 02 2025 at 03:31 UTC