Zulip Chat Archive
Stream: lean4
Topic: Parser bug?
Thomas Murrills (Nov 10 2022 at 00:17):
I stumbled on what seems like a strange bug. The following works as expected:
import Lean
namespace Lean.Elab.Term.TestingParser
def aa := leading_parser "a"
def bb := leading_parser "b"
@[term_parser] def ab :=
leading_parser (aa >> Parser.optional Parser.ident) <|> (bb >> Parser.optional Parser.ident)
def parsedIndicator : TSyntax ``ab → Syntax
| `(ab|a$[$x:ident]?) => Syntax.mkNumLit "0"
| `(ab|b$[$x:ident]?) => Syntax.mkNumLit "1"
| _ => Syntax.mkNumLit "2"
@[term_elab ab] def elabab : TermElab := fun stx _ => do
match stx with
| `($x:ab) => elabTerm (parsedIndicator x) (some (← mkConst ``Nat))
| _ => unreachable!
#eval b xy -- 1
but if you disallow optional idents after the bb
parser, it misses the b
case and hits the _
case, as demonstrated by the 2
:
import Lean
namespace Lean.Elab.Term.TestingParser'
def aa := leading_parser "a"
def bb := leading_parser "b"
@[term_parser] def ab :=
leading_parser (aa >> Parser.optional Parser.ident) <|> bb
def parsedIndicator : TSyntax ``ab → Syntax
| `(ab|a$[$x:ident]?) => Syntax.mkNumLit "0"
| `(ab|b) => Syntax.mkNumLit "1"
| _ => Syntax.mkNumLit "2"
@[term_elab ab] def elabab : TermElab := fun stx _ => do
match stx with
| `($x:ab) => elabTerm (parsedIndicator x) (some (← mkConst ``Nat))
| _ => unreachable!
#eval b -- 2
Is this a bug, or just a gotcha? (I made sure I was on the latest nightly before posting.)
Mario Carneiro (Nov 10 2022 at 01:54):
This does appear to be a bug. cc: @Sebastian Ullrich Minimized:
import Lean
open Lean
def aa := leading_parser "a"
def bb := leading_parser "b"
def ab := leading_parser (aa >> aa) <|> bb
def f : TSyntax ``ab → Nat
| `(ab| a $_) => 0
| `(ab| b) => 1
| _ => 2
#print f
def f : TSyntax `ab → Nat :=
fun x =>
let_fun __discr := x;
if Syntax.isOfKind __discr.raw `ab = true then
let_fun __discr_1 := Syntax.getArg __discr.raw 0;
if Syntax.isOfKind __discr_1 `aa = true then
let_fun __discr_2 := Syntax.getArg __discr_1 0;
let_fun __discr := Syntax.getArg __discr.raw 1;
0
else
let_fun __discr_2 := Syntax.getArg __discr.raw 0;
let_fun __discr := Syntax.getArg __discr.raw 1;
2
else
let_fun __discr := x;
if Syntax.isOfKind __discr.raw `ab = true then
let_fun __discr_1 := Syntax.getArg __discr.raw 0;
if Syntax.isOfKind __discr_1 `bb = true then
let_fun __discr := Syntax.getArg __discr_1 0;
1
else
let_fun __discr := Syntax.getArg __discr.raw 0;
2
else
let_fun __discr := x;
2
The else if Syntax.isOfKind __discr.raw `ab = true
case is unreachable, because we already checked that Syntax.isOfKind __discr.raw `ab = true
is false, and the 1
return value only appears in that branch. In other similar syntax matches, this condition is deduplicated and the 1
would end up in the first if
case. I suspect this has to do with the fact that this is using a parser of arity 2 inside a <|>
, which is discouraged.
Sebastian Ullrich (Nov 10 2022 at 08:29):
Mario Carneiro said:
I suspect this has to do with the fact that this is using a parser of arity 2 inside a
<|>
, which is discouraged.
Yes. Don't do that.
Thomas Murrills (Nov 10 2022 at 19:45):
Is the proper workaround making a leading_parser
for every time you’d want to put a parser of arity >1 inside a <|>
? I noticed that worked, but wasn’t sure if this was the best option or if there was something better than <|>
to use in this situation.
Mario Carneiro (Nov 10 2022 at 20:08):
you can use group
Mario Carneiro (Nov 10 2022 at 20:10):
import Lean
open Lean
def aa := leading_parser "a"
def bb := leading_parser "b"
def ab := leading_parser Parser.group (aa >> aa) <|> bb
def f : TSyntax ``ab → Nat
| `(ab| a $_) => 0
| `(ab| b) => 1
| _ => 2
#print f
works as intended
Sebastian Ullrich (Nov 10 2022 at 20:11):
Though for all of this you should use syntax
instead, which automatically solves this issue
Last updated: Dec 20 2023 at 11:08 UTC