Zulip Chat Archive

Stream: lean4

Topic: Parser bug?


Thomas Murrills (Nov 10 2022 at 00:17):

I stumbled on what seems like a strange bug. The following works as expected:

import Lean
namespace Lean.Elab.Term.TestingParser

def aa := leading_parser "a"
def bb := leading_parser "b"
@[term_parser] def ab :=
  leading_parser (aa >> Parser.optional Parser.ident) <|> (bb >> Parser.optional Parser.ident)

def parsedIndicator : TSyntax ``ab  Syntax
| `(ab|a$[$x:ident]?) => Syntax.mkNumLit "0"
| `(ab|b$[$x:ident]?) => Syntax.mkNumLit "1"
| _ => Syntax.mkNumLit "2"

@[term_elab ab] def elabab : TermElab := fun stx _ => do
  match stx with
  | `($x:ab) => elabTerm (parsedIndicator x) (some ( mkConst ``Nat))
  | _ => unreachable!

#eval b xy    -- 1

but if you disallow optional idents after the bb parser, it misses the b case and hits the _ case, as demonstrated by the 2:

import Lean
namespace Lean.Elab.Term.TestingParser'

def aa := leading_parser "a"
def bb := leading_parser "b"
@[term_parser] def ab :=
  leading_parser (aa >> Parser.optional Parser.ident) <|> bb

def parsedIndicator : TSyntax ``ab  Syntax
| `(ab|a$[$x:ident]?) => Syntax.mkNumLit "0"
| `(ab|b) => Syntax.mkNumLit "1"
| _ => Syntax.mkNumLit "2"

@[term_elab ab] def elabab : TermElab := fun stx _ => do
  match stx with
  | `($x:ab) => elabTerm (parsedIndicator x) (some ( mkConst ``Nat))
  | _ => unreachable!

#eval b    -- 2

Is this a bug, or just a gotcha? (I made sure I was on the latest nightly before posting.)

Mario Carneiro (Nov 10 2022 at 01:54):

This does appear to be a bug. cc: @Sebastian Ullrich Minimized:

import Lean
open Lean

def aa := leading_parser "a"
def bb := leading_parser "b"
def ab := leading_parser (aa >> aa) <|> bb

def f : TSyntax ``ab  Nat
  | `(ab| a $_) => 0
  | `(ab| b) => 1
  | _ => 2
#print f
def f : TSyntax `ab  Nat :=
fun x =>
  let_fun __discr := x;
  if Syntax.isOfKind __discr.raw `ab = true then
    let_fun __discr_1 := Syntax.getArg __discr.raw 0;
    if Syntax.isOfKind __discr_1 `aa = true then
      let_fun __discr_2 := Syntax.getArg __discr_1 0;
      let_fun __discr := Syntax.getArg __discr.raw 1;
      0
    else
      let_fun __discr_2 := Syntax.getArg __discr.raw 0;
      let_fun __discr := Syntax.getArg __discr.raw 1;
      2
  else
    let_fun __discr := x;
    if Syntax.isOfKind __discr.raw `ab = true then
      let_fun __discr_1 := Syntax.getArg __discr.raw 0;
      if Syntax.isOfKind __discr_1 `bb = true then
        let_fun __discr := Syntax.getArg __discr_1 0;
        1
      else
        let_fun __discr := Syntax.getArg __discr.raw 0;
        2
    else
      let_fun __discr := x;
      2

The else if Syntax.isOfKind __discr.raw `ab = true case is unreachable, because we already checked that Syntax.isOfKind __discr.raw `ab = true is false, and the 1 return value only appears in that branch. In other similar syntax matches, this condition is deduplicated and the 1 would end up in the first if case. I suspect this has to do with the fact that this is using a parser of arity 2 inside a <|>, which is discouraged.

Sebastian Ullrich (Nov 10 2022 at 08:29):

Mario Carneiro said:

I suspect this has to do with the fact that this is using a parser of arity 2 inside a <|>, which is discouraged.

Yes. Don't do that.

Thomas Murrills (Nov 10 2022 at 19:45):

Is the proper workaround making a leading_parser for every time you’d want to put a parser of arity >1 inside a <|>? I noticed that worked, but wasn’t sure if this was the best option or if there was something better than <|> to use in this situation.

Mario Carneiro (Nov 10 2022 at 20:08):

you can use group

Mario Carneiro (Nov 10 2022 at 20:10):

import Lean
open Lean

def aa := leading_parser "a"
def bb := leading_parser "b"
def ab := leading_parser Parser.group (aa >> aa) <|> bb

def f : TSyntax ``ab  Nat
  | `(ab| a $_) => 0
  | `(ab| b) => 1
  | _ => 2
#print f

works as intended

Sebastian Ullrich (Nov 10 2022 at 20:11):

Though for all of this you should use syntax instead, which automatically solves this issue


Last updated: Dec 20 2023 at 11:08 UTC