Zulip Chat Archive

Stream: metaprogramming / tactics

Topic: parsing specific ident


Xavier Martinet (Mar 11 2022 at 15:41):

Hi,

I would like to parse not a tk but a specific ident, something whose syntax would be:

meta def lean.parser.specific_ident (nm : name) : lean.parser.name

I tried with things like this but it didn't work :(

meta def lean.parser.specific_ident (nm : name) : lean.parser name := do
  ident  lean.parser.ident,
  guardb (ident = nm),
  pure nm

#eval tactic.trace $ lean.parser.run_with_input (lean.parser.specific_ident `aa <|> pure `zz) "aa"
-- aa
#eval tactic.trace $ lean.parser.run_with_input (lean.parser.specific_ident `aa <|> pure `zz) "bb"
-- failed

-- but:
#eval tactic.trace $ lean.parser.run_with_input (failure <|> pure `zz) "bb"
-- zz
-- and even with guarb:
#eval tactic.trace $ lean.parser.run_with_input (guardb ff *> pure `aa <|> pure `zz) "bb"
-- zz

what is wrong? how come the failure from guardb inside lean.parser.specific_ident is not similar to the guardb when alone?

Arthur Paulino (Mar 13 2022 at 18:50):

Hi @Xavier Martinet! Sorry for the ultra late response. I am on my way back home today but I think I can try to help you tomorrow if nobody else does it first

Arthur Paulino (Mar 13 2022 at 18:56):

In the meantime, I see that you have two unanswered questions (two recent threads). Do you still need help on both of them? Have you made any progress since then?

Xavier Martinet (Mar 14 2022 at 10:37):

Arthur Paulino said:

In the meantime, I see that you have two unanswered questions (two recent threads). Do you still need help on both of them? Have you made any progress since then?

Thanks a million times! Actually the second one was connected to the first one, so if you help me on this one I should be fine :)

Jannis Limperg (Mar 14 2022 at 11:13):

I don't know about the Lean 3 parser, but some other parser combinator libraries do backtracking only for 'atomic' failures. That is, the parser p <|> q works like this:

  • If p succeeds, return its result.
  • If p fails without consuming input, try q.
  • If p fails and has already consumed input, fail without trying q.

Afaict that would explain the observed behaviour. If this is the issue, there should be a combinator which takes a parser p and acts like p, only that non-atomic failure is treated as atomic failure.

Yakov Pechersky (Mar 14 2022 at 12:06):

In regular parsing, it's called "try". I don't remember the lean.parser name but there is try_core iirc

Xavier Martinet (Mar 14 2022 at 16:00):

thx. Actually lean.parser.ident does consume input, but is still compatible with <|> :(

Xavier Martinet (Mar 14 2022 at 16:03):

Yakov Pechersky said:

In regular parsing, it's called "try". I don't remember the lean.parser name but there is try_core iirc

try_core is a tactic monad, not a lean.parser one. However it seems that its code should be reusable for a lean.parser - I will try and let you know

Yakov Pechersky (Mar 14 2022 at 17:29):

What are the underlying differences between lean.parser and just parser?

Xavier Martinet (Mar 14 2022 at 17:52):

it seems that parser is not used much in metaprogramming... lean.parser is an instance of the interaction_monad, that keeps track of the pos, from what I can see

Xavier Martinet (Mar 14 2022 at 18:07):

Ok, it seems that the issue comes from the parser_state.cur_pos.

In <|> aka lean.parser.parser_orelse, there is a check if pos₁ ≠ pos₂ then interaction_monad.result.exception e₁ ref₁ s'. If we got rid of it (see below), it works fine:

meta def lean.parser.parser_orelse' {α : Type}(p₁ p₂ : lean.parser α) : lean.parser α :=
λ s,
let pos₁ := parser_state.cur_pos s in
result.cases_on (p₁ s)
  interaction_monad.result.success
  -- (λ e₁ ref₁ s',
  --   let pos₂ := parser_state.cur_pos s' in
  --   if pos₁ ≠ pos₂ then
  --     interaction_monad.result.exception e₁ ref₁ s'
  --   else result.cases_on (p₂ s)
  --    interaction_monad.result.success
  --    interaction_monad.result.exception)
  (λ e₁ ref₁ s',
    let pos₂ := parser_state.cur_pos s' in
    result.cases_on (p₂ s)
     interaction_monad.result.success
     interaction_monad.result.exception)

meta def lean.parser.specific_ident (nm : name) : lean.parser name := do
  ident  lean.parser.ident,
  guardb (ident = nm),
  pure nm

#eval tactic.trace $ lean.parser.run_with_input (lean.parser.parser_orelse' (lean.parser.specific_ident `aa) (pure `zz)) "bb cc"
-- zz
#eval tactic.trace $ lean.parser.run_with_input (lean.parser.parser_orelse (lean.parser.specific_ident `aa) (pure `zz)) "bb cc"
-- failed

So I wonder: how to set lean_parser.cur_pos ?

Xavier Martinet (Mar 14 2022 at 18:10):

I think this lean_parser.cur_pos issue stems from the fact that lean.parser.ident does consume the string, as @Jannis Limperg mentionned. There should be a way not to consume the string, with a try_core as @Yakov Pechersky suggested. I will try it.

Jannis Limperg (Mar 14 2022 at 18:22):

I just looked through core and mathlib and I can't find this combinator. In fact, I can't even find the primitives you'd need to define this combinator. So this might actually be impossible (short of patching Lean). Maybe tk works for you?

Xavier Martinet (Mar 14 2022 at 18:28):

nope: tk takes a "registered" token, such as "at", not any string :(

Jannis Limperg (Mar 14 2022 at 18:51):

Okay, then I think your only option is to use your variant of parser_orelse. It's not pretty but it probably works.

Xavier Martinet (Mar 14 2022 at 19:00):

Even weirder: I don't understand AT ALL what is going on there.

In case of an exception, I store the previous parser_state, but for some reasons (parser_state not being pure?), it is not frozen:

meta def lean.parser.specific_ident (nm : name) : lean.parser name := λ s,
  let pos := parser_state.cur_pos s in
  result.cases_on (lean.parser.ident s)
    (λ nm' s',
      if nm' = nm then
        interaction_monad.result.success nm' s'
      else
        let new_pos := parser_state.cur_pos s in
        interaction_monad.result.exception
          (option.some $ λ _, format!"init: ({pos.line}, {pos.column}) - end: ({new_pos.line}, {new_pos.column})")
          pos
          s -- we save the previous state s as in `tactic.try_core` in the hope it gets frozen
    )
    (interaction_monad.result.exception)


meta def lean.parser.parser_orelse' {α : Type}(p₁ p₂ : lean.parser α) : lean.parser α :=
λ s,
let pos₁ := parser_state.cur_pos s in
result.cases_on (p₁ s)
  interaction_monad.result.success
  (λ e₁ ref₁ s',
    let pos₂ := parser_state.cur_pos s' in
    if pos₁  pos₂ then
      let e₁' := match e₁ with
      | option.none := "none"
      | option.some f := to_string $ f ()
      end in
      interaction_monad.result.exception (option.some $ λ _, format!"got exc: \"{e₁'}\" - init ({pos₁.line}, {pos₁.column}) ≠ end ({pos₂.line}, {pos₂.column}) (should be {ref₁})") ref₁ s'
    else result.cases_on (p₂ s)
     interaction_monad.result.success
     interaction_monad.result.exception)

#eval tactic.trace $ lean.parser.run_with_input (lean.parser.parser_orelse' (lean.parser.specific_ident `aa) (pure `zz)) "aa cc"
-- aa
#eval tactic.trace $ lean.parser.run_with_input (lean.parser.parser_orelse' (lean.parser.specific_ident `aa) (pure `zz)) "bb cc"
-- got exc: "init: (1, 0) - end: (1, 0)" - init (1, 0) ≠ end (1, 3) (should be (some ⟨1, 0⟩))
-- `s` is not frozen at all!
#eval tactic.trace $ lean.parser.run_with_input (lean.parser.parser_orelse (lean.parser.specific_ident `aa) (pure `zz)) "bb cc"
-- init: (1, 0) - end: (1, 0)

I mean, what the hell is going on there? is that because internally, the parser_state is some C++ pointer whose pointed value gets mutated, no matter what? If so, then how does lean.parser.with_input works ?

Gabriel Ebner (Mar 14 2022 at 19:17):

is that because internally, the parser_state is some C++ pointer whose pointed value gets mutated, no matter what?

Yes, that's indeed what's happening.

Xavier Martinet (Mar 14 2022 at 19:20):

Gabriel Ebner said:

is that because internally, the parser_state is some C++ pointer whose pointed value gets mutated, no matter what?

Yes, that's indeed what's happening.

is there a workaround? either to "freeze" the parser_state, or to set it to a specific value?

Gabriel Ebner (Mar 14 2022 at 19:21):

No obvious workaround. What do you actually want to parse?

Gabriel Ebner (Mar 14 2022 at 19:23):

I can think of one horrible hack: if you know the original string, then maybe you could cur_pos and with_input to temporarily switch the input to a substring of the original string starting with the current position.

Xavier Martinet (Mar 14 2022 at 19:23):

I would like to parse itactic such as "try {norm_num [] }", to have them simplified and written in a more "canonical" form

e.g., if try succeeds, it would be just "norm_num"

Xavier Martinet (Mar 14 2022 at 19:25):

Gabriel Ebner said:

I can think of one horrible hack: if you know the original string, then maybe you could cur_pos and with_input to temporarily switch the input to a substring of the original string starting with the current position.

I have thought about that, but it felt really ugly lol

maybe there is just no other way :( or get rid of the parser altogether and work with heuristic ("if the string starts with a try, then parse after char 4")

Gabriel Ebner (Mar 14 2022 at 19:25):

Another option is to factor out the first ident parser (which is I believe the only one where you need to check the name). Something like do i ← ident, dispatch_on_ident i.

Gabriel Ebner (Mar 14 2022 at 19:27):

On the C++ side there's the get_token_info function on the parser class which returns the info for the next token. Not sure if it's worth the work but it's possible to expose that on the Lean side.

Xavier Martinet (Mar 14 2022 at 19:27):

yup

still, it is sad that in Lean 3 there are things like that that are not inspectable within Lean itself :( hopefully Lean 4 is coming!

Mario Carneiro (Mar 14 2022 at 20:53):

Lean 4 is already "here" enough to play with things like this, and I would expect that you can do this kind of normalization since it isn't too different from what mathport is already doing


Last updated: Dec 20 2023 at 11:08 UTC