Zulip Chat Archive
Stream: metaprogramming / tactics
Topic: parsing specific ident
Xavier Martinet (Mar 11 2022 at 15:41):
Hi,
I would like to parse not a tk
but a specific ident
, something whose syntax would be:
meta def lean.parser.specific_ident (nm : name) : lean.parser.name
I tried with things like this but it didn't work :(
meta def lean.parser.specific_ident (nm : name) : lean.parser name := do
ident ← lean.parser.ident,
guardb (ident = nm),
pure nm
#eval tactic.trace $ lean.parser.run_with_input (lean.parser.specific_ident `aa <|> pure `zz) "aa"
-- aa
#eval tactic.trace $ lean.parser.run_with_input (lean.parser.specific_ident `aa <|> pure `zz) "bb"
-- failed
-- but:
#eval tactic.trace $ lean.parser.run_with_input (failure <|> pure `zz) "bb"
-- zz
-- and even with guarb:
#eval tactic.trace $ lean.parser.run_with_input (guardb ff *> pure `aa <|> pure `zz) "bb"
-- zz
what is wrong? how come the failure
from guardb
inside lean.parser.specific_ident
is not similar to the guardb
when alone?
Arthur Paulino (Mar 13 2022 at 18:50):
Hi @Xavier Martinet! Sorry for the ultra late response. I am on my way back home today but I think I can try to help you tomorrow if nobody else does it first
Arthur Paulino (Mar 13 2022 at 18:56):
In the meantime, I see that you have two unanswered questions (two recent threads). Do you still need help on both of them? Have you made any progress since then?
Xavier Martinet (Mar 14 2022 at 10:37):
Arthur Paulino said:
In the meantime, I see that you have two unanswered questions (two recent threads). Do you still need help on both of them? Have you made any progress since then?
Thanks a million times! Actually the second one was connected to the first one, so if you help me on this one I should be fine :)
Jannis Limperg (Mar 14 2022 at 11:13):
I don't know about the Lean 3 parser, but some other parser combinator libraries do backtracking only for 'atomic' failures. That is, the parser p <|> q
works like this:
- If
p
succeeds, return its result. - If
p
fails without consuming input, tryq
. - If
p
fails and has already consumed input, fail without tryingq
.
Afaict that would explain the observed behaviour. If this is the issue, there should be a combinator which takes a parser p
and acts like p
, only that non-atomic failure is treated as atomic failure.
Yakov Pechersky (Mar 14 2022 at 12:06):
In regular parsing, it's called "try". I don't remember the lean.parser name but there is try_core iirc
Xavier Martinet (Mar 14 2022 at 16:00):
thx. Actually lean.parser.ident
does consume input, but is still compatible with <|>
:(
Xavier Martinet (Mar 14 2022 at 16:03):
Yakov Pechersky said:
In regular parsing, it's called "try". I don't remember the lean.parser name but there is try_core iirc
try_core
is a tactic
monad, not a lean.parser
one. However it seems that its code should be reusable for a lean.parser
- I will try and let you know
Yakov Pechersky (Mar 14 2022 at 17:29):
What are the underlying differences between lean.parser
and just parser
?
Xavier Martinet (Mar 14 2022 at 17:52):
it seems that parser
is not used much in metaprogramming... lean.parser
is an instance of the interaction_monad
, that keeps track of the pos
, from what I can see
Xavier Martinet (Mar 14 2022 at 18:07):
Ok, it seems that the issue comes from the parser_state.cur_pos
.
In <|>
aka lean.parser.parser_orelse
, there is a check if pos₁ ≠ pos₂ then interaction_monad.result.exception e₁ ref₁ s'
. If we got rid of it (see below), it works fine:
meta def lean.parser.parser_orelse' {α : Type}(p₁ p₂ : lean.parser α) : lean.parser α :=
λ s,
let pos₁ := parser_state.cur_pos s in
result.cases_on (p₁ s)
interaction_monad.result.success
-- (λ e₁ ref₁ s',
-- let pos₂ := parser_state.cur_pos s' in
-- if pos₁ ≠ pos₂ then
-- interaction_monad.result.exception e₁ ref₁ s'
-- else result.cases_on (p₂ s)
-- interaction_monad.result.success
-- interaction_monad.result.exception)
(λ e₁ ref₁ s',
let pos₂ := parser_state.cur_pos s' in
result.cases_on (p₂ s)
interaction_monad.result.success
interaction_monad.result.exception)
meta def lean.parser.specific_ident (nm : name) : lean.parser name := do
ident ← lean.parser.ident,
guardb (ident = nm),
pure nm
#eval tactic.trace $ lean.parser.run_with_input (lean.parser.parser_orelse' (lean.parser.specific_ident `aa) (pure `zz)) "bb cc"
-- zz
#eval tactic.trace $ lean.parser.run_with_input (lean.parser.parser_orelse (lean.parser.specific_ident `aa) (pure `zz)) "bb cc"
-- failed
So I wonder: how to set lean_parser.cur_pos
?
Xavier Martinet (Mar 14 2022 at 18:10):
I think this lean_parser.cur_pos
issue stems from the fact that lean.parser.ident
does consume the string, as @Jannis Limperg mentionned. There should be a way not to consume the string, with a try_core
as @Yakov Pechersky suggested. I will try it.
Jannis Limperg (Mar 14 2022 at 18:22):
I just looked through core and mathlib and I can't find this combinator. In fact, I can't even find the primitives you'd need to define this combinator. So this might actually be impossible (short of patching Lean). Maybe tk
works for you?
Xavier Martinet (Mar 14 2022 at 18:28):
nope: tk
takes a "registered" token, such as "at", not any string :(
Jannis Limperg (Mar 14 2022 at 18:51):
Okay, then I think your only option is to use your variant of parser_orelse
. It's not pretty but it probably works.
Xavier Martinet (Mar 14 2022 at 19:00):
Even weirder: I don't understand AT ALL what is going on there.
In case of an exception, I store the previous parser_state
, but for some reasons (parser_state
not being pure?), it is not frozen:
meta def lean.parser.specific_ident (nm : name) : lean.parser name := λ s,
let pos := parser_state.cur_pos s in
result.cases_on (lean.parser.ident s)
(λ nm' s',
if nm' = nm then
interaction_monad.result.success nm' s'
else
let new_pos := parser_state.cur_pos s in
interaction_monad.result.exception
(option.some $ λ _, format!"init: ({pos.line}, {pos.column}) - end: ({new_pos.line}, {new_pos.column})")
pos
s -- we save the previous state s as in `tactic.try_core` in the hope it gets frozen
)
(interaction_monad.result.exception)
meta def lean.parser.parser_orelse' {α : Type}(p₁ p₂ : lean.parser α) : lean.parser α :=
λ s,
let pos₁ := parser_state.cur_pos s in
result.cases_on (p₁ s)
interaction_monad.result.success
(λ e₁ ref₁ s',
let pos₂ := parser_state.cur_pos s' in
if pos₁ ≠ pos₂ then
let e₁' := match e₁ with
| option.none := "none"
| option.some f := to_string $ f ()
end in
interaction_monad.result.exception (option.some $ λ _, format!"got exc: \"{e₁'}\" - init ({pos₁.line}, {pos₁.column}) ≠ end ({pos₂.line}, {pos₂.column}) (should be {ref₁})") ref₁ s'
else result.cases_on (p₂ s)
interaction_monad.result.success
interaction_monad.result.exception)
#eval tactic.trace $ lean.parser.run_with_input (lean.parser.parser_orelse' (lean.parser.specific_ident `aa) (pure `zz)) "aa cc"
-- aa
#eval tactic.trace $ lean.parser.run_with_input (lean.parser.parser_orelse' (lean.parser.specific_ident `aa) (pure `zz)) "bb cc"
-- got exc: "init: (1, 0) - end: (1, 0)" - init (1, 0) ≠ end (1, 3) (should be (some ⟨1, 0⟩))
-- `s` is not frozen at all!
#eval tactic.trace $ lean.parser.run_with_input (lean.parser.parser_orelse (lean.parser.specific_ident `aa) (pure `zz)) "bb cc"
-- init: (1, 0) - end: (1, 0)
I mean, what the hell is going on there? is that because internally, the parser_state
is some C++ pointer whose pointed value gets mutated, no matter what? If so, then how does lean.parser.with_input
works ?
Gabriel Ebner (Mar 14 2022 at 19:17):
is that because internally, the parser_state is some C++ pointer whose pointed value gets mutated, no matter what?
Yes, that's indeed what's happening.
Xavier Martinet (Mar 14 2022 at 19:20):
Gabriel Ebner said:
is that because internally, the parser_state is some C++ pointer whose pointed value gets mutated, no matter what?
Yes, that's indeed what's happening.
is there a workaround? either to "freeze" the parser_state
, or to set it to a specific value?
Gabriel Ebner (Mar 14 2022 at 19:21):
No obvious workaround. What do you actually want to parse?
Gabriel Ebner (Mar 14 2022 at 19:23):
I can think of one horrible hack: if you know the original string, then maybe you could cur_pos
and with_input
to temporarily switch the input to a substring of the original string starting with the current position.
Xavier Martinet (Mar 14 2022 at 19:23):
I would like to parse itactic such as "try {norm_num [] }"
, to have them simplified and written in a more "canonical" form
e.g., if try
succeeds, it would be just "norm_num"
Xavier Martinet (Mar 14 2022 at 19:25):
Gabriel Ebner said:
I can think of one horrible hack: if you know the original string, then maybe you could
cur_pos
andwith_input
to temporarily switch the input to a substring of the original string starting with the current position.
I have thought about that, but it felt really ugly lol
maybe there is just no other way :( or get rid of the parser altogether and work with heuristic ("if the string starts with a try
, then parse after char 4")
Gabriel Ebner (Mar 14 2022 at 19:25):
Another option is to factor out the first ident parser (which is I believe the only one where you need to check the name). Something like do i ← ident, dispatch_on_ident i
.
Gabriel Ebner (Mar 14 2022 at 19:27):
On the C++ side there's the get_token_info
function on the parser class which returns the info for the next token. Not sure if it's worth the work but it's possible to expose that on the Lean side.
Xavier Martinet (Mar 14 2022 at 19:27):
yup
still, it is sad that in Lean 3 there are things like that that are not inspectable within Lean itself :( hopefully Lean 4 is coming!
Mario Carneiro (Mar 14 2022 at 20:53):
Lean 4 is already "here" enough to play with things like this, and I would expect that you can do this kind of normalization since it isn't too different from what mathport is already doing
Last updated: Dec 20 2023 at 11:08 UTC