Zulip Chat Archive
Stream: Program verification
Topic: lawful parsers
Yakov Pechersky (Dec 10 2020 at 00:30):
Is there a parser (that is used) that violates this condition?
import data.nat.cast
import data.fintype.card
open parse_result
variables {α : Type} (p q : parser α) (cb : char_buffer) (n n' : ℕ) (err : dlist string)
lemma valid_parser_of_fail (h : ∃ err, p cb n = fail n' err) :
n' ≤ n := sorry
Yakov Pechersky (Dec 10 2020 at 00:32):
The larger #xy is trying to prove the following somewhat-definitional lemma about the alternative definition on parsers. I have the associated one done for done
lemma orelse_eq_done {n' : ℕ} : (p <|> q) cb n = done n' a ↔
(p cb n = done n' a ∨ ((∃ err, p cb n = fail n err) ∧ q cb n = done n' a)) := ...
but have hit a snag on the case shown below for the fail
case. If this the validity statement above is true, the whole proof can be much simpler. But I've identified precisely where I hit a contradiction when there is no validity guarantee.
Longer proof
Yakov Pechersky (Dec 10 2020 at 00:35):
If there is a more appropriate stream, like #metaprogramming / tactics , please let me know.
Yakov Pechersky (Dec 10 2020 at 00:35):
The even larger #xy is trying to parse directly into subtypes, which requires proofs that a particular parser is unable to produce certain states.
Yakov Pechersky (Dec 10 2020 at 00:37):
Of course, the subtype problem doesn't require a clear proof about how consistently orelse
produces fail
states, but I'm trying to be thorough in building out the theory.
Mario Carneiro (Dec 10 2020 at 09:07):
Of course parser
has no such assumption, so there are parsers that fail it
Mario Carneiro (Dec 10 2020 at 09:11):
lemma not_valid_parser_of_fail :
¬ ∀ {α : Type} (p : parser α) (cb : char_buffer) (n n' : ℕ)
(h : ∃ err, p cb n = fail n' err), n' ≤ n :=
λ H, by cases @H unit (λ _ _, fail 1 dlist.empty) buffer.nil 0 1 ⟨_, rfl⟩
Yakov Pechersky (Dec 10 2020 at 09:15):
Thanks. So I have defined a separate Prop that can label a parser
as valid
in this way, and, for example, have that:
@[simp] def valid : Prop :=
∀ ⦃cb : char_buffer⦄ ⦃n n' : ℕ⦄ ⦃err : dlist string⦄, p cb n = fail n' err → n' ≤ n
lemma orelse_valid (hp : p.valid) (hq : q.valid) : (p <|> q).valid := ...
Yakov Pechersky (Dec 10 2020 at 09:17):
But are there parsers that are in use often that break this assumption? failure
is valid
, for example. One could write a parser that always fails, and reports the error as having had occurred at a position greater than what was queried. But are such parsers useful, or are totally pathological?
Yakov Pechersky (Dec 10 2020 at 09:17):
I was surprised to see that decorate_error(s)
completely ignores where the error has occurred, and instead reports the error at the position of the query.
Mario Carneiro (Dec 10 2020 at 09:18):
er, actually that condition is the wrong way around, reasonable parsers do this all the time
lemma not_valid_parser_of_fail :
¬ ∀ {α : Type} (p : parser α) (cb : char_buffer) (n n' : ℕ)
(h : ∃ err, p cb n = fail n' err), n' ≤ n :=
λ H, by cases @H unit (parser.sat (λ _, true) >> failure) [' '].to_buffer 0 1 ⟨_, rfl⟩
Yakov Pechersky (Dec 10 2020 at 09:19):
So, for decorate_error msg p cb n = fail n' err -> n = n'
Mario Carneiro (Dec 10 2020 at 09:19):
It seems more likely that valid parsers satisfy p cb n = fail n' err -> n <= n'
Mario Carneiro (Dec 10 2020 at 09:21):
the return value is supposed to mean "I was successful up to here" so it makes sense that you wouldn't go before the start point
Yakov Pechersky (Dec 10 2020 at 09:21):
Yes, valid
is not propagated across any >>=
operation, because in a p >>= f
situation, p
can succeed and move the cursor forward, and then the failure is at f
.
Yakov Pechersky (Dec 10 2020 at 09:21):
Hmm.
Yakov Pechersky (Dec 10 2020 at 09:22):
I have a proof of the following statement, so I need to think about how to modify it:
lemma orelse_eq_fail_lt (hn : n' < n) : (p <|> q) cb n = fail n' err ↔
(p cb n = fail n' err) ∨
(q cb n = fail n' err ∧ (∃ (errp), p cb n = fail n errp)) := ...
Mario Carneiro (Dec 10 2020 at 09:23):
a valid parser should never have the hypothesis hn
Yakov Pechersky (Dec 10 2020 at 09:25):
protected def orelse (p q : parser α) : parser α :=
λ input pos, match p input pos with
| parse_result.fail pos₁ expected₁ :=
if pos₁ ≠ pos then parse_result.fail pos₁ expected₁ else
match q input pos with
| parse_result.fail pos₂ expected₂ :=
if pos₁ < pos₂ then
parse_result.fail pos₁ expected₁
else if pos₂ < pos₁ then
parse_result.fail pos₂ expected₂ -- this is the case that you say should never be valid, correct?
else -- pos₁ = pos₂
parse_result.fail pos₁ (expected₁ ++ expected₂)
| ok := ok
end
| ok := ok
end
Yakov Pechersky (Dec 10 2020 at 09:27):
I find it easier to understand after I rewrite pos1
as pos
in the pos1 = pos
branch:
protected def orelse (p q : parser α) : parser α :=
λ input pos, match p input pos with
| parse_result.fail pos₁ expected₁ :=
if pos₁ ≠ pos then parse_result.fail pos₁ expected₁ else
match q input pos with
| parse_result.fail pos₂ expected₂ :=
if pos < pos₂ then
parse_result.fail pos₁ expected₁
else if pos₂ < pos then
parse_result.fail pos₂ expected₂ -- this is the case that you say should never be valid, correct?
else -- pos = pos₂
parse_result.fail pos (expected₁ ++ expected₂)
| ok := ok
end
| ok := ok
end
Mario Carneiro (Dec 10 2020 at 09:27):
I suggest
def parse_result.pos {α} : parse_result α → ℕ
| (parse_result.done pos _) := pos
| (parse_result.fail pos _) := pos
def parser.valid {α} (p : parser α) : Prop :=
∀ ⦃cb : char_buffer⦄ ⦃n : ℕ⦄,
let res := (p cb n).pos in n ≤ res ∧ res ≤ cb.size
Mario Carneiro (Dec 10 2020 at 09:29):
yes, that branch looks like it should not happen
Yakov Pechersky (Dec 10 2020 at 09:30):
But that's the core Lean definition, so in trying to write a simplifying iff lemma for (p <|> q) cb n = fail n' err
I need to figure out how to represent that branch.
Mario Carneiro (Dec 10 2020 at 09:30):
Oh, another property you can claim is that the parser is extensional wrt values of the char_buffer before n
Mario Carneiro (Dec 10 2020 at 09:31):
Your simplification lemma should assume p
and q
are valid
Mario Carneiro (Dec 10 2020 at 09:31):
as long as every core parser satisfies the validity predicate you are licensed to assume it even if it's not in the type
Yakov Pechersky (Dec 10 2020 at 09:31):
Thanks, this is very helpful.
Yakov Pechersky (Dec 10 2020 at 09:32):
Worth PRing in some part of mathlib, or too program verification-y?
Mario Carneiro (Dec 10 2020 at 09:32):
I don't see why those are mutually exclusive
Mario Carneiro (Dec 10 2020 at 09:32):
I think it would certainly be useful
Yakov Pechersky (Dec 10 2020 at 09:33):
Good to know
Yakov Pechersky (Dec 10 2020 at 09:33):
I was surprised I couldn't find this anywhere:
def asum {α : Type*} {m : Type* → Type*} [alternative m] (xs : list (m α)) : m α :=
xs.foldl (<|>) failure
Mario Carneiro (Dec 10 2020 at 09:33):
have you seen data.hash_map
?
Mario Carneiro (Dec 10 2020 at 09:33):
I think that's called mfirst
Yakov Pechersky (Dec 10 2020 at 09:34):
Ah, so it is
Yakov Pechersky (Dec 10 2020 at 09:34):
I'll take a look at hash_map
.
Mario Carneiro (Dec 10 2020 at 09:35):
I would like to see more formalized data structures like that one in mathlib
Mario Carneiro (Dec 10 2020 at 09:36):
there is a long-abandoned mathlib branch called ordmap
which formalizes balanced 2-3 trees as an alternative to the broken rbmap
Yakov Pechersky (Dec 10 2020 at 09:36):
asum = mfirst id
Yakov Pechersky (Dec 10 2020 at 09:37):
But I guess it simplifies things like
asum (ch <$> ['a', 'b']) = mfirst ch ['a', 'b']
Mario Carneiro (Dec 10 2020 at 09:38):
it's also more efficient since you don't actually need to build up all the thunks
Mario Carneiro (Dec 10 2020 at 09:38):
the list can just be the data itself instead of some monad things
Yakov Pechersky (Dec 10 2020 at 09:40):
While we're on this topic, do we have monads in mathlib where >>
behaves more efficiently than >>= (λ _, ...)
?
Yakov Pechersky (Dec 10 2020 at 10:05):
Ah..., pure
isn't valid
by the definition that uses cb.size
:
protected def pure (a : α) : parser α :=
λ input pos, parse_result.done pos a
Mario Carneiro (Dec 10 2020 at 10:51):
Oh, you have to assume pos <= cb.size
Mario Carneiro (Dec 10 2020 at 10:52):
def parser.valid {α} (p : parser α) : Prop :=
∀ ⦃cb : char_buffer⦄ ⦃n : ℕ⦄,
n ≤ (p cb n).pos ∧ (n ≤ cb.size → (p cb n).pos ≤ cb.size)
Yakov Pechersky (Dec 13 2020 at 00:28):
I think it has to be the other way around in the implication:
def valid : Prop :=
∀ (cb : char_buffer) (n : ℕ), n ≤ (p cb n).pos ∧ ((p cb n).pos ≤ cb.size → n ≤ cb.size
Last updated: Dec 20 2023 at 11:08 UTC