## Stream: Program verification

### Topic: lawful parsers

#### Yakov Pechersky (Dec 10 2020 at 00:30):

Is there a parser (that is used) that violates this condition?

import data.nat.cast
import data.fintype.card

open parse_result

variables {α : Type} (p q : parser α) (cb : char_buffer) (n n' : ℕ) (err : dlist string)

lemma valid_parser_of_fail (h : ∃ err, p cb n = fail n' err) :
n' ≤ n := sorry


#### Yakov Pechersky (Dec 10 2020 at 00:32):

The larger #xy is trying to prove the following somewhat-definitional lemma about the alternative definition on parsers. I have the associated one done for done

lemma orelse_eq_done {n' : ℕ} : (p <|> q) cb n = done n' a ↔
(p cb n = done n' a ∨ ((∃ err, p cb n = fail n err) ∧ q cb n = done n' a)) := ...


but have hit a snag on the case shown below for the fail case. If this the validity statement above is true, the whole proof can be much simpler. But I've identified precisely where I hit a contradiction when there is no validity guarantee.

Longer proof

#### Yakov Pechersky (Dec 10 2020 at 00:35):

If there is a more appropriate stream, like #metaprogramming / tactics , please let me know.

#### Yakov Pechersky (Dec 10 2020 at 00:35):

The even larger #xy is trying to parse directly into subtypes, which requires proofs that a particular parser is unable to produce certain states.

#### Yakov Pechersky (Dec 10 2020 at 00:37):

Of course, the subtype problem doesn't require a clear proof about how consistently orelse produces fail states, but I'm trying to be thorough in building out the theory.

#### Mario Carneiro (Dec 10 2020 at 09:07):

Of course parser has no such assumption, so there are parsers that fail it

#### Mario Carneiro (Dec 10 2020 at 09:11):

lemma not_valid_parser_of_fail :
¬ ∀ {α : Type} (p : parser α) (cb : char_buffer) (n n' : ℕ)
(h : ∃ err, p cb n = fail n' err), n' ≤ n :=
λ H, by cases @H unit (λ _ _, fail 1 dlist.empty) buffer.nil 0 1 ⟨_, rfl⟩


#### Yakov Pechersky (Dec 10 2020 at 09:15):

Thanks. So I have defined a separate Prop that can label a parser as valid in this way, and, for example, have that:

@[simp] def valid : Prop :=
∀ ⦃cb : char_buffer⦄ ⦃n n' : ℕ⦄ ⦃err : dlist string⦄, p cb n = fail n' err → n' ≤ n

lemma orelse_valid (hp : p.valid) (hq : q.valid) : (p <|> q).valid := ...


#### Yakov Pechersky (Dec 10 2020 at 09:17):

But are there parsers that are in use often that break this assumption? failure is valid, for example. One could write a parser that always fails, and reports the error as having had occurred at a position greater than what was queried. But are such parsers useful, or are totally pathological?

#### Yakov Pechersky (Dec 10 2020 at 09:17):

I was surprised to see that decorate_error(s) completely ignores where the error has occurred, and instead reports the error at the position of the query.

#### Mario Carneiro (Dec 10 2020 at 09:18):

er, actually that condition is the wrong way around, reasonable parsers do this all the time

lemma not_valid_parser_of_fail :
¬ ∀ {α : Type} (p : parser α) (cb : char_buffer) (n n' : ℕ)
(h : ∃ err, p cb n = fail n' err), n' ≤ n :=
λ H, by cases @H unit (parser.sat (λ _, true) >> failure) [' '].to_buffer 0 1 ⟨_, rfl⟩


#### Yakov Pechersky (Dec 10 2020 at 09:19):

So, for decorate_error msg p cb n = fail n' err -> n = n'

#### Mario Carneiro (Dec 10 2020 at 09:19):

It seems more likely that valid parsers satisfy  p cb n = fail n' err -> n <= n'

#### Mario Carneiro (Dec 10 2020 at 09:21):

the return value is supposed to mean "I was successful up to here" so it makes sense that you wouldn't go before the start point

#### Yakov Pechersky (Dec 10 2020 at 09:21):

Yes, valid is not propagated across any >>= operation, because in a p >>= f situation, p can succeed and move the cursor forward, and then the failure is at f.

Hmm.

#### Yakov Pechersky (Dec 10 2020 at 09:22):

I have a proof of the following statement, so I need to think about how to modify it:

lemma orelse_eq_fail_lt (hn : n' < n) : (p <|> q) cb n = fail n' err ↔
(p cb n = fail n' err) ∨
(q cb n = fail n' err ∧ (∃ (errp), p cb n = fail n errp)) := ...


#### Mario Carneiro (Dec 10 2020 at 09:23):

a valid parser should never have the hypothesis hn

#### Yakov Pechersky (Dec 10 2020 at 09:25):

protected def orelse (p q : parser α) : parser α :=
λ input pos, match p input pos with
| parse_result.fail pos₁ expected₁ :=
if pos₁ ≠ pos then parse_result.fail pos₁ expected₁ else
match q input pos with
| parse_result.fail pos₂ expected₂ :=
if pos₁ < pos₂ then
parse_result.fail pos₁ expected₁
else if pos₂ < pos₁ then
parse_result.fail pos₂ expected₂ -- this is the case that you say should never be valid, correct?
else -- pos₁ = pos₂
parse_result.fail pos₁ (expected₁ ++ expected₂)
| ok := ok
end
| ok := ok
end


#### Yakov Pechersky (Dec 10 2020 at 09:27):

I find it easier to understand after I rewrite pos1 as pos in the pos1 = pos branch:

protected def orelse (p q : parser α) : parser α :=
λ input pos, match p input pos with
| parse_result.fail pos₁ expected₁ :=
if pos₁ ≠ pos then parse_result.fail pos₁ expected₁ else
match q input pos with
| parse_result.fail pos₂ expected₂ :=
if pos < pos₂ then
parse_result.fail pos₁ expected₁
else if pos₂ < pos then
parse_result.fail pos₂ expected₂ -- this is the case that you say should never be valid, correct?
else -- pos = pos₂
parse_result.fail pos (expected₁ ++ expected₂)
| ok := ok
end
| ok := ok
end


#### Mario Carneiro (Dec 10 2020 at 09:27):

I suggest

def parse_result.pos {α} : parse_result α → ℕ
| (parse_result.done pos _) := pos
| (parse_result.fail pos _) := pos

def parser.valid {α} (p : parser α) : Prop :=
∀ ⦃cb : char_buffer⦄ ⦃n : ℕ⦄,
let res := (p cb n).pos in n ≤ res ∧ res ≤ cb.size


#### Mario Carneiro (Dec 10 2020 at 09:29):

yes, that branch looks like it should not happen

#### Yakov Pechersky (Dec 10 2020 at 09:30):

But that's the core Lean definition, so in trying to write a simplifying iff lemma for (p <|> q) cb n = fail n' err I need to figure out how to represent that branch.

#### Mario Carneiro (Dec 10 2020 at 09:30):

Oh, another property you can claim is that the parser is extensional wrt values of the char_buffer before n

#### Mario Carneiro (Dec 10 2020 at 09:31):

Your simplification lemma should assume p and q are valid

#### Mario Carneiro (Dec 10 2020 at 09:31):

as long as every core parser satisfies the validity predicate you are licensed to assume it even if it's not in the type

#### Yakov Pechersky (Dec 10 2020 at 09:32):

Worth PRing in some part of mathlib, or too program verification-y?

#### Mario Carneiro (Dec 10 2020 at 09:32):

I don't see why those are mutually exclusive

#### Mario Carneiro (Dec 10 2020 at 09:32):

I think it would certainly be useful

Good to know

#### Yakov Pechersky (Dec 10 2020 at 09:33):

I was surprised I couldn't find this anywhere:

def asum {α : Type*} {m : Type* → Type*} [alternative m] (xs : list (m α)) : m α :=
xs.foldl (<|>) failure


#### Mario Carneiro (Dec 10 2020 at 09:33):

have you seen data.hash_map?

#### Mario Carneiro (Dec 10 2020 at 09:33):

I think that's called mfirst

Ah, so it is

#### Yakov Pechersky (Dec 10 2020 at 09:34):

I'll take a look at hash_map.

#### Mario Carneiro (Dec 10 2020 at 09:35):

I would like to see more formalized data structures like that one in mathlib

#### Mario Carneiro (Dec 10 2020 at 09:36):

there is a long-abandoned mathlib branch called ordmap which formalizes balanced 2-3 trees as an alternative to the broken rbmap

#### Yakov Pechersky (Dec 10 2020 at 09:36):

asum = mfirst id

#### Yakov Pechersky (Dec 10 2020 at 09:37):

But I guess it simplifies things like

asum (ch <\$> ['a', 'b']) = mfirst ch ['a', 'b']


#### Mario Carneiro (Dec 10 2020 at 09:38):

it's also more efficient since you don't actually need to build up all the thunks

#### Mario Carneiro (Dec 10 2020 at 09:38):

the list can just be the data itself instead of some monad things

#### Yakov Pechersky (Dec 10 2020 at 09:40):

While we're on this topic, do we have monads in mathlib where >> behaves more efficiently than >>= (λ _, ...)?

#### Yakov Pechersky (Dec 10 2020 at 10:05):

Ah..., pure isn't valid by the definition that uses cb.size:

protected def pure (a : α) : parser α :=
λ input pos, parse_result.done pos a


#### Mario Carneiro (Dec 10 2020 at 10:51):

Oh, you have to assume pos <= cb.size

#### Mario Carneiro (Dec 10 2020 at 10:52):

def parser.valid {α} (p : parser α) : Prop :=
∀ ⦃cb : char_buffer⦄ ⦃n : ℕ⦄,
n ≤ (p cb n).pos ∧ (n ≤ cb.size → (p cb n).pos ≤ cb.size)


#### Yakov Pechersky (Dec 13 2020 at 00:28):

I think it has to be the other way around in the implication:

def valid : Prop :=
∀ (cb : char_buffer) (n : ℕ), n ≤ (p cb n).pos ∧ ((p cb n).pos ≤ cb.size → n ≤ cb.size


Last updated: May 08 2021 at 22:13 UTC