Zulip Chat Archive

Stream: Program verification

Topic: lawful parsers

Yakov Pechersky (Dec 10 2020 at 00:30):

Is there a parser (that is used) that violates this condition?

import data.nat.cast
import data.fintype.card

open parse_result

variables {α : Type} (p q : parser α) (cb : char_buffer) (n n' : ℕ) (err : dlist string)

lemma valid_parser_of_fail (h : ∃ err, p cb n = fail n' err) :
  n' ≤ n := sorry

Yakov Pechersky (Dec 10 2020 at 00:32):

The larger #xy is trying to prove the following somewhat-definitional lemma about the alternative definition on parsers. I have the associated one done for done

lemma orelse_eq_done {n' : ℕ} : (p <|> q) cb n = done n' a ↔
  (p cb n = done n' a ∨ ((∃ err, p cb n = fail n err) ∧ q cb n = done n' a)) := ...

but have hit a snag on the case shown below for the fail case. If this the validity statement above is true, the whole proof can be much simpler. But I've identified precisely where I hit a contradiction when there is no validity guarantee.

Longer proof

import data.nat.cast
import data.fintype.card

open parse_result

variables {α : Type} (p q : parser α) (cb : char_buffer) (n n' : ℕ) (err : dlist string)

lemma valid_parser_of_fail (h : ∃ err, p cb n = fail n' err) :
  n' ≤ n := sorry

@[simp] lemma orelse_eq_orelse : p.orelse q = (p <|> q) := rfl

lemma orelse_eq_fail_le {n' : ℕ} (h : (p <|> q) cb n = fail n' err) : n' ≤ n :=
begin
  contrapose! h,
  rw ←orelse_eq_orelse,
  simp [parser.orelse, exists_and_distrib_left],
  set P : parse_result α := p cb n with hP,
  cases p cb n with posp resp posp errp,
  { simp [parser.orelse, exists_false, or_self, and_false, hP] },
  by_cases hp : posp = n,
  { simp only [hp, parser.orelse, true_and, eq_self_iff_true, not_true, if_false, ne.def,
               and_true, false_or, exists_eq', false_and, hP],
    cases q cb n with posq resq posq errq,
    { simp [parser.orelse] },
    { simp [parser.orelse],
      split_ifs with H H',
      { simp [ne_of_lt h], },
      {
        have : posq ≠ n' := ne_of_lt (lt_trans H' h),
        simp [this, iff_self] },
      { have : posq = n := le_antisymm (le_of_not_lt H) (le_of_not_lt H'),
        simp [ne_of_lt h, this, false_and], } } },
  { rcases lt_trichotomy posp n' with H|H|H,
    { simp [parser.orelse, hp, ne_of_lt H, hP] },
    { simp [parser.orelse, hp, ←H],
      -- this is the case where p cb n = p posp errp, and posp = n',
      -- so err should equal errp, since by defn of (p cb n <|>) propagates
      -- the errp to equal err when n ≠ posp
      -- this case needs valid_parser_if fail to show that the `err`s MUST match
      -- and somehow prove this false
      -- by already knowing that n' ≤ n
      have : n' ≤ n,
        { apply valid_parser_of_fail p cb n,
          use errp,
          rw [←H, ←hP] },
      exact absurd this (not_le_of_lt h) },
    { simp [parser.orelse, hp, ne_of_gt H, hP] } },
end

Yakov Pechersky (Dec 10 2020 at 00:35):

If there is a more appropriate stream, like #metaprogramming / tactics , please let me know.

Yakov Pechersky (Dec 10 2020 at 00:35):

The even larger #xy is trying to parse directly into subtypes, which requires proofs that a particular parser is unable to produce certain states.

Yakov Pechersky (Dec 10 2020 at 00:37):

Of course, the subtype problem doesn't require a clear proof about how consistently orelse produces fail states, but I'm trying to be thorough in building out the theory.

Mario Carneiro (Dec 10 2020 at 09:07):

Of course parser has no such assumption, so there are parsers that fail it

Mario Carneiro (Dec 10 2020 at 09:11):

lemma not_valid_parser_of_fail :
  ¬ ∀ {α : Type} (p : parser α) (cb : char_buffer) (n n' : ℕ)
    (h : ∃ err, p cb n = fail n' err), n' ≤ n :=
λ H, by cases @H unit (λ _ _, fail 1 dlist.empty) buffer.nil 0 1 ⟨_, rfl⟩

Yakov Pechersky (Dec 10 2020 at 09:15):

Thanks. So I have defined a separate Prop that can label a parser as valid in this way, and, for example, have that:

@[simp] def valid : Prop :=
  ∀ ⦃cb : char_buffer⦄ ⦃n n' : ℕ⦄ ⦃err : dlist string⦄, p cb n = fail n' err → n' ≤ n

lemma orelse_valid (hp : p.valid) (hq : q.valid) : (p <|> q).valid := ...

Yakov Pechersky (Dec 10 2020 at 09:17):

But are there parsers that are in use often that break this assumption? failure is valid, for example. One could write a parser that always fails, and reports the error as having had occurred at a position greater than what was queried. But are such parsers useful, or are totally pathological?

Yakov Pechersky (Dec 10 2020 at 09:17):

I was surprised to see that decorate_error(s) completely ignores where the error has occurred, and instead reports the error at the position of the query.

Mario Carneiro (Dec 10 2020 at 09:18):

er, actually that condition is the wrong way around, reasonable parsers do this all the time

lemma not_valid_parser_of_fail :
  ¬ ∀ {α : Type} (p : parser α) (cb : char_buffer) (n n' : ℕ)
    (h : ∃ err, p cb n = fail n' err), n' ≤ n :=
λ H, by cases @H unit (parser.sat (λ _, true) >> failure) [' '].to_buffer 0 1 ⟨_, rfl⟩

Yakov Pechersky (Dec 10 2020 at 09:19):

So, for decorate_error msg p cb n = fail n' err -> n = n'

Mario Carneiro (Dec 10 2020 at 09:19):

It seems more likely that valid parsers satisfy p cb n = fail n' err -> n <= n'

Mario Carneiro (Dec 10 2020 at 09:21):

the return value is supposed to mean "I was successful up to here" so it makes sense that you wouldn't go before the start point

Yakov Pechersky (Dec 10 2020 at 09:21):

Yes, valid is not propagated across any >>= operation, because in a p >>= f situation, p can succeed and move the cursor forward, and then the failure is at f.

Yakov Pechersky (Dec 10 2020 at 09:21):

Hmm.

Yakov Pechersky (Dec 10 2020 at 09:22):

I have a proof of the following statement, so I need to think about how to modify it:

lemma orelse_eq_fail_lt (hn : n' < n) : (p <|> q) cb n = fail n' err ↔
  (p cb n = fail n' err) ∨
  (q cb n = fail n' err ∧ (∃ (errp), p cb n = fail n errp)) := ...

Mario Carneiro (Dec 10 2020 at 09:23):

a valid parser should never have the hypothesis hn

Yakov Pechersky (Dec 10 2020 at 09:25):

protected def orelse (p q : parser α) : parser α :=
λ input pos, match p input pos with
| parse_result.fail pos₁ expected₁ :=
  if pos₁ ≠ pos then parse_result.fail pos₁ expected₁ else
  match q input pos with
  | parse_result.fail pos₂ expected₂ :=
    if pos₁ < pos₂ then
      parse_result.fail pos₁ expected₁
    else if pos₂ < pos₁ then
      parse_result.fail pos₂ expected₂ -- this is the case that you say should never be valid, correct?
    else -- pos₁ = pos₂
      parse_result.fail pos₁ (expected₁ ++ expected₂)
  | ok := ok
  end
  | ok := ok
end

Yakov Pechersky (Dec 10 2020 at 09:27):

I find it easier to understand after I rewrite pos1 as pos in the pos1 = pos branch:

protected def orelse (p q : parser α) : parser α :=
λ input pos, match p input pos with
| parse_result.fail pos₁ expected₁ :=
  if pos₁ ≠ pos then parse_result.fail pos₁ expected₁ else
  match q input pos with
  | parse_result.fail pos₂ expected₂ :=
    if pos < pos₂ then
      parse_result.fail pos₁ expected₁
    else if pos₂ < pos then
      parse_result.fail pos₂ expected₂ -- this is the case that you say should never be valid, correct?
    else -- pos = pos₂
      parse_result.fail pos (expected₁ ++ expected₂)
  | ok := ok
  end
  | ok := ok
end

Mario Carneiro (Dec 10 2020 at 09:27):

I suggest

def parse_result.pos {α} : parse_result α → ℕ
| (parse_result.done pos _) := pos
| (parse_result.fail pos _) := pos

def parser.valid {α} (p : parser α) : Prop :=
∀ ⦃cb : char_buffer⦄ ⦃n : ℕ⦄,
let res := (p cb n).pos in n ≤ res ∧ res ≤ cb.size

Mario Carneiro (Dec 10 2020 at 09:29):

yes, that branch looks like it should not happen

Yakov Pechersky (Dec 10 2020 at 09:30):

But that's the core Lean definition, so in trying to write a simplifying iff lemma for (p <|> q) cb n = fail n' err I need to figure out how to represent that branch.

Mario Carneiro (Dec 10 2020 at 09:30):

Oh, another property you can claim is that the parser is extensional wrt values of the char_buffer before n

Mario Carneiro (Dec 10 2020 at 09:31):

Your simplification lemma should assume p and q are valid

Mario Carneiro (Dec 10 2020 at 09:31):

as long as every core parser satisfies the validity predicate you are licensed to assume it even if it's not in the type

Yakov Pechersky (Dec 10 2020 at 09:31):

Thanks, this is very helpful.

Yakov Pechersky (Dec 10 2020 at 09:32):

Worth PRing in some part of mathlib, or too program verification-y?

Mario Carneiro (Dec 10 2020 at 09:32):

I don't see why those are mutually exclusive

Mario Carneiro (Dec 10 2020 at 09:32):

I think it would certainly be useful

Yakov Pechersky (Dec 10 2020 at 09:33):

Good to know

Yakov Pechersky (Dec 10 2020 at 09:33):

I was surprised I couldn't find this anywhere:

def asum {α : Type*} {m : Type* → Type*} [alternative m] (xs : list (m α)) : m α :=
xs.foldl (<|>) failure

Mario Carneiro (Dec 10 2020 at 09:33):

have you seen data.hash_map?

Mario Carneiro (Dec 10 2020 at 09:33):

I think that's called mfirst

Yakov Pechersky (Dec 10 2020 at 09:34):

Ah, so it is

Yakov Pechersky (Dec 10 2020 at 09:34):

I'll take a look at hash_map.

Mario Carneiro (Dec 10 2020 at 09:35):

I would like to see more formalized data structures like that one in mathlib

Mario Carneiro (Dec 10 2020 at 09:36):

there is a long-abandoned mathlib branch called ordmap which formalizes balanced 2-3 trees as an alternative to the broken rbmap

Yakov Pechersky (Dec 10 2020 at 09:36):

asum = mfirst id

Yakov Pechersky (Dec 10 2020 at 09:37):

But I guess it simplifies things like

asum (ch <$> ['a', 'b']) = mfirst ch ['a', 'b']

Mario Carneiro (Dec 10 2020 at 09:38):

it's also more efficient since you don't actually need to build up all the thunks

Mario Carneiro (Dec 10 2020 at 09:38):

the list can just be the data itself instead of some monad things

Yakov Pechersky (Dec 10 2020 at 09:40):

While we're on this topic, do we have monads in mathlib where >> behaves more efficiently than >>= (λ _, ...)?

Yakov Pechersky (Dec 10 2020 at 10:05):

Ah..., pure isn't valid by the definition that uses cb.size:

protected def pure (a : α) : parser α :=
λ input pos, parse_result.done pos a

Mario Carneiro (Dec 10 2020 at 10:51):

Oh, you have to assume pos <= cb.size

Mario Carneiro (Dec 10 2020 at 10:52):

def parser.valid {α} (p : parser α) : Prop :=
∀ ⦃cb : char_buffer⦄ ⦃n : ℕ⦄,
n ≤ (p cb n).pos ∧ (n ≤ cb.size → (p cb n).pos ≤ cb.size)

Yakov Pechersky (Dec 13 2020 at 00:28):

I think it has to be the other way around in the implication:

def valid : Prop :=
  ∀ (cb : char_buffer) (n : ℕ), n ≤ (p cb n).pos ∧ ((p cb n).pos ≤ cb.size → n ≤ cb.size

Last updated: May 02 2025 at 03:31 UTC