Documentation

Mathlib.Data.List.EditDistance.Defs

Levenshtein distances #

We define the Levenshtein edit distance levenshtein C xy ys between two List α, with a customizable cost structure C for the delete, insert, and substitute operations.

As an auxiliary function, we define suffixLevenshtein C xs ys, which gives the list of distances from each suffix of xs to ys. This is defined by recursion on ys, using the internal function Levenshtein.impl, which computes suffixLevenshtein C xs (y :: ys) using xs, y, and suffixLevenshtein C xs ys. (This corresponds to the usual algorithm using the last two rows of the matrix of distances between suffixes.)

After setting up these definitions, we prove lemmas specifying their behaviour, particularly

theorem suffixLevenshtein_eq_tails_map :
  (suffixLevenshtein C xs ys).1 = xs.tails.map fun xs' => levenshtein C xs' ys := ...

and

theorem levenshtein_cons_cons :
  levenshtein C (x :: xs) (y :: ys) =
    min (C.delete x + levenshtein C xs (y :: ys))
      (min (C.insert y + levenshtein C (x :: xs) ys)
        (C.substitute x y + levenshtein C xs ys)) := ...
structure Levenshtein.Cost (α : Type u_1) (β : Type u_2) (δ : Type u_3) :
Type (max (max u_1 u_2) u_3)
  • delete : αδ

    Cost to delete an element from a list.

  • insert : βδ

    Cost in insert an element into a list.

  • substitute : αβδ

    Cost to substitute one elemenet for another in a list.

A cost structure for Levenshtein edit distance.

Instances For
    @[simp]
    theorem Levenshtein.defaultCost_insert {α : Type u_1} [DecidableEq α] :
    ∀ (x : α), Levenshtein.Cost.insert Levenshtein.defaultCost x = 1
    @[simp]
    theorem Levenshtein.defaultCost_delete {α : Type u_1} [DecidableEq α] :
    ∀ (x : α), Levenshtein.Cost.delete Levenshtein.defaultCost x = 1
    @[simp]
    theorem Levenshtein.defaultCost_substitute {α : Type u_1} [DecidableEq α] (a : α) (b : α) :
    Levenshtein.Cost.substitute Levenshtein.defaultCost a b = if a = b then 0 else 1

    The default cost structure, for which all operations cost 1.

    Instances For
      def Levenshtein.impl {α : Type u_1} {β : Type u_2} {δ : Type u_3} [AddZeroClass δ] [Min δ] (C : Levenshtein.Cost α β δ) (xs : List α) (y : β) (d : { r // 0 < List.length r }) :
      { r // 0 < List.length r }

      (Implementation detail for levenshtein)

      Given a list xs and the Levenshtein distances from each suffix of xs to some other list ys, compute the Levenshtein distances from each suffix of xs to y :: ys.

      (Note that we don't actually need to know ys itself here, so it is not an argument.)

      The return value is a list of length x.length + 1, and it is convenient for the recursive calls that we bundle this list with a proof that it is non-empty.

      Instances For
        theorem Levenshtein.impl_cons {α : Type u_2} {β : Type u_3} {δ : Type u_1} [AddZeroClass δ] [Min δ] {C : Levenshtein.Cost α β δ} (x : α) (xs : List α) (y : β) (d : δ) (ds : List δ) (w : 0 < List.length (d :: ds)) (w' : 0 < List.length ds) :
        Levenshtein.impl C (x :: xs) y { val := d :: ds, property := w } = match Levenshtein.impl C xs y { val := ds, property := w' } with | { val := r, property := w } => { val := min (Levenshtein.Cost.delete C x + r[0]) (min (Levenshtein.Cost.insert C y + d) (Levenshtein.Cost.substitute C x y + ds[0])) :: r, property := (_ : 0 < Nat.succ (List.length r)) }
        theorem Levenshtein.impl_cons_fst_zero {α : Type u_2} {β : Type u_3} {δ : Type u_1} [AddZeroClass δ] [Min δ] {C : Levenshtein.Cost α β δ} (x : α) (xs : List α) (y : β) (d : δ) (ds : List δ) (w : 0 < List.length (d :: ds)) (h : 0 < List.length ↑(Levenshtein.impl C (x :: xs) y { val := d :: ds, property := w })) (w' : 0 < List.length ds) :
        (↑(Levenshtein.impl C (x :: xs) y { val := d :: ds, property := w }))[0] = match Levenshtein.impl C xs y { val := ds, property := w' } with | { val := r, property := w } => min (Levenshtein.Cost.delete C x + r[0]) (min (Levenshtein.Cost.insert C y + d) (Levenshtein.Cost.substitute C x y + ds[0]))
        theorem Levenshtein.impl_length {α : Type u_2} {β : Type u_3} {δ : Type u_1} [AddZeroClass δ] [Min δ] {C : Levenshtein.Cost α β δ} (xs : List α) (y : β) (d : { r // 0 < List.length r }) (w : List.length d = List.length xs + 1) :
        def suffixLevenshtein {α : Type u_1} {β : Type u_2} {δ : Type u_3} [AddZeroClass δ] [Min δ] (C : Levenshtein.Cost α β δ) (xs : List α) (ys : List β) :
        { r // 0 < List.length r }

        suffixLevenshtein C xs ys computes the Levenshtein distance (using the cost functions provided by a C : Cost α β δ) from each suffix of the list xs to the list ys.

        The first element of this list is the Levenshtein distance from xs to ys.

        Note that if the cost functions do not satisfy the inequalities

        Instances For
          theorem suffixLevenshtein_length {α : Type u_1} {β : Type u_2} {δ : Type u_3} [AddZeroClass δ] [Min δ] {C : Levenshtein.Cost α β δ} (xs : List α) (ys : List β) :
          theorem suffixLevenshtein_eq {α : Type u_1} {β : Type u_2} {δ : Type u_3} [AddZeroClass δ] [Min δ] {C : Levenshtein.Cost α β δ} (xs : List α) (y : β) (ys : List β) :
          def levenshtein {α : Type u_1} {β : Type u_2} {δ : Type u_3} [AddZeroClass δ] [Min δ] (C : Levenshtein.Cost α β δ) (xs : List α) (ys : List β) :
          δ

          levenshtein C xs ys computes the Levenshtein distance (using the cost functions provided by a C : Cost α β δ) from the list xs to the list ys.

          Note that if the cost functions do not satisfy the inequalities

          Instances For
            theorem suffixLevenshtein_nil_nil {α : Type u_2} {β : Type u_3} {δ : Type u_1} [AddZeroClass δ] [Min δ] {C : Levenshtein.Cost α β δ} :
            ↑(suffixLevenshtein C [] []) = [0]
            theorem List.eq_of_length_one {α : Type u_1} (x : List α) (w : List.length x = 1) :
            let_fun this := (_ : 0 < List.length x); x = [x[0]]
            theorem suffixLevenshtein_nil' {α : Type u_3} {β : Type u_1} {δ : Type u_2} [AddZeroClass δ] [Min δ] {C : Levenshtein.Cost α β δ} (ys : List β) :
            ↑(suffixLevenshtein C [] ys) = [levenshtein C [] ys]
            theorem suffixLevenshtein_cons₂ {α : Type u_1} {β : Type u_2} {δ : Type u_3} [AddZeroClass δ] [Min δ] {C : Levenshtein.Cost α β δ} (xs : List α) (y : β) (ys : List β) :
            theorem suffixLevenshtein_cons₁_aux {δ : Type u_1} {x : { r // 0 < List.length r }} {y : { r // 0 < List.length r }} (w₀ : (x)[0] = (y)[0]) (w : List.tail x = List.tail y) :
            x = y
            theorem suffixLevenshtein_cons₁ {α : Type u_1} {β : Type u_2} {δ : Type u_3} [AddZeroClass δ] [Min δ] {C : Levenshtein.Cost α β δ} (x : α) (xs : List α) (ys : List β) :
            suffixLevenshtein C (x :: xs) ys = { val := levenshtein C (x :: xs) ys :: ↑(suffixLevenshtein C xs ys), property := (_ : 0 < Nat.succ (List.length ↑(suffixLevenshtein C xs ys))) }
            theorem suffixLevenshtein_cons₁_fst {α : Type u_1} {β : Type u_2} {δ : Type u_3} [AddZeroClass δ] [Min δ] {C : Levenshtein.Cost α β δ} (x : α) (xs : List α) (ys : List β) :
            ↑(suffixLevenshtein C (x :: xs) ys) = levenshtein C (x :: xs) ys :: ↑(suffixLevenshtein C xs ys)
            theorem suffixLevenshtein_cons_cons_fst_get_zero {α : Type u_1} {β : Type u_2} {δ : Type u_3} [AddZeroClass δ] [Min δ] {C : Levenshtein.Cost α β δ} (x : α) (xs : List α) (y : β) (ys : List β) (w : 0 < List.length ↑(suffixLevenshtein C (x :: xs) (y :: ys))) :
            (↑(suffixLevenshtein C (x :: xs) (y :: ys)))[0] = match suffixLevenshtein C xs (y :: ys) with | { val := dx, property := property } => match suffixLevenshtein C (x :: xs) ys with | { val := dy, property := property_1 } => match suffixLevenshtein C xs ys with | { val := dxy, property := property_2 } => min (Levenshtein.Cost.delete C x + dx[0]) (min (Levenshtein.Cost.insert C y + dy[0]) (Levenshtein.Cost.substitute C x y + dxy[0]))
            theorem suffixLevenshtein_eq_tails_map {α : Type u_1} {β : Type u_2} {δ : Type u_3} [AddZeroClass δ] [Min δ] {C : Levenshtein.Cost α β δ} (xs : List α) (ys : List β) :
            ↑(suffixLevenshtein C xs ys) = List.map (fun xs' => levenshtein C xs' ys) (List.tails xs)
            @[simp]
            theorem levenshtein_nil_nil {α : Type u_2} {β : Type u_3} {δ : Type u_1} [AddZeroClass δ] [Min δ] {C : Levenshtein.Cost α β δ} :
            levenshtein C [] [] = 0
            @[simp]
            theorem levenshtein_nil_cons {α : Type u_3} {β : Type u_1} {δ : Type u_2} [AddZeroClass δ] [Min δ] {C : Levenshtein.Cost α β δ} (y : β) (ys : List β) :
            @[simp]
            theorem levenshtein_cons_nil {α : Type u_1} {β : Type u_3} {δ : Type u_2} [AddZeroClass δ] [Min δ] {C : Levenshtein.Cost α β δ} (x : α) (xs : List α) :
            @[simp]
            theorem levenshtein_cons_cons {α : Type u_1} {β : Type u_2} {δ : Type u_3} [AddZeroClass δ] [Min δ] {C : Levenshtein.Cost α β δ} (x : α) (xs : List α) (y : β) (ys : List β) :