Documentation

Mathlib.Data.List.EditDistance.Estimator

Estimator for Levenshtein distance. #

The usual algorithm for computing Levenshtein distances provides successively better lower bounds for the Levenshtein distance as it runs, as proved in Mathlib.Data.List.EditDistance.Bounds.

In this file we package that fact as an instance of

Estimator (Thunk.mk fun _ => levenshtein C xs ys) (LevenshteinEstimator C xs ys)

allowing us to use the Estimator framework for Levenshtein distances.

This is then used in the implementation of rewrite_search to avoid needing the entire edit distance calculation in unlikely search paths.

structure LevenshteinEstimator' {α : Type} {β : Type} {δ : Type} [CanonicallyLinearOrderedAddMonoid δ] (C : Levenshtein.Cost α β δ) (xs : List α) (ys : List β) :
  • pre_rev : List β

    The prefix of ys that is not is not involved in the bound, in reverse order.

  • suff : List β

    The suffix of ys, such that the distance from xs to ys is bounded below by the minimum distance from any suffix of xs to this suffix.

  • split : List.reverse s.pre_rev ++ s.suff = ys

    Witness that ys has been decomposed into a prefix and suffix.

  • distances : { r // 0 < List.length r }

    The distances from each suffix of xs to suff.

  • distances_eq : s.distances = suffixLevenshtein C xs s.suff

    Witness that distances are correct.

  • bound : δ ×

    The current bound on the pair (distance from xs to ys, length of ys).

  • bound_eq : s.bound = match s.pre_rev, (_ : List.reverse s.pre_rev ++ s.suff = ys) with | [], split => ((s.distances)[0], List.length ys) | x, split => (List.minimum_of_length_pos (_ : 0 < List.length s.distances), List.length s.suff)

    Predicate describing the current bound.

Data showing that the Levenshtein distance from xs to ys is bounded below by the minimum Levenshtein distance between some suffix of xs and a particular suffix of ys.

This bound is (non-strict) monotone as we take longer suffixes of ys.

This is an auxiliary definition for the later LevenshteinEstimator: this variant constructs a lower bound for the pair consisting of the Levenshtein distance from xs to ys, along with the length of ys.

Instances For
    instance estimator' {α : Type} {β : Type} {δ : Type} [CanonicallyLinearOrderedAddMonoid δ] (C : Levenshtein.Cost α β δ) (xs : List α) (ys : List β) :
    Estimator { fn := fun x => (levenshtein C xs ys, List.length ys) } (LevenshteinEstimator' C xs ys)
    def LevenshteinEstimator {α : Type} {β : Type} {δ : Type} [CanonicallyLinearOrderedAddMonoid δ] (C : Levenshtein.Cost α β δ) (xs : List α) (ys : List β) :

    An estimator for Levenshtein distances.

    Instances For
      instance instBotLevenshteinEstimator {α : Type} {β : Type} {δ : Type} [CanonicallyLinearOrderedAddMonoid δ] (C : Levenshtein.Cost α β δ) (xs : List α) (ys : List β) :

      The initial estimator for Levenshtein distances.