Zulip Chat Archive

Stream: lean4

Topic: API for string patterns


Sabrina Jewson (Dec 23 2025 at 18:36):

I wanted to work with String.Slice.startsWith and similar functions, but I discovered that these functions have no API. I’m here to gather feedback on the following API (to go in String.Slice.Pattern) which extends ForwardPattern and ToForwardSearcher with some proofs necessary to provide API for functions that make use of string patterns.

def LawfulForwardSearcherFrom {σ : Type}
  {s : Slice} [Std.Iterators.Iterator σ Id (SearchStep s)] [Std.Iterators.Finite σ Id]
  (it : Std.Iter (α := σ) (SearchStep s)) (pos : s.Pos) : Prop :=
  match it.step with
  | .yield it' (.rejected startPos endPos) _
  | .yield it' (.matched startPos endPos) _ =>
    pos = startPos  startPos  endPos  LawfulForwardSearcherFrom it' endPos
  | .skip it' _ => LawfulForwardSearcherFrom it' pos
  | .done _ => pos = s.endPos
termination_by it.finitelyManySteps

abbrev Searcher (σ : Slice  Type) : Type :=  s, Std.Iterators.Iterator (σ s) Id (SearchStep s)

def EquivSearcher {σ : Slice  Type} [Searcher σ] [ s, Std.Iterators.Finite (σ s) Id]
  {s₁ s₂ : Slice}
  (it₁ : Std.Iter (α := σ s₁) (SearchStep s₁))
  (it₂ : Std.Iter (α := σ s₂) (SearchStep s₂)) : Prop :=
  match it₁.step, it₂.step with
  | .yield it₁' (.rejected startPos₁ endPos₁) _, .yield it₂' (.rejected startPos₂ endPos₂) _
  | .yield it₁' (.matched startPos₁ endPos₁) _, .yield it₂' (.matched startPos₂ endPos₂) _ =>
    startPos₁.offset = startPos₂.offset  endPos₁.offset = endPos₂.offset 
      EquivSearcher it₁' it₂'
  | .done _, .done _ => True
  | .skip it₁' _, _ => EquivSearcher it₁' it₂
  | _, .skip it₂' _ => EquivSearcher it₁ it₂'
  | _, _ => False
termination_by (it₁.finitelyManySteps, it₂.finitelyManySteps)

class LawfulForwardPattern {ρ : Type} (pat : ρ) (σ : outParam (Slice  Type)) [Searcher σ]
  extends ToForwardSearcher pat σ, ForwardPattern pat where
  finite (s : Slice) : Std.Iterators.Finite (σ s) Id
  lawful (s : Slice) : LawfulForwardSearcherFrom (toSearcher s) s.startPos
  toSearcher_congr {s₁ s₂ : Slice} : s₁ == s₂  EquivSearcher (toSearcher s₁) (toSearcher s₂)
  step_toSearcher_ne_skip (s : Slice) it' h : (toSearcher s).step  .skip it' h
  startsWith_eq_default (s : Slice) : startsWith s = ForwardPattern.defaultStartsWith pat s
  dropPrefix?_eq_default (s : Slice) : dropPrefix? s = ForwardPattern.defaultDropPrefix? pat s

Some comments:

  • I’m not sure on the reasoning behind why ToForwardSearcher and ForwardPattern are two separate classes to begin with. I merged them in the lawful variant for simplicity but I don’t know if they should be separated again.
  • step_toSearcher_ne_skip is currently necessary because defaultStartsWith does not take into account skips. Alternatively, we could modify defaultStartsWith to properly handle skips.
  • I’m not sure whether this should go in core Lean or Batteries. I believe we may need some changes to core Lean, because I can’t figure out how to prove anything about memcmpStr_iff myself (which is necessary both to reason about String.Slice.beq and to construct LawfulForwardPattern for string slices) – I just get to a goal state involving memcmpStr.go✝ and then I’m stuck. If you can fill in this sorry then it would be appreciated :smile:
theorem String.Slice.Pattern.Internal.memcmpStr_iff (lhs rhs : String)
  (lstart : Pos.Raw) (rstart : Pos.Raw) (len : Pos.Raw)
  (h₁ : len.offsetBy lstart  lhs.rawEndPos)
  (h₂ : len.offsetBy rstart  rhs.rawEndPos) :
    memcmpStr lhs rhs lstart rstart len h₁ h₂ 
      lhs.toByteArray.extract lstart.byteIdx (len.offsetBy lstart).byteIdx =
        rhs.toByteArray.extract rstart.byteIdx (len.offsetBy rstart).byteIdx := sorry

Some examples of lemmas we could prove using this API:

@[simp]
theorem String.Slice.dropPrefix?_isSome_iff {s : Slice} {ρ : Type} {pat : ρ}
  {σ} [Pattern.Searcher σ] [Pattern.LawfulForwardPattern pat σ] :
  (s.dropPrefix? pat).isSome  s.startsWith pat

@[simp]
theorem String.Slice.copy_startsWith_eq {s : Slice} {ρ : Type} {pat : ρ}
  {σ} [Pattern.Searcher σ] [Pattern.LawfulForwardPattern pat σ] :
  s.copy.startsWith pat = s.startsWith pat

theorem Slice.dropWhile_eq_self {s : Slice} {ρ : Type} {pat : ρ}
  {σ} [Pattern.Searcher σ] [Pattern.LawfulForwardPattern pat σ]
  (h : ¬s.startsWith pat) : s.dropWhile pat = s

@[simp]
theorem Slice.takeWhile_append_dropWhile {s : Slice} {ρ : Type} {pat : ρ}
  {σ} [Pattern.Searcher σ] [Pattern.LawfulForwardPattern pat σ] :
  (s.takeWhile pat).copy ++ (s.dropWhile pat).copy = s.copy

Henrik Böving (Dec 23 2025 at 20:19):

API for these functions is on the roadmap of the FRO and will eventually be added by the stdlib team. When I wrote these function I did indeed design them to eventually work with a Lawful*Pattern type of thing.

Sabrina Jewson (Dec 23 2025 at 20:20):

I see. Would it be better to just wait in that case?

Henrik Böving (Dec 23 2025 at 20:24):

I’m not sure on the reasoning behind why ToForwardSearcher and ForwardPattern are two separate classes to begin with.

They exist because ForwardPattern allows users of the API to specialize certain behavior which can be implemented using just a searcher on its own. For example startsWith for string can be implemented using memcmp instead of a character by character step through the iterator. They could in principle also be implemented through default arguments to type classes I guess? It just seemed nicer to me to explicitly pull this performance optimization out of the root class.

Sabrina Jewson said:

I see. Would it be better to just wait in that case?

Likely yes.


Last updated: Feb 28 2026 at 14:05 UTC