Zulip Chat Archive
Stream: lean4
Topic: API for string patterns
Sabrina Jewson (Dec 23 2025 at 18:36):
I wanted to work with String.Slice.startsWith and similar functions, but I discovered that these functions have no API. I’m here to gather feedback on the following API (to go in String.Slice.Pattern) which extends ForwardPattern and ToForwardSearcher with some proofs necessary to provide API for functions that make use of string patterns.
def LawfulForwardSearcherFrom {σ : Type}
{s : Slice} [Std.Iterators.Iterator σ Id (SearchStep s)] [Std.Iterators.Finite σ Id]
(it : Std.Iter (α := σ) (SearchStep s)) (pos : s.Pos) : Prop :=
match it.step with
| .yield it' (.rejected startPos endPos) _
| .yield it' (.matched startPos endPos) _ =>
pos = startPos ∧ startPos ≤ endPos ∧ LawfulForwardSearcherFrom it' endPos
| .skip it' _ => LawfulForwardSearcherFrom it' pos
| .done _ => pos = s.endPos
termination_by it.finitelyManySteps
abbrev Searcher (σ : Slice → Type) : Type := ∀ s, Std.Iterators.Iterator (σ s) Id (SearchStep s)
def EquivSearcher {σ : Slice → Type} [Searcher σ] [∀ s, Std.Iterators.Finite (σ s) Id]
{s₁ s₂ : Slice}
(it₁ : Std.Iter (α := σ s₁) (SearchStep s₁))
(it₂ : Std.Iter (α := σ s₂) (SearchStep s₂)) : Prop :=
match it₁.step, it₂.step with
| .yield it₁' (.rejected startPos₁ endPos₁) _, .yield it₂' (.rejected startPos₂ endPos₂) _
| .yield it₁' (.matched startPos₁ endPos₁) _, .yield it₂' (.matched startPos₂ endPos₂) _ =>
startPos₁.offset = startPos₂.offset ∧ endPos₁.offset = endPos₂.offset ∧
EquivSearcher it₁' it₂'
| .done _, .done _ => True
| .skip it₁' _, _ => EquivSearcher it₁' it₂
| _, .skip it₂' _ => EquivSearcher it₁ it₂'
| _, _ => False
termination_by (it₁.finitelyManySteps, it₂.finitelyManySteps)
class LawfulForwardPattern {ρ : Type} (pat : ρ) (σ : outParam (Slice → Type)) [Searcher σ]
extends ToForwardSearcher pat σ, ForwardPattern pat where
finite (s : Slice) : Std.Iterators.Finite (σ s) Id
lawful (s : Slice) : LawfulForwardSearcherFrom (toSearcher s) s.startPos
toSearcher_congr {s₁ s₂ : Slice} : s₁ == s₂ → EquivSearcher (toSearcher s₁) (toSearcher s₂)
step_toSearcher_ne_skip (s : Slice) ⦃it' h⦄ : (toSearcher s).step ≠ .skip it' h
startsWith_eq_default (s : Slice) : startsWith s = ForwardPattern.defaultStartsWith pat s
dropPrefix?_eq_default (s : Slice) : dropPrefix? s = ForwardPattern.defaultDropPrefix? pat s
Some comments:
- I’m not sure on the reasoning behind why
ToForwardSearcherandForwardPatternare two separate classes to begin with. I merged them in the lawful variant for simplicity but I don’t know if they should be separated again. step_toSearcher_ne_skipis currently necessary becausedefaultStartsWithdoes not take into account skips. Alternatively, we could modifydefaultStartsWithto properly handle skips.- I’m not sure whether this should go in core Lean or Batteries. I believe we may need some changes to core Lean, because I can’t figure out how to prove anything about
memcmpStr_iffmyself (which is necessary both to reason aboutString.Slice.beqand to constructLawfulForwardPatternfor string slices) – I just get to a goal state involvingmemcmpStr.go✝and then I’m stuck. If you can fill in thissorrythen it would be appreciated :smile:
theorem String.Slice.Pattern.Internal.memcmpStr_iff (lhs rhs : String)
(lstart : Pos.Raw) (rstart : Pos.Raw) (len : Pos.Raw)
(h₁ : len.offsetBy lstart ≤ lhs.rawEndPos)
(h₂ : len.offsetBy rstart ≤ rhs.rawEndPos) :
memcmpStr lhs rhs lstart rstart len h₁ h₂ ↔
lhs.toByteArray.extract lstart.byteIdx (len.offsetBy lstart).byteIdx =
rhs.toByteArray.extract rstart.byteIdx (len.offsetBy rstart).byteIdx := sorry
Some examples of lemmas we could prove using this API:
@[simp]
theorem String.Slice.dropPrefix?_isSome_iff {s : Slice} {ρ : Type} {pat : ρ}
{σ} [Pattern.Searcher σ] [Pattern.LawfulForwardPattern pat σ] :
(s.dropPrefix? pat).isSome ↔ s.startsWith pat
@[simp]
theorem String.Slice.copy_startsWith_eq {s : Slice} {ρ : Type} {pat : ρ}
{σ} [Pattern.Searcher σ] [Pattern.LawfulForwardPattern pat σ] :
s.copy.startsWith pat = s.startsWith pat
theorem Slice.dropWhile_eq_self {s : Slice} {ρ : Type} {pat : ρ}
{σ} [Pattern.Searcher σ] [Pattern.LawfulForwardPattern pat σ]
(h : ¬s.startsWith pat) : s.dropWhile pat = s
@[simp]
theorem Slice.takeWhile_append_dropWhile {s : Slice} {ρ : Type} {pat : ρ}
{σ} [Pattern.Searcher σ] [Pattern.LawfulForwardPattern pat σ] :
(s.takeWhile pat).copy ++ (s.dropWhile pat).copy = s.copy
Henrik Böving (Dec 23 2025 at 20:19):
API for these functions is on the roadmap of the FRO and will eventually be added by the stdlib team. When I wrote these function I did indeed design them to eventually work with a Lawful*Pattern type of thing.
Sabrina Jewson (Dec 23 2025 at 20:20):
I see. Would it be better to just wait in that case?
Henrik Böving (Dec 23 2025 at 20:24):
I’m not sure on the reasoning behind why
ToForwardSearcherandForwardPatternare two separate classes to begin with.
They exist because ForwardPattern allows users of the API to specialize certain behavior which can be implemented using just a searcher on its own. For example startsWith for string can be implemented using memcmp instead of a character by character step through the iterator. They could in principle also be implemented through default arguments to type classes I guess? It just seemed nicer to me to explicitly pull this performance optimization out of the root class.
Sabrina Jewson said:
I see. Would it be better to just wait in that case?
Likely yes.
Last updated: Feb 28 2026 at 14:05 UTC