Zulip Chat Archive
Stream: lean4
Topic: String API recommendations
pandaman (Nov 30 2025 at 06:32):
I noticed that String.Iterator is being deprecated in v4.26.0-rc2 while updating lean-regex. I have two questions about the future directions of String APIs so that I can update the lean-regex APIs accordingly.
1. What should String APIs take as arguments?
So far, I have been using String.Iterator as an input to the regex searches, where
- the termination proofs don't require the iterator to be valid (which required some tricky lemmas)
- the correctness proofs assume a valid iterator
With the deprecation of String.Iterator, are we expected to migrate string functions to accept String.ValidPos, as described in this doc comment, and drop support for non-vaild positions? (I think it's fine as the functions have returned bogus results for non-valid iterators anyway)
One tricky part of the migration is that String.endPos.offsetBy 1 is a useful sentinel value for representing a search iterator that has consumed the entire input (example), but I guess we can use Pos.Raw specifically for it.
2. Future of String.Iterator.ValidFor
ValidFor from Batteries is a VERY useful mechanism to reason about code points (or Chars) in a string, and lean-regex uses it quite a lot. Is there a plan to have a counterpart for ValidPos in core? If not, is it okay to make an PR to Batteries to duplicate ValidFor for ValidPos?
Robin Arnez (Nov 30 2025 at 14:32):
For 1.: Yes, the recommended alternative to String.Iterator is String.ValidPos (which will be String.Pos in a future update). If you need sentinel values, I guess you can still use String.Pos.Raw or, well, Option s.ValidPos. There is also some stuff for termination at docs#String.ValidPos.instWellFoundedRelation
For 2.: The core version of ValidFor is docs#String.ValidPos.Splits (which will be String.Pos.Splits in a future update) which works on Strings instead of List Chars. If you want a literal equivalent, you can use something like pos.Splits (String.ofList s) (String.ofList t). The API of Splits is still a bit incomplete though.
Anyways, @Markus Himmel might have something more to say about this.
pandaman (Nov 30 2025 at 14:48):
Thank you for the answers!
Regarding a valid position + sentinel, I created a type for it, which seems to be working so far.
As for a ValidFor replacement, the most typical lemma I use is it.ValidFor l (m :: r) → it.next.ValidFor (l ++ [m]) r. The Splits counterpart looks a bit more mouthful, but I'll see if I can make it work. Thank you for the pointer!
pandaman (Nov 30 2025 at 15:01):
Oh, the ValidPos one looks much nicer: https://leanprover-community.github.io/mathlib4_docs/Init/Data/String/Lemmas/Splits.html#String.ValidPos.Splits.next
(Still trickier than ValidFor, though)
Markus Himmel (Dec 01 2025 at 06:36):
If you feel that Splits works less well than ValidFor in your application, please share some code. It's still early enough days to change the API if necessary.
pandaman (Dec 01 2025 at 13:14):
I'm still trying to understand the new APIs, but here is a case study migrating a simple search function and its proofs to ValidPos/Splits.
Observations
- Simple arguments work with
Splits(seefind_completenessAux) - However, I needed to reach List-based reasoning with
ofList/toListto prove theorems likeString.eq_of_append_eqorString.empty_or_eq_singleton_append. Being able to do pattern match/induction is so useful here - The biggest issue is that I really have no idea how to reason about
ValidPos.get. Previously,Pos.Raw.getwas defined viaPos.Raw.utf8GetAux, which allowed list-based reasoning and ultimately useful theorems likeIterator.ValidFor.curr. Now thatValidPos.getgoes through extracting a character from a byte array, I cannot prove a counterpart withSplits, namelysplits_get_singleton (p.Splits l (singleton c ++ r)) : p.get ... = cin the shared code.
pandaman (Dec 01 2025 at 13:20):
The last point is more about ValidPos.get vs Pos.Raw.get than ValidPos.Splits vs Iterator.ValidFor, but nonetheless I wouldn't be able to migrate lean-regex proofs to ValidPos without useful theorems about ValidPos.get (sorry if they are already added).
I hope this is a useful feedback. Thank you!
Markus Himmel (Dec 01 2025 at 13:32):
Here is how I would prove splits_get_singleton (which is a good theorem for the library):
theorem splits_get_singleton {s l r : String} {c : Char} {p : ValidPos s} (sp : p.Splits l (singleton c ++ r)) :
p.get sp.ne_endValidPos_of_singleton = c := by
obtain ⟨r', h⟩ := sp.exists_eq_singleton_append sp.ne_endValidPos_of_singleton
simp at h
exact h.1.symm
Markus Himmel (Dec 01 2025 at 13:35):
next_le_of_lt was part of lean4#11289, so it should show up in the next version.
pandaman (Dec 01 2025 at 13:40):
Thanks!
Markus Himmel (Dec 01 2025 at 13:41):
Some meta-information: the Splits API is very much still in (early) development; I am fully aware that tons of material is missing. Unfortunately I will have to take a break from working on it in December, but work will resume in January, and it's great to know what you're currently struggling with, so that I can make sure that these things become easy. Verification of the find function that is part of the standard library is part of the TODO list of course.
pandaman (Dec 02 2025 at 12:38):
Thanks! I managed to prove the completeness with the new lemmas: Playground link
I want to mention that I quite like the dependent ValidPos type, and I understand there is much room for future improvements. Thank you for great work!
pandaman (Dec 06 2025 at 06:16):
A few more useful lemmas for ValidPos (mostly ports from the Nat ordering):
namespace String.ValidPos
-- already in nightly
theorem next_le_of_lt {s} {pos pos' : ValidPos s} {h : pos ≠ s.endValidPos} (lt : pos < pos') : pos.next h ≤ pos' := sorry
theorem next_inj {s} {pos pos' : ValidPos s} {h : pos ≠ s.endValidPos} {h' : pos' ≠ s.endValidPos}
(eq : pos.next h = pos'.next h') :
pos = pos' := by
have eq' := (pos.splits_next h).eq_left (eq ▸ pos'.splits_next h')
simp only [append_singleton, push_inj] at eq'
exact ValidPos.ext (Eq.trans (eq'.1 ▸ pos.splits.offset_eq_rawEndPos) (pos'.splits.offset_eq_rawEndPos).symm)
theorem lt_of_le_of_ne {s} {pos pos' : ValidPos s} (le : pos ≤ pos') (ne : pos ≠ pos') : pos < pos' :=
Nat.lt_of_le_of_ne le (by simpa [ValidPos.ext_iff, Pos.Raw.ext_iff] using ne)
theorem le_of_lt_next {s} {pos pos' : ValidPos s} {h' : pos' ≠ s.endValidPos} (lt : pos < pos'.next h') : pos ≤ pos' :=
Decidable.by_contra (fun nle => Nat.not_lt_of_le (ValidPos.next_le_of_lt (Nat.lt_of_not_le nle)) lt)
theorem le_or_eq_of_le_next {s} {pos pos' : ValidPos s} {h' : pos' ≠ s.endValidPos} (le : pos ≤ pos'.next h') :
pos ≤ pos' ∨ pos = pos'.next h' :=
Decidable.byCases
(fun (eq : pos = pos'.next h') => .inr eq)
(fun (ne : pos ≠ pos'.next h') => .inl (le_of_lt_next (lt_of_le_of_ne le ne)))
theorem le_next_iff {s} {pos pos' : ValidPos s} {h' : pos' ≠ s.endValidPos} :
pos ≤ pos'.next h' ↔ pos ≤ pos' ∨ pos = pos'.next h' := by
refine ⟨le_or_eq_of_le_next, ?_⟩
intro h
cases h with
| inl le => exact le_trans le (le_of_lt pos'.lt_next)
| inr eq => exact eq ▸ le_refl _
theorem lt_next_iff {s} {pos pos' : ValidPos s} {h' : pos' ≠ s.endValidPos} : pos < pos'.next h' ↔ pos ≤ pos' :=
⟨le_of_lt_next, fun le => Nat.lt_of_le_of_lt le pos'.lt_next⟩
theorem le_iff_lt_or_eq {s} {pos pos' : ValidPos s} : pos ≤ pos' ↔ pos < pos' ∨ pos = pos' :=
Iff.trans Nat.le_iff_lt_or_eq (or_congr Iff.rfl (by simp [ValidPos.ext_iff, Pos.Raw.ext_iff]))
theorem lt_next_iff_lt_or_eq {s} {pos pos' : ValidPos s} (h' : pos' ≠ s.endValidPos) :
pos < pos'.next h' ↔ pos < pos' ∨ pos = pos' :=
lt_next_iff.trans le_iff_lt_or_eq
end String.ValidPos
pandaman (Dec 06 2025 at 06:26):
I have completed porting from Iterator to ValidPos with a scary +3,434 −4,321 PR :joy: The proofs are passing so it should be good... :octopus:
Reflecting on the experiences, I'm convinced that String-based Splits is useful enough to prove non-trivial properties like those in lean-regex, and List Char-based one is not necessary (while falling back to lists sometimes makes the reasoning easier). I'm sure that missing lemmas can be added gradually.
Again, I like the dependent ValidPos s, and I hope we have dependent slices, too. It removed the need for some verification conditions (e.g., it.next.s = it.s), which partly contributed to the reduction in the line count. (Not all reductions come from it, since rewriting proofs with grind also reduced line of proofs by a few hundred)
pandaman (Dec 06 2025 at 06:26):
Great thank you for improving the string APIs!
Last updated: Dec 20 2025 at 21:32 UTC