Zulip Chat Archive
Stream: lean4
Topic: Dependent String slices?
Thomas Murrills (Nov 24 2025 at 23:01):
I'm finding that one bit of friction with the new string API is remembering that the underlying string of a String.Slice is some particular existing string. By making some string into a String.Slice, we seem to forget that valid positions for the slice are also valid positions for the string it came from.
(Instead, they're valid positions for the field, and it's easy to lose the unfoldability that is necessary to know one is the same as the other.)
I've just started using this API, so first: am I missing something, or "holding it wrong"? :)
If not: would it make sense to have dependent slices? Something like
structure String.SliceOf (s : String) where
startInclusive : s.ValidPos
endExclusive : s.ValidPos
startInclusive_le_endExclusive : startInclusive ≤ endExclusive
or does this ruin performance somehow?
Markus Himmel (Nov 25 2025 at 06:25):
Yes, we want this. The reason why it doesn't exist yet is that adding this will multiply the size of the String API by a factor of two again, and we just didn't have time to do so. Also, it will be easier to do this in one step once the String.Slice API has stabilized a bit more. We are planning to use the names String.Substring and String.Slice.Subslice.
In the meantime, I'm trying to make sure that most of the interesting equalities someSlice.str = someString are definitional. Let me know if I missed one. We also have docs#String.Slice.Pos.str, String.Slice.Pos.ofStr (this one hasn't made it into an RC yet I believe), and docs#String.ValidPos.cast, which should offer a way of dealing with this reasonably (without descending right into DTT hell) even if the equality is not definitional. If this turns out to be annoying in your use case, I would be grateful if you could share some code with me, so that I can improve the API in core.
Thomas Murrills (Nov 25 2025 at 22:48):
That's great to hear! :)
Here's a simplified example of what I'm working with. The crux of it, I suppose, is that I'm starting with Syntax.Ranges. That makes it convenient to define something that converts it into an Option String.Slice.
The following example just puts brackets around the provided array of ranges. This only seems convenient with dependent slices; if I want to use Slice, I have to return the equality from my Syntax.Range.toSlice?, and then rewrite with ▸.
Also, a secondary question: what's the preferred way to construct a slice "by hand" when the positions and inequality are present? I'm not inclined to use replaceStartEnd, since I don't want to adjust an existing slice; it seems contrived to first construct the whole slice then replace the start and end. Further, is there something that attempts to figure out the inequality from the context, e.g. with a by grind autoparam?
(Note: String.replaceStart took some time to find, because I was looking for something that referenced going to the end of the string, not adjusting a start. In general, it was a bit of a surprise to think of the start and end positions of strings themselves (as opposed to slices) as things which can even become updated! :) )
And a final note: I seem forced here to first construct the slice for the range, then compare its position. Similarly to losing the connection to the string, if I use if h : prevEndPos.offset ≤ range.start, and then construct the slice, I lose the connection between the positions mentioned by h and the position in the new slice, and so can't construct the other slice up to the start of the range (the one which uses h).
It seems most of the pain here (which, to be honest, really isn't that much altogether :) ) is coming from having to the traverse the boundary between raw positions and valid positions.
I realize that the new API is just getting started, so I don't fault it at all for these kinds of things not being available yet! :grinning_face_with_smiling_eyes: But hopefully it helps to hear the kinds of things I'm attempting to reach for.
bracket ranges
Markus Himmel (Nov 26 2025 at 12:17):
First of all, thank you very much for engaging with the material in its current undocumented state and providing feedback. This kind of feedback is very valuable to me.
Disclaimer about everything I'm about to say: it will only be available starting in Lean 4.27.0.
Thomas Murrills said:
what's the preferred way to construct a slice "by hand" when the positions and inequality are present?
Thomas Murrills said:
Note:
String.replaceStarttook some time to find, because I was looking for something that referenced going to the end of the string, not adjusting a start.
As of lean4#11290, we now have a uniform family of functions String.slice, String.slice!, String.sliceFrom, String.sliceTo, String.Slice.slice, String.Slice.slice!, String.Slice.sliceFrom, String.Slice.sliceTo for taking slices delineated by one or two positions. So String.replaceStart is now called String.sliceFrom, which is hopefully a bit easier to understand.
About your example: I agree that if you like to go through the Option route, then having dependent slices is convenient. However, there is a different approach that works quite well with plain String.Slice, which you had no chance of discovering because the necessary functions were only added today (in lean4#11380). The idea is this: whenever you create a slice, there should be operations on positions that transport positions forwards and backwards across this operation. For example: String.slice creates a Slice from a String. Therefore, there should be a function String.Pos.slice that turns a s.Pos into a (s.slice ..).Pos, and in the other direction, there should be a function String.Pos.ofSlice that turns a (s.slice ..).Pos into a s.Pos. Following this approach for the general-purpose slicing functions, we get the following little zoo (sorry for sending a screenshot; I'd send a live-lean link but this stuff isn't in any nightly yet...):
Screenshot_20251126_111845.png
Notice that this gives an alternative answer to the problem of "forgetting" what string the slice is on: if I have a term of type (s.slice p0 p1 h).Pos, then the s is right there in the type, and calling String.Slice.Pos.ofSlice will recover the corresponding position on s for me.
This principle also applies to user-defined operations, like your operation that turns Syntax.Range into a slice. Here is how you could implement this (this should compile starting with nightly-2025-11-27):
import Lean
open Lean
structure Lean.Syntax.Range.IsValid (range : Syntax.Range) (s : String) : Prop where
isValid_start : range.start.IsValid s
isValid_stop : range.stop.IsValid s
le : range.start ≤ range.stop
theorem Lean.Syntax.Range.isValid_iff {range : Syntax.Range} {s : String} :
range.IsValid s ↔ range.start.IsValid s ∧ range.stop.IsValid s ∧ range.start ≤ range.stop := by
grind [IsValid]
instance {range : Syntax.Range} {s : String} : Decidable (range.IsValid s) :=
decidable_of_iff' _ Lean.Syntax.Range.isValid_iff
def Lean.Syntax.Range.toSlice (range : Syntax.Range) (s : String) (h : range.IsValid s) : String.Slice :=
s.slice (s.pos range.start h.isValid_start) (s.pos range.stop h.isValid_stop) h.le
def String.Pos.ofSyntaxRange {s : String} {range : Syntax.Range} {h} (pos : (range.toSlice s h).Pos) :
s.Pos :=
ofSlice pos
/-- Brackets the ranges in `ranges`. Assumes the ranges are already sorted and are disjoint. -/
def String.bracketRanges (text : String) (ranges : Array Syntax.Range) :
String := Id.run do
let mut out : String := ""
let mut prevEndPos := text.startPos
for range in ranges do
if h : range.IsValid text then
let slice := range.toSlice text h
if h' : prevEndPos ≤ String.Pos.ofSyntaxRange slice.startPos then
out := out ++ text.slice prevEndPos (String.Pos.ofSyntaxRange slice.startPos) h'
out := out ++ "⟪"
out := out ++ slice
out := out ++ "⟫"
prevEndPos := String.Pos.ofSyntaxRange slice.endPos
out := out ++ text.sliceFrom prevEndPos
return out
#eval "abcdef".bracketRanges #[⟨⟨0⟩,⟨1⟩⟩, ⟨⟨3⟩,⟨5⟩⟩] -- "⟪a⟫bc⟪de⟫f"
So here I'm defining a Lean.Syntax.Range.toSlice which takes a proof of validity instead of your Option-returning function. The advantage is that this allows us to easily define String.Pos.ofSyntaxRange, which transforms a position on range.toSlice s h into a position on s. In String.bracketRange we can then easily get hold of the required positions, and we're not even relying on any definitional equalities any more after String.Pos.ofSyntaxRange is defined.
This approach also offers a solution to your ordering problem, because it is easy to prove a lemma relating the ordering on s to the ordering on range.toSlice s h:
import Lean
open Lean
structure Lean.Syntax.Range.IsValid (range : Syntax.Range) (s : String) : Prop where
isValid_start : range.start.IsValid s
isValid_stop : range.stop.IsValid s
le : range.start ≤ range.stop
theorem Lean.Syntax.Range.isValid_iff {range : Syntax.Range} {s : String} :
range.IsValid s ↔ range.start.IsValid s ∧ range.stop.IsValid s ∧ range.start ≤ range.stop := by
grind [IsValid]
instance {range : Syntax.Range} {s : String} : Decidable (range.IsValid s) :=
decidable_of_iff' _ Lean.Syntax.Range.isValid_iff
def Lean.Syntax.Range.toSlice (range : Syntax.Range) (s : String) (h : range.IsValid s) : String.Slice :=
s.slice (s.pos range.start h.isValid_start) (s.pos range.stop h.isValid_stop) h.le
def String.Pos.ofSyntaxRange {s : String} {range : Syntax.Range} {h} (pos : (range.toSlice s h).Pos) :
s.Pos :=
ofSlice pos
theorem String.Pos.le_ofSyntaxRange_iff {s : String} {range : Syntax.Range} {p : s.Pos} {h}
{p' : (range.toSlice s h).Pos} :
p ≤ String.Pos.ofSyntaxRange p' ↔ p.offset ≤ p'.offset.offsetBy range.start := by
simp [Pos.le_iff, ofSyntaxRange] -- this is actually true by `Iff.rfl`.
/-- Brackets the ranges in `ranges`. Assumes the ranges are already sorted and are disjoint. -/
def String.bracketRanges' (text : String) (ranges : Array Syntax.Range) :
String := Id.run do
let mut out : String := ""
let mut prevEndPos := text.startPos
for range in ranges do
if h : range.IsValid text ∧ prevEndPos.offset ≤ range.start then
let slice := range.toSlice text h.1
out := out ++ text.slice prevEndPos (String.Pos.ofSyntaxRange slice.startPos)
(by simp [String.Pos.le_ofSyntaxRange_iff, h.2])
out := out ++ "⟪"
out := out ++ slice
out := out ++ "⟫"
prevEndPos := String.Pos.ofSyntaxRange slice.endPos
out := out ++ text.sliceFrom prevEndPos
return out
#eval "abcdef".bracketRanges' #[⟨⟨0⟩,⟨1⟩⟩, ⟨⟨3⟩,⟨5⟩⟩] -- "⟪a⟫bc⟪de⟫f"
Note that I'm not saying that this is strictly better or worse than having dependent slices/substring; I think we'll want both.
Last updated: Dec 20 2025 at 21:32 UTC