Zulip Chat Archive

Stream: lean4

Topic: Dependent String slices?


Thomas Murrills (Nov 24 2025 at 23:01):

I'm finding that one bit of friction with the new string API is remembering that the underlying string of a String.Slice is some particular existing string. By making some string into a String.Slice, we seem to forget that valid positions for the slice are also valid positions for the string it came from.

(Instead, they're valid positions for the field, and it's easy to lose the unfoldability that is necessary to know one is the same as the other.)

I've just started using this API, so first: am I missing something, or "holding it wrong"? :)

If not: would it make sense to have dependent slices? Something like

structure String.SliceOf (s : String) where
  startInclusive : s.ValidPos
  endExclusive : s.ValidPos
  startInclusive_le_endExclusive : startInclusive  endExclusive

or does this ruin performance somehow?

Markus Himmel (Nov 25 2025 at 06:25):

Yes, we want this. The reason why it doesn't exist yet is that adding this will multiply the size of the String API by a factor of two again, and we just didn't have time to do so. Also, it will be easier to do this in one step once the String.Slice API has stabilized a bit more. We are planning to use the names String.Substring and String.Slice.Subslice.

In the meantime, I'm trying to make sure that most of the interesting equalities someSlice.str = someString are definitional. Let me know if I missed one. We also have docs#String.Slice.Pos.str, String.Slice.Pos.ofStr (this one hasn't made it into an RC yet I believe), and docs#String.ValidPos.cast, which should offer a way of dealing with this reasonably (without descending right into DTT hell) even if the equality is not definitional. If this turns out to be annoying in your use case, I would be grateful if you could share some code with me, so that I can improve the API in core.

Thomas Murrills (Nov 25 2025 at 22:48):

That's great to hear! :)

Here's a simplified example of what I'm working with. The crux of it, I suppose, is that I'm starting with Syntax.Ranges. That makes it convenient to define something that converts it into an Option String.Slice.

The following example just puts brackets around the provided array of ranges. This only seems convenient with dependent slices; if I want to use Slice, I have to return the equality from my Syntax.Range.toSlice?, and then rewrite with .

Also, a secondary question: what's the preferred way to construct a slice "by hand" when the positions and inequality are present? I'm not inclined to use replaceStartEnd, since I don't want to adjust an existing slice; it seems contrived to first construct the whole slice then replace the start and end. Further, is there something that attempts to figure out the inequality from the context, e.g. with a by grind autoparam?

(Note: String.replaceStart took some time to find, because I was looking for something that referenced going to the end of the string, not adjusting a start. In general, it was a bit of a surprise to think of the start and end positions of strings themselves (as opposed to slices) as things which can even become updated! :) )

And a final note: I seem forced here to first construct the slice for the range, then compare its position. Similarly to losing the connection to the string, if I use if h : prevEndPos.offset ≤ range.start, and then construct the slice, I lose the connection between the positions mentioned by h and the position in the new slice, and so can't construct the other slice up to the start of the range (the one which uses h).

It seems most of the pain here (which, to be honest, really isn't that much altogether :) ) is coming from having to the traverse the boundary between raw positions and valid positions.

I realize that the new API is just getting started, so I don't fault it at all for these kinds of things not being available yet! :grinning_face_with_smiling_eyes: But hopefully it helps to hear the kinds of things I'm attempting to reach for.

bracket ranges

Markus Himmel (Nov 26 2025 at 12:17):

First of all, thank you very much for engaging with the material in its current undocumented state and providing feedback. This kind of feedback is very valuable to me.

Disclaimer about everything I'm about to say: it will only be available starting in Lean 4.27.0.

Thomas Murrills said:

what's the preferred way to construct a slice "by hand" when the positions and inequality are present?

Thomas Murrills said:

Note: String.replaceStart took some time to find, because I was looking for something that referenced going to the end of the string, not adjusting a start.

As of lean4#11290, we now have a uniform family of functions String.slice, String.slice!, String.sliceFrom, String.sliceTo, String.Slice.slice, String.Slice.slice!, String.Slice.sliceFrom, String.Slice.sliceTo for taking slices delineated by one or two positions. So String.replaceStart is now called String.sliceFrom, which is hopefully a bit easier to understand.

About your example: I agree that if you like to go through the Option route, then having dependent slices is convenient. However, there is a different approach that works quite well with plain String.Slice, which you had no chance of discovering because the necessary functions were only added today (in lean4#11380). The idea is this: whenever you create a slice, there should be operations on positions that transport positions forwards and backwards across this operation. For example: String.slice creates a Slice from a String. Therefore, there should be a function String.Pos.slice that turns a s.Pos into a (s.slice ..).Pos, and in the other direction, there should be a function String.Pos.ofSlice that turns a (s.slice ..).Pos into a s.Pos. Following this approach for the general-purpose slicing functions, we get the following little zoo (sorry for sending a screenshot; I'd send a live-lean link but this stuff isn't in any nightly yet...):
Screenshot_20251126_111845.png

Notice that this gives an alternative answer to the problem of "forgetting" what string the slice is on: if I have a term of type (s.slice p0 p1 h).Pos, then the s is right there in the type, and calling String.Slice.Pos.ofSlice will recover the corresponding position on s for me.

This principle also applies to user-defined operations, like your operation that turns Syntax.Range into a slice. Here is how you could implement this (this should compile starting with nightly-2025-11-27):

import Lean

open Lean

structure Lean.Syntax.Range.IsValid (range : Syntax.Range) (s : String) : Prop where
  isValid_start : range.start.IsValid s
  isValid_stop : range.stop.IsValid s
  le : range.start  range.stop

theorem Lean.Syntax.Range.isValid_iff {range : Syntax.Range} {s : String} :
    range.IsValid s  range.start.IsValid s  range.stop.IsValid s  range.start  range.stop := by
  grind [IsValid]

instance {range : Syntax.Range} {s : String} : Decidable (range.IsValid s) :=
  decidable_of_iff' _ Lean.Syntax.Range.isValid_iff

def Lean.Syntax.Range.toSlice (range : Syntax.Range) (s : String) (h : range.IsValid s) : String.Slice :=
  s.slice (s.pos range.start h.isValid_start) (s.pos range.stop h.isValid_stop) h.le

def String.Pos.ofSyntaxRange {s : String} {range : Syntax.Range} {h} (pos : (range.toSlice s h).Pos) :
    s.Pos :=
  ofSlice pos

/-- Brackets the ranges in `ranges`. Assumes the ranges are already sorted and are disjoint. -/
def String.bracketRanges (text : String) (ranges : Array Syntax.Range) :
    String := Id.run do
  let mut out : String := ""
  let mut prevEndPos := text.startPos
  for range in ranges do
    if h : range.IsValid text then
      let slice := range.toSlice text h
      if h' : prevEndPos  String.Pos.ofSyntaxRange slice.startPos then
        out := out ++ text.slice prevEndPos (String.Pos.ofSyntaxRange slice.startPos) h'
        out := out ++ "⟪"
        out := out ++ slice
        out := out ++ "⟫"
        prevEndPos := String.Pos.ofSyntaxRange slice.endPos
  out := out ++ text.sliceFrom prevEndPos
  return out

#eval "abcdef".bracketRanges #[⟨⟨0,1⟩⟩, ⟨⟨3,5⟩⟩] -- "⟪a⟫bc⟪de⟫f"

So here I'm defining a Lean.Syntax.Range.toSlice which takes a proof of validity instead of your Option-returning function. The advantage is that this allows us to easily define String.Pos.ofSyntaxRange, which transforms a position on range.toSlice s h into a position on s. In String.bracketRange we can then easily get hold of the required positions, and we're not even relying on any definitional equalities any more after String.Pos.ofSyntaxRange is defined.

This approach also offers a solution to your ordering problem, because it is easy to prove a lemma relating the ordering on s to the ordering on range.toSlice s h:

import Lean

open Lean

structure Lean.Syntax.Range.IsValid (range : Syntax.Range) (s : String) : Prop where
  isValid_start : range.start.IsValid s
  isValid_stop : range.stop.IsValid s
  le : range.start  range.stop

theorem Lean.Syntax.Range.isValid_iff {range : Syntax.Range} {s : String} :
    range.IsValid s  range.start.IsValid s  range.stop.IsValid s  range.start  range.stop := by
  grind [IsValid]

instance {range : Syntax.Range} {s : String} : Decidable (range.IsValid s) :=
  decidable_of_iff' _ Lean.Syntax.Range.isValid_iff

def Lean.Syntax.Range.toSlice (range : Syntax.Range) (s : String) (h : range.IsValid s) : String.Slice :=
  s.slice (s.pos range.start h.isValid_start) (s.pos range.stop h.isValid_stop) h.le

def String.Pos.ofSyntaxRange {s : String} {range : Syntax.Range} {h} (pos : (range.toSlice s h).Pos) :
    s.Pos :=
  ofSlice pos

theorem String.Pos.le_ofSyntaxRange_iff {s : String} {range : Syntax.Range} {p : s.Pos} {h}
    {p' : (range.toSlice s h).Pos} :
    p  String.Pos.ofSyntaxRange p'  p.offset  p'.offset.offsetBy range.start := by
  simp [Pos.le_iff, ofSyntaxRange] -- this is actually true by `Iff.rfl`.

/-- Brackets the ranges in `ranges`. Assumes the ranges are already sorted and are disjoint. -/
def String.bracketRanges' (text : String) (ranges : Array Syntax.Range) :
    String := Id.run do
  let mut out : String := ""
  let mut prevEndPos := text.startPos
  for range in ranges do
    if h : range.IsValid text  prevEndPos.offset  range.start then
      let slice := range.toSlice text h.1
      out := out ++ text.slice prevEndPos (String.Pos.ofSyntaxRange slice.startPos)
        (by simp [String.Pos.le_ofSyntaxRange_iff, h.2])
      out := out ++ "⟪"
      out := out ++ slice
      out := out ++ "⟫"
      prevEndPos := String.Pos.ofSyntaxRange slice.endPos
  out := out ++ text.sliceFrom prevEndPos
  return out

#eval "abcdef".bracketRanges' #[⟨⟨0,1⟩⟩, ⟨⟨3,5⟩⟩] -- "⟪a⟫bc⟪de⟫f"

Note that I'm not saying that this is strictly better or worse than having dependent slices/substring; I think we'll want both.


Last updated: Dec 20 2025 at 21:32 UTC