Zulip Chat Archive
Stream: lean4
Topic: the way String.extract handles invalid byte positions
Bulhwi Cha (Jun 05 2024 at 14:29):
Consider the following evaluations:
#eval "ㄱㄴㄷ".extract ⟨0⟩ ⟨1⟩ -- output: "ㄱㄴㄷ"
#eval "ㄱㄴㄷ".extract ⟨0⟩ ⟨2⟩ -- "ㄱㄴㄷ"
#eval "ㄱㄴㄷ".extract ⟨0⟩ ⟨3⟩ -- "ㄱ"
I think it'd be better if the first two outputs were "ㄱ"
instead of "ㄱㄴㄷ"
. But I'm not sure it's necessary. What do you think?
Damiano Testa (Jun 05 2024 at 14:35):
This seems similar to my troubles in this thread.
Sebastian Ullrich (Jun 05 2024 at 14:36):
Absent further motivation, the rule is garbage in, garbage out
Eric Wieser (Jun 05 2024 at 14:38):
Where the garbage here is using ⟨i⟩
to construct invalid string positions
Bulhwi Cha (Jun 05 2024 at 14:46):
Mario Carneiro said:
The difficult part about string manipulation is that everything uses the
Pos
type, which is not necessarily a valid UTF8 char boundary in the string, and the string functions have a variety of (mostly garbage and not at all consistent) ways to handle these invalid locations. We could painstakingly prove how each of them handles this case, but it is basically a misuse of the library so I think it is safe to just restrict to the case where the position is valid for the string.To that end, we introduce the
Pos.Valid
type, which asserts that the position is the UTF8 length of some initial substring. We can prove some strong theorems about the behavior of all the main functions, but with awkward input values. TheValidFor
type makes this a bit easier; it works especially well for theString.Iterator
type but we also have a version forSubstring
.
Bulhwi Cha (Jun 05 2024 at 14:51):
Thanks to @Mario Carneiro, we already have some tools in Batteries that make it less painful to prove theorems about string manipulations.
Bulhwi Cha (Jun 05 2024 at 15:13):
Still, some theorems I'm trying to prove can't use I think a few string functions are currently defined in a way that it's hard to prove theorems about them. But I'm not asking for refactoring. I just wanted to talk about this matter.Pos.Valid
or ValidFor
.
Bulhwi Cha (Jun 16 2024 at 14:02):
Bulhwi Cha said:
Still, some theorems I'm trying to prove can't useI think a few string functions are currently defined in a way that it's hard to prove theorems about them.Pos.Valid
orValidFor
.
Consider the following theorem:
theorem extract_eq_extract_append_extract (s : String) (i j k : Pos) (hle₁ : i ≤ j) (hle₂ : j ≤ k) :
s.extract i k = s.extract i j ++ s.extract j k := by
sorry
#eval "ㄱㄴㄷ".extract ⟨3⟩ ⟨6⟩ -- output: "ㄴ"
#eval "ㄱㄴㄷ".extract ⟨3⟩ ⟨4⟩ -- "ㄴㄷ"
#eval "ㄱㄴㄷ".extract ⟨4⟩ ⟨6⟩ -- ""
A mistake I made in the statement (fixed)
What hypotheses do I need to add to this theorem?
Bulhwi Cha (Jun 16 2024 at 14:10):
Sebastian Ullrich said:
the rule is garbage in, garbage out
Why do we let garbage in?
Mario Carneiro (Jun 16 2024 at 14:37):
because it's easier not to have additional hypotheses on the inputs for practical reasons
Mario Carneiro (Jun 16 2024 at 14:37):
but these functions should be treated as not having useful values when the inputs are not valid
Mario Carneiro (Jun 16 2024 at 14:38):
The theorem extract_eq_extract_append_extract
you wrote above should hold if i,j,k are all Pos.Valid
(if you fix the statement to use s.extract i k
on the left)
Mario Carneiro (Jun 16 2024 at 14:40):
It is also possible to characterize what extract
does in general, it basically sanitizes both positions by setting them to String.endPos
if not valid
Mario Carneiro (Jun 16 2024 at 14:42):
and you could take advantage of this observation to prove that extract_eq_extract_append_extract
still holds if k
is not valid, although it's not really necessary
Bulhwi Cha (Jun 16 2024 at 14:46):
Bulhwi Cha said:
Still, some theorems I'm trying to prove can't use
Pos.Valid
orValidFor
.
I was mistaken about Pos.Valid
. I can use it for the theorems I'm trying to prove.
Bulhwi Cha (Jun 16 2024 at 14:50):
Mario Carneiro said:
The theorem
extract_eq_extract_append_extract
you wrote above should hold if i,j,k are allPos.Valid
(if you fix the statement to uses.extract i k
on the left)
Oh, I haven't noticed. I fixed it.
Bulhwi Cha (Jun 16 2024 at 15:02):
theorem extract_eq_extract_append_extract (s : String) (i j k : Pos)
(hval₁ : i.Valid s) (hval₂ : j.Valid s) (hval₃ : k.Valid s) (hle₁ : i ≤ j) (hle₂ : j ≤ k) :
s.extract i k = s.extract i j ++ s.extract j k := by
sorry
Thank you very much for your explanation, Mario. You're amazing.
Last updated: May 02 2025 at 03:31 UTC