Zulip Chat Archive

Stream: lean4

Topic: the way String.extract handles invalid byte positions


Bulhwi Cha (Jun 05 2024 at 14:29):

Consider the following evaluations:

#eval "ㄱㄴㄷ".extract 0 1 -- output: "ㄱㄴㄷ"
#eval "ㄱㄴㄷ".extract 0 2 -- "ㄱㄴㄷ"
#eval "ㄱㄴㄷ".extract 0 3 -- "ㄱ"

I think it'd be better if the first two outputs were "ㄱ" instead of "ㄱㄴㄷ". But I'm not sure it's necessary. What do you think?

Damiano Testa (Jun 05 2024 at 14:35):

This seems similar to my troubles in this thread.

Sebastian Ullrich (Jun 05 2024 at 14:36):

Absent further motivation, the rule is garbage in, garbage out

Eric Wieser (Jun 05 2024 at 14:38):

Where the garbage here is using ⟨i⟩ to construct invalid string positions

Bulhwi Cha (Jun 05 2024 at 14:46):

Mario Carneiro said:

The difficult part about string manipulation is that everything uses the Pos type, which is not necessarily a valid UTF8 char boundary in the string, and the string functions have a variety of (mostly garbage and not at all consistent) ways to handle these invalid locations. We could painstakingly prove how each of them handles this case, but it is basically a misuse of the library so I think it is safe to just restrict to the case where the position is valid for the string.

To that end, we introduce the Pos.Valid type, which asserts that the position is the UTF8 length of some initial substring. We can prove some strong theorems about the behavior of all the main functions, but with awkward input values. The ValidFor type makes this a bit easier; it works especially well for the String.Iterator type but we also have a version for Substring.

Bulhwi Cha (Jun 05 2024 at 14:51):

Thanks to @Mario Carneiro, we already have some tools in Batteries that make it less painful to prove theorems about string manipulations.

Bulhwi Cha (Jun 05 2024 at 15:13):

Still, some theorems I'm trying to prove can't use Pos.Valid or ValidFor. I think a few string functions are currently defined in a way that it's hard to prove theorems about them. But I'm not asking for refactoring. I just wanted to talk about this matter.

Bulhwi Cha (Jun 16 2024 at 14:02):

Bulhwi Cha said:

Still, some theorems I'm trying to prove can't use Pos.Valid or ValidFor. I think a few string functions are currently defined in a way that it's hard to prove theorems about them.

Consider the following theorem:

theorem extract_eq_extract_append_extract (s : String) (i j k : Pos) (hle₁ : i  j) (hle₂ : j  k) :
    s.extract i k = s.extract i j ++ s.extract j k := by
  sorry

#eval "ㄱㄴㄷ".extract 3 6 -- output: "ㄴ"
#eval "ㄱㄴㄷ".extract 3 4 -- "ㄴㄷ"
#eval "ㄱㄴㄷ".extract 4 6 -- ""

A mistake I made in the statement (fixed)

What hypotheses do I need to add to this theorem?

Bulhwi Cha (Jun 16 2024 at 14:10):

Sebastian Ullrich said:

the rule is garbage in, garbage out

Why do we let garbage in?

Mario Carneiro (Jun 16 2024 at 14:37):

because it's easier not to have additional hypotheses on the inputs for practical reasons

Mario Carneiro (Jun 16 2024 at 14:37):

but these functions should be treated as not having useful values when the inputs are not valid

Mario Carneiro (Jun 16 2024 at 14:38):

The theorem extract_eq_extract_append_extract you wrote above should hold if i,j,k are all Pos.Valid (if you fix the statement to use s.extract i k on the left)

Mario Carneiro (Jun 16 2024 at 14:40):

It is also possible to characterize what extract does in general, it basically sanitizes both positions by setting them to String.endPos if not valid

Mario Carneiro (Jun 16 2024 at 14:42):

and you could take advantage of this observation to prove that extract_eq_extract_append_extract still holds if k is not valid, although it's not really necessary

Bulhwi Cha (Jun 16 2024 at 14:46):

Bulhwi Cha said:

Still, some theorems I'm trying to prove can't use Pos.Valid or ValidFor.

I was mistaken about Pos.Valid. I can use it for the theorems I'm trying to prove.

Bulhwi Cha (Jun 16 2024 at 14:50):

Mario Carneiro said:

The theorem extract_eq_extract_append_extract you wrote above should hold if i,j,k are all Pos.Valid (if you fix the statement to use s.extract i k on the left)

Oh, I haven't noticed. I fixed it.

Bulhwi Cha (Jun 16 2024 at 15:02):

theorem extract_eq_extract_append_extract (s : String) (i j k : Pos)
    (hval₁ : i.Valid s) (hval₂ : j.Valid s) (hval₃ : k.Valid s) (hle₁ : i  j) (hle₂ : j  k) :
    s.extract i k = s.extract i j ++ s.extract j k := by
  sorry

Thank you very much for your explanation, Mario. You're amazing.


Last updated: May 02 2025 at 03:31 UTC