Preliminary developments for strings #
This file contains the material about strings which we can write down without the results in
Init.Data.String.Decode, i.e., without knowing about the bijection between String and
List Char given by UTF-8 decoding and encoding.
Note that this file, despite being called Defs, contains quite a few lemmas.
Decodes an array of bytes that encode a string as UTF-8 into the corresponding string.
Equations
- String.fromUTF8 a h = { bytes := a, isValidUTF8 := h }
Instances For
Appends two strings. Usually accessed via the ++ operator.
The internal implementation will perform destructive updates if the string is not shared.
Examples:
"abc".append "def" = "abcdef""abc" ++ "def" = "abcdef""" ++ "" = ""
Instances For
Adds multiple repetitions of a character to the end of a string.
Returns s, with n repetitions of c at the end. Internally, the implementation repeatedly calls
String.push, so the string is modified in-place if there is a unique reference to it.
Examples:
Equations
- s.pushn c n = Nat.repeat (fun (s : String) => s.push c) n s
Instances For
Equations
Instances For
Appends all the strings in a list of strings, in order.
Use String.intercalate to place a separator string between the strings in a list.
Examples:
String.join ["gr", "ee", "n"] = "green"String.join ["b", "", "l", "", "ue"] = "blue"String.join [] = ""
Equations
- String.join l = List.foldl (fun (r s : String) => r ++ s) "" l
Instances For
Appends the strings in a list of strings, placing the separator s between each pair.
Examples:
", ".intercalate ["red", "green", "blue"] = "red, green, blue"" and ".intercalate ["tea", "coffee"] = "tea and coffee"" | ".intercalate ["M", "", "N"] = "M | | N"
Equations
- s.intercalate [] = ""
- s.intercalate (a :: as) = String.intercalate.go✝ a s as
Instances For
Equations
Instances For
Predicate for validity of positions inside a String.
There are multiple equivalent definitions for validity.
We say that a position is valid if the string obtained by taking all of the bytes up to, but
excluding, the given position, is valid UTF-8; see Pos.isValid_iff_isValidUTF8_extract_zero.
Similarly, a position is valid if the string obtained by taking all of the bytes starting at the
given position is valid UTF-8; see Pos.isValid_iff_isValidUTF8_extract_utf8ByteSize.
An equivalent condition is that the position is the length of the UTF-8 encoding of
some prefix of the characters of the string; see Pos.isValid_iff_exists_append and
Pos.isValid_iff_exists_take_data.
Another equivalent condition that can be checked efficiently is that the position is either the
end position or strictly smaller than the end position and the byte at the position satisfies
UInt8.IsUTF8FirstByte; see Pos.isValid_iff_isUTF8FirstByte.
Examples:
String.Pos.IsValid "abc" ⟨0⟩String.Pos.IsValid "abc" ⟨1⟩String.Pos.IsValid "abc" ⟨3⟩¬ String.Pos.IsValid "abc" ⟨4⟩String.Pos.IsValid "𝒫(A)" ⟨0⟩¬ String.Pos.IsValid "𝒫(A)" ⟨1⟩¬ String.Pos.IsValid "𝒫(A)" ⟨2⟩¬ String.Pos.IsValid "𝒫(A)" ⟨3⟩String.Pos.IsValid "𝒫(A)" ⟨4⟩
- isValidUTF8_extract_zero : (s.bytes.extract 0 off.byteIdx).IsValidUTF8
Instances For
The start position of s, as an s.ValidPos.
Equations
- s.startValidPos = { offset := 0, isValid := ⋯ }
Instances For
Equations
- String.instInhabitedValidPos = { default := s.startValidPos }
Equations
- String.instDecidableLeValidPos l r = decidable_of_iff' (l.offset ≤ r.offset) ⋯
Equations
- String.instDecidableLtValidPos l r = decidable_of_iff' (l.offset < r.offset) ⋯
A region or slice of some underlying string.
A slice consists of a string together with the start and end byte positions of a region of interest. Actually extracting a substring requires copying and memory allocation, while many slices of the same underlying string may exist with very little overhead. While this could be achieved by tracking the bounds by hand, the slice API is much more convenient.
String.Slice bundles proofs to ensure that the start and end positions always delineate a valid
string. For this reason, it should be preferred over Substring.Raw.
- str : String
The underlying strings.
The byte position of the start of the string slice.
The byte position of the end of the string slice.
The slice is not degenerate (but it may be empty).
Instances For
Equations
- One or more equations did not get rendered due to their size.
Returns a slice that contains the entire string.
Equations
- s.toSlice = { str := s, startInclusive := s.startValidPos, endExclusive := s.endValidPos, startInclusive_le_endExclusive := ⋯ }
Instances For
The number of bytes of the UTF-8 encoding of the string slice.
Equations
Instances For
Equations
- String.instHAddRawSlice = { hAdd := fun (p : String.Pos.Raw) (s : String.Slice) => { byteIdx := p.byteIdx + s.utf8ByteSize } }
Equations
- String.instHAddSliceRaw = { hAdd := fun (s : String.Slice) (p : String.Pos.Raw) => { byteIdx := s.utf8ByteSize + p.byteIdx } }
Equations
- String.instHSubRawSlice = { hSub := fun (p : String.Pos.Raw) (s : String.Slice) => { byteIdx := p.byteIdx - s.utf8ByteSize } }
The end position of a slice, as a Pos.Raw.
Equations
- s.rawEndPos = { byteIdx := s.utf8ByteSize }
Instances For
Accesses the indicated byte in the UTF-8 encoding of a string slice.
At runtime, this function is implemented by efficient, constant-time code.
Equations
- s.getUTF8Byte p h = s.str.getUTF8Byte (p.offsetBy s.startInclusive.offset) ⋯
Instances For
Accesses the indicated byte in the UTF-8 encoding of the string slice, or panics if the position is out-of-bounds.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- String.instInhabitedPos = { default := s.startPos }
Equations
- String.instDecidableLePos l r = decidable_of_iff' (l.offset ≤ r.offset) ⋯
Equations
- String.instDecidableLtPos l r = decidable_of_iff' (l.offset < r.offset) ⋯
pos.IsAtEnd is just shorthand for pos = s.endValidPos that is easier to write if s is long.
Equations
- pos.IsAtEnd = (pos = s.endValidPos)