Zulip Chat Archive
Stream: lean4
Topic: String.containsSubstring
Adam Topaz (Jun 30 2023 at 05:52):
Do we have a function that checks whether a string contains a substring? E.g. ”abcd”.containsSubstr “bc”
returns true but ”abcd”.containsSubstr “cb”
returns false.
James Gallicchio (Jun 30 2023 at 05:53):
I think there's a function that returns the index of a substring?
James Gallicchio (Jun 30 2023 at 05:56):
but I can't find it now :thinking:
James Gallicchio (Jun 30 2023 at 06:04):
Must've been thinking of the isPrefix and isSuffix functions
Scott Morrison (Jun 30 2023 at 06:08):
It seems we don't. Rather embarrassing. :-)
James Gallicchio (Jun 30 2023 at 06:12):
Isn't the linear time algorithm not so straightforward?
Scott Morrison (Jun 30 2023 at 06:21):
Implementing the quadratic algorithm, with a big flashing warning on it, would have two great effects:
- Adam, who presumably is not running this on long strings, can do what he wants.
- The big flashing warning will provoke someone else into writing the linear algorithm. :-)
Adam Topaz (Jun 30 2023 at 06:25):
I'll get by with the following for now :)
def _root_.String.tails (S : String) : Array String := Id.run <| do
let mut out := #[]
for i in [:S.length] do
out := out.push <| S.extract ⟨i⟩ ⟨S.length⟩
return out
def _root_.String.containsStr (S T : String) : Bool := Id.run <| do
for tail in S.tails do
if T.isPrefixOf tail then return true
return false
Adam Topaz (Jun 30 2023 at 06:25):
Is there a better way to slice strings?
James Gallicchio (Jun 30 2023 at 06:30):
I'd recommend slicing into Substring, rather than String, since casting to String
actually copies the string
James Gallicchio (Jun 30 2023 at 06:31):
The most efficient quadratic implementation would use the String.substrEq
function and just iterate over the starting positions.
Mario Carneiro (Jun 30 2023 at 06:37):
Doesn't List.isInfix
have a decidability instance?
Mario Carneiro (Jun 30 2023 at 06:37):
in which case a braindead implementation is s.1 <:+: t.1
James Gallicchio (Jun 30 2023 at 06:38):
def String.isSubstr (pattern : String) (s : String) : Bool :=
aux 0
where aux (i : String.Pos) :=
if h : i.byteIdx < s.utf8ByteSize then
have : s.utf8ByteSize - (s.next i).byteIdx < s.utf8ByteSize - i.byteIdx :=
Nat.sub_lt_sub_left h (String.lt_next _ _)
String.substrEq pattern 0 s i pattern.utf8ByteSize
|| aux (s.next i)
else
false
termination_by aux i => s.utf8ByteSize - i.byteIdx
should be right?
James Gallicchio (Jun 30 2023 at 06:38):
looks like it works from some #evals :P
Mario Carneiro (Jun 30 2023 at 06:40):
yeah that looks correct
Mario Carneiro (Jun 30 2023 at 06:41):
It might be useful to have a version that returns the index though
James Gallicchio (Jun 30 2023 at 06:41):
Mario Carneiro said:
in which case a braindead implementation is
s.1 <:+: t.1
can I nerd-snipe you into proving equivalence of this with the above definition? would be a great lemma to have handy :)
Mario Carneiro (Jun 30 2023 at 06:41):
if you commit it to std, sure
James Gallicchio (Jun 30 2023 at 06:42):
will make a PR with both an index version and a bool version. String.indexOfSubstr
is a fine name to everyone?
Mario Carneiro (Jun 30 2023 at 06:42):
Also it should work with substrings
James Gallicchio (Jun 30 2023 at 08:07):
James Gallicchio (Jun 30 2023 at 08:20):
(now with 100% more CI approval!)
François G. Dorais (Jul 07 2023 at 03:47):
I had an implementation of the Knuth-Morris-Pratt matching algorithm in my stuff. I cleaned it up to work with Std and created a draft PR std4#178. It's pretty versatile and it can be used to implement a linear time string matching, in particular. It's very much a WIP. I would like some input on how to integrate it with Std and design a good API.
Last updated: Dec 20 2023 at 11:08 UTC