Zulip Chat Archive

Stream: lean4

Topic: String.containsSubstring


Adam Topaz (Jun 30 2023 at 05:52):

Do we have a function that checks whether a string contains a substring? E.g. ”abcd”.containsSubstr “bc” returns true but ”abcd”.containsSubstr “cb” returns false.

James Gallicchio (Jun 30 2023 at 05:53):

I think there's a function that returns the index of a substring?

James Gallicchio (Jun 30 2023 at 05:56):

but I can't find it now :thinking:

James Gallicchio (Jun 30 2023 at 06:04):

Must've been thinking of the isPrefix and isSuffix functions

Scott Morrison (Jun 30 2023 at 06:08):

It seems we don't. Rather embarrassing. :-)

James Gallicchio (Jun 30 2023 at 06:12):

Isn't the linear time algorithm not so straightforward?

Scott Morrison (Jun 30 2023 at 06:21):

Implementing the quadratic algorithm, with a big flashing warning on it, would have two great effects:

  • Adam, who presumably is not running this on long strings, can do what he wants.
  • The big flashing warning will provoke someone else into writing the linear algorithm. :-)

Adam Topaz (Jun 30 2023 at 06:25):

I'll get by with the following for now :)

def _root_.String.tails (S : String) : Array String := Id.run <| do
  let mut out := #[]
  for i in [:S.length] do
    out := out.push <| S.extract i S.length
  return out

def _root_.String.containsStr (S T : String) : Bool := Id.run <| do
  for tail in S.tails do
    if T.isPrefixOf tail then return true
  return false

Adam Topaz (Jun 30 2023 at 06:25):

Is there a better way to slice strings?

James Gallicchio (Jun 30 2023 at 06:30):

I'd recommend slicing into Substring, rather than String, since casting to String actually copies the string

James Gallicchio (Jun 30 2023 at 06:31):

The most efficient quadratic implementation would use the String.substrEq function and just iterate over the starting positions.

Mario Carneiro (Jun 30 2023 at 06:37):

Doesn't List.isInfix have a decidability instance?

Mario Carneiro (Jun 30 2023 at 06:37):

in which case a braindead implementation is s.1 <:+: t.1

James Gallicchio (Jun 30 2023 at 06:38):

def String.isSubstr (pattern : String) (s : String) : Bool :=
  aux 0
where aux (i : String.Pos) :=
  if h : i.byteIdx < s.utf8ByteSize then
    have : s.utf8ByteSize - (s.next i).byteIdx < s.utf8ByteSize - i.byteIdx :=
      Nat.sub_lt_sub_left h (String.lt_next _ _)
    String.substrEq pattern 0 s i pattern.utf8ByteSize
      || aux (s.next i)
  else
    false
termination_by aux i => s.utf8ByteSize - i.byteIdx

should be right?

James Gallicchio (Jun 30 2023 at 06:38):

looks like it works from some #evals :P

Mario Carneiro (Jun 30 2023 at 06:40):

yeah that looks correct

Mario Carneiro (Jun 30 2023 at 06:41):

It might be useful to have a version that returns the index though

James Gallicchio (Jun 30 2023 at 06:41):

Mario Carneiro said:

in which case a braindead implementation is s.1 <:+: t.1

can I nerd-snipe you into proving equivalence of this with the above definition? would be a great lemma to have handy :)

Mario Carneiro (Jun 30 2023 at 06:41):

if you commit it to std, sure

James Gallicchio (Jun 30 2023 at 06:42):

will make a PR with both an index version and a bool version. String.indexOfSubstr is a fine name to everyone?

Mario Carneiro (Jun 30 2023 at 06:42):

Also it should work with substrings

James Gallicchio (Jun 30 2023 at 08:07):

std4#172

James Gallicchio (Jun 30 2023 at 08:20):

(now with 100% more CI approval!)

François G. Dorais (Jul 07 2023 at 03:47):

I had an implementation of the Knuth-Morris-Pratt matching algorithm in my stuff. I cleaned it up to work with Std and created a draft PR std4#178. It's pretty versatile and it can be used to implement a linear time string matching, in particular. It's very much a WIP. I would like some input on how to integrate it with Std and design a good API.


Last updated: Dec 20 2023 at 11:08 UTC