Zulip Chat Archive

Stream: Is there code for X?

Topic: contiguous sublists


Hannah Fechtner (Nov 25 2024 at 01:13):

Hi, I'm looking to see if the following might be hiding in mathlib before I implement it:

Say I have a list L, and if a certain contiguous sublist b (so basically substring) is in it, I'd like lean to return some (a, b, c) where L = a ++ b ++ c, and otherwise return none. (The goal is to re-write b to something else, but that's a bit down the road)

It looks like List.sublist is non-contiguous; List.find only looks at a single element. Is there anything that looks for "substrings" in a list?

Eric Wieser (Nov 25 2024 at 02:11):

Are you aware of docs#Substring.findSubstr?, if strings are actually your motivation?

Edward Zhang (Nov 25 2024 at 02:49):

I think infix is what you are looking for. Try List.IsInfix. Just like the definition of prefix and suffix of a List. Infix means that l₁ is a contiguous substring of l₂.

Hannah Fechtner (Nov 25 2024 at 03:29):

Eric Wieser said:

Are you aware of docs#Substring.findSubstr?, if strings are actually your motivation?

I was not aware of that, thanks! I'll have a look at how that was implemented.

Alas, I do not think strings will work out :( I'm working on an algorithm which solves the braid word problem, and so I have a list of braid generators. It needs a more general alphabet that Char (currently each "letter" is of type Option \N \times Bool)

Kim Morrison (Nov 25 2024 at 03:32):

Does List.inInfix help? Perhaps we should add something that returns the actual three lists.

Hannah Fechtner (Nov 25 2024 at 03:32):

Edward Zhuang said:

I think infix is what you are looking for. Try List.IsInfix. Just like the definition of prefix and suffix of a List. Infix means that l₁ is a contiguous substring of l₂.

Yes, thank you, that's the word I'm looking for! It looks like it's defined, but there's nothing which returns the 3 parts if it is satisfied? That's okay, I'm only looking for infixes of length two, so it's not bad to code up myself

Hannah Fechtner (Nov 25 2024 at 03:37):

Kim Morrison said:

Does List.inInfix help? Perhaps we should add something that returns the actual three lists.

That would be nice, but it needn't be made on my account. I'll need something more general (I'm not trying to match an exact infix, but rather see if there is an infix of a specific form - i.e. [(some i, true), (none, false)] for any natural number i), and my guess it's going to be a bear to code up for the case of an n-length infix. Much less gross for an infix of length 2 and just customizing it off the bat for my use case

Edward Zhang (Nov 25 2024 at 04:01):

Kim Morrison said:

Does List.inInfix help? Perhaps we should add something that returns the actual three lists.

I'm not sure that adding a function to List that returns a list of 3 is general enough. Seems like expand the infix in return (if I'm not mistaken about what you're saying). I'd like to help with it.

Edward Zhang (Nov 25 2024 at 04:08):

Hannah Fechtner said:

Kim Morrison said:

Does List.inInfix help? Perhaps we should add something that returns the actual three lists.

That would be nice, but it needn't be made on my account. I'll need something more general (I'm not trying to match an exact infix, but rather see if there is an infix of a specific form - i.e. [(some i, true), (none, false)] for any natural number i), and my guess it's going to be a bear to code up for the case of an n-length infix. Much less gross for an infix of length 2 and just customizing it off the bat for my use case

Could you give a #mwe about the general idea? Confuse about what you mean by [(some i, true), (none, false)]. Personnally I think returning a 3-length (infix with customized prefixes and suffixes) make sense. And if you wanna a n-length in other form. You may need to code in Monad.

Kim Morrison (Nov 25 2024 at 04:22):

I will add something, this is an obvious gap.

Kim Morrison (Nov 25 2024 at 06:14):

You may also want to use @François G. Dorais's implementation of the KMP matcher, available in Batteries.Data.Array.Match, although it only seems to have a friendly frontend specialized for String.

François G. Dorais (Nov 25 2024 at 07:17):

@Kim Morrison @Hannah Fechtner The Stream frontend for KMP should work well for this task.

https://github.com/leanprover-community/batteries/blob/44e2d2e643fd2618b01f9a0592d7dcbd3ffa22de/Batteries/Data/Array/Match.lean#L93

Kim Morrison (Nov 25 2024 at 07:21):

Could we also just have Array and List interfaces for finding the first match?

Kim Morrison (Nov 25 2024 at 07:22):

I'm writing a slow-but-with-lemmas function for finding an infix list.

Kim Morrison (Nov 25 2024 at 07:23):

I think the right spec is

theorem dropInfix?_eq_some_iff [BEq α] {l i p s : List α} :
    dropInfix? l i = some (p, s) 
      ( i', l = p ++ i' ++ s  i' == i) 
        ( p' i' s', l = p' ++ i' ++ s'  i' == i  p'.length  p.length)

Kim Morrison (Nov 25 2024 at 07:24):

(We're missing List.dropPrefix? and List.dropSuffix? as well, despite having them for String, so I'll add those at the same time.)

François G. Dorais (Nov 25 2024 at 07:41):

Yes, I was just putting that on my todo list, but if you want to work on it then I won't stop you :-)

Kim Morrison (Nov 25 2024 at 07:42):

That theorem?

Kim Morrison (Nov 25 2024 at 07:42):

Or friendler Array and List frontends for KMP?

François G. Dorais (Nov 25 2024 at 07:42):

The friendly frontends.

Kim Morrison (Nov 25 2024 at 07:42):

I'll do the theorem, but if you wanted to do KMP frontends that would be nice.

François G. Dorais (Nov 25 2024 at 07:43):

Sounds good. I think I have time in the next couple of days.

Kim Morrison (Nov 25 2024 at 07:43):

It's not at all urgent, just nice to have.

François G. Dorais (Nov 25 2024 at 09:01):

batteries#1065

Kim Morrison (Nov 25 2024 at 11:50):

batteries#1066 (and all that!) now contains: dropInfix? l i, which returns the prefix and suffix for the first time i appears as a contiguous sublist of l, and a specification theorem

theorem dropInfix?_eq_some_iff [BEq α] {l i p s : List α} :
    dropInfix? l i = some (p, s) 
      -- `i` is an infix up to `==`
      ( i', l = p ++ i' ++ s  i' == i) 
        -- and there is no shorter prefix for which that is the case
        ( p' i' s', l = p' ++ i' ++ s'  i' == i  p'.length  p.length)

It also fills in dropPrefix? and dropSuffix?.

There's a note explaining that if you want computation, you need to use KMP; the point of these is they are verified.

Edward Zhang (Nov 25 2024 at 16:04):

Could we consider adding an implicit parameter e.g. useKmp to dropInfix/ dropPrefix/ dropSuffix, for not using kmp as default. If there are performance requirements, users can enable it.

Mario Carneiro (Nov 25 2024 at 16:38):

that sounds strictly worse than just using KMP always

François G. Dorais (Nov 25 2024 at 17:06):

We need a verification of KMP. I was planning on having some students work on that but we never got around to it.

Kim Morrison (Nov 25 2024 at 22:34):

Yes, the problem is that KMP is currently all partial, so no proofs are possible without a rewrite. The point of dropInfix? is to give a function for people who are "merely proving".

Kim Morrison (Nov 25 2024 at 22:35):

Once a verified KMP is available we can swap out the implementation, keeping the lemmas.

François G. Dorais (Nov 25 2024 at 23:48):

To avoid partial we need something like well-founded streams.

François G. Dorais (Nov 25 2024 at 23:52):

batteries#127

Kim Morrison (Nov 26 2024 at 01:41):

Or just implement it on Array...

Kim Morrison (Nov 27 2024 at 21:27):

@Hannah Fechtner, this function List.dropInfix? should now be available on Mathlib master.

Eric Wieser (Nov 27 2024 at 22:54):

batteries#994 seems quite similar in spirit to docs#List.dropInfix?

Eric Wieser (Nov 27 2024 at 22:57):

Also, regarding dropInfix it seems to me like dropInfix? l i = some (p, i', s) might be more useful, if we're opening the BEq equivalence relation can of worms

Kim Morrison (Nov 28 2024 at 00:47):

Eric Wieser said:

batteries#994 seems quite similar in spirit to docs#List.dropInfix?

Yes, it is a pity that hasn't seen follow-up from the author.


Last updated: May 02 2025 at 03:31 UTC