Zulip Chat Archive

Stream: lean4

Topic: List.index_of


Chris Lovett (Aug 27 2022 at 06:36):

Is there a List helper that returns the index of an item that is contained in the list? Ultimately I want to get the next item after the item I find in the list...

Chris Lovett (Aug 27 2022 at 07:07):

Tried this but getting a weird error:

def getIndex? (xs : List String) (s : String) : Option Nat := do
  for i in [0:xs.length] do
    if xs[i]! == s then return i

  return none

#eval getIndex? ["foo", "bar", "--input", "x"] "--input"

even return (none: Option Nat) doesn't help...?

saying:

application type mismatch
  pure none
argument
  none
has type
  Option ?m.547 : Type ?u.546
but is expected to have type
  Nat : Type

Chris Lovett (Aug 27 2022 at 07:10):

I realize I can do this, pure functional version but I'm still curious about that error:

def List.posOf? (xs : List String) (s : String) (i := 0): Option Nat :=
  match xs with
  | [] => none
  | a :: tail => if a = s then some i else posOf? tail s (i+1)

Sebastian Ullrich (Aug 27 2022 at 08:04):

Recall that Option is a monad, so inside do you are writing a program returning Nat in the Option monad

Leonardo de Moura (Aug 27 2022 at 17:37):

@Sebastian Ullrich is correct. Recall that Option can be used as a Monad and as data.

-- `Option` as a `Monad`
def getIndex? (xs : List String) (s : String) : Option Nat := do
  for i in [0:xs.length] do
    if xs[i]! == s then return i
  failure -- We have `Alternative Option`.

#eval getIndex? ["foo", "bar", "--input", "x"] "--input"
-- some 2

-- `Option` as data. We are forcing Lean to use the `Id` Monad.
def getIndex?' (xs : List String) (s : String) : Option Nat := Id.run do
  for i in [0:xs.length] do
    if xs[i]! == s then return i -- A coercion was introduced here, it is equivalent to `return some i`
  return none

#eval getIndex?' ["foo", "bar", "--input", "x"] "--input"
-- some 2

Leonardo de Moura (Aug 27 2022 at 17:40):

BTW, getIndex? is a bad example to include in our documentation. It is quadratic. We rarely use the a[i] notation with lists.

Marcus Rossel (Aug 29 2022 at 08:49):

Leonardo de Moura said:

We rarely use the a[i] notation with lists.

Is get? preferred over []? when using List?

Mario Carneiro (Aug 29 2022 at 08:50):

I think it's just that List doesn't lend itself well to random access

Mario Carneiro (Aug 29 2022 at 08:50):

If you need to do so anyway, use list[i]

Mario Carneiro (Aug 29 2022 at 08:50):

If you need random access you should probably be using Array

Mario Carneiro (Aug 29 2022 at 08:53):

In the example, the getIndex? implementation is unnecessarily inefficient - you can do it in linear time with a pattern matching definition. I think it is the for loop / use of list[i] that makes it a bad example here

Marcus Rossel (Aug 29 2022 at 09:02):

Mario Carneiro said:

I think it's just that List doesn't lend itself well to random access

I'm using []? in the context of (totally noncomputable) mathy definitions/proofs. I was just missing some lemma along the lines of get?_eq_getElem? as for Array. That's why I was unsure whether I should even be using []? in this context.

Mario Carneiro (Aug 29 2022 at 10:03):

Currently it seems like mathlib4 prefers List.get? over []? in lemmas, but we could change that

Chris Lovett (Aug 29 2022 at 20:15):

It does beg the question should the array indexing be removed from List ?

Mario Carneiro (Aug 30 2022 at 00:13):

No, List.nth (now List.get? / list[_]?) is a very important function in mathlib and we certainly don't want to remove it

Mario Carneiro (Aug 30 2022 at 00:14):

but the fact that it's linear time is something people should keep in mind

Tom (Aug 30 2022 at 03:41):

I agree with @Chris Lovett . I feel that will lead to inefficient programs (and gotchas for beginners). List seems so "front and center" in most functional languages, people will reach for it by default.
I think as a matter of language and library design, short notation should do the "right" thing (even in terms of big-O).
If the genericity is required something like you suggest (.nth typeclass) stands out more IMO.

Mario Carneiro (Aug 30 2022 at 03:55):

.nth isn't a typeclass, it's a function

Mario Carneiro (Aug 30 2022 at 03:56):

the typeclass is exactly the thing that provides the l[i]? notation

Mario Carneiro (Aug 30 2022 at 03:57):

I don't think that we should prevent people from using functions just because they have a bad big-O. If we did, we would want to remove List entirely

Mario Carneiro (Aug 30 2022 at 03:57):

In particular, List.get? is important as providing the specification for Array.get?

Mario Carneiro (Aug 30 2022 at 03:59):

I don't think there is any shortcut to learning that linked lists are an inefficient data structure for most purposes. As a specification type they work quite well however and lean is used for both mathematics and CS applications

Mario Carneiro (Aug 30 2022 at 04:00):

Note that the docstring on all these functions will say the big-O of each operation

Mario Carneiro (Aug 30 2022 at 04:01):

although we currently don't have very good support for type-dependent docstrings for typeclass functions

Tom (Aug 30 2022 at 04:11):

Mario Carneiro said:

.nth isn't a typeclass, it's a function

Sorry for the confusion; I was actually suggesting having one (or something similar).

I don't think we're disagreeing about needing to understand big-O. Let me rather present an alternative point of view: In C++, you can advance a "random access iterator" (effectively a pointer-like object) by saying x + n. However, this does not work with e.g. lists precisely for the reasons I indicated. For that, you would call std::advance(x, n).

The point is purely that of avoiding (in this case) performance gotchas. Making the short notations "obviously right". I can say for myself that I have written plenty of dumb code when I was tired... + is easy to overlook; std::distance has more semantic meaning.

I was trying to make the same observation for traversal: nth seems like a good name for getting the n'th element of a traversable container.

As an aside, for the cases when it might be useful - irrespective of big-O concerns - it would seem that making the syntax [] an alias for nth would still be possible.

Thanks!

Mario Carneiro (Aug 30 2022 at 04:19):

I understand your point, and reasonable languages can disagree on this. Haskell has the same indexing notation on arrays and lists: list !! n vs arr ! n. Rust and C++ both don't directly provide an operation for indexing on lists, although they could: Rust has list.iter().nth(n).unwrap(), which is a mouthful, and C++ has *std::advance(list.begin(), n). I'm inclined to offer the short notation despite the performance pitfall, because unlike Rust, C++ and even Haskell, in Lean reasoning about code is just as important as running it, and the List.nth operation (however it is spelled) is very important for this.

Chris Lovett (Aug 30 2022 at 08:14):

Then I still advocate for a big warning note in the docs as I've proposed. And even a metaprogram that can catch obvious degenerate code that puts square brackets in a for loop, which novice users will do from time to time. I also think list is a core data structure in any language, there are plenty of ways it is more efficient that other data structures if you use them for what they are designed for. It's just that array indexing is NOT what they were designed for, so why lead novice users off the cliff like that. nth(n) would be a lot more clear, which is probably why Rust went that way.

Mac (Aug 30 2022 at 13:22):

Mario Carneiro said:

I don't think that we should prevent people from using functions just because they have a bad big-O. If we did, we would want to remove List entirely

it should be noted that List is not universally bad big-O. There are algorithms / use-cases where List (i.e., linked lists) are more performant than Array (i.e., contiguous arrays). So, even optimizing for performance, keeping it would still be wise.

James Gallicchio (Aug 30 2022 at 14:52):

Just my two cents, LeanColls has an Indexed typeclass for "collections with efficient size and indexed access operations," and my brain bucks a little less if the [] notation is only used on those collections...


Last updated: Dec 20 2023 at 11:08 UTC