Zulip Chat Archive
Stream: std4
Topic: List.groupBy
Alexander Lucas (Jul 03 2023 at 19:32):
I'm not sure if it's appropriate to ask questions of this sort here, but anyway: does std4 have a groupBy
function on Lists (behaving like the identically-named function in F# or what have you)? It's such a widely used function in other languages that it seems like surely it's present in Lean 4, but I can't seem to find it. Perhaps the name is different or it's hiding in some other namespace than List
?
There is a function called List.groupBy, apparently, but it does something different than what it does is most other languages. In F#, it takes a List and turns it into a list of lists with each of the inner lists being composed from elements that have the same value under the function passes.
I'm looking for something like this https://c-cube.github.io/ocaml-containers/3.12/containers/CCList/index.html#val-group_by.
Mario Carneiro (Jul 03 2023 at 21:16):
The lean List.groupBy
is modeled after https://hackage.haskell.org/package/base-4.10.1.0/docs/Data-List.html#v:groupBy
Mario Carneiro (Jul 03 2023 at 21:17):
I notice that the linked groupBy function is basically a hashmap algorithm, as you can tell from the fact that it takes a hash
and eq
function. I would expect such a function to live somewhere on hashmaps
Alexander Lucas (Jul 03 2023 at 21:31):
Does that imply that I would have to convert my lists to hashmaps before I could use the function? I've just been copying a pasting around the following implementation, and I'll probably keep doing so if that's the case.
def group_by [BEq β] (projection : α → β) : List α → List (List α)
| [] => []
| xs@(x :: _) =>
let (l, r) := List.partition (fun a => projection a == projection x) xs
l :: group_by projection r
decreasing_by admit
Alexander Lucas (Jul 03 2023 at 21:34):
notably, the hash
and eq
arguments are optional in the linked function -- one typically wouldn't care what hash is used (it's an implementation detail that isn't exposed in other libraries implementing the same function), and the builtin notion of equality is default.
Mario Carneiro (Jul 03 2023 at 21:40):
it makes a big difference whether you are using a hash function or not, since one approach is linear time and the other is quadratic
Mario Carneiro (Jul 03 2023 at 21:42):
your description of "optional" sounds more like "supplied by a typeclass" rather than "does not need to be passed in"
Alexander Lucas (Jul 03 2023 at 21:42):
Thanks. Yes, that's certainly true. My point is that whether it's a hash function or like I've written, that needn't affect the function signature.
Mario Carneiro (Jul 03 2023 at 21:42):
it would, typeclasses are part of the function signature
Alexander Lucas (Jul 03 2023 at 21:44):
is there not a way to hash something without caring about its type? I suppose that's a difference from ocaml and other languages I'm used to.
Mario Carneiro (Jul 03 2023 at 21:44):
That's what typeclasses are for?
Mario Carneiro (Jul 03 2023 at 21:45):
The hash implementation is provided by the type because hashing depends on what is being stored
Mario Carneiro (Jul 03 2023 at 21:45):
hashing the pointer values isn't really stable
Mario Carneiro (Jul 03 2023 at 21:45):
although you can do that (unsafely) if you really want to
Alexander Lucas (Jul 03 2023 at 21:47):
Okay, sure, thanks; I'll look at that. So then, implementation aside, you're saying there is no analogue to the function that I've defined here in the standard library (i.e. that takes a list argument)?
Mario Carneiro (Jul 03 2023 at 21:47):
Not that I am aware of, but you are welcome to PR one if you like
Mario Carneiro (Jul 03 2023 at 21:47):
(although that won't be "implementation aside")
Alexander Lucas (Jul 03 2023 at 21:48):
Obviously.
Alexander Lucas (Jul 03 2023 at 21:49):
The code above was just to illustrate the interface.
Mario Carneiro (Jul 03 2023 at 21:49):
I assume the ordering is important?
Alexander Lucas (Jul 03 2023 at 21:51):
The ordering is typically not guaranteed for implementations of this function, since they do all use hashmaps with farily standard hashes as far as I can tell.
BTW thanks for your time.
Notification Bot (Jul 03 2023 at 21:51):
This topic was moved here from #new members > List.groupBy by Mario Carneiro.
Mario Carneiro (Jul 03 2023 at 21:52):
@Alexander Lucas could you point me to an implementation of this function?
Mario Carneiro (Jul 03 2023 at 21:53):
the docs you linked don't seem to have a src link or a precise description of what happens regarding ordering
Alexander Lucas (Jul 03 2023 at 21:55):
Of course: this is the implementation of the function that I linked docs for. https://github.com/c-cube/ocaml-containers/blob/master/src/core/CCList.ml
Alexander Lucas (Jul 03 2023 at 21:55):
I can try to find the F# implementation as well if you want.
Mario Carneiro (Jul 03 2023 at 21:56):
yes indeed it doesn't seem to give the results in any particular order
Mario Carneiro (Jul 03 2023 at 21:56):
yes please
Mario Carneiro (Jul 03 2023 at 21:57):
Especially if order is not preserved I would be a bit doubtful about putting this in the List
namespace without a scary name like groupByUnordered
Alexander Lucas (Jul 03 2023 at 21:58):
The F# Core version is here: https://github.com/dotnet/fsharp/blob/main/src/FSharp.Core/list.fs#L564-564. The code is a bit oblique compared to the previous one though.
Mario Carneiro (Jul 03 2023 at 21:58):
I suspect it is possible to use hashmaps to get a speedup while still resulting in the same order-preserved behavior of the quadratic algorithm you posted above
Alexander Lucas (Jul 03 2023 at 21:59):
That would be ideal. I haven't thought about implementing the function "seriously" beyond just posting it here, but I agree that it seems probably possible.
Mario Carneiro (Jul 03 2023 at 22:00):
hm, it seems to just redirect to Microsoft.FSharp.Primitives.Basics.List.groupBy
Alexander Lucas (Jul 03 2023 at 22:02):
The actual code may be in C# or C. I can look for that if its relevant, but based on experience I don't believe there's any order guarantee for that one either.
Alexander Lucas (Jul 03 2023 at 22:07):
Using a linked hash map of some sort would probably work. I'll look into it and maybe make a PR if I end up writing something.
Last updated: Dec 20 2023 at 11:08 UTC