Zulip Chat Archive

Stream: std4

Topic: collection typeclasses


James Gallicchio (Oct 04 2022 at 21:05):

Looking for feedback on adding a few collections typeclasses from LeanColls that I've found useful for writing code operating on collections.

The first typeclasses are to do with fundamental basic operations on collections:

  • Foldable (perhaps Fold?) for colls which have a folding function/internal iterator
  • Iterable for colls which can be externally iterated over (e.g. an iterator with a step function)
  • Enumerable for colls which can be constructed one element at a time

Some more for common types of collections:

  • Indexed for colls which have efficient indexed get/nth functions
  • Initable for colls efficiently constructable from a Fin n -> A initialization function
  • SetLike for types isomorphic to A -> Bool
  • MapLike for types isomorphic to A -> Option B
  • Queue for LIFO/FIFO queues (somewhat outdated)

And then there's a few classes for implementations of common collection operations

  • FoldableOps for operations implementable on any Foldable
  • IndexedOps for operations efficiently implementable on any Iterable

Tomas Skrivan (Oct 05 2022 at 05:44):

I have a similar set of typeclasses for development of SciLean. My main interest is working with collections that are multidimensional arrays, it is interesting to see how it reflects in different design choices.

The most prominent are:

Iterable - I think yours is better as it gives extra flexibility with iterator

Enumtype - type with an explicit isomorphism with Fin n. This one I use all over the place.

SetElem - accompanying buildin GetElem class

HasIntro - similar to yours Initiable but allows for arbitrary index type and carries lawfulness proof.


Fee comments to yours:

  1. I like the use of iterator in Iterable.
  2. I'm a bit confused that by default Foldable implies Iterable, plus through construction intermediate List which can be performance footgun. Wouldn't you accidentally convert Array, or even worse FloatArray, to List?
  3. Indexed and Initiable should use a generic type instead of Fin n for the index type
  4. MapLike should align with buildin a[i]? notation

In general, it would be great to converge to a certain set of these classes and add them to std4.

Also how should we approach lawfulness? Should we define Lawful*** version of all of these? Or define ***.Correct class?

I actually always want the lawful version of these. Is there a good example when you do not want lawful version? (That you do not want to provide proof does not count as you can always sorry it and at least document your intention that way)

James Gallicchio (Oct 05 2022 at 06:03):

Feedback very appreciated, I'm gonna clone scilean and play around with it!

Tomas Skrivan (Oct 05 2022 at 06:06):

Also another class of interest might be:

PlainDataType - type that allows for conversion from/to byte(s). This allows to define DataArray α with appropriate bit compatification, thus DataArray Bool is bit array or DataArray (Fin 8 × Fin 4 × Fin 3) consumes one byte per element.

James Gallicchio (Oct 05 2022 at 06:19):

Tomas Skrivan said:

I'm a bit confused that by default Foldable implies Iterable, plus through construction intermediate List which can be performance footgun. Wouldn't you accidentally convert Array, or even worse FloatArray, to List?

Yeah, you definitely could end up with bad performance if you're not careful. All of the collections in LeanColls that have faster Iterable implementations do override it. I wasn't sure whether to explicitly extend it or not, since most users aren't going to be declaring their own Iterable.

What I did with the *Ops typeclasses was to provide a default implementation given the minimum implementable subset, and then I explicitly said instance : *Ops A := default for collections where the default was what I wanted. Perhaps we do the same with Iterable.

James Gallicchio (Oct 05 2022 at 06:23):

Tomas Skrivan said:

Also how should we approach lawfulness? Should we define Lawful*** version of all of these? Or define ***.Correct class?

I don't know much about Lean typeclass best practices, I'm not really sure what the difference is. for Foldable, the only "law" I have is that they abide parametricity, which is a relatively weak result but enough to let me write some theorems generic over folds :D

Tomas Skrivan (Oct 05 2022 at 06:32):

I think when it comes to "lawfulness" these classes usually interact in pairs. For example Initiable with GetElem or with Size. I'm not sure how to structure this. For example I have PowType (for types X^I that are morally I -> X) that bundles get/set/init and that they interact together, but I'm not sure if that is the best approach.

Tomas Skrivan (Oct 05 2022 at 06:35):

James Gallicchio said:

Tomas Skrivan said:

I'm a bit confused that by default Foldable implies Iterable, plus through construction intermediate List which can be performance footgun. Wouldn't you accidentally convert Array, or even worse FloatArray, to List?

Yeah, you definitely could end up with bad performance if you're not careful. All of the collections in LeanColls that have faster Iterable implementations do override it. I wasn't sure whether to explicitly extend it or not, since most users aren't going to be declaring their own Iterable.

What I did with the *Ops typeclasses was to provide a default implementation given the minimum implementable subset, and then I explicitly said instance : *Ops A := default for collections where the default was what I wanted. Perhaps we do the same with Iterable.

I'm actually a quite confused by your Foldable class. Every Iterable can be Foldable, no? Is there an instance for it? You have the reverse, which I find a bit questionable and dangerous.

James Gallicchio (Oct 05 2022 at 06:37):

Not every Iterable can be Foldable. Folding is guaranteed to be finite, whereas iteration can be infinite. The corresponding built-in classes are ForIn (for fold) and Stream (for Iterable)

James Gallicchio (Oct 05 2022 at 06:39):

Tomas Skrivan said:

I think when it comes to "lawfulness" these classes usually interact in pairs. For example Initiable with GetElem or with Size. I'm not sure how to structure this. For example I have PowType (for types X^I that are morally I -> X) that bundles get/set/init and that they interact together, but I'm not sure if that is the best approach.

I see... hm. I'm not sure either. My suspicion is that nearly all lawfulness results in collection-world can be stated relative to get (most of my collections in LeanColls have loads of results of this form, but I haven't tried to make a typeclass for that yet...)

James Gallicchio (Oct 05 2022 at 06:42):

Relevant: I'm a bit surprised by FunType including an extensionality requirement. It seems a bit overly restrictive, since there's tons of collections that don't have canonical forms. We could quotient by the equivalence relation of being equal on all elements, but it seems better to just always state correctness relative to get?

Tomas Skrivan (Oct 05 2022 at 06:44):

James Gallicchio said:

Not every Iterable can be Foldable. Folding is guaranteed to be finite, whereas iteration can be infinite. The corresponding built-in classes are ForIn (for fold) and Stream (for Iterable)

Ahh makes total sense. Well I would also suggest to maybe get ready to be aligned with mathlib. There is finite which should imply finite termination of iteration. Thus maybe Foldable should be Iterable + finite ?

James Gallicchio (Oct 05 2022 at 06:47):

I'm a bit hesitant to prefer Iterable over Foldable as the default iteration class; Rust had a big back and forth between internal/external iterators and ended up settling on internal iteration as preferred, because it tended to produce faster bytecode. I suspect (but have not measured this) that the same is true in Lean, but maybe I should test that hypothesis :joy:

Tomas Skrivan (Oct 05 2022 at 06:49):

James Gallicchio said:

Relevant: I'm a bit surprised by FunType including an extensionality requirement. It seems a bit overly restrictive, since there's tons of collections that don't have canonical forms. We could quotient by the equivalence relation of being equal on all elements, but it seems better to just always state correctness relative to get?

You are probably right that it is too restrictive in general. All I care about right now are multi-dimensional array like types and really want them to form algebraic structures. So for example there should be only one zero matrix, the one with zero elements.

Tomas Skrivan (Oct 05 2022 at 06:52):

James Gallicchio said:

I'm a bit hesitant to prefer Iterable over Foldable as the default iteration class; Rust had a big back and forth between internal/external iterators and ended up settling on internal iteration as preferred, because it tended to produce faster bytecode. I suspect (but have not measured this) that the same is true in Lean, but maybe I should test that hypothesis :joy:

Ohh I see, then it is a definitely good idea to keep them separate if they have different performance behavior. I think testing performance right now, as the compiler is undergoing massive changes, is not going to be really indicative.

James Gallicchio (Oct 05 2022 at 06:53):

This makes complete sense -- then there's definitely a use case for a typeclass of GetElem with extensionality.

Naming these typeclasses is going to be a nightmare :face_palm: maybe something like LawfulGetExt for the extensionality law.

James Gallicchio (Oct 05 2022 at 06:55):

Tomas Skrivan said:

I think testing performance right now, as the compiler is undergoing massive changes, is not going to be really indicative.

Yeah... I'm waiting until at least after the new code generator is done. It's the main reason I haven't been doing any performance testing yet. :/ but I think we can nail down a solid user interface before even thinking about (constant factor) performance

James Gallicchio (Oct 05 2022 at 06:58):

@Tomas Skrivan do you have a built-in range class for iterating over the elements of Fin n? I've done a bit of work on making a Range iterator that works for common array stuff, and I'm assuming SciLean has something similar

Tomas Skrivan (Oct 05 2022 at 07:12):

To iterate over Enumtype ι (i.e. ι isomoprhic to Fin n) I use forIn notation

Example of creating 3x3 identity matrix we will iterate over all Fin 3 × Fin 3

#eval Id.run do
  let mut a : ^(Fin 3 × Fin 3) := 0
  for ((i,j),li) in Enumtype.fullRange a.Index do
    a[li] := if i = j then 1 else 0
  a

The forIn notation gives you (id, ld) : ι × (Fin (numOf ι)), id is the structure index and li is linear index that actually corresponds how the matrix is stored in memory and you can use both indices to access matrix elements, a[id], a[li] of if id = (i,j) you can write a[i,j].

(right now on github li is Nat but I have recently changed it to Fin)

Tomas Skrivan (Oct 05 2022 at 07:46):

Also have you considered using zipper for modifying lists and trees? Or do you already have it?

James Gallicchio (Oct 05 2022 at 08:07):

I'd consider it a separate data structure. Didn't have a use case for it yet so I didn't implement anything on it. If we want generic, fast, persistent sequences, there's some radix-tree stuff I've been working on implementing in Lean that should cover those use cases!

James Gallicchio (Oct 05 2022 at 08:10):

Tomas Skrivan said:

#eval Id.run do
  let mut a : ^(Fin 3 × Fin 3) := 0
  for ((i,j),li) in Enumtype.fullRange a.Index do
    a[li] := if i = j then 1 else 0
  a

The forIn notation gives you (id, ld) : ι × (Fin (numOf ι)), id is the structure index and li is linear index that actually corresponds how the matrix is stored in memory and you can use both indices to access matrix elements, a[id], a[li] of if id = (i,j) you can write a[i,j].

This is really interesting. Is there a huge cost to actually doing the indexing calculation at each iteration of the loop? I know most c compilers can optimize those kinds of calculations in loops, and it feels a bit not-ergonomic for the user to not write [i,j] everywhere

Tomas Skrivan (Oct 05 2022 at 08:38):

I agree that it is not the most ergonometric thing. In the future, my hope is to add some optimization pass that replaces a[i,j] with a[li].

I have no clue if the conversion adds some cost but it might and dealing with it looked like an interesting problem :)

The conversion between structured and linear index can be somewhat complicated. For example, I deal with 3d regular grids and when I loop over all edges of the grid, the structured index holds info if the edge is X, Y or Z aligned and it's coordinates. Once I start doing finite element analysis over this grid it will get even more complicated.

James Gallicchio (Oct 05 2022 at 08:50):

Oh boy............

Is there a good reason to linearize stuff like that? I'd naively have just stored the X, Y, and Z edges as 3 separate 3-dim arrays.

Tomas Skrivan (Oct 05 2022 at 09:17):

Of course, if you have some data stored on every edge and this data satisfies some linear system. Then you need to build the corresponding matrix, thus you need linear index.

With finite element method this gets even more complicated, you have some data on points, some data on edges, faces and cells. They all interact through one big linear system. Thus you need linear index for the union of points, edges, faces and cells. This is actually the major pain in the ass in implementing finite element analysis and I'm soo happy to use Lean because it allows me to abstract all this annoyance away.

James Gallicchio (Oct 25 2022 at 14:32):

I'm looking to add an API for fixed-length arrays, any gripes with the name ArrayN? Inspired from @Tomas Skrivan's ArrayN

Michael George (Jan 02 2023 at 13:53):

I think a good way to structure the laws is in terms of multiset operations, since collections are, well, collections. I think the fundamental concept here is multiplicity (the number of a given item in a collection). From this, you can define membership, set operations like union, intersection, etc, correctness for iterators, etc. I think this is more fundamental than iteration and especially folding.

Michael George (Jan 02 2023 at 14:01):

Regarding lawfulness, one example of a potentially non-lawful data structure would be something like a Bloom filter, although that's arguably not a collection. AFAICT the choice of whether to provide "non-lawful" instances seems to vary for different parts of the library. For example, if I understand correctly, the algebraic hierarchy largely bundles the laws with the structures so that you can't have a group that doesn't follow the group law, while there is a distinction between Monad and LawfulMonad. I'm not sure if there's a good reason for this other than mathematicians are used to thinking more about the laws first and the implementations second while programmers are used to thinking more about the implementations first and the laws second.

Michael George (Jan 02 2023 at 14:05):

Regarding naming, I wonder if there is benefit in designing a Lawful class, with curried instances like Lawful SortedSet, Lawful PriorityQueue, Lawful Dictionary etc. I haven't thought carefully enough about exactly how the dependent types would work out here; my gut says it would be possible but probably not trivial to figure out the right type for Lawful. It's also not clear to me whether this provides benefit. It is maybe better to just bundle lawfulness axioms, so that instances of the corresponding classes are all lawful. I think using notation typeclasses allows for unlawful instances.

James Gallicchio (Jan 02 2023 at 17:33):

You definitely could make a universe-polymorphic Lawful typeclass but I think the universe inference algorithm would quickly become unable to guess the right universes for everything :joy: but that's orthogonal to the typeclass design

I definitely should have split out lawful classes; it's quite nice to have the separation between implementation and laws when developing the library, so we don't feel like we need to prove the laws for each implementation when implementing it...

James Gallicchio (Jan 02 2023 at 17:47):

@Michael George was interested in meeting to discuss collection design stuff, would anyone else want to join a ~1hr meeting?

I want to discuss

  • Naming conventions
  • Goal list for interfaces
  • Goal list for implementations
  • Top-level class hierarchy

If interested, add your availability here: https://www.when2meet.com/?18110122-sgZI3

James Gallicchio (Jan 02 2023 at 17:53):

(@Tomas Skrivan @Mario Carneiro @Wojciech Nawrocki @Mac pinging you because this might be of interest, and I'd appreciate your input!)

James Gallicchio (Jan 02 2023 at 17:55):

I'll put together a doc with agenda & summary of what SciLean/LeanColls/Std4 have right now

Wojciech Nawrocki (Jan 02 2023 at 22:19):

The mathematical model. I definitely agree that the mathematical model for collections should be something like lists up to permutation. One question which isn't entirely answered is whether to use lists up to permutation literally, or take the quotient and use multisets. So far I only attempted the former approach which worked out fine most of the time, except that proofs using permutations are really ugly because Lean doesn't (yet?) have generalized rewriting so we must construct them by hand, like here. A quotient would turn this into normal rewriting by equality. It would be good to try it and see if things actually become easier or something else breaks; I haven't tried so far.

Wojciech Nawrocki (Jan 02 2023 at 22:23):

Laws vs implementations. I also agree that there should be a Foldable typeclass which just exposes the operations but doesn't contain any laws. This is convenient for programming - you don't always want to prove things correct. In mathematics you pretty much do - a group is lawful by definition; a (multiplicative) group without laws is not a group but rather something like HasMul+HasOne+HasInv. And then a LawfulFoldable which says that the Foldable instance is compatible with the (multiset or whatever) model. I am not seeing any benefit to having a generic Lawful TypeClass however. The laws are pretty specific to what TypeClass is, so there wouldn't be much you could do by parameterizing over TypeClass.

Michael George (Jan 03 2023 at 20:01):

It seems the same either way, we just call them HasFold and Foldable (mathlib convention) or Foldable and LawfulFoldable (std convention)

Michael George (Jan 03 2023 at 20:09):

Re modeling: Lists or quotients of lists feel very implementation focused, whereas multisets feel very interface / specification focused. My gut says that the latter will lead to more natural definitions and proofs, but I don't have a lot of lean experience yet so there may be other conventions or considerations I'm not aware of.

Michael George (Jan 03 2023 at 20:16):

For example, the following seem very natural to me:

/-- The `HasMultiplicity` typeclass supports the `#(a ∈ S)` notation. -/
class HasMultiplicity (α : outParam <| Type u) (β : Type v) where
  /-- `multiplicity a S` is the number of occurrences of `a` in `S`. -/
  multiplicity : α  β  Nat
export HasMultiplicity (multiplicity)

/-- `#(a ∈ S)` gives the multiplicity of `a` in the multiset `S` -/
notation:max " #( " a:55 " ∈ " S:55 " ) " => multiplicity a S


/-- The standard definition of `⊆` in terms of multiplicity -/
class LawfulMultiSubset [HasMultiplicity α β] [HasSubset β] where
  /-- `S₁ ⊆ S₂` means every `a ∈ S₁` is also in `S₂` (counting multiplicity). -/
  subset_defn (S₁ S₂ : β) : S₁  S₂   a, #(a  S₁)  #( a  S₂)

/-- The standard definition of `∪` in terms of multiplicity -/
class LawfulMultiUnion [HasMultiplicity α β] [Union β] where
  /-- `a ∈ S₁ ∪ S₂` means that `a ∈ S₁` or `a ∈ S₂` (counting multiplicity). -/
  union_defn (S₁ S₂ : β):  a : α, #(a  S₁  S₂) = #(a  S₁) + #(a  S₂)

Then operations like insert have natural laws:

  • insert a S = S \cup { a }
  • for poll: S = {fst (poll S)} ∪ snd (poll S)

James Gallicchio (Jan 03 2023 at 20:26):

It does seem like multiplicity would be clean for describing a lot of collection operations. I haven't seen this used before in other formal languages, do you know if it has?

Definitely worth trying out. I can't really visualize how this would shape the hierarchy, but we can talk about that when we meet!

Speaking of which, it looks like Monday morning (EST) is when we are all the most free, so let's tentatively schedule for 11am EST

Michael George (Jan 03 2023 at 20:31):

I don't know of other examples of this design. This idea comes from trying to apply ideas that went into the design of the algebraic hierarchy to the problem of collections.

Mario Carneiro (Jan 03 2023 at 21:59):

one of the issues with using multiplicity-based characterizations of multisets, at least as far as the mathlib multiset formalization is concerned, is that it adds "unnecessary" DecidableEq assumptions to a lot of definitions and theorems, since you can't count the number of occurrences of a in a multiset unless the base type has decidable equality

Mario Carneiro (Jan 03 2023 at 22:00):

For a lot of data structures in std you don't really want to be using DecidableEq in the first place, because there is instead some equivalence relation implied by a BEq or Ord instance which the data structure is supposed to be using

Michael George (Jan 03 2023 at 22:27):

I see, you really want a sort of multisetoid. I'm haven't yet learned how quotients work in lean so this is a little beyond my current understanding

Mario Carneiro (Jan 03 2023 at 22:33):

this is what multiset is in lean

Mario Carneiro (Jan 03 2023 at 22:34):

a multiset is a list-up-to-permutation, i.e. a quotient of List A by List.Perm

Michael George (Jan 03 2023 at 23:11):

I was talking about the quotient of the set of elements by the equivalence defined by BEq. That is, ‘multiplicity’ should satisfy ‘a == b’ implies ‘a’ and ‘b’ have the same multiplicity in ‘S’

Mario Carneiro (Jan 03 2023 at 23:35):

you can define that as a function on multisets, by having a function like countP (f : A -> Bool) (x : Multiset A) which counts the number of elements of x satisfying f

Mario Carneiro (Jan 03 2023 at 23:35):

where f is (. == a) for example

Michael George (Jan 04 2023 at 15:45):

I think there's some confusion because when I say "multiset" I'm talking about the concept of multiset, rather than a particular implementation. The multiset I'm talking about should be a class, rather than a structure. In the same way that a set is defined by the \in relation, a multiset is defined by the multiplicity function. I initially thought that multiplicity should have type \alpha \to Nat, but there's two problems with this. The first, which you pointed out, is that the domain should really be the set of equivalence classes of \alpha by some relation (which for our purposes should be decidable). That's what I meant by "multisetoid". The second is that the codomain should really be a (potentially infinite) cardinality, rather than a Nat. This only matters for infinite data structures, but I guess those are useful.

Probably the only requirement for the codomain is that it's a partial order, I haven't thought about it. Probably something about decidability too.

Michael George (Jan 04 2023 at 15:51):

The reason I mention quotients is that I don't know how (=) is defined in lean; maybe it doesn't need to be baked into the interfaces for the data structures because you can already form quotients. I think there's also some difference between how we're thinking of the equivalence relation; I'm thinking of both the equivalence relation on elements and the order as being part of the data structure definition, with a requirement that they are compatible, whereas it seems like you're suggesting that the order is not part of the data structure, but is rather extracted from the partial order (and the compatibility becomes a lemma rather than a field)

James Gallicchio (Jan 04 2023 at 18:23):

(Needless to say I am a bit lost.......)

James Gallicchio (Jan 04 2023 at 18:24):

We can definitely flesh out both a model-based hierarchy and a property-based hierarchy and see what works out. Prove equivalences and such. Unless Mario et al are pretty confident it would be a pain to work with

James Gallicchio (Jan 08 2023 at 23:38):

@Wojciech Nawrocki @Michael George are you both still available for tomorrow at 11am? I think (?) this Jitsi link should work for then https://meet.jit.si/moderated/fb7a9cfff6d2836340d5adf794798f347c69312ecc59058412194e8abc418406

James Gallicchio (Jan 08 2023 at 23:39):

Oh @Mac I see you on the when2meet too :) does that time work for you still?

Mario Carneiro (Jan 08 2023 at 23:57):

I can't attend that time because it overlaps with the maintainer meeting. (Not sure if I was invited though. If you don't need me around feel free to ignore.)

Wojciech Nawrocki (Jan 09 2023 at 00:47):

@Mario Carneiro did you see https://www.when2meet.com/?18110122-sgZI3

James Gallicchio (Jan 09 2023 at 01:07):

Oh, what time is that at? We can potentially schedule for another time...

Mario Carneiro (Jan 09 2023 at 01:08):

11:30 EST

Mario Carneiro (Jan 09 2023 at 01:11):

I added my availability to that when2meet link

Mario Carneiro (Jan 09 2023 at 01:13):

We could do it during the std meeting on tuesday, although I'm thinking of cancelling regular meetings entirely because there isn't enough development / attendance to make it worthwhile

James Gallicchio (Jan 09 2023 at 01:15):

Hrm, okay, let's shoot for 10:30 instead. I think an hour is plenty, but if we go over we go over

James Gallicchio (Jan 09 2023 at 01:20):

@Mario Carneiro @Wojciech Nawrocki @Michael George @Mac slightly revised meeting time -- 10:30am EST tomorrow!
Jitsi link: https://meet.jit.si/moderated/fb7a9cfff6d2836340d5adf794798f347c69312ecc59058412194e8abc418406

Michael George (Jan 09 2023 at 01:29):

I'll have to move another meeting but I should be able to

Tomas Skrivan (Jan 09 2023 at 14:26):

Sorry, I won't be able to join, I'm currently on a vacation.

Mac (Jan 10 2023 at 13:50):

Sorry for missing the meeting. I didn't see the message before it was already to late. :sad:

James Gallicchio (Jan 10 2023 at 17:39):

No worries -- the main things we decided were (1) that ToList would be the model for all collections proofs (2) that implementations and users have very different expectations for the collections proof API (3) we can start experimenting with lawfulness classes, try to incorporate Wojciech's hashmap proofs and some of the LeanColls/SciLean stuff to see if a good lawfulness API emerges

Alex Keizer (Jun 26 2023 at 12:29):

Has there been any progress on this since? I find that with the different List-like things we currently have in mathlib (e.g, List, Array, Vector, Bitvec, and I'm sure there are others still), I would love it if there were typeclasses like Foldable, instead of the current situation were foldl/foldr and such are just functions reimplemented on each different type (and often not implemented at all).

James Gallicchio (Jun 26 2023 at 12:38):

No progress since, but I'm starting to work on it again. At the meeting we decided to use ToList as the base of the hierarchy instead of a Foldable typeclass. But I think the LeanColls classes like FoldableOps and IndexedOps were somewhat uncontroversial, so I'm going to put together a PR branch to Std and get feedback from everyone on whether the definitions are okay with everyone.

James Gallicchio (Jun 26 2023 at 12:41):

(I want to start writing more code that relies on those typeclasses, just to see how the experience is, so having it on a branch that we are refining is an easy way for me to just change my project lakefiles)

Alex Keizer (Jun 26 2023 at 12:47):

I wouldn't mind helping out with that PR, if you have a clear idea of what needs to be done.

James Gallicchio (Jun 26 2023 at 12:55):

That would be awesome. I'll add you to a branch in a bit -- I want to get the following cleaned up first:

  • ToList
  • Fold
  • FoldUntil
  • FoldableOps -> FoldOps (shorter name better :) with default impls derived from FoldUntil and Fold
  • Indexed (though we need to think about this class carefully -- see above discussions of SciLean's PowType and Enumtype)
  • IndexedOps with default impl

James Gallicchio (Jun 26 2023 at 12:56):

We also need to get the remaining PRs merged that move permutations into Std, so shepherding that would be super helpful (I am gonna be busy for the next few hours)

James Gallicchio (Jun 26 2023 at 14:15):

oh, i have a branch already that we can use :joy: https://github.com/leanprover/std4/pull/20

James Gallicchio (Jun 26 2023 at 14:32):

@Alex Keizer added you to my fork. if anyone else wants access let me know

Alex Keizer (Jun 26 2023 at 15:01):

Some comments after reading the LeanColls source:

  • With Iterable, we're tying the concept of an iterator together with the concept of a collection that can be iterated over. Would it make sense to separate that into Iterator and Iterable/ToIterator classes? That would also reduce some of the duplication between Iterable and Iterable'

  • Do we want to tie in the Indexed class with the existing GetElem class for the notation?

Then, I was reading back this thread and got a bit confused by your earlier statement

I'm a bit hesitant to prefer Iterable over Foldable as the default iteration class; Rust had a big back and forth between internal/external iterators and ended up settling on internal iteration as preferred, because it tended to produce faster bytecode. I suspect (but have not measured this) that the same is true in Lean

In my experience, Rust seems to heavily make use of its Iterator trait, which would be external iteration, right? Am I misunderstanding something here?

James Gallicchio (Jun 26 2023 at 15:07):

Rust's Iterator added an internal iteration method a few years ago, try_fold. So Iterator is a trait with both internal and external iteration.

I've almost never used Rust though so I don't really know how Iterator works -- this is the post I read: https://medium.com/@veedrac/rust-is-slow-and-i-am-the-cure-32facc0fdcb

James Gallicchio (Jun 26 2023 at 15:15):

Oh, we also need to upstream Multiset and Finset from Mathlib for defining set-like and map-like interfaces

Alex Keizer (Jun 26 2023 at 16:15):

Do you want to upstream everything in one PR, or maybe start minimally?

James Gallicchio (Jun 26 2023 at 16:28):

I think Mario wants small PRs for upstreaming, makes it easier to review

Alex Keizer (Jun 26 2023 at 16:28):

James Gallicchio said:

Rust's Iterator added an internal iteration method a few years ago, try_fold. So Iterator is a trait with both internal and external iteration.

I've almost never used Rust though so I don't really know how Iterator works -- this is the post I read: https://medium.com/@veedrac/rust-is-slow-and-i-am-the-cure-32facc0fdcb

I didn't know about try_fold, but I would still argue that doesn't constitute a preference for internal iteration. Things like for-loops still get desugared to external iteration, and I feel a lot of idiomatic Rust still primarily uses external iteration.
Not that it matters that much, I think for Lean we should look more at which style is more flexible/ergonomic. AFAIU, Lean isn't as suited to micro-optimization as Rust anyway.

Kyle Miller (Jun 26 2023 at 16:34):

James Gallicchio said:

Oh, we also need to upstream Multiset and Finset from Mathlib for defining set-like and map-like interfaces

What's the design idea here?

James Gallicchio (Jun 26 2023 at 16:40):

For collections where the order or multiplicity of elements shouldn't matter (e.g. HashSet), I think those are the types we will want to use as models.

James Gallicchio (Jun 26 2023 at 16:42):

As Wojciech described above, proving non-= congruences manually is a pain, and working with quotients like Multiset/Finset is probably nicer.

James Gallicchio (Jun 26 2023 at 16:46):

Those collections will of course also be ToListable, and we can have lawfulness theorems that the ToMultiset or ToFinset models agree with the canonical implementation derived fromToList. But I am guessing we won't ever want to use the ToList model for e.g. HashSet. So we need pretty fleshed out theories for the multiset/finset models as well (stolen from Mathlib).

Kyle Miller (Jun 26 2023 at 16:50):

Is the point of ToMultiset and ToFinset programming applications or mathematical/CS-proving ones?

For example, I don't think you could ever write a general mapM for Finset or Multiset. If the underlying type is orderable, you could write one that does ToList by sorting the elements and running the list mapM, but very few monads are symmetric (which you'd need for non-orderable types), and also that sorting is overhead you probably don't want to incur.

James Gallicchio (Jun 26 2023 at 16:55):

yes, but that seems fine/desirable... you can define an interface for mapM but not define its lawfulness with respect to ToMultiset, because it doesn't have a well-defined result when the order is arbitrary.

James Gallicchio (Jun 26 2023 at 16:57):

In my mind, the models are there for CS-proving. But they're not essential for programming applications, because we'll have separate classes for the operations (FoldOps/IndexedOps/MapLike/SetLike)

James Gallicchio (Jun 26 2023 at 17:32):

@Kyle Miller the branch's current contents are somewhat arbitrary experimentation :rofl: so I wouldn't read too much into it yet


Last updated: Dec 20 2023 at 11:08 UTC