Zulip Chat Archive
Stream: std4
Topic: collection typeclasses
James Gallicchio (Oct 04 2022 at 21:05):
Looking for feedback on adding a few collections typeclasses from LeanColls that I've found useful for writing code operating on collections.
The first typeclasses are to do with fundamental basic operations on collections:
Foldable
(perhapsFold
?) for colls which have a folding function/internal iteratorIterable
for colls which can be externally iterated over (e.g. an iterator with a step function)Enumerable
for colls which can be constructed one element at a time
Some more for common types of collections:
Indexed
for colls which have efficient indexedget
/nth
functionsInitable
for colls efficiently constructable from aFin n -> A
initialization functionSetLike
for types isomorphic toA -> Bool
MapLike
for types isomorphic toA -> Option B
Queue
for LIFO/FIFO queues (somewhat outdated)
And then there's a few classes for implementations of common collection operations
FoldableOps
for operations implementable on anyFoldable
IndexedOps
for operations efficiently implementable on anyIterable
Tomas Skrivan (Oct 05 2022 at 05:44):
I have a similar set of typeclasses for development of SciLean. My main interest is working with collections that are multidimensional arrays, it is interesting to see how it reflects in different design choices.
The most prominent are:
Iterable
- I think yours is better as it gives extra flexibility with iterator
Enumtype
- type with an explicit isomorphism with Fin n
. This one I use all over the place.
SetElem
- accompanying buildin GetElem
class
HasIntro
- similar to yours Initiable
but allows for arbitrary index type and carries lawfulness proof.
Fee comments to yours:
- I like the use of iterator in
Iterable
. - I'm a bit confused that by default
Foldable
impliesIterable
, plus through construction intermediateList
which can be performance footgun. Wouldn't you accidentally convertArray
, or even worseFloatArray
, toList
? Indexed
andInitiable
should use a generic type instead ofFin n
for the index typeMapLike
should align with buildina[i]?
notation
In general, it would be great to converge to a certain set of these classes and add them to std4.
Also how should we approach lawfulness? Should we define Lawful***
version of all of these? Or define ***.Correct
class?
I actually always want the lawful version of these. Is there a good example when you do not want lawful version? (That you do not want to provide proof does not count as you can always sorry it and at least document your intention that way)
James Gallicchio (Oct 05 2022 at 06:03):
Feedback very appreciated, I'm gonna clone scilean and play around with it!
Tomas Skrivan (Oct 05 2022 at 06:06):
Also another class of interest might be:
PlainDataType
- type that allows for conversion from/to byte(s). This allows to define DataArray α
with appropriate bit compatification, thus DataArray Bool
is bit array or DataArray (Fin 8 × Fin 4 × Fin 3)
consumes one byte per element.
James Gallicchio (Oct 05 2022 at 06:19):
Tomas Skrivan said:
I'm a bit confused that by default Foldable implies Iterable, plus through construction intermediate List which can be performance footgun. Wouldn't you accidentally convert Array, or even worse FloatArray, to List?
Yeah, you definitely could end up with bad performance if you're not careful. All of the collections in LeanColls that have faster Iterable
implementations do override it. I wasn't sure whether to explicitly extend it or not, since most users aren't going to be declaring their own Iterable
.
What I did with the *Ops
typeclasses was to provide a default implementation given the minimum implementable subset, and then I explicitly said instance : *Ops A := default
for collections where the default was what I wanted. Perhaps we do the same with Iterable.
James Gallicchio (Oct 05 2022 at 06:23):
Tomas Skrivan said:
Also how should we approach lawfulness? Should we define
Lawful***
version of all of these? Or define***.Correct
class?
I don't know much about Lean typeclass best practices, I'm not really sure what the difference is. for Foldable
, the only "law" I have is that they abide parametricity, which is a relatively weak result but enough to let me write some theorems generic over folds :D
Tomas Skrivan (Oct 05 2022 at 06:32):
I think when it comes to "lawfulness" these classes usually interact in pairs. For example Initiable
with GetElem
or with Size
. I'm not sure how to structure this. For example I have PowType
(for types X^I
that are morally I -> X
) that bundles get/set/init and that they interact together, but I'm not sure if that is the best approach.
Tomas Skrivan (Oct 05 2022 at 06:35):
James Gallicchio said:
Tomas Skrivan said:
I'm a bit confused that by default Foldable implies Iterable, plus through construction intermediate List which can be performance footgun. Wouldn't you accidentally convert Array, or even worse FloatArray, to List?
Yeah, you definitely could end up with bad performance if you're not careful. All of the collections in LeanColls that have faster
Iterable
implementations do override it. I wasn't sure whether to explicitly extend it or not, since most users aren't going to be declaring their ownIterable
.What I did with the
*Ops
typeclasses was to provide a default implementation given the minimum implementable subset, and then I explicitly saidinstance : *Ops A := default
for collections where the default was what I wanted. Perhaps we do the same with Iterable.
I'm actually a quite confused by your Foldable
class. Every Iterable
can be Foldable
, no? Is there an instance for it? You have the reverse, which I find a bit questionable and dangerous.
James Gallicchio (Oct 05 2022 at 06:37):
Not every Iterable
can be Foldable
. Folding is guaranteed to be finite, whereas iteration can be infinite. The corresponding built-in classes are ForIn
(for fold) and Stream
(for Iterable)
James Gallicchio (Oct 05 2022 at 06:39):
Tomas Skrivan said:
I think when it comes to "lawfulness" these classes usually interact in pairs. For example
Initiable
withGetElem
or withSize
. I'm not sure how to structure this. For example I havePowType
(for typesX^I
that are morallyI -> X
) that bundles get/set/init and that they interact together, but I'm not sure if that is the best approach.
I see... hm. I'm not sure either. My suspicion is that nearly all lawfulness results in collection-world can be stated relative to get
(most of my collections in LeanColls have loads of results of this form, but I haven't tried to make a typeclass for that yet...)
James Gallicchio (Oct 05 2022 at 06:42):
Relevant: I'm a bit surprised by FunType
including an extensionality requirement. It seems a bit overly restrictive, since there's tons of collections that don't have canonical forms. We could quotient by the equivalence relation of being equal on all elements, but it seems better to just always state correctness relative to get
?
Tomas Skrivan (Oct 05 2022 at 06:44):
James Gallicchio said:
Not every
Iterable
can beFoldable
. Folding is guaranteed to be finite, whereas iteration can be infinite. The corresponding built-in classes areForIn
(for fold) andStream
(for Iterable)
Ahh makes total sense. Well I would also suggest to maybe get ready to be aligned with mathlib. There is finite
which should imply finite termination of iteration. Thus maybe Foldable
should be Iterable
+ finite
?
James Gallicchio (Oct 05 2022 at 06:47):
I'm a bit hesitant to prefer Iterable over Foldable as the default iteration class; Rust had a big back and forth between internal/external iterators and ended up settling on internal iteration as preferred, because it tended to produce faster bytecode. I suspect (but have not measured this) that the same is true in Lean, but maybe I should test that hypothesis :joy:
Tomas Skrivan (Oct 05 2022 at 06:49):
James Gallicchio said:
Relevant: I'm a bit surprised by
FunType
including an extensionality requirement. It seems a bit overly restrictive, since there's tons of collections that don't have canonical forms. We could quotient by the equivalence relation of being equal on all elements, but it seems better to just always state correctness relative toget
?
You are probably right that it is too restrictive in general. All I care about right now are multi-dimensional array like types and really want them to form algebraic structures. So for example there should be only one zero matrix, the one with zero elements.
Tomas Skrivan (Oct 05 2022 at 06:52):
James Gallicchio said:
I'm a bit hesitant to prefer Iterable over Foldable as the default iteration class; Rust had a big back and forth between internal/external iterators and ended up settling on internal iteration as preferred, because it tended to produce faster bytecode. I suspect (but have not measured this) that the same is true in Lean, but maybe I should test that hypothesis :joy:
Ohh I see, then it is a definitely good idea to keep them separate if they have different performance behavior. I think testing performance right now, as the compiler is undergoing massive changes, is not going to be really indicative.
James Gallicchio (Oct 05 2022 at 06:53):
This makes complete sense -- then there's definitely a use case for a typeclass of GetElem
with extensionality.
Naming these typeclasses is going to be a nightmare :face_palm: maybe something like LawfulGetExt
for the extensionality law.
James Gallicchio (Oct 05 2022 at 06:55):
Tomas Skrivan said:
I think testing performance right now, as the compiler is undergoing massive changes, is not going to be really indicative.
Yeah... I'm waiting until at least after the new code generator is done. It's the main reason I haven't been doing any performance testing yet. :/ but I think we can nail down a solid user interface before even thinking about (constant factor) performance
James Gallicchio (Oct 05 2022 at 06:58):
@Tomas Skrivan do you have a built-in range class for iterating over the elements of Fin n
? I've done a bit of work on making a Range
iterator that works for common array stuff, and I'm assuming SciLean has something similar
Tomas Skrivan (Oct 05 2022 at 07:12):
To iterate over Enumtype ι
(i.e. ι
isomoprhic to Fin n
) I use forIn
notation
Example of creating 3x3 identity matrix we will iterate over all Fin 3 × Fin 3
#eval Id.run do
let mut a : ℝ^(Fin 3 × Fin 3) := 0
for ((i,j),li) in Enumtype.fullRange a.Index do
a[li] := if i = j then 1 else 0
a
The forIn
notation gives you (id, ld) : ι × (Fin (numOf ι))
, id
is the structure index and li
is linear index that actually corresponds how the matrix is stored in memory and you can use both indices to access matrix elements, a[id]
, a[li]
of if id = (i,j)
you can write a[i,j]
.
(right now on github li
is Nat
but I have recently changed it to Fin
)
Tomas Skrivan (Oct 05 2022 at 07:46):
Also have you considered using zipper for modifying lists and trees? Or do you already have it?
James Gallicchio (Oct 05 2022 at 08:07):
I'd consider it a separate data structure. Didn't have a use case for it yet so I didn't implement anything on it. If we want generic, fast, persistent sequences, there's some radix-tree stuff I've been working on implementing in Lean that should cover those use cases!
James Gallicchio (Oct 05 2022 at 08:10):
Tomas Skrivan said:
#eval Id.run do let mut a : ℝ^(Fin 3 × Fin 3) := 0 for ((i,j),li) in Enumtype.fullRange a.Index do a[li] := if i = j then 1 else 0 a
The
forIn
notation gives you(id, ld) : ι × (Fin (numOf ι))
,id
is the structure index andli
is linear index that actually corresponds how the matrix is stored in memory and you can use both indices to access matrix elements,a[id]
,a[li]
of ifid = (i,j)
you can writea[i,j]
.
This is really interesting. Is there a huge cost to actually doing the indexing calculation at each iteration of the loop? I know most c compilers can optimize those kinds of calculations in loops, and it feels a bit not-ergonomic for the user to not write [i,j]
everywhere
Tomas Skrivan (Oct 05 2022 at 08:38):
I agree that it is not the most ergonometric thing. In the future, my hope is to add some optimization pass that replaces a[i,j]
with a[li]
.
I have no clue if the conversion adds some cost but it might and dealing with it looked like an interesting problem :)
The conversion between structured and linear index can be somewhat complicated. For example, I deal with 3d regular grids and when I loop over all edges of the grid, the structured index holds info if the edge is X, Y or Z aligned and it's coordinates. Once I start doing finite element analysis over this grid it will get even more complicated.
James Gallicchio (Oct 05 2022 at 08:50):
Oh boy............
Is there a good reason to linearize stuff like that? I'd naively have just stored the X, Y, and Z edges as 3 separate 3-dim arrays.
Tomas Skrivan (Oct 05 2022 at 09:17):
Of course, if you have some data stored on every edge and this data satisfies some linear system. Then you need to build the corresponding matrix, thus you need linear index.
With finite element method this gets even more complicated, you have some data on points, some data on edges, faces and cells. They all interact through one big linear system. Thus you need linear index for the union of points, edges, faces and cells. This is actually the major pain in the ass in implementing finite element analysis and I'm soo happy to use Lean because it allows me to abstract all this annoyance away.
James Gallicchio (Oct 25 2022 at 14:32):
I'm looking to add an API for fixed-length arrays, any gripes with the name ArrayN
? Inspired from @Tomas Skrivan's ArrayN
Michael George (Jan 02 2023 at 13:53):
I think a good way to structure the laws is in terms of multiset operations, since collections are, well, collections. I think the fundamental concept here is multiplicity (the number of a given item in a collection). From this, you can define membership, set operations like union, intersection, etc, correctness for iterators, etc. I think this is more fundamental than iteration and especially folding.
Michael George (Jan 02 2023 at 14:01):
Regarding lawfulness, one example of a potentially non-lawful data structure would be something like a Bloom filter, although that's arguably not a collection. AFAICT the choice of whether to provide "non-lawful" instances seems to vary for different parts of the library. For example, if I understand correctly, the algebraic hierarchy largely bundles the laws with the structures so that you can't have a group that doesn't follow the group law, while there is a distinction between Monad
and LawfulMonad
. I'm not sure if there's a good reason for this other than mathematicians are used to thinking more about the laws first and the implementations second while programmers are used to thinking more about the implementations first and the laws second.
Michael George (Jan 02 2023 at 14:05):
Regarding naming, I wonder if there is benefit in designing a Lawful
class, with curried instances like Lawful SortedSet
, Lawful PriorityQueue
, Lawful Dictionary
etc. I haven't thought carefully enough about exactly how the dependent types would work out here; my gut says it would be possible but probably not trivial to figure out the right type for Lawful
. It's also not clear to me whether this provides benefit. It is maybe better to just bundle lawfulness axioms, so that instances of the corresponding classes are all lawful. I think using notation typeclasses allows for unlawful instances.
James Gallicchio (Jan 02 2023 at 17:33):
You definitely could make a universe-polymorphic Lawful
typeclass but I think the universe inference algorithm would quickly become unable to guess the right universes for everything :joy: but that's orthogonal to the typeclass design
I definitely should have split out lawful classes; it's quite nice to have the separation between implementation and laws when developing the library, so we don't feel like we need to prove the laws for each implementation when implementing it...
James Gallicchio (Jan 02 2023 at 17:47):
@Michael George was interested in meeting to discuss collection design stuff, would anyone else want to join a ~1hr meeting?
I want to discuss
- Naming conventions
- Goal list for interfaces
- Goal list for implementations
- Top-level class hierarchy
If interested, add your availability here: https://www.when2meet.com/?18110122-sgZI3
James Gallicchio (Jan 02 2023 at 17:53):
(@Tomas Skrivan @Mario Carneiro @Wojciech Nawrocki @Mac pinging you because this might be of interest, and I'd appreciate your input!)
James Gallicchio (Jan 02 2023 at 17:55):
I'll put together a doc with agenda & summary of what SciLean/LeanColls/Std4 have right now
Wojciech Nawrocki (Jan 02 2023 at 22:19):
The mathematical model. I definitely agree that the mathematical model for collections should be something like lists up to permutation. One question which isn't entirely answered is whether to use lists up to permutation literally, or take the quotient and use multisets. So far I only attempted the former approach which worked out fine most of the time, except that proofs using permutations are really ugly because Lean doesn't (yet?) have generalized rewriting so we must construct them by hand, like here. A quotient would turn this into normal rewriting by equality. It would be good to try it and see if things actually become easier or something else breaks; I haven't tried so far.
Wojciech Nawrocki (Jan 02 2023 at 22:23):
Laws vs implementations. I also agree that there should be a Foldable
typeclass which just exposes the operations but doesn't contain any laws. This is convenient for programming - you don't always want to prove things correct. In mathematics you pretty much do - a group is lawful by definition; a (multiplicative) group without laws is not a group but rather something like HasMul
+HasOne
+HasInv
. And then a LawfulFoldable
which says that the Foldable
instance is compatible with the (multiset or whatever) model. I am not seeing any benefit to having a generic Lawful TypeClass
however. The laws are pretty specific to what TypeClass
is, so there wouldn't be much you could do by parameterizing over TypeClass
.
Michael George (Jan 03 2023 at 20:01):
It seems the same either way, we just call them HasFold
and Foldable
(mathlib convention) or Foldable
and LawfulFoldable
(std convention)
Michael George (Jan 03 2023 at 20:09):
Re modeling: Lists or quotients of lists feel very implementation focused, whereas multisets feel very interface / specification focused. My gut says that the latter will lead to more natural definitions and proofs, but I don't have a lot of lean experience yet so there may be other conventions or considerations I'm not aware of.
Michael George (Jan 03 2023 at 20:16):
For example, the following seem very natural to me:
/-- The `HasMultiplicity` typeclass supports the `#(a ∈ S)` notation. -/
class HasMultiplicity (α : outParam <| Type u) (β : Type v) where
/-- `multiplicity a S` is the number of occurrences of `a` in `S`. -/
multiplicity : α → β → Nat
export HasMultiplicity (multiplicity)
/-- `#(a ∈ S)` gives the multiplicity of `a` in the multiset `S` -/
notation:max " #( " a:55 " ∈ " S:55 " ) " => multiplicity a S
/-- The standard definition of `⊆` in terms of multiplicity -/
class LawfulMultiSubset [HasMultiplicity α β] [HasSubset β] where
/-- `S₁ ⊆ S₂` means every `a ∈ S₁` is also in `S₂` (counting multiplicity). -/
subset_defn (S₁ S₂ : β) : S₁ ⊆ S₂ ↔ ∀ a, #(a ∈ S₁) ≤ #( a ∈ S₂)
/-- The standard definition of `∪` in terms of multiplicity -/
class LawfulMultiUnion [HasMultiplicity α β] [Union β] where
/-- `a ∈ S₁ ∪ S₂` means that `a ∈ S₁` or `a ∈ S₂` (counting multiplicity). -/
union_defn (S₁ S₂ : β): ∀ a : α, #(a ∈ S₁ ∪ S₂) = #(a ∈ S₁) + #(a ∈ S₂)
Then operations like insert
have natural laws:
insert a S = S \cup { a }
- for
poll
:S = {fst (poll S)} ∪ snd (poll S)
James Gallicchio (Jan 03 2023 at 20:26):
It does seem like multiplicity would be clean for describing a lot of collection operations. I haven't seen this used before in other formal languages, do you know if it has?
Definitely worth trying out. I can't really visualize how this would shape the hierarchy, but we can talk about that when we meet!
Speaking of which, it looks like Monday morning (EST) is when we are all the most free, so let's tentatively schedule for 11am EST
Michael George (Jan 03 2023 at 20:31):
I don't know of other examples of this design. This idea comes from trying to apply ideas that went into the design of the algebraic hierarchy to the problem of collections.
Mario Carneiro (Jan 03 2023 at 21:59):
one of the issues with using multiplicity-based characterizations of multisets, at least as far as the mathlib multiset formalization is concerned, is that it adds "unnecessary" DecidableEq
assumptions to a lot of definitions and theorems, since you can't count the number of occurrences of a
in a multiset unless the base type has decidable equality
Mario Carneiro (Jan 03 2023 at 22:00):
For a lot of data structures in std you don't really want to be using DecidableEq
in the first place, because there is instead some equivalence relation implied by a BEq
or Ord
instance which the data structure is supposed to be using
Michael George (Jan 03 2023 at 22:27):
I see, you really want a sort of multisetoid. I'm haven't yet learned how quotients work in lean so this is a little beyond my current understanding
Mario Carneiro (Jan 03 2023 at 22:33):
this is what multiset
is in lean
Mario Carneiro (Jan 03 2023 at 22:34):
a multiset is a list-up-to-permutation, i.e. a quotient of List A
by List.Perm
Michael George (Jan 03 2023 at 23:11):
I was talking about the quotient of the set of elements by the equivalence defined by BEq. That is, ‘multiplicity’ should satisfy ‘a == b’ implies ‘a’ and ‘b’ have the same multiplicity in ‘S’
Mario Carneiro (Jan 03 2023 at 23:35):
you can define that as a function on multisets, by having a function like countP (f : A -> Bool) (x : Multiset A)
which counts the number of elements of x
satisfying f
Mario Carneiro (Jan 03 2023 at 23:35):
where f
is (. == a)
for example
Michael George (Jan 04 2023 at 15:45):
I think there's some confusion because when I say "multiset" I'm talking about the concept of multiset, rather than a particular implementation. The multiset I'm talking about should be a class, rather than a structure. In the same way that a set is defined by the \in
relation, a multiset is defined by the multiplicity
function. I initially thought that multiplicity
should have type \alpha \to Nat
, but there's two problems with this. The first, which you pointed out, is that the domain should really be the set of equivalence classes of \alpha
by some relation (which for our purposes should be decidable). That's what I meant by "multisetoid". The second is that the codomain should really be a (potentially infinite) cardinality, rather than a Nat
. This only matters for infinite data structures, but I guess those are useful.
Probably the only requirement for the codomain is that it's a partial order, I haven't thought about it. Probably something about decidability too.
Michael George (Jan 04 2023 at 15:51):
The reason I mention quotients is that I don't know how (=)
is defined in lean; maybe it doesn't need to be baked into the interfaces for the data structures because you can already form quotients. I think there's also some difference between how we're thinking of the equivalence relation; I'm thinking of both the equivalence relation on elements and the order as being part of the data structure definition, with a requirement that they are compatible, whereas it seems like you're suggesting that the order is not part of the data structure, but is rather extracted from the partial order (and the compatibility becomes a lemma rather than a field)
James Gallicchio (Jan 04 2023 at 18:23):
(Needless to say I am a bit lost.......)
James Gallicchio (Jan 04 2023 at 18:24):
We can definitely flesh out both a model-based hierarchy and a property-based hierarchy and see what works out. Prove equivalences and such. Unless Mario et al are pretty confident it would be a pain to work with
James Gallicchio (Jan 08 2023 at 23:38):
@Wojciech Nawrocki @Michael George are you both still available for tomorrow at 11am? I think (?) this Jitsi link should work for then https://meet.jit.si/moderated/fb7a9cfff6d2836340d5adf794798f347c69312ecc59058412194e8abc418406
James Gallicchio (Jan 08 2023 at 23:39):
Oh @Mac I see you on the when2meet too :) does that time work for you still?
Mario Carneiro (Jan 08 2023 at 23:57):
I can't attend that time because it overlaps with the maintainer meeting. (Not sure if I was invited though. If you don't need me around feel free to ignore.)
Wojciech Nawrocki (Jan 09 2023 at 00:47):
@Mario Carneiro did you see https://www.when2meet.com/?18110122-sgZI3
James Gallicchio (Jan 09 2023 at 01:07):
Oh, what time is that at? We can potentially schedule for another time...
Mario Carneiro (Jan 09 2023 at 01:08):
11:30 EST
Mario Carneiro (Jan 09 2023 at 01:11):
I added my availability to that when2meet link
Mario Carneiro (Jan 09 2023 at 01:13):
We could do it during the std meeting on tuesday, although I'm thinking of cancelling regular meetings entirely because there isn't enough development / attendance to make it worthwhile
James Gallicchio (Jan 09 2023 at 01:15):
Hrm, okay, let's shoot for 10:30 instead. I think an hour is plenty, but if we go over we go over
James Gallicchio (Jan 09 2023 at 01:20):
@Mario Carneiro @Wojciech Nawrocki @Michael George @Mac slightly revised meeting time -- 10:30am EST tomorrow!
Jitsi link: https://meet.jit.si/moderated/fb7a9cfff6d2836340d5adf794798f347c69312ecc59058412194e8abc418406
Michael George (Jan 09 2023 at 01:29):
I'll have to move another meeting but I should be able to
Tomas Skrivan (Jan 09 2023 at 14:26):
Sorry, I won't be able to join, I'm currently on a vacation.
Mac (Jan 10 2023 at 13:50):
Sorry for missing the meeting. I didn't see the message before it was already to late. :sad:
James Gallicchio (Jan 10 2023 at 17:39):
No worries -- the main things we decided were (1) that ToList would be the model for all collections proofs (2) that implementations and users have very different expectations for the collections proof API (3) we can start experimenting with lawfulness classes, try to incorporate Wojciech's hashmap proofs and some of the LeanColls/SciLean stuff to see if a good lawfulness API emerges
Alex Keizer (Jun 26 2023 at 12:29):
Has there been any progress on this since? I find that with the different List
-like things we currently have in mathlib (e.g, List
, Array
, Vector
, Bitvec
, and I'm sure there are others still), I would love it if there were typeclasses like Foldable
, instead of the current situation were foldl
/foldr
and such are just functions reimplemented on each different type (and often not implemented at all).
James Gallicchio (Jun 26 2023 at 12:38):
No progress since, but I'm starting to work on it again. At the meeting we decided to use ToList
as the base of the hierarchy instead of a Foldable
typeclass. But I think the LeanColls classes like FoldableOps
and IndexedOps
were somewhat uncontroversial, so I'm going to put together a PR branch to Std and get feedback from everyone on whether the definitions are okay with everyone.
James Gallicchio (Jun 26 2023 at 12:41):
(I want to start writing more code that relies on those typeclasses, just to see how the experience is, so having it on a branch that we are refining is an easy way for me to just change my project lakefiles)
Alex Keizer (Jun 26 2023 at 12:47):
I wouldn't mind helping out with that PR, if you have a clear idea of what needs to be done.
James Gallicchio (Jun 26 2023 at 12:55):
That would be awesome. I'll add you to a branch in a bit -- I want to get the following cleaned up first:
ToList
Fold
FoldUntil
FoldableOps -> FoldOps
(shorter name better :) with default impls derived fromFoldUntil
andFold
Indexed
(though we need to think about this class carefully -- see above discussions of SciLean'sPowType
andEnumtype
)IndexedOps
with default impl
James Gallicchio (Jun 26 2023 at 12:56):
We also need to get the remaining PRs merged that move permutations into Std, so shepherding that would be super helpful (I am gonna be busy for the next few hours)
James Gallicchio (Jun 26 2023 at 14:15):
oh, i have a branch already that we can use :joy: https://github.com/leanprover/std4/pull/20
James Gallicchio (Jun 26 2023 at 14:32):
@Alex Keizer added you to my fork. if anyone else wants access let me know
Alex Keizer (Jun 26 2023 at 15:01):
Some comments after reading the LeanColls
source:
-
With
Iterable
, we're tying the concept of an iterator together with the concept of a collection that can be iterated over. Would it make sense to separate that intoIterator
andIterable/ToIterator
classes? That would also reduce some of the duplication betweenIterable
andIterable'
-
Do we want to tie in the
Indexed
class with the existingGetElem
class for the notation?
Then, I was reading back this thread and got a bit confused by your earlier statement
I'm a bit hesitant to prefer Iterable over Foldable as the default iteration class; Rust had a big back and forth between internal/external iterators and ended up settling on internal iteration as preferred, because it tended to produce faster bytecode. I suspect (but have not measured this) that the same is true in Lean
In my experience, Rust seems to heavily make use of its Iterator
trait, which would be external iteration, right? Am I misunderstanding something here?
James Gallicchio (Jun 26 2023 at 15:07):
Rust's Iterator
added an internal iteration method a few years ago, try_fold
. So Iterator
is a trait with both internal and external iteration.
I've almost never used Rust though so I don't really know how Iterator
works -- this is the post I read: https://medium.com/@veedrac/rust-is-slow-and-i-am-the-cure-32facc0fdcb
James Gallicchio (Jun 26 2023 at 15:15):
Oh, we also need to upstream Multiset
and Finset
from Mathlib for defining set-like and map-like interfaces
Alex Keizer (Jun 26 2023 at 16:15):
Do you want to upstream everything in one PR, or maybe start minimally?
James Gallicchio (Jun 26 2023 at 16:28):
I think Mario wants small PRs for upstreaming, makes it easier to review
Alex Keizer (Jun 26 2023 at 16:28):
James Gallicchio said:
Rust's
Iterator
added an internal iteration method a few years ago,try_fold
. SoIterator
is a trait with both internal and external iteration.I've almost never used Rust though so I don't really know how
Iterator
works -- this is the post I read: https://medium.com/@veedrac/rust-is-slow-and-i-am-the-cure-32facc0fdcb
I didn't know about try_fold
, but I would still argue that doesn't constitute a preference for internal iteration. Things like for
-loops still get desugared to external iteration, and I feel a lot of idiomatic Rust still primarily uses external iteration.
Not that it matters that much, I think for Lean we should look more at which style is more flexible/ergonomic. AFAIU, Lean isn't as suited to micro-optimization as Rust anyway.
Kyle Miller (Jun 26 2023 at 16:34):
James Gallicchio said:
Oh, we also need to upstream
Multiset
andFinset
from Mathlib for defining set-like and map-like interfaces
What's the design idea here?
James Gallicchio (Jun 26 2023 at 16:40):
For collections where the order or multiplicity of elements shouldn't matter (e.g. HashSet), I think those are the types we will want to use as models.
James Gallicchio (Jun 26 2023 at 16:42):
As Wojciech described above, proving non-= congruences manually is a pain, and working with quotients like Multiset/Finset is probably nicer.
James Gallicchio (Jun 26 2023 at 16:46):
Those collections will of course also be ToList
able, and we can have lawfulness theorems that the ToMultiset
or ToFinset
models agree with the canonical implementation derived fromToList
. But I am guessing we won't ever want to use the ToList
model for e.g. HashSet
. So we need pretty fleshed out theories for the multiset/finset models as well (stolen from Mathlib).
Kyle Miller (Jun 26 2023 at 16:50):
Is the point of ToMultiset
and ToFinset
programming applications or mathematical/CS-proving ones?
For example, I don't think you could ever write a general mapM
for Finset
or Multiset
. If the underlying type is orderable, you could write one that does ToList
by sorting the elements and running the list mapM
, but very few monads are symmetric (which you'd need for non-orderable types), and also that sorting is overhead you probably don't want to incur.
James Gallicchio (Jun 26 2023 at 16:55):
yes, but that seems fine/desirable... you can define an interface for mapM
but not define its lawfulness with respect to ToMultiset
, because it doesn't have a well-defined result when the order is arbitrary.
James Gallicchio (Jun 26 2023 at 16:57):
In my mind, the models are there for CS-proving. But they're not essential for programming applications, because we'll have separate classes for the operations (FoldOps
/IndexedOps
/MapLike
/SetLike
)
James Gallicchio (Jun 26 2023 at 17:32):
@Kyle Miller the branch's current contents are somewhat arbitrary experimentation :rofl: so I wouldn't read too much into it yet
Last updated: Dec 20 2023 at 11:08 UTC