Zulip Chat Archive

Stream: lean4

Topic: Interface for containers


Tomas Skrivan (Dec 03 2021 at 16:41):

The recent talk of mathlib4/standard library and my interest in multi-dimensional arrays prompted me to write a general interface for containers. See my fork of mathlib4

I have defined three core classes:

  1. Iterable (ι : Type) - anything that can be iterated on. i.e. has start and successor (can be empty! I want Fin 0 to be iterable)
  2. Enumtype (ι : Type) - something with an explicit bijection with Fin n
  3. Cont (C ι α : Type) - a container c : C that is indexed by ι and has elements of type α

What have I done:

  1. Common interface with notation like A[i,j,k]. See example

  2. ForIn loops like:

    let mut m : Matrix 4 4 := λ (i,j) => j.1.toFloat
    for ((i,j), li) in allIdx m do
      m := set m (i,j) li.toFloat

example 1
example 2

  1. Lazy containers: Any function f : (ι → α) can be interpreted as a container, with notation f : ι ↦ α and "lambda" abstraction cont i => f i

  2. Lazy operations like matrix multiplication and broadcasting. Writing a*x + y does not actually execute any code, but builds and expression that is evaluated only when needed. This way we can get fast BLAS like operations. This effectively creates expression templates, I would be really interested in writing a tactic that optimizes these expressions.

    let mut m : Matrix 4 4 := intro λ (i,j) => j.1.toFloat
    let mut v : Vector 4 := intro λ i : Fin 4 => i.1.toFloat
    let u : Fin 4  Float := m*v     --- `u` is not `Vector 4`

What I need help with:

  1. Notation for slices, I want to be able to write A[2, :]. That should be expanded to fun i => A[2,i].
    The · notation does not work, it consumes to much. f A[2,·] get expanded to fun i => f A[2,i] but I want f (fun i => A[2,i])
    I wrote bunch of ideas how this should work.

  2. Nicer notation for mutation in for loops
    Absolute must is:

    let mut m : Matrix 4 4 := λ (i,j) => j.1.toFloat
    for ((i,j), li) in allIdx m do
      m[i,j] := li.toFloat
      m[i,j] += 1.0

I would love to have also this i.e. the binder in for loop has mutable reference. Accessing element of m in the loop through m[i,j] would force a copy of m before the loop and m[i,j] would always refer to this copy.
Now I think about it, it should just be a syntactic sugar for mapIdx.

    let mut m : Matrix 4 4 := λ (i,j) => j.1.toFloat
    for (a, (i,j), li) in m do
       a := li.toFloat
  1. Binder notation for products: I want to write cont (i,j) => f i j and ∑ (i,j), A[i,j]. This kind of notation only works for fun (i,j) => f i j.
  2. Proofs! I have omitted every proof I have encountered, except rfl or simp. Right now, I want to get the interface right only then I start proving something. Also, I'm not very experienced with proving in Lean.

Things to change:

  1. Rename Cont to Container because "cont" can be interpreted as "continuous"
  2. The definition of Cont should be changed to Cont {C : Sort} (c : C) (ι α : Type), this way we can consider stuff like l : List α as a container with iterators ofl as index type
    See my attempt

Henrik Böving (Dec 03 2021 at 16:51):

Another little thing regarding naming, it would be nice if your Enumtype would be called Fintype instead since this is what is used in Lean 3 as well and I don't think there is a reason to change it?

But your stuff sounds awesome! Sadly I'm not knowledgeable enough to help with any notation magic though.

Tomas Skrivan (Dec 03 2021 at 16:52):

Ohh there is also ContainerView

    def ContView {κ} (C : Type u) [ContData C] (tr : κ  (indexOf C)) := C

This should allow reindexing of C with index type κ but you are still able to mutate it. You can always turn C into κ ↦ α by cont j => C[tr j], but then you can't mutate it.

I didn't test it yet, but the use case in my head was setting up a matrix by initializing its sub-matrices.

Tomas Skrivan (Dec 03 2021 at 16:54):

Another little thing regarding naming, it would be nice if your Enumtype would be called Fintype instead since this is what is used in Lean 3 as well and I don't think there is a reason to change it?

If I'm not mistaken, there is a big difference. Fintype just have a proof of existence of such bijection. Enumtype is actually holding it.

If I'm mistaken, then I will of course rename it to Fintype

Henrik Böving (Dec 03 2021 at 16:56):

Ah that's true yes, fintype contains a finset along with a proof that all elements of the type in question are in it.

Tomas Skrivan (Dec 03 2021 at 17:04):

I would say that Enumtype should extend Fintype.

Mac (Dec 03 2021 at 17:11):

@Thomas Eckl Cont is also a problematic name because it is usually used in functional languages for the continuation monad (e.g., Haskell).

Mac (Dec 03 2021 at 17:13):

@Tomas Skrivan what do you see as the overlap/difference between your Iterable and ForIn?

Mac (Dec 03 2021 at 17:14):

Also, not that the Lean core already has a Range type.

Tomas Skrivan (Dec 03 2021 at 17:21):

Mac said:

Tomas Skrivan what do you see as the overlap/difference between your Iterable and ForIn?

You are right that you can probably get one from the other, but as API user, it is much easier to provide instance of Iterable then instance of ForIn. I was looking at definition of ForIn for quite some time and it took me couple of trials and errors to get it working.

Tomas Skrivan (Dec 03 2021 at 17:25):

Mac said:

Also, not that the Lean core already has a Range type.

Yes, I know, but Lean's Range is defined with start : Nat | step : Nat | stop : Nat but my Range is defined with two elements of the index set. You do not need to know the distance between them or their global indices(that are meaning full only for Enumtype).

Tomas Skrivan (Dec 03 2021 at 17:32):

Also, Iterable can be infinite, so you can iterate over lines of file etc. Over something you do not the size beforehand.
I provide default ForIn, in order to prove termination I just run for maximum ofUSize.size steps(effectively infinite loop). The user is encouraged to provide an instance of Iterable.UpperBound, if they want to prove that they traversed the whole type.

Kyle Miller (Dec 03 2021 at 18:01):

Tomas Skrivan said:

Fintype just have a proof of existence of such bijection. Enumtype is actually holding it.

Henrik Böving said:

fintype contains a finset along with a proof that all elements of the type in question are in it.

A fintype is like an enumtype that's forgotten the exact order of the elements of the type -- deep within there's a list with all the elements enumerated (but you're prevented from accessing it unless you can prove you're doing it in a way that order doesn't matter). enumtype is in mathlib is usually written directly as an equivalence α ≃ fin (fintype.card α).

You can computably enumerate the elements of a fintype by a fin equivalence, but the equivalence is wrapped up in trunc, which forces you to prove that your computation doesn't depend on the enumeration order. docs#fintype.trunc_equiv_fin

Kyle Miller (Dec 03 2021 at 18:02):

(One consequence to fintype not being just a proof of the existence of a bijection is that different fintype instances for the same type might not be definitionally equal. They're all provably equal, though.)

Kyle Miller (Dec 03 2021 at 18:04):

Oh, found the mathlib name for Enumtype: docs#fin_enum

Tomas Skrivan (Dec 03 2021 at 18:05):

Kyle Miller said:

Oh, found the mathlib name for Enumtype: docs#fin_enum (I like Enumtype more though)

So for the sake of consistency with mathlib, should I rename Enumtype to FinEnum? (I don't like the name though ...)

Kyle Miller (Dec 03 2021 at 18:07):

Yeah, probably. I actually retract liking Enumtype more because "enumerable type" can suggest countably infinite cardinality.

Tomas Skrivan (Dec 03 2021 at 18:08):

fair point

Kyle Miller (Dec 03 2021 at 18:08):

I'm not sure fin_enum is used very much, and its name could probably be changed.

Kyle Miller (Dec 03 2021 at 18:08):

Another option is EFintype for enumerable fintype

Tomas Skrivan (Dec 03 2021 at 18:09):

What about Index ?

Tomas Skrivan (Dec 03 2021 at 18:10):

That actually holds the meaning how I want to use it(might not be everyone)

Mario Carneiro (Dec 03 2021 at 18:10):

That sounds like a typeclass for the index operator

Mario Carneiro (Dec 03 2021 at 18:13):

It's unclear to me why you need FinEnum though. I would think a finite collection can more easily support an iterator like interface rather than something based on indexes

Mario Carneiro (Dec 03 2021 at 18:14):

i.e. something like rust's ExactSizeIterator

Mario Carneiro (Dec 03 2021 at 18:16):

Also, this architecture is fairly heavyweight. I'm dubious of putting it too deep in the dependency hierarchy

Mario Carneiro (Dec 03 2021 at 18:17):

(I don't think this is a problem for inclusion in mathlib4 since it can be a leaf file if it turns out not to be used)

Tomas Skrivan (Dec 03 2021 at 18:17):

When working with matrices/tensors, you really want indices. Even more, you want the strucured index (i,j,k), to convey meaning, and the linear index i + n * (j + m * k), to address memory. When writing high-performance code, you do not want to reconstruct one type of index from another at every iteration, but you want to build both indices iterativelly as you are traversing the tensor.

Kyle Miller (Dec 03 2021 at 18:19):

@Mario Carneiro It can also be used to go back and forth between an array indexed by a cartesian product and an array of arrays, for example, since it lets you use arbitrary enumerable types as indexes into the array.

Kyle Miller (Dec 03 2021 at 18:20):

But like Thomas is saying, you can also then operate directly on the underlying fin indices for performance.

Mario Carneiro (Dec 03 2021 at 18:21):

@Tomas Skrivan FinEnum only supports the linear index, and it has nothing to do with memory layout (since it's a typeclass), so I'm not sure your typeclass expresses your constraints

Tomas Skrivan (Dec 03 2021 at 18:23):

Look at my example using FinEnum/Enumtype to convey row/colum major ordering for matrix index type.

Tomas Skrivan (Dec 03 2021 at 18:24):

And the definition of row/colum major matrices.

Mario Carneiro (Dec 03 2021 at 18:26):

Also, talk of high performance code seems inappropriate here since this code is not optimized for speed by your own claim. Can lean actually see through these abstractions the way you want? I would be looking at compiler output to make sure I don't go astray, because lean's abstractions are not zero cost

Kyle Miller (Dec 03 2021 at 18:27):

With a Sufficiently Smart Compiler, Google's language Dex is able to optimize these things, and as I understand it its array abstraction works in a similar way.

Tomas Skrivan (Dec 03 2021 at 18:28):

My idea is to write the code in the correct form and then write a compiler to properly unfold these definitions and inline code as necessary.

Mario Carneiro (Dec 03 2021 at 18:29):

Maybe rather than an interface for containers, you mean an interface for matrices?

Mario Carneiro (Dec 03 2021 at 18:30):

that was not initially clear to me

Tomas Skrivan (Dec 03 2021 at 18:30):

The ×ₗ product is defined here.

Tomas Skrivan (Dec 03 2021 at 18:31):

Mario Carneiro said:

Maybe rather than an interface for containers, you mean an interface for matrices?

Yes matrices is my primary concern, but look at my attempt for List. But that requires me to change the definition of Cont to Cont {C : Sort} (c : C) (ι α : Type)

Mario Carneiro (Dec 03 2021 at 18:33):

What I mean is that by trying to broaden the applicability to arbitrary containers you will make a suboptimal solution for both matrices and containers

Mario Carneiro (Dec 03 2021 at 18:33):

matrices have a very particular structure and you should exploit that

Mario Carneiro (Dec 03 2021 at 18:34):

hashmaps and lists don't have that structure

Kyle Miller (Dec 03 2021 at 18:34):

Maybe instead of "containers" it would be clearer to say "finitely indexable objects".

Multidimensional arrays (which this is mainly addressing) also tend to be called "tensors" (and which mathlib calls holors, which I've never heard anywhere else).

Mario Carneiro (Dec 03 2021 at 18:35):

folks objected to "tensor" because that has a meaning in physics, to do with coordinate transformations

Tomas Skrivan (Dec 03 2021 at 18:37):

Yes, I also object to tensor as it clashes with tensors with physics and in general with tensor product.

Kyle Miller (Dec 03 2021 at 18:37):

I get that, I've just never seen "holor" before, and there's not much out there about them.

Tomas Skrivan (Dec 03 2021 at 18:39):

Mario Carneiro said:

What I mean is that by trying to broaden the applicability to arbitrary containers you will make a suboptimal solution for both matrices and containers

I will drop support for arbitrary containers once it means a sacrifice for matrices, but I have not encounter that yet.

Mario Carneiro (Dec 03 2021 at 18:46):

I'm pretty dubious about the use of a dependent type for list iterators. You should just keep the data for the list in the iterator. (For reverse iteration, see https://en.wikipedia.org/wiki/Zipper_(data_structure) .)

Jad Ghalayini (Dec 03 2021 at 20:54):

Mario Carneiro said:

folks objected to "tensor" because that has a meaning in physics, to do with coordinate transformations

It is standard CS terminology though, and most physics tensors are in bijection with multidimensional arrays over R

Tomas Skrivan (Dec 03 2021 at 21:40):

What about calling it Table?

Tomas Skrivan (Dec 04 2021 at 11:10):

Mario Carneiro said:

I'm pretty dubious about the use of a dependent type for list iterators. You should just keep the data for the list in the iterator. (For reverse iteration, see https://en.wikipedia.org/wiki/Zipper_(data_structure) .)

After some more thinking, I agree that it is a bad idea. I will stick with matrix like objects and rename Cont to Table. As I do not like the word tensor, I feel table is quite appropriate name but I'm open to other suggestions.


Last updated: Dec 20 2023 at 11:08 UTC