Zulip Chat Archive

Stream: lean4

Topic: DArray as primitives for compact data structures


Joachim Breitner (Sep 24 2023 at 13:25):

Lean currently provides Array (α : Type) backed by an efficient implementation. I’d like to make the case that it would be useful to (also) have DArray (α : ∀ (i:Nat), Type), with the same representation.

Conceptually, and in terms of proving things, it could look like

structure DArray : Type u where mk ::
  size : Nat
  vals : (i : Fin size)  α i

DArray.empty : DArray α
DArray.get (a : DArray α) (i : Fin a.size) : α i
DArray.push (a : DArray α) (v : α a.size) : DArray α
DArray.set. (a : DArray α) (i : Fin a.size) (v : α i.val) : DArray α

and all the other array operations you’d expect, just with the more general type – and annotated with the usual @[extern] annotations so that this is not the runtime representation.

The benefit of such a data structure is that it unlocks libraries to implement almost arbitrary complex compact type-safe data structures.

Consider places where you’d currently use Array (α × β). This is rather expensive: An array with n entries costs $3 + 4 n$ words, and an access requires two indirections. You could improve things a bit by using Array α × Array β, so now you have $6 + 2n$, but still two indirections.

With DArray you can write a data structure that's isomorphic to Array (α × β) like this:

def Nat.even : Nat  Bool
def Alternating (i : Nat) := if i.even then α else β
structure ArrayPair where
  arr : DArray (Alternating α β)
  heven : arr.size.even

and now the cost is $3 + 2n$ words, and accessing is only one indirection.

The implementation is a bit nasty, with all those array calculations, but at least it’s doable.

Of course, this is a rather simple example, I wonder what other kind of data structures might be possible with this… Maybe a library doing more complex Array-of-Tuples transformations.

It’s _almost_ possible to define this in a pure library, simply using the existing lean_array_* functions in @[extern …]; only the constructor/projections themselves would need a bit of extra code. Preliminary experiments at <https://github.com/nomeata/lean-darray>. But given how this is tied to the FFI, it should probably be provided by the code, if at all?

Does this sound useful? Should I turn this into a Lean RFC?

Joachim Breitner (Sep 24 2023 at 13:40):

(Another use case I can think of and that we talked about here some time ago is tabulating/memoizing dependent functions with enumerable domain, like Nat.fib.)

Kyle Miller (Sep 24 2023 at 18:35):

There have been some other discussions about dependent arrays before. One point that should be mentioned is that the reason this can't "just" be Array (Sigma α) is that the runtime representation of this would be an array of pairs. The trick is that we can encode the first nat in each pair using the position of the element itself.

James Gallicchio (Sep 26 2023 at 22:41):

I think if we're doing any core changes, I'd want the most primitive array you can get in C (which is really just a fixed-length block of memory, that you can address in somewhat arbitrary ways, that can have uninitialized data, etc). But I think last time it was discussed, core doesn't want to change any of the basic FFI types at the moment...

Joachim Breitner (Sep 27 2023 at 08:24):

That’s reasonable. And I guess one can implement this as a separate library with judicious use of coerce and implemented_by.


Last updated: Dec 20 2023 at 11:08 UTC