## Stream: general

### Topic: fast data structures?

#### Kevin Lacker (Oct 27 2020 at 18:31):

(Re-asking here after crickets in new-members.) While trying to clean up this implementation of rewrite_search I am trying to figure out which data structures are the fastest and had a couple questions. I would expect that it's the ones implemented in the VM. It looks like d_array is supposed to be, for example. My first question is, does this work quickly as I'd expect? When I look at the code some of the methods don't seem like they would be efficient. For example from https://github.com/leanprover-community/lean/blob/0ec2a19/library/init/data/array/basic.lean#L11:

/-- write a i v sets the ith member of a to be v. Has builtin VM implementation. -/
def write (a : d_array n α) (i : fin n) (v : α i) : d_array n α :=
{data := λ j, if h : i = j then eq.rec_on h v else a.read j}


That implementation looks like it will be O(n) to evaluate a read after doing n writes. Is it actually being replaced with something else more efficient by the VM?

My second question is whether there are other efficient data structures. In particular a good map structure like a hashmap or a treemap would be nice. There are a bunch of different tree-like structures lying around and I was wondering if there's conventional wisdom on which are the best to use.

#### Johan Commelin (Oct 27 2020 at 18:32):

I think that indeed the definition of write is replaced in the VM

#### Gabriel Ebner (Oct 27 2020 at 18:33):

https://github.com/leanprover-community/lean/blob/016b855d61d4e8959a2ad689678b0d7dc80138bb/src/library/vm/vm_array.cpp#L189

#### Kevin Lacker (Oct 27 2020 at 18:35):

why is it even defined in lean at all if it's replaced by the vm? I guess that part is just used as formalization?

#### Kevin Lacker (Oct 27 2020 at 18:36):

in that directory i see rb_map, that looks promising

#### Mario Carneiro (Oct 27 2020 at 18:47):

There is native.rb_map for a good rb_map implementation

#### Mario Carneiro (Oct 27 2020 at 18:47):

The pure rb_map in core is kind of broken

#### Kevin Lacker (Oct 27 2020 at 18:47):

yikes. how is it "kind of" broken?

#### Mario Carneiro (Oct 27 2020 at 18:48):

There is also docs#hash_map if you want a hash map

#### Gabriel Ebner (Oct 27 2020 at 18:48):

Isn't the hash map even slower?

than what?

#### Mario Carneiro (Oct 27 2020 at 18:49):

The operations of rb_map are fine, but the invariant is wrong so you can't construct the full API

#### Kevin Lacker (Oct 27 2020 at 18:49):

this hash_map uses a hash function written in lean iiuc, so it sort of can't be all that fast

#### Gabriel Ebner (Oct 27 2020 at 18:49):

Than native.rb_map.

#### Mario Carneiro (Oct 27 2020 at 18:50):

Oh sure, anything written in C++ is going to be faster just because it's compiled

#### Mario Carneiro (Oct 27 2020 at 18:50):

I wasn't comparing those anyway

#### Mario Carneiro (Oct 27 2020 at 18:51):

This is sort of the reason my plan to formalize Okasaki's pure functional data structures book fell through

#### Kevin Lacker (Oct 27 2020 at 18:51):

lean has its own implementation of an array, huh. https://github.com/leanprover-community/lean/blob/016b855d61d4e8959a2ad689678b0d7dc80138bb/src/library/parray.h i wonder why not use STL data structures

#### Mario Carneiro (Oct 27 2020 at 18:52):

besides the fact that we still have no memoization primitive, the performance of the lean 3 VM is sufficiently bad that it's kind of pointless

#### Mario Carneiro (Oct 27 2020 at 18:52):

I wonder why not use STL data structures

Which one?

#### Mario Carneiro (Oct 27 2020 at 18:52):

parray is a persistent array, std::array would not be a replacement

#### Kevin Lacker (Oct 27 2020 at 18:53):

oh i see. can I get a regular old non-persistent array, or is that philosophically forbidden?

#### Mario Carneiro (Oct 27 2020 at 18:54):

I think it would be useful to have such a thing, but it is very difficult to use properly in lean

#### Mario Carneiro (Oct 27 2020 at 18:55):

specifically, you have to use the array linearly, and unfortunately lean sometimes holds on to extra references in places where it probably shouldn't and this kills your performance

#### Mario Carneiro (Oct 27 2020 at 18:55):

If you use a parray completely linearly, I believe it acts basically like an array

#### Mario Carneiro (Oct 27 2020 at 18:56):

IIRC there is an option you can set to see when non-destructive updates are being performed

#### Kevin Lacker (Oct 27 2020 at 19:00):

ok. so when you say the native.rb_map is good and the pure rb_map in core is kind of broken, which one is the good one? I think I only see https://github.com/leanprover-community/lean/blob/0ec2a19/library/init/meta/rb_map.lean#L15 in the lean codebase

#### Gabriel Ebner (Oct 27 2020 at 19:34):

The other one is called rbmap (without the underscore).

#### Kevin Lacker (Oct 27 2020 at 19:37):

so rb_map good, rbmap bad? which is which here

#### Mario Carneiro (Oct 27 2020 at 19:38):

the C++ one has a native in the name

#### Kevin Lacker (Oct 27 2020 at 19:40):

ok cool. yet another question - what's the difference between d_array and array? They look the same to me. A comment at https://github.com/leanprover-community/lean/blob/0ec2a19/library/init/data/array/basic.lean#L138 says that array is "a non-dependent array", but I have never heard the phrase "dependent array" before and google does not enlighten me

#### Mario Carneiro (Oct 27 2020 at 19:40):

Look at the types

#### Mario Carneiro (Oct 27 2020 at 19:41):

d_array allows the type of each element of the array to be different

#### Mario Carneiro (Oct 27 2020 at 19:41):

You won't find such a feature in most programming languages because they lack dependent types

#### Kevin Lacker (Oct 27 2020 at 19:43):

i am familiar with this feature, from such obscure language as python, javascript, php, ruby... ;-)

#### Mario Carneiro (Oct 27 2020 at 19:43):

for example, d_array 100 (λ i, array i.1 ℕ) is a two-dimensional triangular array

#### Mario Carneiro (Oct 27 2020 at 19:44):

python, javascript, etc cannot express this

#### Mario Carneiro (Oct 27 2020 at 19:44):

they can say "this is an array of things" but they can't have a strongly typed version

#### Kevin Lacker (Oct 27 2020 at 19:45):

OK i get it - so probably array is saner for most applications

yes

#### Mario Carneiro (Oct 27 2020 at 19:46):

it's useful for proving properties though, for example if you want the elements of the array to satisfy some invariant that depends on the index

#### Kevin Lacker (Oct 27 2020 at 20:25):

yet another array question. the array's length is part of its type signature. is there a way to have something akin to a java "arraylist" where it can be of any length, but its implementation is an array rather than a list, so that random access is efficient? i just want to do something like, pass arrays around as arguments without describing what the length is all the time

#### Mario Carneiro (Oct 27 2020 at 20:26):

See the buffer type

#### Johan Commelin (Oct 27 2020 at 20:43):

Why is there a has_lt instance in

protected meta def mk (key : Type) [has_lt key] [decidable_rel ((<) : key → key → Prop)] (data : Type) : rb_lmap key data :=
rb_map.mk key (list data)


Because rb_map.mk doesn't need it...

#### Gabriel Ebner (Oct 28 2020 at 08:20):

There's a has_lt key in docs#native.rb_map.mk

#### Johan Commelin (Oct 28 2020 at 08:48):

Ooh, you are right

Last updated: May 08 2021 at 19:11 UTC