Zulip Chat Archive

Stream: general

Topic: fast data structures?


view this post on Zulip Kevin Lacker (Oct 27 2020 at 18:31):

(Re-asking here after crickets in new-members.) While trying to clean up this implementation of rewrite_search I am trying to figure out which data structures are the fastest and had a couple questions. I would expect that it's the ones implemented in the VM. It looks like d_array is supposed to be, for example. My first question is, does this work quickly as I'd expect? When I look at the code some of the methods don't seem like they would be efficient. For example from https://github.com/leanprover-community/lean/blob/0ec2a19/library/init/data/array/basic.lean#L11:

/-- `write a i v` sets the `i`th member of `a` to be `v`. Has builtin VM implementation. -/
def write (a : d_array n α) (i : fin n) (v : α i) : d_array n α :=
{data := λ j, if h : i = j then eq.rec_on h v else a.read j}

That implementation looks like it will be O(n) to evaluate a read after doing n writes. Is it actually being replaced with something else more efficient by the VM?

My second question is whether there are other efficient data structures. In particular a good map structure like a hashmap or a treemap would be nice. There are a bunch of different tree-like structures lying around and I was wondering if there's conventional wisdom on which are the best to use.

view this post on Zulip Johan Commelin (Oct 27 2020 at 18:32):

I think that indeed the definition of write is replaced in the VM

view this post on Zulip Gabriel Ebner (Oct 27 2020 at 18:33):

https://github.com/leanprover-community/lean/blob/016b855d61d4e8959a2ad689678b0d7dc80138bb/src/library/vm/vm_array.cpp#L189

view this post on Zulip Kevin Lacker (Oct 27 2020 at 18:35):

why is it even defined in lean at all if it's replaced by the vm? I guess that part is just used as formalization?

view this post on Zulip Kevin Lacker (Oct 27 2020 at 18:36):

in that directory i see rb_map, that looks promising

view this post on Zulip Mario Carneiro (Oct 27 2020 at 18:47):

There is native.rb_map for a good rb_map implementation

view this post on Zulip Mario Carneiro (Oct 27 2020 at 18:47):

The pure rb_map in core is kind of broken

view this post on Zulip Kevin Lacker (Oct 27 2020 at 18:47):

yikes. how is it "kind of" broken?

view this post on Zulip Mario Carneiro (Oct 27 2020 at 18:48):

There is also docs#hash_map if you want a hash map

view this post on Zulip Gabriel Ebner (Oct 27 2020 at 18:48):

Isn't the hash map even slower?

view this post on Zulip Mario Carneiro (Oct 27 2020 at 18:48):

than what?

view this post on Zulip Mario Carneiro (Oct 27 2020 at 18:49):

The operations of rb_map are fine, but the invariant is wrong so you can't construct the full API

view this post on Zulip Kevin Lacker (Oct 27 2020 at 18:49):

this hash_map uses a hash function written in lean iiuc, so it sort of can't be all that fast

view this post on Zulip Gabriel Ebner (Oct 27 2020 at 18:49):

Than native.rb_map.

view this post on Zulip Mario Carneiro (Oct 27 2020 at 18:50):

Oh sure, anything written in C++ is going to be faster just because it's compiled

view this post on Zulip Mario Carneiro (Oct 27 2020 at 18:50):

I wasn't comparing those anyway

view this post on Zulip Mario Carneiro (Oct 27 2020 at 18:51):

This is sort of the reason my plan to formalize Okasaki's pure functional data structures book fell through

view this post on Zulip Kevin Lacker (Oct 27 2020 at 18:51):

lean has its own implementation of an array, huh. https://github.com/leanprover-community/lean/blob/016b855d61d4e8959a2ad689678b0d7dc80138bb/src/library/parray.h i wonder why not use STL data structures

view this post on Zulip Mario Carneiro (Oct 27 2020 at 18:52):

besides the fact that we still have no memoization primitive, the performance of the lean 3 VM is sufficiently bad that it's kind of pointless

view this post on Zulip Mario Carneiro (Oct 27 2020 at 18:52):

I wonder why not use STL data structures

Which one?

view this post on Zulip Mario Carneiro (Oct 27 2020 at 18:52):

parray is a persistent array, std::array would not be a replacement

view this post on Zulip Kevin Lacker (Oct 27 2020 at 18:53):

oh i see. can I get a regular old non-persistent array, or is that philosophically forbidden?

view this post on Zulip Mario Carneiro (Oct 27 2020 at 18:54):

I think it would be useful to have such a thing, but it is very difficult to use properly in lean

view this post on Zulip Mario Carneiro (Oct 27 2020 at 18:55):

specifically, you have to use the array linearly, and unfortunately lean sometimes holds on to extra references in places where it probably shouldn't and this kills your performance

view this post on Zulip Mario Carneiro (Oct 27 2020 at 18:55):

If you use a parray completely linearly, I believe it acts basically like an array

view this post on Zulip Mario Carneiro (Oct 27 2020 at 18:56):

IIRC there is an option you can set to see when non-destructive updates are being performed

view this post on Zulip Kevin Lacker (Oct 27 2020 at 19:00):

ok. so when you say the native.rb_map is good and the pure rb_map in core is kind of broken, which one is the good one? I think I only see https://github.com/leanprover-community/lean/blob/0ec2a19/library/init/meta/rb_map.lean#L15 in the lean codebase

view this post on Zulip Gabriel Ebner (Oct 27 2020 at 19:34):

The other one is called rbmap (without the underscore).

view this post on Zulip Kevin Lacker (Oct 27 2020 at 19:37):

so rb_map good, rbmap bad? which is which here

view this post on Zulip Mario Carneiro (Oct 27 2020 at 19:38):

the C++ one has a native in the name

view this post on Zulip Kevin Lacker (Oct 27 2020 at 19:40):

ok cool. yet another question - what's the difference between d_array and array? They look the same to me. A comment at https://github.com/leanprover-community/lean/blob/0ec2a19/library/init/data/array/basic.lean#L138 says that array is "a non-dependent array", but I have never heard the phrase "dependent array" before and google does not enlighten me

view this post on Zulip Mario Carneiro (Oct 27 2020 at 19:40):

Look at the types

view this post on Zulip Mario Carneiro (Oct 27 2020 at 19:41):

d_array allows the type of each element of the array to be different

view this post on Zulip Mario Carneiro (Oct 27 2020 at 19:41):

You won't find such a feature in most programming languages because they lack dependent types

view this post on Zulip Kevin Lacker (Oct 27 2020 at 19:43):

i am familiar with this feature, from such obscure language as python, javascript, php, ruby... ;-)

view this post on Zulip Mario Carneiro (Oct 27 2020 at 19:43):

for example, d_array 100 (λ i, array i.1 ℕ) is a two-dimensional triangular array

view this post on Zulip Mario Carneiro (Oct 27 2020 at 19:44):

python, javascript, etc cannot express this

view this post on Zulip Mario Carneiro (Oct 27 2020 at 19:44):

they can say "this is an array of things" but they can't have a strongly typed version

view this post on Zulip Kevin Lacker (Oct 27 2020 at 19:45):

OK i get it - so probably array is saner for most applications

view this post on Zulip Mario Carneiro (Oct 27 2020 at 19:45):

yes

view this post on Zulip Mario Carneiro (Oct 27 2020 at 19:46):

it's useful for proving properties though, for example if you want the elements of the array to satisfy some invariant that depends on the index

view this post on Zulip Kevin Lacker (Oct 27 2020 at 20:25):

yet another array question. the array's length is part of its type signature. is there a way to have something akin to a java "arraylist" where it can be of any length, but its implementation is an array rather than a list, so that random access is efficient? i just want to do something like, pass arrays around as arguments without describing what the length is all the time

view this post on Zulip Mario Carneiro (Oct 27 2020 at 20:26):

See the buffer type

view this post on Zulip Johan Commelin (Oct 27 2020 at 20:43):

Why is there a has_lt instance in

protected meta def mk (key : Type) [has_lt key] [decidable_rel ((<) : key  key  Prop)] (data : Type) : rb_lmap key data :=
rb_map.mk key (list data)

Because rb_map.mk doesn't need it...

view this post on Zulip Gabriel Ebner (Oct 28 2020 at 08:20):

There's a has_lt key in docs#native.rb_map.mk

view this post on Zulip Johan Commelin (Oct 28 2020 at 08:48):

Ooh, you are right


Last updated: May 08 2021 at 19:11 UTC