Zulip Chat Archive
Stream: general
Topic: fast data structures?
Kevin Lacker (Oct 27 2020 at 18:31):
(Re-asking here after crickets in new-members.) While trying to clean up this implementation of rewrite_search
I am trying to figure out which data structures are the fastest and had a couple questions. I would expect that it's the ones implemented in the VM. It looks like d_array
is supposed to be, for example. My first question is, does this work quickly as I'd expect? When I look at the code some of the methods don't seem like they would be efficient. For example from https://github.com/leanprover-community/lean/blob/0ec2a19/library/init/data/array/basic.lean#L11:
/-- `write a i v` sets the `i`th member of `a` to be `v`. Has builtin VM implementation. -/
def write (a : d_array n α) (i : fin n) (v : α i) : d_array n α :=
{data := λ j, if h : i = j then eq.rec_on h v else a.read j}
That implementation looks like it will be O(n) to evaluate a read after doing n writes. Is it actually being replaced with something else more efficient by the VM?
My second question is whether there are other efficient data structures. In particular a good map structure like a hashmap or a treemap would be nice. There are a bunch of different tree-like structures lying around and I was wondering if there's conventional wisdom on which are the best to use.
Johan Commelin (Oct 27 2020 at 18:32):
I think that indeed the definition of write
is replaced in the VM
Gabriel Ebner (Oct 27 2020 at 18:33):
Kevin Lacker (Oct 27 2020 at 18:35):
why is it even defined in lean at all if it's replaced by the vm? I guess that part is just used as formalization?
Kevin Lacker (Oct 27 2020 at 18:36):
in that directory i see rb_map
, that looks promising
Mario Carneiro (Oct 27 2020 at 18:47):
There is native.rb_map
for a good rb_map implementation
Mario Carneiro (Oct 27 2020 at 18:47):
The pure rb_map
in core is kind of broken
Kevin Lacker (Oct 27 2020 at 18:47):
yikes. how is it "kind of" broken?
Mario Carneiro (Oct 27 2020 at 18:48):
There is also docs#hash_map if you want a hash map
Gabriel Ebner (Oct 27 2020 at 18:48):
Isn't the hash map even slower?
Mario Carneiro (Oct 27 2020 at 18:48):
than what?
Mario Carneiro (Oct 27 2020 at 18:49):
The operations of rb_map
are fine, but the invariant is wrong so you can't construct the full API
Kevin Lacker (Oct 27 2020 at 18:49):
this hash_map uses a hash function written in lean iiuc, so it sort of can't be all that fast
Gabriel Ebner (Oct 27 2020 at 18:49):
Than native.rb_map
.
Mario Carneiro (Oct 27 2020 at 18:50):
Oh sure, anything written in C++ is going to be faster just because it's compiled
Mario Carneiro (Oct 27 2020 at 18:50):
I wasn't comparing those anyway
Mario Carneiro (Oct 27 2020 at 18:51):
This is sort of the reason my plan to formalize Okasaki's pure functional data structures book fell through
Kevin Lacker (Oct 27 2020 at 18:51):
lean has its own implementation of an array, huh. https://github.com/leanprover-community/lean/blob/016b855d61d4e8959a2ad689678b0d7dc80138bb/src/library/parray.h i wonder why not use STL data structures
Mario Carneiro (Oct 27 2020 at 18:52):
besides the fact that we still have no memoization primitive, the performance of the lean 3 VM is sufficiently bad that it's kind of pointless
Mario Carneiro (Oct 27 2020 at 18:52):
I wonder why not use STL data structures
Which one?
Mario Carneiro (Oct 27 2020 at 18:52):
parray
is a persistent array, std::array
would not be a replacement
Kevin Lacker (Oct 27 2020 at 18:53):
oh i see. can I get a regular old non-persistent array, or is that philosophically forbidden?
Mario Carneiro (Oct 27 2020 at 18:54):
I think it would be useful to have such a thing, but it is very difficult to use properly in lean
Mario Carneiro (Oct 27 2020 at 18:55):
specifically, you have to use the array linearly, and unfortunately lean sometimes holds on to extra references in places where it probably shouldn't and this kills your performance
Mario Carneiro (Oct 27 2020 at 18:55):
If you use a parray
completely linearly, I believe it acts basically like an array
Mario Carneiro (Oct 27 2020 at 18:56):
IIRC there is an option you can set to see when non-destructive updates are being performed
Kevin Lacker (Oct 27 2020 at 19:00):
ok. so when you say the native.rb_map
is good and the pure rb_map
in core is kind of broken, which one is the good one? I think I only see https://github.com/leanprover-community/lean/blob/0ec2a19/library/init/meta/rb_map.lean#L15 in the lean codebase
Gabriel Ebner (Oct 27 2020 at 19:34):
The other one is called rbmap
(without the underscore).
Kevin Lacker (Oct 27 2020 at 19:37):
so rb_map
good, rbmap
bad? which is which here
Mario Carneiro (Oct 27 2020 at 19:38):
the C++ one has a native
in the name
Kevin Lacker (Oct 27 2020 at 19:40):
ok cool. yet another question - what's the difference between d_array
and array
? They look the same to me. A comment at https://github.com/leanprover-community/lean/blob/0ec2a19/library/init/data/array/basic.lean#L138 says that array
is "a non-dependent array", but I have never heard the phrase "dependent array" before and google does not enlighten me
Mario Carneiro (Oct 27 2020 at 19:40):
Look at the types
Mario Carneiro (Oct 27 2020 at 19:41):
d_array
allows the type of each element of the array to be different
Mario Carneiro (Oct 27 2020 at 19:41):
You won't find such a feature in most programming languages because they lack dependent types
Kevin Lacker (Oct 27 2020 at 19:43):
i am familiar with this feature, from such obscure language as python, javascript, php, ruby... ;-)
Mario Carneiro (Oct 27 2020 at 19:43):
for example, d_array 100 (λ i, array i.1 ℕ)
is a two-dimensional triangular array
Mario Carneiro (Oct 27 2020 at 19:44):
python, javascript, etc cannot express this
Mario Carneiro (Oct 27 2020 at 19:44):
they can say "this is an array of things" but they can't have a strongly typed version
Kevin Lacker (Oct 27 2020 at 19:45):
OK i get it - so probably array
is saner for most applications
Mario Carneiro (Oct 27 2020 at 19:45):
yes
Mario Carneiro (Oct 27 2020 at 19:46):
it's useful for proving properties though, for example if you want the elements of the array to satisfy some invariant that depends on the index
Kevin Lacker (Oct 27 2020 at 20:25):
yet another array question. the array's length is part of its type signature. is there a way to have something akin to a java "arraylist" where it can be of any length, but its implementation is an array rather than a list, so that random access is efficient? i just want to do something like, pass arrays around as arguments without describing what the length is all the time
Mario Carneiro (Oct 27 2020 at 20:26):
See the buffer
type
Johan Commelin (Oct 27 2020 at 20:43):
Why is there a has_lt
instance in
protected meta def mk (key : Type) [has_lt key] [decidable_rel ((<) : key → key → Prop)] (data : Type) : rb_lmap key data :=
rb_map.mk key (list data)
Because rb_map.mk
doesn't need it...
Gabriel Ebner (Oct 28 2020 at 08:20):
There's a has_lt key
in docs#native.rb_map.mk
Johan Commelin (Oct 28 2020 at 08:48):
Ooh, you are right
Last updated: Dec 20 2023 at 11:08 UTC