Zulip Chat Archive

Stream: lean4

Topic: When are Arrays faster than Lists (and vice versa)?


Thomas Murrills (Mar 19 2023 at 21:36):

In particular there are these pieces of insight from the docs which I'd appreciate help in understanding (which might be obvious if I actually had a CS background):

Arrays perform best when unshared; as long as they are used "linearly" all updates will be performed destructively on the array, so it has comparable performance to mutable arrays in imperative programming languages.

List α works well as a persistent data structure, when many copies of the tail are shared. When the value is not shared, Array α will have better performance because it can do destructive updates.

What does it mean to be "shared"? I mean, intuitively, I can guess that it means "used by multiple processes" or "appearing in multiple expressions" somehow, but more importantly, how do you actually tell if you're sharing something or not by looking at the code? What does sharing mean precisely in this context?

Likewise, what's ""linearly"" alluding to here?

Alex Keizer (Mar 19 2023 at 21:48):

The basic distinction is whether you care about old values after mutation. Using something linearly roughly means using it only once. For example,

let xs := [3,4,5]
let ys := 2 :: xs
let zs := 1 :: xs

Here, xs is a shared tail of ys and zs, so it is not used linearly.

let xs1 := #[5]
let xs2 := xs1.push 4
let xs3 := xs2.push 3
let xs4 := xs3.push 2

Here, we use values xs1, xs2, xs3 only once. So, even though these are different variables, the compiler is free to destroy the old values (we never look at them again) and do in-place mutation.

Thomas Murrills (Mar 19 2023 at 21:57):

Hmm, okay, thanks! I think the root of my confusion here might've sprung from not knowing how the compiler/flow of computation works at a deeper level...for example, if we have just

let xs := [3,4,5]
let ys := 2 :: xs

(no zs) we do still care about xs—it's just inside of ys. But I gather a "future use" of ys here doesn't count as a "future use" of {xs as a variable}...or rather, all future uses of xs in this way factor through uses of ys, so the compiler can forget about xs. (Something about reference counting?)

Wojciech Nawrocki (Mar 20 2023 at 02:10):

@Thomas Murrills that's right, as long as you never use xs again (in code! using it again to specify Properties is okay), it counts as linear use (strictly speaking 'linear' means exactly once, to ensure that values can be overwritten in-place we just need 'affine' which means at most once; but if we never use the value then it may as well not exist, so saying 'linear' also makes sense).

Jannis Limperg (Mar 20 2023 at 09:47):

Thomas Murrills said:

(Something about reference counting?)

Indeed this optimisation is driven by refcounting. When xs.push x is executed and the refcount of xs is 1 (i.e. we're the only ones using it), then xs is destructively updated. If the refcount is >1, a new copy of xs is created, which for arrays involves copying every element. If you do this in a loop, it's bad.

But in situations where arrays are used linearly, they're much better. Perhaps even in situations where they are used non-linearly but the involved arrays are small.

James Gallicchio (Mar 20 2023 at 19:00):

Array should also only be used when updates are inplace (like map/set) or at the end (push). Luckily that is the common case.

For non-inplace updates in the middle of the sequence there are much better data structures, but we're currently missing them in Std :)

James Gallicchio (Mar 20 2023 at 19:02):

(also, fwiw, both Array and List are "destructively updated" when used linearly -- the performance difference is mostly due to Array having good cache locality, e.g. all the data is stored contiguously in memory rather than scattered across many locations with pointers between them)

Thomas Murrills (Mar 21 2023 at 22:41):

I see! This is all very useful insight, thanks so much everyone! :)


Last updated: Dec 20 2023 at 11:08 UTC