Zulip Chat Archive
Stream: general
Topic: Folding over `Array` vs. over `List`
nrs (Nov 13 2024 at 19:11):
I've recently switched to arrays for a performance-critical use case and I'm trying to see to what extent I can preserve the intuitions I've built up on List
. In general, when I'm working with a list I can usually achieve whatever transformation I want by either directly pattern matching on .cons
or by first matching on a pseudo coalgebra match pseudoCoalg mylist with | ...
then using the constructors of the resulting datatype to recurse on the list. Since we're at all using Array
instead of List
, I'm wondering what methods of recursing over Array
preserves the time complexity advantages that Array
has over List
? If we hack a pattern-match-like method using get
to simulate the way we recurse on List
, will we lose the benefits of using Array
at all? Is there a recommended way of recursing over Array
s, or are we relatively unconstrained?
Kyle Miller (Nov 13 2024 at 20:10):
I think you'll have to ask a more specific/concrete question to get a useful response, since this is a very open-ended topic.
nrs (Nov 13 2024 at 20:17):
Kyle Miller said:
I think you'll have to ask a more specific/concrete question to get a useful response
hm I was afraid it would come off as slightly rambling-ish. as an example, https://hackage.haskell.org/package/vector-0.13.2.0/docs/Data-Vector.html has annotations for the time complexity of most functions so you can have a more-or-less general ideas of what sort of operations will be costly. since Lean is in its infancy still we obviously can't expect such detail but I was wondering whether people had general guidelines they sticked to for consuming/transforming Lean arrays or recursing over them, as opposed to the straightforward ways of recursing over List
. Any general thoughts on the subjects are appreciated!
as a side node, there are (scarce) comments about time complexity here and there in the docs if you grep for O(
Kyle Miller (Nov 13 2024 at 20:26):
I still think you need to give specific examples of recursion strategies to get a useful response there, but the basic operations for Array are well-understood. You have to ensure that you only have one reference to the Array, and then pushing and popping from the end is O(1). An Array is represented as a contiguous block of memory, so for other operations you need to know what interacting with such data structures are like. For example, deleting the first element is O(n) since it needs to shift all the elements over.
Kyle Miller (Nov 13 2024 at 20:30):
There's a very small amount of information at https://lean-lang.org/lean4/doc/array.html, which mentions that it's a contiguous block of memory ("it is similar to C++ std::vector<T>
"). I'm not sure if this manual says anything about "functional but in-place" (FBIP), the compiler optimization that makes it possible to work with Array like a functional data structure even though it's being mutated at runtime.
Asei Inoue (Nov 13 2024 at 20:31):
@Kyle Miller
Is the FBIP not relevant to the List? Is the List immutable when there is only one reference to it?
Kyle Miller (Nov 13 2024 at 20:31):
(Matching on pseudo coalgebras is a foreign concept here, which is one reason I'm suggesting that you should give specific applications rather than ask for a general overview.)
Kyle Miller (Nov 13 2024 at 20:32):
@Asei Inoue It's certainly relevant, but that's only for reusing individual cons cells, which is just an O(1) optimization. The Array.push optimization saves O(n) time.
Kyle Miller (Nov 13 2024 at 20:35):
In fact, if I'm interpreting the IR correctly for List.reverse
, if every cell of the list has just one reference, it will not allocate any new memory — it reuses the old list.
nrs (Nov 13 2024 at 20:36):
edit: fixing an implementation detail
Kyle Miller (Nov 13 2024 at 20:38):
@Asei Inoue
def reverseAux._rarg (x_1 : obj) (x_2 : obj) : obj :=
case x_1 : obj of
List.nil →
ret x_2 -- returns accumulated reversed list
List.cons →
let x_3 : u8 := isShared x_1;
case x_3 : u8 of
Bool.false →
let x_4 : obj := proj[1] x_1; -- extract the tail of the list
set x_1[1] := x_2; -- reuses x_1, setting its tail to the accumulated reversed list
let x_5 : obj := reverseAux._rarg x_4 x_1; -- tail call, continues with tail of list and x_1 as accumulated reversed list
ret x_5
Bool.true →
/- ... shared case, constructs a new cell ... -/
There are no reference count increases or decreases in this code either.
Asei Inoue (Nov 13 2024 at 20:39):
@Kyle Miller
is this Lean code? what is ret
?
Kyle Miller (Nov 13 2024 at 20:40):
This is IR ("intermediate representation"). It's what the compiler transforms the Lean code into before emitting C.
Kyle Miller (Nov 13 2024 at 20:40):
There are many stages of it, but this is the last one.
Kyle Miller (Nov 13 2024 at 20:40):
set_option trace.compiler.ir true
enables seeing it.
Asei Inoue (Nov 13 2024 at 20:40):
how can I see IR? Thank you.
nrs (Nov 13 2024 at 21:22):
Kyle Miller said:
(Matching on pseudo coalgebras is a foreign concept here, which is one reason I'm suggesting that you should give specific applications rather than ask for a general overview.)
here's an example of what I mean by pseudo coalgebra, it's not an official term or anything, it's just meant to indicate that it's very loosely inspired by the haskell anamorphism/unfold recursion scheme but eager evaluation and the fact that induction and coinduction don't coincide in Lean types makes it such that we can't use the terminology literally. what is captured by using lists in this manner is just the fact of being able to pattern match at all, which (unless i just don't fully understand arrays yet) doesn't happen with arrays, which are more meant to be thought of as datatypes one uses a for
loop to consume instead of a pattern match.
inductive Tree' (α : Type) (F : Type -> Type)
| leaf : F α -> Tree' α F
| node : α -> Tree' α F -> Tree' α F -> Tree' α F
def printer : Tree' Nat List -> String
| .leaf _ => ""
| .node n ta tb => " ( " ++ printer ta ++ n.repr ++ printer tb ++ " ) "
instance : ToString (Tree' Nat List) where
toString := printer
def partition : List Nat -> Tree' Nat List
| x :: xs => .node x (.leaf $ keepWhile (fun n => n < x) xs) (.leaf $ keepWhile (fun n => n ≥ x) xs)
| [] => .leaf []
instance : Inhabited $ Tree' Nat List where
default := .leaf []
partial def elim (l : List Nat) (pcoalg : List Nat -> Tree' Nat List) : Tree' Nat List :=
match pcoalg l with
| .node x (.leaf la) (.leaf lb) => .node x (elim la pcoalg) (elim lb pcoalg)
| _ => .leaf []
#eval elim [5, 7, 8, 9, 2] partition
A fold following this sort-of-anamorphism/unfold will flatten the list into a sorted list (the way the printer is implemented as a fold, when combined with the unfold, kind of reminds of the hylomorphism recursion scheme)
Kyle Miller (Nov 13 2024 at 21:32):
I think if you're wanting to write code like this you should use List, since there's not much point using Array if the first step is to explode it into a Tree, or to allocate new Arrays in partition
.
With Array, you could write partition
to partition the array in-place with a for loop (or with recursion if you prefer, that's common in Lean) like you would in an imperative algorithm. The imperative way would also avoid doing 2n comparisons instead of just n.
Kyle Miller (Nov 13 2024 at 21:33):
But yes, you don't do pattern matches on Array, it shouldn't be thought of as being an inductive datatype for programming purposes. (For proving purposes however, it's defined to be a wrapper around List.)
nrs (Nov 13 2024 at 21:37):
I see thanks a lot for the comments!! it helps to have an idea that thinking in these terms is counterproductive and precisely the whole point is to achieve performance improvements over List
Kyle Miller (Nov 13 2024 at 21:51):
List does have its place to be sure. For example, if you want to have a reference to both xs
and ys := x :: xs
at the same time, there is no problem, but if you want a reference to both as
and bs := Array.push as a
, this will copy the whole array.
Eric Wieser (Nov 13 2024 at 22:02):
Though you can use docs#Subarray for that use-case, right?
Kyle Miller (Nov 13 2024 at 22:03):
Not for pushing, but yeah, you can use it for having multiple views into the same Array
Last updated: May 02 2025 at 03:31 UTC