Zulip Chat Archive
Stream: lean4
Topic: Array foldl timeout
Ioannis Konstantoulas (Jul 23 2023 at 18:34):
Continuing tests with Lean data structures, I came across the following timeout with an array of 1 Billion natural 1
s:
def constArray := Array.mkArray (1000*1000*1000) 1
def arraySum (a : Array Nat) : Nat :=
Array.foldl (. + .) 0 a
def target₁ := arraySum constArray
def main : IO Unit :=
IO.println s!"Hello, {target₁}!"
Running with latest nightly, I get:
lake build && time ( ./build/bin/performance )
[1/4] Building Main
[1/4] Compiling Performance
[4/4] Linking performance
[2] 193754 killed ( ./build/bin/performance; )
( ./build/bin/performance; ) 4.88s user 92.96s system 92% cpu 1:45.32 total
I presume this kill is some kind of timeout from the runtime, because I did not see a segfault (can someone confirm?). If instead of a foldl
I manually iterate over array entries and sum them, I get the same timeout:
def manualSum (a : Array Nat) : Nat := Id.run do
let mut sum := 0
for i in a do
sum := sum + i
return sum
def target₁ := manualSum constArray
lake build && time ( ./build/bin/performance )
[2] 194957 killed ( ./build/bin/performance; )
( ./build/bin/performance; ) 4.70s user 77.73s system 93% cpu 1:28.15 total
The curious thing is, if I reduce the array size from 1 Billion to 100 Million, the above code executes successfully in approximately 1.2 seconds on my machine. So something is happening when accessing the high indices. Is Array element access not constant time?
Mario Carneiro (Jul 23 2023 at 18:43):
this most likely means you are running out of memory
Ioannis Konstantoulas (Jul 23 2023 at 18:44):
But I have 16 Gigabytes and I am not running anything, my PC is quite capable.
Mario Carneiro (Jul 23 2023 at 18:45):
the array there is about 8 gigabytes, two copies of it are enough to run out of RAM
Ioannis Konstantoulas (Jul 23 2023 at 18:45):
I see, is there no swapping happening automatically?
Mario Carneiro (Jul 23 2023 at 18:45):
that depends on the OS settings
Ioannis Konstantoulas (Jul 23 2023 at 18:46):
Also, why are there two copies? Is the fold not happening in place?
Mario Carneiro (Jul 23 2023 at 18:46):
It is likely to be preinitialized and then copied at "runtime"
Mario Carneiro (Jul 23 2023 at 18:46):
you can avoid this behavior using set_option compiler.extract_closed false
Ioannis Konstantoulas (Jul 23 2023 at 18:47):
Thank you!
Notification Bot (Jul 23 2023 at 18:47):
Ioannis Konstantoulas has marked this topic as resolved.
Notification Bot (Jul 23 2023 at 18:51):
Ioannis Konstantoulas has marked this topic as unresolved.
Ioannis Konstantoulas (Jul 23 2023 at 18:51):
I added this option to the main file, but it is still getting killed. Weird.
Mario Carneiro (Jul 23 2023 at 18:52):
what is the full program?
Mario Carneiro (Jul 23 2023 at 18:52):
you probably need it on target1
Mario Carneiro (Jul 23 2023 at 18:53):
and constArray
Ioannis Konstantoulas (Jul 23 2023 at 19:01):
Sorry for the delay, my computer crashed.
The full program is:
Performance.lean:
set_option compiler.extract_closed false
def constArray := Array.mkArray (1000*1000*1000) 1
--def arraySum (a : Array Nat) : Nat :=
-- Array.foldl (. + .) 0 a
set_option compiler.extract_closed false
def manualSum (a : Array Nat) : Nat := Id.run do
let mut sum := 0
for i in a do
sum := sum + i
return sum
set_option compiler.extract_closed false
def target₁ := manualSum constArray
Main.lean:
import «Performance»
set_option compiler.extract_closed false
def main : IO Unit :=
IO.println s!"Hello, {target₁}!"
I am not really sure how these options work or where I can read about them.
Mac Malone (Jul 23 2023 at 20:59):
@Ioannis Konstantoulas Have you tried creating a similar array in C? I imagine it may be possible that allocator may be having some trouble figuring out how to properly page such a large contiguous data structure.
Arend Mellendijk (Jul 23 2023 at 21:12):
FWIW when I run the last thing you wrote on my machine with 48 gigs it works and has a peak memory usage of ~16GB
Ioannis Konstantoulas (Jul 24 2023 at 11:14):
Mac said:
Ioannis Konstantoulas Have you tried creating a similar array in C? I imagine it may be possible that allocator may be having some trouble figuring out how to properly page such a large contiguous data structure.
Hello; C does not have a problem with this:
#include <stdio.h>
#include <stdlib.h>
#define SIZE 1000000000
int main(void) {
unsigned int sum=0;
unsigned int i=0;
unsigned int *array = (int *) malloc((SIZE)*sizeof(int));
if ( array == NULL ) {
printf("Malloc failed!\n");
return 1;
}
for (i = 0; i < SIZE; i++) array[i]=1;
for (i = 0; i < SIZE; i++) sum += array[i];
printf("%d\n", sum);
return 0;
}
> gcc myar.c && ./a.out
1000000000
Ioannis Konstantoulas (Jul 24 2023 at 11:35):
Note: although I allocated with sizeof(int)
, the same success occurs if I change everything to long
.
Leonardo de Moura (Jul 24 2023 at 13:29):
In C, int
is a 32-bit value. In Lean, Nat
is an arbitrary precision number. Moreover, in the current Lean runtime, the Array
type is represented as an array of pointers / boxed values. The types ByteArray
and FloatArray
are represented as arrays of scalar values.
Ioannis Konstantoulas (Jul 24 2023 at 14:52):
The manual led me to believe that for not too big Nat
s, the storage is in an 8 byte long
, so a 1 Billion element array of (1 : Nat)
should take approximately 8 GB of RAM, plus the data structure overhead. But if the Array is a collection of pointers to values elsewhere allocated, it makes sense that the total usage will be double that, amounting to approximately 16 GB.
So, currently, to create an array of ints and perform integer or modular arithmetic on it, would it make sense to create a large ByteArray
, assemble UIntXX
values out of consecutive array elements, perform arithmetic, and decompose back and store into bytes?
Daniil Kisel (Jul 24 2023 at 16:04):
If all values in the array are 1
and less than 2^63 (?), they will be stored inside the tagged "pointers" and won't consume twice as much memory.
Ioannis Konstantoulas (Jul 24 2023 at 16:25):
Daniil Kisel said:
If all values in the array are
1
and less than 2^63 (?), they will be stored inside the tagged "pointers" and won't consume twice as much memory.
Well, then why are we seeing 16 GB of memory consumption in the above code, even with the compiler.extract_closed false
option?
Daniil Kisel (Jul 24 2023 at 17:00):
Maybe array doubles the capacity of the underlying array? Looking at valgrind, compiler.extract_closed
doesn't affect memory consumption (because the array is not modified), and while array of size 2^20 uses + 20508KB
, with size 2^20 + 1 it uses + 32796KB
(where + means compared to allocating array with one element).
Though lean_mk_array
doesn't extend capacity, so my guess is wrong. Maybe the cause is in the allocator lean uses? Wrong again, it uses malloc
for allocations bigger than 4096.
Daniil Kisel (Jul 24 2023 at 17:24):
@Ioannis Konstantoulas have you tried the C version with (SIZE)*sizeof(size_t)
instead of (SIZE)*sizeof(int)
?
Ioannis Konstantoulas (Jul 24 2023 at 18:37):
Daniil Kisel said:
Ioannis Konstantoulas have you tried the C version with
(SIZE)*sizeof(size_t)
instead of(SIZE)*sizeof(int)
?
Yes, there is no difference; I also tested a bigger array to run out of memory deliberately, and the C program above automatically used swap space and terminated successfully.
Henrik Böving (Jul 24 2023 at 18:46):
Using swap isn't exactly a choice of your program, the OS can decide to swap. However if you allocate too much too fast it can also decide to murder your process because its swapping cant keep up
Last updated: Dec 20 2023 at 11:08 UTC