Zulip Chat Archive
Stream: lean4
Topic: speed tests
Matthew Ballard (Jun 15 2023 at 20:33):
I got sidetracked benchmarking Lean against jabbalaci/SpeedTests. The best result I can get is
./build/bin/Main 8.33s user 0.02s system 95% cpu 8.745 total
vs the example C code for comparison
./main 1.80s user 0.00s system 99% cpu 1.808 total
Can anyone improve on this?
Mac Malone (Jun 15 2023 at 21:15):
What is the base speed of a no-op Lean program on your machine? For example, def main : IO UInt32 := return 0
. That would provide an idea of how much time there is to save. For tests like these, Lean has a semi-expensive start-up cost that may hurt the metric.
Henrik Böving (Jun 15 2023 at 21:22):
λ time ./main
0
1
3435
438579088
./main 2.41s user 0.00s system 99% cpu 2.423 total
0
1
3435
438579088
./build/bin/jabbalaci 5.36s user 0.01s system 99% cpu 5.395 total
although its already not so pretty anymore^^
Henrik Böving (Jun 15 2023 at 21:22):
let see how far I can push it
Henrik Böving (Jun 15 2023 at 21:27):
./build/bin/jabbalaci 4.21s user 0.01s system 99% cpu 4.227 total
Henrik Böving (Jun 15 2023 at 21:29):
If i compile with gcc instead of clang that is almost C level speed: ./main 3.90s user 0.00s system 99% cpu 3.919 total
`
Henrik Böving (Jun 15 2023 at 21:45):
./build/bin/jabbalaci 3.65s user 0.01s system 99% cpu 3.668 total
beating gcc now
Henrik Böving (Jun 15 2023 at 22:17):
Okay I can't get it better anymore :( Seems like clang is out of reach for now, here is my result:
partial def cacheAux (arr : Array UInt32) (i : Nat) : Array UInt32 :=
if i == 10 then
arr
else
cacheAux (arr.push <| UInt32.ofNat <| i ^ i) (i + 1)
@[noinline]
def cache : Array UInt32 := Id.run do
let mut arr := Array.mkEmpty 10
arr := arr.push 0
cacheAux arr 1
@[inline]
def final : UInt32 := 440000000
partial def isMunchhausenAux (n total number : UInt32) : Bool :=
if n == 0 then
total == number
else
let digit := n % 10
let total := total + cache.uget digit.toUSize sorry
if total > number then
false
else
let n := n / 10
isMunchhausenAux n total number
@[inline]
def isMunchhausen (number : UInt32) : Bool :=
isMunchhausenAux number 0 number
partial def mainAux (curr : UInt32) : IO Unit := do
if curr == final then
return ()
else
if isMunchhausen curr then
IO.println curr
mainAux (curr + 1)
else
mainAux (curr + 1)
def main : IO Unit := do
mainAux 0
Henrik Böving (Jun 15 2023 at 22:32):
But looking at the cart from the README (given the fact that my run times of the C are pretty close to theirs I conjecture my Lean would be similar on their setup) it seems to me like we are in a pretty good spot, it is faster than quite a lot of the languages and very close to the true systems languages like C/C++/Rust so it appears our code generator is quite acceptable :D
Matthew Ballard (Jun 15 2023 at 23:13):
Mac said:
What is the base speed of a no-op Lean program on your machine? For example,
def main : IO UInt32 := return 0
. That would provide an idea of how much time there is to save. For tests like these, Lean has a semi-expensive start-up cost that may hurt the metric.
./build/bin/Main 0.02s user 0.02s system 9% cpu 0.455 total
Matthew Ballard (Jun 15 2023 at 23:13):
Excellent!
Matthew Ballard (Jun 15 2023 at 23:31):
Henrik Böving said:
Okay I can't get it better anymore :( Seems like clang is out of reach for now, here is my result:
partial def cacheAux (arr : Array UInt32) (i : Nat) : Array UInt32 := if i == 10 then arr else cacheAux (arr.push <| UInt32.ofNat <| i ^ i) (i + 1) @[noinline] def cache : Array UInt32 := Id.run do let mut arr := Array.mkEmpty 10 arr := arr.push 0 cacheAux arr 1 @[inline] def final : UInt32 := 440000000 partial def isMunchhausenAux (n total number : UInt32) : Bool := if n == 0 then total == number else let digit := n % 10 let total := total + cache.uget digit.toUSize sorry if total > number then false else let n := n / 10 isMunchhausenAux n total number @[inline] def isMunchhausen (number : UInt32) : Bool := isMunchhausenAux number 0 number partial def mainAux (curr : UInt32) : IO Unit := do if curr == final then return () else if isMunchhausen curr then IO.println curr mainAux (curr + 1) else mainAux (curr + 1) def main : IO Unit := do mainAux 0
This gives
./build/bin/Main 2.89s user 0.01s system 92% cpu 3.141 total
on my machine.
Matthew Ballard (Jun 15 2023 at 23:33):
Though gcc
does (slightly) better than clang
there
Mac Malone (Jun 16 2023 at 00:04):
@Henrik Böving One key thing Lean is missing is a primitive array. Currently the cache is built of boxed UInt32
and unboxed on access, which hurts performance (the allocation of the boxed cache objects is probably itself a significant chunk of time wasted).
Matthew Ballard (Jun 16 2023 at 00:11):
I was running with Nat
instead of UInt32
which is the main performance gain left on the table.
Matthew Ballard (Jun 16 2023 at 00:12):
It cut the time in half.
Mac Malone (Jun 16 2023 at 00:16):
@Matthew Ballard Yes, but unfortunately the elements of cache : Array UInt32
are still boxed.
Matthew Ballard (Jun 16 2023 at 00:24):
What values are unboxed by default?
Mac Malone (Jun 16 2023 at 00:25):
UInt8/16/32/64, USize, Float
Mac Malone (Jun 16 2023 at 00:26):
Inductive types which are essentially enums (e.g., Bool
) are also represented by unboxed numerical tag of smallest size that fits (e.g. UInt8
for Bool
)
Mac Malone (Jun 16 2023 at 00:27):
Another bit of overhead is that Lean has to check for IO errors on each print, but since this function only prints 4 times, that is likely not much.
Matthew Ballard (Jun 16 2023 at 00:42):
More comparisons on my machine: rust
./target/release/rust 1.73s user 0.01s system 86% cpu 2.019 total
and go
./main 3.21s user 0.02s system 94% cpu 3.413 total
Mac Malone (Jun 16 2023 at 00:43):
@Matthew Ballard Neat! So Lean is faster than Go on your machine.
Matthew Ballard (Jun 16 2023 at 00:47):
I was going to do java but I have to get a working runtime :lol:
Matthew Ballard (Jun 16 2023 at 17:38):
One more comment: following this discussion, it is interesting to see the difference between v1 at 93.8s and v2 at 3.5s for the Haskell code.
Max Nowak 🐉 (Jun 17 2023 at 09:03):
This is really interesting! I wonder about more benchmarks comparing Lean to C/C++/Rust.
Last updated: Dec 20 2023 at 11:08 UTC