Zulip Chat Archive

Stream: lean4

Topic: speed tests


Matthew Ballard (Jun 15 2023 at 20:33):

I got sidetracked benchmarking Lean against jabbalaci/SpeedTests. The best result I can get is

./build/bin/Main  8.33s user 0.02s system 95% cpu 8.745 total

vs the example C code for comparison

./main  1.80s user 0.00s system 99% cpu 1.808 total

Can anyone improve on this?

Mac Malone (Jun 15 2023 at 21:15):

What is the base speed of a no-op Lean program on your machine? For example, def main : IO UInt32 := return 0. That would provide an idea of how much time there is to save. For tests like these, Lean has a semi-expensive start-up cost that may hurt the metric.

Henrik Böving (Jun 15 2023 at 21:22):

λ time ./main
0
1
3435
438579088
./main  2.41s user 0.00s system 99% cpu 2.423 total
0
1
3435
438579088
./build/bin/jabbalaci  5.36s user 0.01s system 99% cpu 5.395 total

although its already not so pretty anymore^^

Henrik Böving (Jun 15 2023 at 21:22):

let see how far I can push it

Henrik Böving (Jun 15 2023 at 21:27):

./build/bin/jabbalaci  4.21s user 0.01s system 99% cpu 4.227 total

Henrik Böving (Jun 15 2023 at 21:29):

If i compile with gcc instead of clang that is almost C level speed: ./main 3.90s user 0.00s system 99% cpu 3.919 total
`

Henrik Böving (Jun 15 2023 at 21:45):

./build/bin/jabbalaci 3.65s user 0.01s system 99% cpu 3.668 total beating gcc now

Henrik Böving (Jun 15 2023 at 22:17):

Okay I can't get it better anymore :( Seems like clang is out of reach for now, here is my result:

partial def cacheAux (arr : Array UInt32) (i : Nat) : Array UInt32 :=
  if i == 10 then
    arr
  else
    cacheAux (arr.push <| UInt32.ofNat <| i ^ i) (i + 1)

@[noinline]
def cache : Array UInt32 := Id.run do
  let mut arr := Array.mkEmpty 10
  arr := arr.push 0
  cacheAux arr 1

@[inline]
def final : UInt32 := 440000000

partial def isMunchhausenAux (n total number : UInt32) : Bool :=
  if n == 0 then
    total == number
  else
    let digit := n % 10
    let total := total + cache.uget digit.toUSize sorry
    if total > number then
      false
    else
      let n := n / 10
      isMunchhausenAux n total number

@[inline]
def isMunchhausen (number : UInt32) : Bool :=
  isMunchhausenAux number 0 number

partial def mainAux (curr : UInt32) : IO Unit := do
  if curr == final then
    return ()
  else
    if isMunchhausen curr then
      IO.println curr
      mainAux (curr + 1)
    else
      mainAux (curr + 1)

def main : IO Unit := do
  mainAux 0

Henrik Böving (Jun 15 2023 at 22:32):

But looking at the cart from the README (given the fact that my run times of the C are pretty close to theirs I conjecture my Lean would be similar on their setup) it seems to me like we are in a pretty good spot, it is faster than quite a lot of the languages and very close to the true systems languages like C/C++/Rust so it appears our code generator is quite acceptable :D

Matthew Ballard (Jun 15 2023 at 23:13):

Mac said:

What is the base speed of a no-op Lean program on your machine? For example, def main : IO UInt32 := return 0. That would provide an idea of how much time there is to save. For tests like these, Lean has a semi-expensive start-up cost that may hurt the metric.

./build/bin/Main  0.02s user 0.02s system 9% cpu 0.455 total

Matthew Ballard (Jun 15 2023 at 23:13):

Excellent!

Matthew Ballard (Jun 15 2023 at 23:31):

Henrik Böving said:

Okay I can't get it better anymore :( Seems like clang is out of reach for now, here is my result:

partial def cacheAux (arr : Array UInt32) (i : Nat) : Array UInt32 :=
  if i == 10 then
    arr
  else
    cacheAux (arr.push <| UInt32.ofNat <| i ^ i) (i + 1)

@[noinline]
def cache : Array UInt32 := Id.run do
  let mut arr := Array.mkEmpty 10
  arr := arr.push 0
  cacheAux arr 1

@[inline]
def final : UInt32 := 440000000

partial def isMunchhausenAux (n total number : UInt32) : Bool :=
  if n == 0 then
    total == number
  else
    let digit := n % 10
    let total := total + cache.uget digit.toUSize sorry
    if total > number then
      false
    else
      let n := n / 10
      isMunchhausenAux n total number

@[inline]
def isMunchhausen (number : UInt32) : Bool :=
  isMunchhausenAux number 0 number

partial def mainAux (curr : UInt32) : IO Unit := do
  if curr == final then
    return ()
  else
    if isMunchhausen curr then
      IO.println curr
      mainAux (curr + 1)
    else
      mainAux (curr + 1)

def main : IO Unit := do
  mainAux 0

This gives

./build/bin/Main  2.89s user 0.01s system 92% cpu 3.141 total

on my machine.

Matthew Ballard (Jun 15 2023 at 23:33):

Though gcc does (slightly) better than clang there

Mac Malone (Jun 16 2023 at 00:04):

@Henrik Böving One key thing Lean is missing is a primitive array. Currently the cache is built of boxed UInt32 and unboxed on access, which hurts performance (the allocation of the boxed cache objects is probably itself a significant chunk of time wasted).

Matthew Ballard (Jun 16 2023 at 00:11):

I was running with Nat instead of UInt32 which is the main performance gain left on the table.

Matthew Ballard (Jun 16 2023 at 00:12):

It cut the time in half.

Mac Malone (Jun 16 2023 at 00:16):

@Matthew Ballard Yes, but unfortunately the elements of cache : Array UInt32 are still boxed.

Matthew Ballard (Jun 16 2023 at 00:24):

What values are unboxed by default?

Mac Malone (Jun 16 2023 at 00:25):

UInt8/16/32/64, USize, Float

Mac Malone (Jun 16 2023 at 00:26):

Inductive types which are essentially enums (e.g., Bool) are also represented by unboxed numerical tag of smallest size that fits (e.g. UInt8 for Bool)

Mac Malone (Jun 16 2023 at 00:27):

Another bit of overhead is that Lean has to check for IO errors on each print, but since this function only prints 4 times, that is likely not much.

Matthew Ballard (Jun 16 2023 at 00:42):

More comparisons on my machine: rust

./target/release/rust  1.73s user 0.01s system 86% cpu 2.019 total

and go

./main  3.21s user 0.02s system 94% cpu 3.413 total

Mac Malone (Jun 16 2023 at 00:43):

@Matthew Ballard Neat! So Lean is faster than Go on your machine.

Matthew Ballard (Jun 16 2023 at 00:47):

I was going to do java but I have to get a working runtime :lol:

Matthew Ballard (Jun 16 2023 at 17:38):

One more comment: following this discussion, it is interesting to see the difference between v1 at 93.8s and v2 at 3.5s for the Haskell code.

Max Nowak 🐉 (Jun 17 2023 at 09:03):

This is really interesting! I wonder about more benchmarks comparing Lean to C/C++/Rust.


Last updated: Dec 20 2023 at 11:08 UTC