Zulip Chat Archive
Stream: lean4
Topic: Comparing `Lean.HashMap` and `Std.HashMap` performance
Jovan Gerbscheid (Sep 05 2024 at 17:08):
In version 4.11.0, the new Std.HashMap
was introduced in Lean. I decided to compare the performance to Lean.HashMap
for constructing a HashMap
, and got some surprising results. I would have expected to see similar performance, but I got very different performance, and the performance was also very different in 4.12.0-rc1 compared to 4.11.0. The by far fastest one out of the 4 combinations was Lean.HashMap
in 4.11.0, and the next fastest was Std.HashMap
in 4.12.0-rc1. Can anyone explain this enormous difference?
Here are the results (which I got from Lean 4 Web):
import Lean
def foo (x : Nat) : IO Nat := do
let mut m : Std.HashMap Nat Nat := {}
for i in [:x] do
m := m.insert i x
return m.size
def foo' (x : Nat) : IO Nat := do
let mut m : Lean.HashMap Nat Nat := {}
for i in [:x] do
m := m.insert i x
return m.size
/-
4.11.0: 511ms
4.12.0-rc1: 185ms
-/
#eval discard <| timeit "" do foo 100000
/-
4.11.0: 37ms
4.12.0-rc1: 292ms
-/
#eval discard <| timeit "" do foo' 100000
def beatit (fn : IO α) : IO Unit := do
let b0 ← IO.getNumHeartbeats
let _ ← fn
let b1 ← IO.getNumHeartbeats
IO.print s! "{b1-b0} beats"
/-
4.11.0: 7219058 beats
4.12.0-rc1: 2500354 beats
-/
#eval beatit do foo 100000
/-
4.11.0: 100155 beats
4.12.0-rc1: 1583409 beats
-/
#eval beatit do foo' 100000
Henrik Böving (Sep 05 2024 at 17:58):
With a setup like this you are essentially measuring the performance of the interpreter for the for loop implementation, not the hashmap itself. If you run this code:
import Lean
@[noinline]
def foo (x : Nat) : IO Nat := do
let mut m : Std.HashMap Nat Nat := {}
for i in [:x] do
m := m.insert i x
return m.size
@[noinline]
def foo' (x : Nat) : IO Nat := do
let mut m : Lean.HashMap Nat Nat := {}
for i in [:x] do
m := m.insert i x
return m.size
def beatit (x : String) (fn : IO α) : IO Unit := do
IO.print x
let b0 ← IO.getNumHeartbeats
let _ ← fn
let b1 ← IO.getNumHeartbeats
IO.println s! "{b1-b0} beats"
def main : IO Unit := do
let size := 1000000
discard <| timeit "new" do foo size
discard <| timeit "old" do foo' size
beatit "new " do foo size
beatit "old " do foo' size
as a compiled binary I measure:
new 45.3ms
old 40.1ms
new 1000008 beats
old 1000025 beats
on 4.11 and on 4.12:
new 41.5ms
old 38.9ms
new 1000008 beats
old 1000025 beats
So there is barely an allocation difference and the runtime is also quite tight, maybe even within noise
Markus Himmel (Sep 05 2024 at 18:01):
Benchmarks executed in the editor like this are very sensitive with regard to imports, because if you happen to import a module which uses the exact specializations you are using, then you will run precompiled code, and otherwise you will run interpreted code. Interpreted code is orders of magnitude slower. You can inspect the generated IR with set_option compiler.ir.result true
, and if you see something like Std.DHashMap.Internal.AssocList.replace._at.Lean.Elab.Tactic.BVDecide.LRAT.trim.M.registerIdMap._spec_6
, then you are running precompiled code specialized for an imported module (in this case some implementation detail of the new bv_decide
tactic).
A better way to benchmark would be compiling with lake build
and running the resulting executable. In this case, there isn't any difference across toolchains and hash map versions for what you're testing.
Jovan Gerbscheid (Sep 05 2024 at 19:26):
Thank for the explanation. I'm working on a tactic for Mathlib, and the performance of HashMap.insert is quite important for it. However, I understand that code in mathlib doesn't get compiled to a binary, and that there is no workaround for this.
Jovan Gerbscheid (Sep 05 2024 at 19:27):
And how do I easily compile and run a lean file for a test like this? (given that I'm working in a copy of mathlib where files don't get compiled)
Last updated: May 02 2025 at 03:31 UTC