Zulip Chat Archive

Stream: lean4

Topic: Comparing `Lean.HashMap` and `Std.HashMap` performance


Jovan Gerbscheid (Sep 05 2024 at 17:08):

In version 4.11.0, the new Std.HashMap was introduced in Lean. I decided to compare the performance to Lean.HashMap for constructing a HashMap, and got some surprising results. I would have expected to see similar performance, but I got very different performance, and the performance was also very different in 4.12.0-rc1 compared to 4.11.0. The by far fastest one out of the 4 combinations was Lean.HashMap in 4.11.0, and the next fastest was Std.HashMap in 4.12.0-rc1. Can anyone explain this enormous difference?

Here are the results (which I got from Lean 4 Web):

import Lean

def foo (x : Nat) : IO Nat := do
  let mut m : Std.HashMap Nat Nat := {}
  for i in [:x] do
    m := m.insert i x
  return m.size

def foo' (x : Nat) : IO Nat := do
  let mut m : Lean.HashMap Nat Nat := {}
  for i in [:x] do
    m := m.insert i x
  return m.size

/-
4.11.0:     511ms
4.12.0-rc1: 185ms
-/
#eval discard <| timeit "" do foo  100000
/-
4.11.0:      37ms
4.12.0-rc1: 292ms
-/
#eval discard <| timeit "" do foo' 100000



def beatit (fn : IO α) : IO Unit := do
  let b0  IO.getNumHeartbeats
  let _  fn
  let b1  IO.getNumHeartbeats
  IO.print s! "{b1-b0} beats"

/-
4.11.0:     7219058 beats
4.12.0-rc1: 2500354 beats
-/
#eval beatit do foo  100000
/-
4.11.0:      100155 beats
4.12.0-rc1: 1583409 beats
-/
#eval beatit do foo' 100000

Henrik Böving (Sep 05 2024 at 17:58):

With a setup like this you are essentially measuring the performance of the interpreter for the for loop implementation, not the hashmap itself. If you run this code:

import Lean

@[noinline]
def foo (x : Nat) : IO Nat := do
  let mut m : Std.HashMap Nat Nat := {}
  for i in [:x] do
    m := m.insert i x
  return m.size

@[noinline]
def foo' (x : Nat) : IO Nat := do
  let mut m : Lean.HashMap Nat Nat := {}
  for i in [:x] do
    m := m.insert i x
  return m.size

def beatit (x : String) (fn : IO α) : IO Unit := do
  IO.print x
  let b0  IO.getNumHeartbeats
  let _  fn
  let b1  IO.getNumHeartbeats
  IO.println s! "{b1-b0} beats"

def main : IO Unit := do
  let size := 1000000
  discard <| timeit "new" do foo  size
  discard <| timeit "old" do foo' size
  beatit "new " do foo  size
  beatit "old " do foo' size

as a compiled binary I measure:

new 45.3ms
old 40.1ms
new 1000008 beats
old 1000025 beats

on 4.11 and on 4.12:

new 41.5ms
old 38.9ms
new 1000008 beats
old 1000025 beats

So there is barely an allocation difference and the runtime is also quite tight, maybe even within noise

Markus Himmel (Sep 05 2024 at 18:01):

Benchmarks executed in the editor like this are very sensitive with regard to imports, because if you happen to import a module which uses the exact specializations you are using, then you will run precompiled code, and otherwise you will run interpreted code. Interpreted code is orders of magnitude slower. You can inspect the generated IR with set_option compiler.ir.result true, and if you see something like Std.DHashMap.Internal.AssocList.replace._at.Lean.Elab.Tactic.BVDecide.LRAT.trim.M.registerIdMap._spec_6, then you are running precompiled code specialized for an imported module (in this case some implementation detail of the new bv_decide tactic).

A better way to benchmark would be compiling with lake build and running the resulting executable. In this case, there isn't any difference across toolchains and hash map versions for what you're testing.

Jovan Gerbscheid (Sep 05 2024 at 19:26):

Thank for the explanation. I'm working on a tactic for Mathlib, and the performance of HashMap.insert is quite important for it. However, I understand that code in mathlib doesn't get compiled to a binary, and that there is no workaround for this.

Jovan Gerbscheid (Sep 05 2024 at 19:27):

And how do I easily compile and run a lean file for a test like this? (given that I'm working in a copy of mathlib where files don't get compiled)


Last updated: May 02 2025 at 03:31 UTC