Zulip Chat Archive
Stream: lean4
Topic: Efficient hashmap counter from list
Jason Rute (Feb 11 2024 at 13:15):
Let's say I want to make an efficient hashmap counter from a list. Is this good, or am I copying the hashmap at each iteration? It isn't clear to me how fold or tail recursion work with change-in-place.
def counterFromList (features : List String) : Std.HashMap String Nat :=
features.foldl (init := Std.mkHashMap) fun hshMap str =>
let cnt? := hshMap.find? str
let newCnt := match cnt? with
| none => 1
| some cnt => cnt + 1
hshMap.insert str newCnt
Jason Rute (Feb 11 2024 at 13:27):
(Also, if I'm being dumb and I should use a persistent map instead, feel free to tell me that also.)
Marc Huisinga (Feb 11 2024 at 15:38):
You can test this as follows:
import Std
@[noinline]
def insert (map : Std.HashMap String Nat) (x : String) (y : Nat) :=
let map := dbgTraceIfShared "" map
map.insert x y
def counterFromList (features : List String) : Std.HashMap String Nat :=
features.foldl (init := Std.mkHashMap) fun hshMap str =>
let cnt? := hshMap.find? str
let newCnt := match cnt? with
| none => 1
| some cnt => cnt + 1
insert hshMap str newCnt
-- output: shared RC
#eval (counterFromList ["asdf", "foo", "bar"]).findD "asdf" 0
The hash map is shared in the first iteration but not in the following iterations. The empty hash map is probably shared in the first iteration because closed term extraction persists it and so it needs to be copied once instead of mutated.
Marc Huisinga (Feb 11 2024 at 15:46):
We need the @[noinline]
trick because the old compiler can do some strange reorderings with dbgTraceIfShared
operations. For example, take the same example with Lean.HashMap
instead of Std.HashMap
:
import Lean
def counterFromList (features : List String) : Lean.HashMap String Nat :=
features.foldl (init := ∅) fun hshMap str =>
let hshMap := dbgTraceIfShared "1" hshMap
let cnt? := hshMap.find? str
let hshMap := dbgTraceIfShared "2" hshMap
let newCnt := match cnt? with
| none => 1
| some cnt => cnt + 1
let hshMap := dbgTraceIfShared "3" hshMap
hshMap.insert str newCnt
-- output:
-- shared RC 2
-- shared RC 3
-- shared RC 2
-- shared RC 3
-- shared RC 2
-- shared RC 3
#eval (counterFromList ["asdf", "foo", "bar"]).findD "asdf" 0
Here, the old compiler incorrectly moves the find?
call in-between the third dbgTraceIfShared
and the insert
call, effectively yielding the following code where hshMap1
is shared every iteration between the second dbgTraceIfShared
operation and find?
:
def counterFromList (features : List String) : Lean.HashMap String Nat :=
features.foldl (init := ∅) fun hshMap str =>
let hshMap1 := dbgTraceIfShared "1" hshMap
let hshMap2 := dbgTraceIfShared "2" hshMap1
let hshMap3 := dbgTraceIfShared "3" hshMap2
let cnt? := hshMap1.find? str
let newCnt := match cnt? with
| none => 1
| some cnt => cnt + 1
hshMap3.insert str newCnt
Last updated: May 02 2025 at 03:31 UTC