Zulip Chat Archive

Stream: lean4

Topic: Efficient hashmap counter from list


Jason Rute (Feb 11 2024 at 13:15):

Let's say I want to make an efficient hashmap counter from a list. Is this good, or am I copying the hashmap at each iteration? It isn't clear to me how fold or tail recursion work with change-in-place.

def counterFromList (features : List String) : Std.HashMap String Nat :=
  features.foldl (init := Std.mkHashMap) fun hshMap str =>
    let cnt? := hshMap.find? str
    let newCnt := match cnt? with
    | none => 1
    | some cnt => cnt + 1
    hshMap.insert str newCnt

Jason Rute (Feb 11 2024 at 13:27):

(Also, if I'm being dumb and I should use a persistent map instead, feel free to tell me that also.)

Marc Huisinga (Feb 11 2024 at 15:38):

You can test this as follows:

import Std

@[noinline]
def insert (map : Std.HashMap String Nat) (x : String) (y : Nat) :=
  let map := dbgTraceIfShared "" map
  map.insert x y

def counterFromList (features : List String) : Std.HashMap String Nat :=
  features.foldl (init := Std.mkHashMap) fun hshMap str =>
    let cnt? := hshMap.find? str
    let newCnt := match cnt? with
      | none => 1
      | some cnt => cnt + 1
    insert hshMap str newCnt

-- output: shared RC
#eval (counterFromList ["asdf", "foo", "bar"]).findD "asdf" 0

The hash map is shared in the first iteration but not in the following iterations. The empty hash map is probably shared in the first iteration because closed term extraction persists it and so it needs to be copied once instead of mutated.

Marc Huisinga (Feb 11 2024 at 15:46):

We need the @[noinline] trick because the old compiler can do some strange reorderings with dbgTraceIfShared operations. For example, take the same example with Lean.HashMap instead of Std.HashMap:

import Lean

def counterFromList (features : List String) : Lean.HashMap String Nat :=
  features.foldl (init := ) fun hshMap str =>
    let hshMap := dbgTraceIfShared "1" hshMap
    let cnt? := hshMap.find? str
    let hshMap := dbgTraceIfShared "2" hshMap
    let newCnt := match cnt? with
      | none => 1
      | some cnt => cnt + 1
    let hshMap := dbgTraceIfShared "3" hshMap
    hshMap.insert str newCnt

-- output:
-- shared RC 2
-- shared RC 3
-- shared RC 2
-- shared RC 3
-- shared RC 2
-- shared RC 3
#eval (counterFromList ["asdf", "foo", "bar"]).findD "asdf" 0

Here, the old compiler incorrectly moves the find? call in-between the third dbgTraceIfShared and the insert call, effectively yielding the following code where hshMap1 is shared every iteration between the second dbgTraceIfShared operation and find?:

def counterFromList (features : List String) : Lean.HashMap String Nat :=
  features.foldl (init := ) fun hshMap str =>
    let hshMap1 := dbgTraceIfShared "1" hshMap
    let hshMap2 := dbgTraceIfShared "2" hshMap1
    let hshMap3 := dbgTraceIfShared "3" hshMap2
    let cnt? := hshMap1.find? str
    let newCnt := match cnt? with
      | none => 1
      | some cnt => cnt + 1
    hshMap3.insert str newCnt

Last updated: May 02 2025 at 03:31 UTC