Zulip Chat Archive

Stream: lean4

Topic: Hashable Expr diverges from BEq Expr

Thomas Murrills (May 02 2024 at 23:05):

I noticed that hash (e : Expr) conflates .lam and .forallE, but e₁ == e₂ doesn’t.

I bring this up because HashMap relies on BEq and Hashable, but I’m not sure if this is actually of any consequence. After all, a lambda with exactly the arguments as a forallE is not impossible, but not very common; plus, the HashMap operations seem to behave alright with respect to collisions.

Just curious, is there any specific reason we want to consider lambdas and foralls the same when hashing?

Adam Topaz (May 02 2024 at 23:17):

I’m surprised that the hash doesn’t mix the name of the constructor. Is this the case for automatically derived instances for other inductive types?

Eric Wieser (May 02 2024 at 23:20):

This is of course the right way around; as long as a == b -> hash a = hash b, then everything is lawful :)

Adam Topaz (May 02 2024 at 23:23):

deriving Hashable seems to have issues as well, but not quite the same:

inductive Foo where
  | a : Nat → Foo
  | b : Nat → Foo
deriving Hashable

#eval hash (Foo.a 0) -- 3866779316627607737
#eval hash (Foo.b 0) -- 7999773623304365860

inductive Bar where
  | a : Nat → Bar
  | b : Nat → Bar
deriving Hashable

#eval hash (Bar.a 0) -- 3866779316627607737
#eval hash (Bar.b 0) -- 7999773623304365860

Kyle Miller (May 02 2024 at 23:33):

@Adam Topaz Hashes don't need to be globally uniquish, just uniquish within the same type. The use case is being able to do a quick check to avoid doing a more expensive == check.

Adam Topaz (May 02 2024 at 23:34):

Yeah I understand.

Adam Topaz (May 02 2024 at 23:35):

I just found it surprising that the name of the type seems to have no bearing on the hash.

Kyle Miller (May 02 2024 at 23:37):

The name of the type doesn't matter though, for the use case

Eric Wieser (May 02 2024 at 23:40):

Presumably the idea being that if you really did have a mixture of multiple types, you'd wrap them in a Sum type and could then use the hash of its constructor indices anyway.

Adam Topaz (May 02 2024 at 23:41):

It seems that the index of the constructor is what's used, not the name, FWIW.

Adam Topaz (May 02 2024 at 23:42):

Oh yeah, I just saw that sneaky work "indices" in Eric's message :)

Adam Topaz (May 02 2024 at 23:44):

Anyway, this is irrelevant to the original question concerning Expr.

Kyle Miller (May 02 2024 at 23:57):

Just to check, indeed there is a collision in something reasonable: a motive and the type of a recursor.

open Lean Elab in
elab "#hashes " t:term : command => Command.runTermElabM fun _ => do
  let e ← Term.elabTermAndSynthesize t none
  e.forEach fun node => do
    logInfo m!"{hash node}: {node}"

set_option pp.funBinderTypes true
#hashes id <| fun b => Bool.recOn (motive := fun _ => Bool) b true false
/-
2384416661: id fun (b : Bool) ↦ Bool.recOn b true false
2927525288: id
1812194161: @id
3551570259: Bool → Bool
183048209: Bool
2428420263: fun (b : Bool) ↦ Bool.recOn b true false
2175286235: Bool.recOn #0 true false
4279372883: Bool.recOn #0 true
1771607792: Bool.recOn #0
1088603745: Bool.recOn
149233892: @Bool.recOn
3551570259: fun (x : Bool) ↦ Bool
351589370: #0
3057640288: true
1995999671: false
-/

Notice 3551570259 for both fun (x : Bool) ↦ Bool and Bool → Bool

Kyle Miller (May 03 2024 at 10:18):

Created an issue: lean4#4060

Last updated: May 02 2025 at 03:31 UTC