Zulip Chat Archive

Stream: batteries

Topic: Extensional equivalence for `HashMap`

Timo Carlin-Burns (Feb 15 2024 at 22:20):

Currently, HashMap has a very weak notion of equality, so that even empty hashmaps with different capacities are unequal. Will HashMap eventually be defined as a quotient? One complication I can imagine is choosing the precise equivalence relation for the quotient:

If two HashMaps are equal whenever they can't be distinguished through HashMap.find?, then HashMap.toList, which exposes the internal order, is not easily definable.
- Even if we switched to an IndexMap implementation which has mostly predictable internal order, O(1) removal would still cause internal order to be exposed.
- Alternatively, I suppose we could do something similar to Multiset and have a toList which returns something quotiented by List.Perm. If this is done in an ergonomic way, it could be a nice improvement over most languages where relying on internal HashMap order is a correctness and forward-compatibility issue, but not caught by the type system.
If two HashMaps are equal whenever they can't be distinguished by toList, that might be surprising, because it doesn't reflect the usual meaning of map1 == map2 in languages like Rust.

One possible benefit of using a quotient type is that we could soundly store a random seed in the hashmap to mitigate HashDOS attacks without having to move HashMap.empty into the IO monad.

James Gallicchio (Feb 15 2024 at 23:56):

Do you have a reason for wanting to discuss hashmap equality? I sorta suspect that is a bit of a code smell.

For any map type from K to V we can give an equivalence on the model K -> Option V (find?), maybe call it find?_equiv even. It is decidable for hashmaps, even.

Timo Carlin-Burns (Feb 15 2024 at 23:58):

In implementing the simp lemma find?_empty : (∅ : HashMap α β).find? k = none, I realized that this lemma has misleading generality, because it doesn't actually characterize all empty hashmaps (i.e. those with size = 0), but only empty hashmaps with the default initial capacity.

James Gallicchio (Feb 15 2024 at 23:59):

Sure, but that's not an issue with equality, just with how the simp lemma was written, no?

James Gallicchio (Feb 16 2024 at 00:02):

(in particular you can make this simp lemma work for any initial capacity by explicitly providing that argument)

Timo Carlin-Burns (Feb 16 2024 at 00:04):

I admit that I don't personally have a use case (besides the random seeding idea) for talking about equality of hashmaps. But in most other parts of Lean I've seen (mathlib), we make sure to have a nice notion of equality on our types. I suppose you're saying for computational use cases, this is not important. Does that mean for HashMap we might end up with BEq reflecting the nice notion of equality and Eq reflecting the weak one?

Timo Carlin-Burns (Feb 16 2024 at 00:06):

Another example: Wouldn't it be nice if one could write a simp lemma insert_insert : (m.insert k v).insert k v' = m.insert k v'?

Timo Carlin-Burns (Feb 16 2024 at 00:10):

Actually I supposed that equality is true intensionally in any reasonable HashMap implementation... But insert_remove_eq_remove wouldn't be (because the insertion could have triggered growing)

James Gallicchio (Feb 16 2024 at 00:13):

That lemma is just a very strong claim in general. I think in LeanColls my goal has been to express those lemmas in terms of the models being equivalent, never using equality directly on the collection, because it vastly simplifies the implementor's job.

I think there is a way to then quotient by the equivalence in a way that is generic over the collection type, where then those lemmas become equalities, but again I'm unsure of the benefit

Timo Carlin-Burns (Feb 16 2024 at 00:19):

Looking forward to checking out LeanColls in more detail then! Does LeanColls make it easy to verify algorithms in terms of models of the data structures they use rather than their implementations? It seems like we're probably missing relevant tooling (like grw) for working with anything but equalities.

James Gallicchio (Feb 16 2024 at 08:08):

Timo Carlin-Burns said:

Does LeanColls make it easy to verify algorithms in terms of models of the data structures they use rather than their implementations?

oh not at all, it's a shell of a library, the only thing it accomplishes is some abstraction over the existing data structures in std...

James Gallicchio (Feb 16 2024 at 08:12):

a lot of algorithms that operate over data structures do have some equivalence relation they want to quotient the data by. I just am not sure that equivalence relation is always the same for a given data structure, which is what gives me pause here.

like maybe some algorithm does want to expose the internal order of the hashmaps, in which case having a version that is not quotiented is essential. or maybe you want to treat a red-black map as an ordered (partial,finite) map rather than a regular (partial,finite) map.

James Gallicchio (Feb 16 2024 at 08:19):

grw is going to be very useful precisely because quotients can be awkward to verify with...

Last updated: May 02 2025 at 03:31 UTC