Zulip Chat Archive

Stream: lean4

Topic: Design principles behind Std.HashMap


pandaman (Sep 09 2024 at 10:18):

The 4.11.0 release note states:

Lean 4.11 features a brand-new implementation of hash tables, concretely usable as both dictionaries (Std.HashMap) and sets (Std.HashSet). This new implementation has been designed from the ground up for both efficiency and correctness, and it features an extensible, carefully-engineered framework for specifying and proving properties of its API.

I'm curious about the design principle and challenges faced during the development of the new hash map implementation. Were there any trade off between speed, verification of correctness, and usability of API? I would love to hear more about the thought process behind these decisions, especially how you set up a framework for specification and verification purposes.

Thanks!

Kim Morrison (Sep 09 2024 at 10:23):

Perhaps you would find the implementation notes, in Std.Data.DHashMap.Internal.Defs helpful!

Kim Morrison (Sep 09 2024 at 10:26):

Very briefly: Markus did a lot of engineering so that all operations have a corresponding operation on a List-based reference model, and there is infrastructure in place to uniformly transfer results from the reference model to the runtime model. (This was a lot of work!) But the result is that there are essentially no tradeoffs of performance or user facing API in exchange for verifiability. The cost to this was the work (done now), and that whenever we want to add more operations it has to be done inside this reference model framework, which is slightly cumbersome, but doesn't pose any particular challenge.

pandaman (Sep 09 2024 at 10:29):

Hi. Are you referring to this comment? This indeed looks good read. Thank you for the summary and pointer!

Henrik Böving (Sep 09 2024 at 11:13):

You can see the comment in action here: https://github.com/leanprover/lean4/pull/5244, even though the PR is large it is mostly copying theorems around so that all variations of the HashMap type have the theorems, the actual implementation work is rather miniscule.


Last updated: May 02 2025 at 03:31 UTC