This is an internal implementation file of the hash map. Users of the hash map should not rely on the contents of this file.
File contents: Definition of all operations on Raw₀
, definition of WFImp
.
Hash map implementation notes #
This is a simple separate-chaining hash table. The data of the hash map (DHashMap.Raw
) consists of
a cached size and an array of buckets, where each bucket is an AssocList α β
(which is the same as
a List ((a : α) × β a)
but with one less level of indirection). The number of buckets is always a
power of two. The hash map doubles its size upon inserting an element such that the number of
elements is more than 75% of the number of buckets.
Because we need DHashMap.Raw
to be nested-inductive-safe, we cannot bundle the fact that there is
at least one bucket. We there for define a type Raw₀
which is just a Raw
together with the fact
that the size is positive. Almost all internal work on the hash map happens on Raw₀
so that we do
not have to perform size check all of the time. The operations defined on Raw
perform this size
check once to transform the Raw
into a Raw₀
and then operate on that (therefore, each operation
on Raw
will only do a single size check). The operations defined on DHashMap
conclude that the
size is positive from the well-formedness predicate, use that to build a Raw₀
and then operate on
that. So the operations on DHashMap
are exactly the same as the operations on Raw₀
and the
operations on Raw
are the same as the operations on Raw₀
as long as we can prove that the size
check will succeed.
The operations on Raw₀
are defined in this file. They are built for performance. The IR is
hand-checked to ensure that there are no unnecessary allocations, inlining problems or linearity
bugs. Note that for many operations the IR only becomes meaningful once it has been specialized with
concrete Hashable
and BEq
instances as is the case in actual applications.
Of the internal files, only Internal.Index
, Internal.Defs
and Internal.AssocList.Basic
contain
any executable code. The rest of the files set up the verification framework which is described in
the remainder of this text.
The basic idea is to translate statements about hash maps into statements about lists using the
function toListModel
defined in this file. The function toListModel
simply concatenates all
buckets into a List ((a : α) × β a)
. The file Internal.List.Associative
then contains a complete
verification of associative lists. The theorems relating the operations on Raw₀
to the operations
on List ((a : α) × β a)
are located in Internal.WF
and have such names as
contains_eq_containsKey
or toListModel_insert
. In the file Internal.RawLemmas
we then state
all of the lemmas for Raw₀
and use a tactic to apply the results from Internal.WF
to reduce to
the results from Internal.List.Associative
. From there we can state the actual lemmas for
DHashMap.Raw
, DHashMap
, HashMap.Raw
, HashMap
, HashSet.Raw
and HashSet
in the
non-internal *.Lemmas
files and immediately reduce them to the results about Raw₀
in
Internal.RawLemmas
.
There are some additional indirections to this high-level strategy. First, we have an additional
layer of so-called "model functions" on Raw₀
, defined in the file Internal.Model
. These have the
same signature as their counterparts defined in this file, but may have a slightly simpler
implementation. For example, Raw₀.erase
has a linearity optimization which is not present in the
model function Raw₀.eraseₘ
. We prove that the functions are equal to their model implementations
in Internal.Model
, then verify the model implementation. This makes the verification more robust
against implementation details, future performance improvements, etc.
Second, reducing hash maps to lists only works if the hash map is well-formed. Our internal
well-formedness predicate is called Raw.WFImp
(defined in this file) and states that (a) each
bucket only contains items with the correct hash, (b) the cached size is equal to the actual number
of elements in the buckets, and (c) after concatenating the buckets the keys in the resulting list
are pairwise not equal. The third condition is a priori stronger than the slightly more natural
condition that the keys in each bucket are pairwise not equal, but they are equivalent in
well-behaved cases and our condition works better. The user-visible well-formedness predicate
Raw.WF
is equivalent to WFImp
, as is shown in Internal.WF
. The user-visible version exists to
postpone the proofs that all operations preserve well-formedness to a later file so that it is
possible to import DHashMap.Basic
without pulling in all of the proof machinery.
The framework works very hard to make adding and verifying new operations very easy and
maintainable. To this end, we provide theorems apply_bucket
, apply_bucket_with_proof
,
toListModel_updateBucket
and toListModel_updateAllBuckets
, which do all of the heavy lifting in
a general way. The verification for each actual operation in Internal.WF
is then extremely
straightward, requiring only to plug in some results about lists. See for example the functions
containsₘ_eq_containsKey
and the section on eraseₘ
for prototypical examples of this technique.
Here is a summary of the steps required to add and verify a new operation:
- Write the executable implementation
- Implement the operation
AssocList.operation
on associative lists inInternal.AssocList.Basic
- Implement the operation
Raw₀.operation
onRaw₀
inInternal.Defs
- Implement the operation
Raw.operation
/DHashMap.operation
onDHashMap.Raw
andDHashMap
inDHashMap.Basic
. If your operation modifies the hash map, this will involve adding a new constructoroperation₀
toRaw.WF
. In that case, don't forget to provide a well-formedness lemmaRaw.WF.operation
(which differs fromRaw.WF.operation₀
in that it is about the operation onRaw
, not onRaw₀
(remember, these differ by a bounds check)). - Implement the operation
Raw.operation
/HashMap.operation
onHashMap.Raw
andHashMap
inHashMap.Basic
. - Implement the operation
Raw.operation
/HashSet.operation
onHashSet.Raw
andHashSet
inHashSet.Basic
- Write the list implementation
- Implement the operation
List.operation
on lists inInternal.List.Associative
- Connect the implementation on lists and associative lists in
Internal.AssocList.Lemmas
via a lemmaAssocList.operation_eq
.
- Write the model implementation
- Write the model implementation
Raw₀.operationₘ
inInternal.Model
- Prove that the model implementation is equal to the actual implementation in
Internal.Model
via a lemmaoperation_eq_operationₘ
.
- Verify the model implementation
- In
Internal.WF
, proveoperationₘ_eq_List.operation
(for access operations) orwfImp_operationₘ
andtoListModel_operationₘ
- Immediately deduce the corresponding results
operation_eq_List.operation
orwfImp_operation
/toListModel_operation
by combining the results from steps 3 and 4. - If you added a new constructor to
Raw.WF
earlier, fix up the proof ofRaw.WF.out
.
- Connect
Raw
andRaw₀
- Write
Raw.operation_eq
andRaw.operation_val
inInternal.Raw
.
- Prove
Raw₀
versions of user-facing lemmas
- State all of the user-facing lemmas that you want in
Internal.RawLemmas
. This will generally involve connecting the operation to ALL existing operations or deciding that there is nothing to be said about a certain pair. - Prove the corresponding results about lists in
List.Associative
- Use the tactic provided in
Internal.RawLemmas
to deduce theRaw₀
results from the list results. You will need to hook some of the results you proved in step 4 into the tactic. You might also have to prove that your list operation is invariant under permutation and add that to the tactic.
- State and prove the user-facing lemmas
- Restate all of your lemmas for
DHashMap.Raw
inDHashMap.RawLemmas
and prove them using the provided tactic after hooking in youroperation_eq
andoperation_val
from step 5. - Restate all of your lemmas for
DHashMap
inDHashMap.Lemmas
and prove them by reducing toRaw₀
. - Restate all of your lemmas for
HashMap.Raw
inHashMap.RawLemmas
and prove them by reducing toDHashMap.Raw
. - Restate all of your lemmas for
HashMap
inHashMap.Lemmas
and prove them by reducing toDHashMap
. - Restate all of your lemmas for
HashSet.Raw
inHashSet.RawLemmas
and prove them by reducing toHashMap.Raw
. - Restate all of your lemmas for
HashSet
inHashSet.Lemmas
and prove them by reducing toHashMap
.
This sounds like a lot of work (and it is if you have to add a lot of user-facing lemmas), but the framework is set up in such a way that each step is really easy and the proofs are all really short and clean.
Internal implementation detail of the hash map
Equations
- Std.DHashMap.Internal.toListModel buckets = buckets.toList.flatMap Std.DHashMap.Internal.AssocList.toList
Instances For
Internal implementation detail of the hash map
Equations
- Std.DHashMap.Internal.computeSize buckets = Array.foldl (fun (d : Nat) (b : Std.DHashMap.Internal.AssocList α β) => d + b.length) 0 buckets
Instances For
Internal implementation detail of the hash map
Equations
- Std.DHashMap.Internal.Raw₀ α β = { m : Std.DHashMap.Raw α β // 0 < m.buckets.size }
Instances For
Internal implementation detail of the hash map
Equations
- One or more equations did not get rendered due to their size.
Instances For
Copies all the entries from buckets
into a new hash map with a larger capacity.
Equations
- Std.DHashMap.Internal.Raw₀.expand ⟨data_2, hd⟩ = Std.DHashMap.Internal.Raw₀.expand.go 0 data_2 ⟨mkArray (data_2.size * 2) Std.DHashMap.Internal.AssocList.nil, ⋯⟩
Instances For
Inner loop of expand
. Copies elements source[i:]
into target
,
destroying source
in the process.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Internal implementation detail of the hash map
Equations
- Std.DHashMap.Internal.Raw₀.get ⟨{ size := size, buckets := buckets }, h⟩ a hma_2 = Std.DHashMap.Internal.AssocList.getCast a buckets[(Std.DHashMap.Internal.mkIdx buckets.size h (hash a)).val] hma_2
Instances For
Internal implementation detail of the hash map
Equations
- Std.DHashMap.Internal.Raw₀.get! ⟨{ size := size, buckets := buckets }, hm⟩ a = Std.DHashMap.Internal.AssocList.getCast! a buckets[(Std.DHashMap.Internal.mkIdx buckets.size hm (hash a)).val]
Instances For
Internal implementation detail of the hash map
Equations
- Std.DHashMap.Internal.Raw₀.map f ⟨{ size := size, buckets := buckets }, hm⟩ = ⟨{ size := size, buckets := Array.map (Std.DHashMap.Internal.AssocList.map f) buckets }, ⋯⟩
Instances For
Internal implementation detail of the hash map
Equations
- Std.DHashMap.Internal.Raw₀.Const.get! ⟨{ size := size, buckets := buckets }, h⟩ a = Std.DHashMap.Internal.AssocList.get! a buckets[(Std.DHashMap.Internal.mkIdx buckets.size h (hash a)).val]
Instances For
Internal implementation detail of the hash map
Equations
- One or more equations did not get rendered due to their size.
Instances For
Internal implementation detail of the hash map
Equations
- One or more equations did not get rendered due to their size.
Instances For
Internal implementation detail of the hash map
Equations
- Std.DHashMap.Internal.Raw₀.getKey! ⟨{ size := size, buckets := buckets }, hm⟩ a = Std.DHashMap.Internal.AssocList.getKey! a buckets[(Std.DHashMap.Internal.mkIdx buckets.size hm (hash a)).val]
Instances For
This is the actual well-formedness predicate for hash maps. Users should never need to interact
with this and should use WF
instead.
- buckets_hash_self : IsHashSelf m.buckets
Internal implementation detail of the hash map
- size_eq : m.size = (toListModel m.buckets).length
Internal implementation detail of the hash map
- distinct : List.DistinctKeys (toListModel m.buckets)
Internal implementation detail of the hash map