Zulip Chat Archive

Stream: new members

Topic: Proof that HashSets don't have any overlap

aron (Apr 09 2025 at 18:11):

I'm making a data structure to store disjoint sets of keys that point to values:

import Std.Data.HashMap
import Std.Data.HashSet
import Mathlib
open Std

variable {k v : Type*} [instBeq : BEq k] [instHash : Hashable k]

instance : Hashable (HashSet k) where
  hash := hash ∘ HashSet.toArray


abbrev DisjointSetMap' (k : Type u) (v : Type u') [instBeq : BEq k] [instHash : Hashable k] :=
  HashMap (HashSet k) v


structure DisjointSetMap (k : Type*) [BEq k] [Hashable k] (v : Type*) where
  /-- This maps from _sets_ of `k`s to `v`s -/
  innerMap : DisjointSetMap' k v
  /-- How to merge values together when key sets are merged -/
  mergeFn : v → v → (keysNew : HashSet k) → v

However for this data structure to be correct I want to include a proof in the DisjointSetMap structure that:

none of the hashsets are empty
no two hashsets have any overlap. In other words, that for every single key there is at most one key set that contains that key

Here's the full code, including my attempt at a proof so far

import Std.Data.HashMap
import Std.Data.HashSet
import Mathlib
open Std


variable {k v : Type*} [instBeq : BEq k] [instHash : Hashable k]


instance : Hashable (HashSet k) where
  hash := hash ∘ HashSet.toArray




def Std.HashSet.intersection (a : HashSet k) (b : HashSet k) : HashSet k :=
  a.fold (init := ∅) fun s x => if b.contains x then s.insert x else s

def Std.HashSet.intersection_mem_both {k : Type u} [BEq k] [Hashable k] [EquivBEq k] [LawfulHashable k] (a : HashSet k) (b : HashSet k) : ∀ v ∈ (a.intersection b), v ∈ a ∧ v ∈ b := by
  rw [intersection]
  suffices ∀ a : List k, ∀ q : HashSet k, ∀ (v : k), v ∈ List.foldl (fun s x => if b.contains x = true then s.insert x else s) q a → v ∈ q ∨ a.contains v ∧ v ∈ b by
    simpa [HashSet.fold_eq_foldl_toList, HashSet.mem_iff_contains] using this a.toList ∅
  intro a q
  induction a generalizing q with
  | nil => simp
  | cons hd tl ih =>
    simp
    intro v hv
    obtain (h|h) := ih _ _ hv
    · split at h
      · next ht =>
        rw [HashSet.mem_insert] at h
        obtain (h|h) := h
        · refine Or.inr ⟨Or.inl (BEq.symm h), ?_⟩
          rwa [HashSet.mem_iff_contains, ← HashSet.contains_congr h]
        · exact Or.inl h
      · exact Or.inl h
    · exact Or.inr ⟨Or.inr h.1, h.2⟩




/-- Gets the items in `a` that are not in `b` -/
def Std.HashSet.diff (a : HashSet k) (b : HashSet k) : HashSet k :=
  a.filter (not ∘ b.contains)







abbrev DisjointSetMap' (k : Type u) (v : Type u') [instBeq : BEq k] [instHash : Hashable k] :=
  HashMap (HashSet k) v





inductive NoEmptyKeys (map : DisjointSetMap' k v) : Prop where
  | mk
    (keySet : HashSet k)
    (keySetInMap : keySet ∈ map)
    (keySetNotEmpty : keySet ≠ ∅)
    : NoEmptyKeys map

/-- @TODO: Need help with this 👇 -/
inductive NoKeyOverlap (map : DisjointSetMap' k v) : Prop where
  | mk
    (allPairsDisjoint : ∀ (keySet1 keySet2 : HashSet k),
                keySet1 ∈ map →
                keySet2 ∈ map →
                keySet1 ≠ ∅ →
                keySet2 ≠ ∅ →
                keySet1 ≠ keySet2 →
                ∀ (key : k), key ∈ keySet1 → key ∉ keySet2 →
                NoKeyOverlap map)
    : NoKeyOverlap map



/-- The vacuously true case for `NoKeyOverlap` where the map is empty. -/
def NoKeyOverlap.empty : NoKeyOverlap (HashMap.empty : DisjointSetMap' k v) :=
  by
  constructor
  intro keySet1 keySet2 h1
  simp at h1







structure DisjointSetMap (k : Type*) [BEq k] [Hashable k] (v : Type*) where
  /-- This maps from _sets_ of `k`s to `v`s -/
  innerMap : DisjointSetMap' k v
  /-- A proof that the inner map has no key overlap -/
  noKeyOverlap : NoKeyOverlap innerMap
  /-- How to merge values together when key sets are merged -/
  mergeFn : v → v → (keysNew : HashSet k) → v









namespace DisjointSetMap





def getOuterMapFromInnerMap (innerMap : HashMap (HashSet k) v) : HashMap k (HashSet k × v) :=
  innerMap
  |>.fold
    (fun (map : HashMap k (HashSet k × v)) (keySet : HashSet k) value =>
      keySet.fold (fun map' key => map'.insert key (keySet, value)) map)
    HashMap.empty



/-- A map with a separate key for each key in `d.innerMap`'s key sets.

A projection from the inner map to a map of individual keys to key sets _and_ values; so we can:
  a) find items in the map by a single key by doing `d.outerMap[key]`
  b) which then returns the full set of keys that the single key belongs to, as well as the value stored for that set. The values in this map have type `HashSet k × v`
-/
def outerMap (d : DisjointSetMap k v) : HashMap k (HashSet k × v) :=
  getOuterMapFromInnerMap d.innerMap


def empty (mergeFn : v → v → (keysNew : HashSet k) → v) : DisjointSetMap k v :=
  { innerMap := HashMap.empty
    noKeyOverlap := NoKeyOverlap.mk ∅ ∅ (∅ ∈ innerMap) (∅ ∈ innerMap) (by simp) (by simp)
    mergeFn }




def addSet (d : DisjointSetMap k v) (newKeySet : HashSet k) (val : v) : DisjointSetMap k v :=
  let overlappingSets :=
    d.innerMap.fold (init := (newKeySet, [])) fun acc currSet value =>
      let intersection := acc.1.intersection currSet
      if intersection == ∅ then
        -- The no overlap case, so there's nothing to add here
        acc
      else
        -- If there is some overlap, we snowball the current set in the accumulated set, and include the value of the newly merged set in the list of values to merge later
        let union := acc.1.union currSet
        (union, value :: acc.2)


  match overlappingSets with
  | (_, []) =>
    -- No overlaps, just insert the new set with its value
    { d with innerMap := d.innerMap.insert newKeySet val }

  | (mergedSet, valuesToMerge) =>
    -- Merge all overlapping sets and their values
    let mergedValue :=
      valuesToMerge.foldl (init := val)
        (fun acc value => d.mergeFn acc value mergedSet)

    -- Remove old sets
    let newInnerMap :=
      d.innerMap.fold (init := HashMap.empty) fun newInnerMap currSet _ =>
        let hasOverlap := currSet.any newKeySet.contains
        if hasOverlap then
          newInnerMap.erase currSet
        else
          newInnerMap

    -- And insert the new snowballed merged set with its combined value
    { d with innerMap := newInnerMap.insert mergedSet mergedValue }




/-- This merges multiple sets without adding a new value. If none of the keys are in the map then this is a no-op because we have no value to set for it! -/
def union (d : DisjointSetMap k v) (keysToMerge : HashSet k) : DisjointSetMap k v :=
  let overlappingSets :=
    d.innerMap.fold (init := (keysToMerge, [])) fun acc currSet value =>
      let intersection := acc.1.intersection currSet
      if intersection == ∅ then
        -- The no overlap case, so there's nothing to add here
        acc
      else
        -- If there is some overlap, we snowball the current set in the accumulated set, and include the value of the newly merged set in the list of values to merge later
        let union := acc.1.union currSet
        (union, value :: acc.2)


  match overlappingSets with
  | (_, []) =>
    -- No overlaps, none of the keys are in the map so we do nothing because we have no value to set for it
    d

  | (mergedSet, firstVal :: restValsToMerge) =>
    -- Merge all overlapping sets and their values
    let mergedValue :=
      restValsToMerge.foldl (init := firstVal)
        (fun acc value => d.mergeFn acc value mergedSet)

    -- Remove old sets
    let newInnerMap :=
      d.innerMap.fold (init := HashMap.empty) fun newInnerMap currSet _ =>
        let hasOverlap := currSet.any keysToMerge.contains
        if hasOverlap then
          newInnerMap.erase currSet
        else
          newInnerMap

    -- And insert the new snowballed merged set with its combined value
    { d with innerMap := newInnerMap.insert mergedSet mergedValue }


def find? (d : DisjointSetMap k v) (key : k) : Option v :=
  d.outerMap[key]? |>.map Prod.snd



def find (d : DisjointSetMap k v) (key : k) (h : key ∈ d.outerMap) : v :=
  d.outerMap[key]'h |>.2

end DisjointSetMap

but I don't think I'm going down the right path here. Is there a better and ideally simpler way to prove this property?

Matt Diamond (Apr 10 2025 at 01:40):

why are your predicates defined as inductives? why not define them like this?

def NoEmptyKeys (map : DisjointSetMap' k v) : Prop :=
∀ keySet ∈ map, ¬keySet.isEmpty

def NoKeyOverlap (map : DisjointSetMap' k v) : Prop :=
∀ (keySet1 keySet2 : HashSet k), keySet1 ∈ map → keySet2 ∈ map → (keySet1.intersection keySet2).isEmpty

Matt Diamond (Apr 10 2025 at 01:41):

also, notice that NoKeyOverlap doesn't need to check if the sets are nonempty because NoEmptyKeys already takes care of that (assuming you include NoEmptyKeys in the DisjointSetMap structure)

aron (Apr 10 2025 at 11:54):

Matt Diamond said:

why are your predicates defined as inductives?

Good point – because I needed them to be inductives for this other data structure I made, but you're right that since this property isn't recursive it can just be a regular function

aron (Apr 10 2025 at 12:00):

Your definition of NoKeyOverlap looks good, but I probably also need a proof that keySet1 and keySet2 are not the same key set, right?

So something like:

def NoKeyOverlap (map : DisjointSetMap' k v) : Prop :=
  ∀ (keySet1 keySet2 : HashSet k),
  keySet1 ∈ map →
  keySet2 ∈ map →
  keySet1 ≠ keySet2 → -- <-- distinctness proof
  (keySet1.intersection keySet2).isEmpty

Johannes Tantow (Apr 10 2025 at 12:03):

Why would you need that? If they are equal their intersection won't be empty (as long as not both are empty)

@Johannes Tantow right but if you don't require keySet1 and keySet2 to be different sets then you could never construct a valid proof because you could not prove that (keySet1.intersection keySet2).isEmpty for all keySets – namely when keySet1 = keySet2

Johannes Tantow (Apr 10 2025 at 16:37):

I have missed the forall true. I thought it is just a property for two general hash sets

Johannes Tantow (Apr 10 2025 at 16:41):

In that case, you are right and can ignore everything I said

Johannes Tantow (Apr 10 2025 at 16:55):

Though perhaps map.keys.Pairwise (fun x y => (x.intersection y).isEmpty) might be more succint, but that is probably not an easier proof.

Last updated: May 02 2025 at 03:31 UTC

leanprover-community / mathlib