Zulip Chat Archive

Stream: new members

Topic: How to prove relationship between two HashMaps?

aron (Feb 02 2025 at 19:09):

I've got a function that takes as input a map containing Set ks as keys and outputs a separate map which breaks out every k as its own key, with the original Set k as the value.

How can I prove the relationship between these two maps, that any value in the output map is guaranteed to exist as a key in the input map?

This is what I've got so far:

import Lean.Data.RBTree
import Std.Data.HashMap
import Std.Data.HashSet
import Mathlib

instance [BEq k] [Hashable k] : Hashable (Std.HashSet k) where
  hash a := Std.HashSet.toArray a |> hash


def getOuterMapFromInnerMap {k v : Type u} [BEq k] [Hashable k] (innerMap : Std.HashMap (Std.HashSet k) v) : Std.HashMap k (Std.HashSet k) :=
  innerMap.keys
  |> List.foldl
    (fun (map : Std.HashMap k (Std.HashSet k)) (keySet : Std.HashSet k) =>
      keySet.fold (fun map' key => map'.insert key keySet) map)
    Std.HashMap.empty


theorem relationShipOuterToInnerMap {k v : Type u} [BEq k] [Hashable k] (innerMap : Std.HashMap (Std.HashSet k) v) : innerMap.keys |> List.Forall (fun keySet => keySet.toList |> List.Forall (fun key => key ∈ getOuterMapFromInnerMap innerMap)) :=
  by
  simp [getOuterMapFromInnerMap]
  simp [Std.HashSet.fold, Std.HashMap.fold]
  simp [Std.DHashMap.fold]
  -- ?

I don't know how to prove the theorem, or even whether I've formulated the theorem in the best way

Johannes Tantow (Feb 03 2025 at 07:44):

You don't appear to use RBMaps at all, but instead use HashMaps. Foldl for HashMaps is a bit tricky to prove as there are currently no lemmas yet in Std.
What is supposed to happen if a value of k appears in multiple HashSets that are used as keys?

aron (Feb 03 2025 at 17:13):

Oops yeah you're right I meant to write HashMaps.

What is supposed to happen if a value of k appears in multiple HashSets that are used as keys?

I'm intending to use this to store disjoint sets, so any given k should only ever appear in one set; although yeah I should probably create a refinement type to carry the proof of that fact along with the map.

Johannes Tantow (Feb 04 2025 at 08:29):

I think proving this is rather difficult, because the way the hashMaps are designed. You might need to go down to the AssocList model that is below all the HashMap proofs, which gets tedious.

Johannes Tantow (Feb 04 2025 at 08:31):

If you want to use disjoint sets perhaps a different datastructure would be simpler, namely the union-find or disjoint-set datastructure. There is a description of how to proving the correctness here (in Coq): https://usr.lmf.cnrs.fr/%7Ejcf/ftp/publis/puf-wml07.pdf . Maybe that is simpler depending on your use case

Markus Himmel (Feb 04 2025 at 08:32):

In fact, there is a verified Union Find available in Batteries as docs#Batteries.UnionFind

Johannes Tantow (Feb 04 2025 at 08:37):

Even better :grinning: Forgot to check there

Markus Himmel (Feb 04 2025 at 08:38):

Unfortunately there is no theory for toList and fold on Std.HashMap at the moment, so proving anything about them will be very difficult. This is on the Lean FRO's TODO list for this year, but it will take a while until we get to it. If anyone is interested in contributing results in this direction to the Lean standard libray, feel free to contact me and I'll help you get started.

Johannes Tantow (Feb 04 2025 at 08:41):

The result would be m.fold f init = m.toList.foldr/foldl f init (don't know which direction works) or is there anything more related to fold? I could try that in the next days

Johannes Tantow (Feb 04 2025 at 08:41):

Though I guess the proof above would still be hard

Markus Himmel (Feb 04 2025 at 08:43):

Johannes Tantow said:

The result would be m.fold f init = m.toList.foldr/foldl f init (don't know which direction works) or is there anything more related to fold? I could try that in the next days

Yes, but this is only useful if you also have lemmas about toList, which are also missing

Johannes Tantow (Feb 04 2025 at 08:46):

toList would require similar lemmas/proofs to keys I guess. I see also that my proposed result does not type check as the input the function is curried in fold and uncurried in the foldl.

aron (Feb 04 2025 at 09:08):

Markus Himmel said:

In fact, there is a verified Union Find available in Batteries as docs#Batteries.UnionFind

Oh yeah – but... I don't quite understand that type. It has no type parameters for what sets of things it stores? and besides, it doesn't look like it allows for using its sets as keys to point to values?

Markus Himmel (Feb 04 2025 at 09:14):

Yes, you would have to map the elements of type k you care about to (small) natural numbers in order to use this.

aron (Feb 04 2025 at 09:15):

Johannes Tantow said:

If you want to use disjoint sets perhaps a different datastructure would be simpler, namely the union-find or disjoint-set datastructure.

This is effectively what I'm trying to do; here's what my data structure looks like so far:

structure DisjointSet (k : Type u) [BEq k] [Hashable k] (valType : Type u) where
  /-- This maps from sets of `k`s to `v`s -/
  innerMap : Std.HashMap (Std.HashSet k) valType
  /-- This is a projection of the inner map, into a map of keys to sets of keys -/
  outerMap := getOuterMapFromInnerMap innerMap
  /-- The merge function allows for conditionally not adding an item into the list, by returning a `None` -/
  mergeFn : valType → valType → (keysNew : Std.HashSet k) → Option valType

aron (Feb 04 2025 at 09:15):

Markus Himmel said:

Yes, you would have to map the elements of type k you care about to (small) natural numbers in order to use this.

Hmm I don't love that :oh_no:

aron (Feb 04 2025 at 09:17):

also because it still won't let me associate values with the sets, so I'd have to store the values separately and check whether they need to be merged every time I do a union operation; so I may as well make my own data structure to store the sets anyway

Johannes Tantow (Feb 04 2025 at 09:20):

I would think that you don't have to use hash maps of sets in order to implement union find, which would simplify the proofs. Also := is only a default value, best to leave it out of the structure

aron (Feb 04 2025 at 09:25):

I wanted to use hashmaps because I figured it'll be simpler to implement than a traditional union-find, I don't really care about performance atm.

Also := is only a default value, best to leave it out of the structure

Ah I didn't know that, I thought it was syntax for a computed field. Does Lean have a computed field feature for records types?

Actually I suppose if I name the function DisjointSet.outerMap it will automatically let me do d.outerMap on any DisjointSet d so I don't need it to be part of the structure definition!

aron (Feb 04 2025 at 09:52):

I just came up with a pretty simple solution to my problem. I can just include the innerMap's keys' values in the outerMap along with each key set. That way I can always retrieve the value in a single hop directly from the outer map without needing to use a proof that a value in outerMap exists as a key of innerMap:

def getOuterMapFromInnerMap {k v : Type u} [BEq k] [Hashable k] (innerMap : Std.HashMap (Std.HashSet k) v) : Std.HashMap k (Std.HashSet k × v) :=
  innerMap
  |>.fold
    (fun (map : Std.HashMap k (Std.HashSet k × v)) (keySet : Std.HashSet k) value =>
      keySet.fold (fun map' key => map'.insert key (keySet, value)) map)
    Std.HashMap.empty

Aaron Liu (Feb 04 2025 at 12:02):

aron said:

Ah I didn't know that, I thought it was syntax for a computed field. Does Lean have a computed field feature for records types?

I think I saw a computed field once in docs#Lean.Expr

Last updated: May 02 2025 at 03:31 UTC