Zulip Chat Archive

Stream: general

Topic: Lemmas about HashSet


Michiel Huttener (Jul 29 2025 at 13:59):

In some project, I'd like to use HashSets and show that the program satisfies some properties. The proof boils down to the two lemmas below:

import Std.Data.HashSet

universe u

variable {α : Type u} [BEq α] [Hashable α]

namespace Std.HashSet

theorem mem_union_iff {a : α} (m₁ m₂ : HashSet α) :
    a  m₁.union m₂  a  m₁  a  m₂ := sorry

theorem contains_toArray_iff {a : α} (m : HashSet α) :
    m.contains a  m.toArray.contains a := sorry

end Std.HashSet

Currently, I'm unable to show this. There seems to be a stack of data structures involved (HashSet, HashMap and DHashMap) which makes things harder, and I also haven't found a good way to show results about .fold, which both .union and .toArray use. I'd appreciate all feedback/guidance to show this!

Secondly, I guess lemmas like these belong upstream in Lean itself, and they're probably on someone's to-do list. I'd be more than happy to contribute, if someone is willing to give some pointers/insights.

pandaman (Jul 29 2025 at 14:06):

At the current state, Std.HashSet.fold_eq_foldl_toList seems useful since it allows us to use induction over lists instead of HashSet. I'm curious about proper ways to prove, too.

Johannes Tantow (Jul 29 2025 at 14:37):

import Std.Data.HashSet

universe u

variable {α : Type u} [BEq α] [Hashable α] [EquivBEq α] [LawfulHashable α]

namespace Std.HashSet

theorem mem_union_iff {a : α} (m₁ m₂ : HashSet α) :
    a  m₁.union m₂  a  m₁  a  m₂ := by
    simp [union, fold_eq_foldl_toList]
    rw [mem_iff_contains (m:= m₂),  contains_toList]
    generalize m₂.toList = l
    induction l generalizing m₁ with
    | nil => simp
    | cons hd tl ih =>
      simp [ih]
      rw [BEq.comm]
      grind

This solves the first lemma if you can provide the two instances I added. Without these, proofs about Hash objects arent really possible.

Johannes Tantow (Jul 29 2025 at 14:41):

The second one is more difficult as keysArray is not verified and has to be done from the basics.

Johannes Tantow (Jul 29 2025 at 14:58):

The general proccess was a bit described recently here: #mathlib4 > ExtTreeMap - Filter / MergeWith

Markus Himmel (Jul 29 2025 at 15:25):

There is already an open PR that redefines union to use a better implementation and gets started with the verification: lean4#7823. The PR has stalled, and I expect that @Paul Lezeau would be happy if you were interested in taking over that PR, @Michiel Huttener.

Michiel Huttener (Jul 29 2025 at 18:55):

Great; I'll look into it!

Johannes Tantow (Aug 18 2025 at 09:52):

The PR lean#9685 is currently being merged into the lean4 repo, which adds theorems about toArray so you should be able to use it to prove the other theorem either in the next nightly or in a later release.


Last updated: Dec 20 2025 at 21:32 UTC