Zulip Chat Archive

I've been able to prove some theorems about HashSets. But there is one theorem I can't work out how to prove, without which I can't prove some further theorems, namely: ∀(s : HashSet k), s.isEmpty = true → s = ∅

import Std.Data.HashSet
import Mathlib
open Std

variable {k : Type u} {v : Type w} [instBeq : BEq k] [instHash : Hashable k]


theorem HashSet.insert_not_empty {item : k} [LawfulHashable k] [EquivBEq k] : ∀ (m : HashSet k), m.insert item ≠ ∅ := by
  intro m h
  apply congr_fun ∘ congr_arg HashSet.contains at h
  specialize h item
  simp at h

theorem HashSet.empty_isEmpty_eq_true : (∅ : HashSet k).isEmpty = true := by
  simp

theorem HashSet.isEmpty_false_neq_empty { s : HashSet k} : s.isEmpty = false → s ≠ ∅ := by
  intro h hempty
  have : (∅ : HashSet k).isEmpty = true := HashSet.empty_isEmpty_eq_true
  rw [hempty] at h
  contradiction

theorem HashSet.isEmpty_true_eq_empty : ∀ (s : HashSet k), s.isEmpty = true → s = ∅ := by
  intro s h
  if hh : (s = ∅) then
    exact hh
  else
    exfalso
    sorry

theorem HashSet.eq_empty_isEmpty_true { s : HashSet k} : s = ∅ → s.isEmpty = true := by
  intro h
  rw [h]
  exact HashSet.isEmpty_empty

theorem HashSet.nempty_isEmpty_eq_false {s : HashSet k} : s ≠ ∅ → s.isEmpty = false := by
  intro h
  induction hsize : s.size with
  | succ n ih =>
    have ssize_ne_zero : s.size ≠ 0 := by simp [hsize]
    have ssize_zero_false : (s.size == 0) = false := by simp [ssize_ne_zero]
    have : s.isEmpty = (s.size == 0) := HashSet.isEmpty_eq_size_eq_zero
    rw [ssize_zero_false] at this
    exact this

  | zero =>
    have : s.isEmpty = (s.size == 0) := HashSet.isEmpty_eq_size_eq_zero
    contrapose h
    simp
    simp at h
    exact HashSet.isEmpty_true_eq_empty s h

theorem HashSet.only_nempty_isEmpty_eq_false {s : HashSet k} : s ≠ ∅ ↔ s.isEmpty = false := by
  apply Iff.intro
  exact HashSet.nempty_isEmpty_eq_false
  exact HashSet.isEmpty_false_neq_empty

theorem HashSet.only_empty_isEmpty { s : HashSet k} : s = ∅ ↔ s.isEmpty = true := by
  apply Iff.intro
  exact HashSet.eq_empty_isEmpty_true
  exact HashSet.isEmpty_true_eq_empty s

Markus Himmel (May 23 2025 at 11:16):

It's not true. s := HashSet.emptyWithCapacity 1000 satisfies s.isEmpty = true but s ≠ ∅. ∅ is defined as HashSet.emptyWithCapacity 8.

Markus Himmel (May 23 2025 at 11:17):

You can use docs#Std.ExtHashSet if you need a hash set whose equality coincides with equality as finite sets. There we have docs#Std.ExtHashSet.isEmpty_iff .

aron (May 23 2025 at 13:55):

Oh ok. What's the difference between a regular HashSet and an ExtHashSet then? Do they behave more or less the same at runtime?

From the docs#Std.ExtHashSet:

In contrast to regular hash sets, [Std.ExtHashSet](https://leanprover-community.github.io/mathlib4_docs/Std/Data/ExtHashSet/Basic.html#Std.ExtHashSet) offers several extensionality lemmas and therefore has more lemmas about equality of hash maps. This however also makes it lose the ability to iterate freely over hash sets.

Is this the only difference? more equality lemmas and not being able to iterate over items one at a time? does the latter mean you can't fold over them either?

Markus Himmel (May 23 2025 at 13:56):

Yes, folding over them isn't possible at the moment. In principle it's possible to fold as long as you can prove that the result of your fold doesn't depend on the order you encounter the elements in, but we don't provide this function at the moment.

Markus Himmel (May 23 2025 at 13:57):

At runtime, HashSet and ExtHashSet are the same thing.

aron (May 23 2025 at 14:00):

mm ok. so what's the actual difference between them? now I feel silly having spent so much time trying to work with HashSets when there was an almost identical version of it I didn't know about and don't know why I'd pick one over the other :confused:

Eric Wieser (May 23 2025 at 14:03):

The difference is what = means

Markus Himmel (May 23 2025 at 14:04):

ExtHashSet is the quotient of HashSet by equivalence of hash sets, where two hash sets are considered equivalent if they represent the same finite set.

A hash set is internally represented as an array of buckets. An array of 8 empty buckets and an array of 1024 empty buckets both represent an empty set. They are not equal as HashSets, but they are equal as ExtHashSets.

Markus Himmel (May 23 2025 at 14:05):

All (safe) functions in Lean must respect equality, so if you use ExtHashSet, you get more equalities, but you cannot have functions that are able to detect the internal representation of the hash set in any way. All kinds of iteration are examples of this, because the iteration order depends on the exact way the data is represented.

aron (May 23 2025 at 14:36):

I've just realised that ExtHashSet doesn't have a toList or toArray function, presumably for the same reason that those would reveal its internal ordering. Which also means I can't make a Hashable instance for it either :confused: I don't think this is going to be usable for my needs unfortunately

aron (May 23 2025 at 14:39):

Markus Himmel said:

because the iteration order depends on the exact way the data is represented.

Presumably the internal ordering would depend on the hashed value, so couldn't the iteration order similarly use that hashed value ordering? since ExtHashSet already requires a Hashable instance, could that not be used to implement fold/toList/toArray/etc?

Mario Carneiro (May 23 2025 at 16:24):

no, the internal ordering in hash tables is not a pure function of the hashed value, it also has to do with insertion order

Mario Carneiro (May 23 2025 at 16:25):

if you have two elements with the same hash and which compare unequal, there is nothing in a hashmap's available typeclasses to be able to consistently order those two elements

Mario Carneiro (May 23 2025 at 16:26):

I think there should be a function ExtHashSet.toMultiset but this will have to be in mathlib

Mario Carneiro (May 23 2025 at 16:30):

There is another alternative, probably much simpler for your application, which is to use the extensional equality relation on HashSet spelled ~m but without actually doing the quotient to get ExtHashSet. So the theorem becomes ∀ (s : HashSet k), s.isEmpty = true → s ~m ∅ which is docs#Std.HashSet.equiv_empty_iff_isEmpty

Mario Carneiro (May 23 2025 at 16:33):

in other words, just never talk about the = relation on HashSets, use ~m where you would have used = normally and then it's as if you are using ExtHashSet but you can still call fold later

aron (May 23 2025 at 17:16):

Mm ok interesting! That might do the trick. But hm this ~m syntax doesn't seem to work for me?

import Std.Data.HashSet
import Mathlib
open Std

theorem HashSet.isEmpty_true_eq_empty : ∀ (s : HashSet k), s.isEmpty = true → (s ~m ∅) := by sorry

aron (May 23 2025 at 17:18):

ah nvm it works when I open the Std.HashSet namespace

import Std.Data.HashSet
import Mathlib
open Std
open HashSet

theorem HashSet.isEmpty_true_eq_empty : ∀ (s : HashSet k), s.isEmpty = true → (s ~m ∅) := by sorry

Last updated: Feb 28 2026 at 14:05 UTC

leanprover-community / mathlib