Zulip Chat Archive

Stream: lean4

Topic: deriving Hashable gives opaque definition

Scott Morrison (Nov 24 2023 at 04:54):

How do people feel about this:

import Std
open Std

structure Foo where
  n : Nat
deriving BEq, Hashable

-- This is fine, `mixHash` is opaque:
#eval mixHash 1 2 -- 16582581243253999004
example : mixHash 1 2 = 16582581243253999004 := rfl -- type mismatch

-- And this is the consequence of the `Hashable` deriver using `mixHash`:
#eval hash ({ n := 1 } : Foo) -- 18217991133577325693
example : hash ({ n := 1 } : Foo) = 18217991133577325693 := rfl -- type mismatch

-- But this seems bad?
#eval ((HashMap.empty : HashMap Foo Nat).find? { n := 1 }) -- none
example : ((HashMap.empty : HashMap Foo Nat).find? { n := 1 }) = none := rfl -- type mismatch

I understand that mixHash is not something one should be doing kernel computation on.

But perhaps the default deriving handler using mixHash then counts as a footgun? Or should I have known better?

Scott Morrison (Nov 24 2023 at 04:55):

(Same issue affects Lean.HashMap, of course.)

Arthur Adjedj (Nov 24 2023 at 07:21):

Scott Morrison said:

I understand that mixHash is not something one should be doing kernel computation on.

While not desirable, it would perhaps be better to have an internal computation of it for such occasions. Looking at the c++ code for the function, it should be possible to write the equivalent code in Lean as well.

Mario Carneiro (Nov 24 2023 at 18:02):

I'm not sure it would be a good idea to have hash computations be simulated in the kernel as part of the proof of such a simple fact. Just use simp lemmas please

Scott Morrison (Nov 24 2023 at 23:51):

Mario, saying "just use simp lemmas" doesn't really cut it here. I want to write a verified algorithm that uses a look up table of some kind, and I need that lookup to be fast. The algorithm won't be proof producing, instead I'll ask the kernel for the rfl proof of the output of the algorithm. Are you saying I should never use HashMap in that situation?

Shreyas Srinivas (Nov 25 2023 at 00:00):

You want efficient hash functions. Those are usually not trivial to prove correct. There could be an entire math project dedicated to proving hash functions "correct" for some suitable criterion.

François G. Dorais (Nov 25 2023 at 00:03):

There are long textbook chapters dedicated to that topic, there should also be (unfortunately longer, in most cases) verifications thereof!

Shreyas Srinivas (Nov 25 2023 at 00:08):

Textbook chapters? You could write several volumes of books on hash functions. One will end up calling on probability theory (balls and bins problems all the way), information theory, number theory, finite field polynomials, and what not.

Scott Morrison (Nov 25 2023 at 00:11):

@François G. Dorais , does your thumbs up above mean "I think HashMap (even with simple hash functions) should never be used in a verified algorithm"? Or the opposite?

Shreyas Srinivas (Nov 25 2023 at 00:14):

btw hash function computations without bit level optimisations can be ... super inefficient to say the least.

François G. Dorais (Nov 25 2023 at 00:14):

That thumbs up was almost certainly referring to the "just use simp lemmas" part, but I did enjoy other parts of your comment :)

François G. Dorais (Nov 25 2023 at 00:18):

I do think all algorithms (and proofs) should be verified in some way. Whether Lean is the right tool is debatable but that's up to us to make it work, or fail...

Shreyas Srinivas (Nov 25 2023 at 00:21):

That's true. I would like to see imperative programs written and verified with lean, something that the do notation doesn't seem to support. So I am just not sure our abstractions are there yet.

James Gallicchio (Nov 25 2023 at 00:49):

Scott Morrison said:

I want to write a verified algorithm that uses a look up table of some kind, and I need that lookup to be fast. The algorithm won't be proof producing, instead I'll ask the kernel for the rfl proof of the output of the algorithm.

(Is this a concrete thing you're trying to do? More context could be useful)

James Gallicchio (Nov 25 2023 at 00:54):

I do think it's a bit weird to want "running in the kernel but fast!" rather than normal kernel proofs or proof by reflection

Mario Carneiro (Nov 25 2023 at 02:51):

running hashmaps in the kernel will be very far from fast

Mario Carneiro (Nov 25 2023 at 02:52):

Scott Morrison said:

Mario, saying "just use simp lemmas" doesn't really cut it here. I want to write a verified algorithm that uses a look up table of some kind, and I need that lookup to be fast. The algorithm won't be proof producing, instead I'll ask the kernel for the rfl proof of the output of the algorithm. Are you saying I should never use HashMap in that situation?

Yes, that is what I am saying. HashMap does not have the right performance profile for this situation, it will not be O(1) lookup because the kernel doesn't understand arrays and will resort to linear scans over the array for each lookup

Mario Carneiro (Nov 25 2023 at 02:52):

Use a RBMap instead

Mac Malone (Nov 25 2023 at 02:55):

@Scott Morrison Like James, I am rather confused as to what you are trying to do here. It is possible to write verify HashMap without kernel computations about hash. In fact, I happen to be working on a project that is currently requiring me to do exactly this. I am almost close to done and the implementation of hash as never posed a problem once. Verifying HashMap itself works fine by just assuming LawfulHashable and showing LawfulHashable still seems possible with mixHash for most use cases I can think of.

Last updated: May 02 2025 at 03:31 UTC