Zulip Chat Archive
Stream: lean4
Topic: No Hashable instance for ByteArray?
Mac (Oct 18 2021 at 15:28):
In Lake, I am trying to take the hash of a binary file -- in particular, the lean executable (for #23) -- but this appears infeasible as there is no Hashable ByteArray
instance. As such, I cannot hash
the result of IO.FS.readBinFile
. Is there some better approach I am missing?
Mac (Oct 18 2021 at 15:42):
My current workaround is:
def computeByteHash (bytes : ByteArray) :=
bytes.toList.foldl (init := 1723) fun h b => mixHash h (hash b.toNat)
but I imagine this is quite inefficient
Leonardo de Moura (Oct 18 2021 at 17:49):
@Mac I will try to improve the ByteArray
primitives.
Leonardo de Moura (Oct 19 2021 at 00:01):
@Mac I pushed new primitives for ByteArray
and other goodies.
Now, the main loop of
def computeByteHash (bytes : ByteArray) :=
bytes.foldl (init := 1723) fun h b => mixHash h (hash b.toNat)
is implemented by the following IR
def ByteArray.foldlMUnsafe.fold._at.computeByteHash._spec_1 (x_1 : @& obj) (x_2 : usize) (x_3 : usize) (x_4 : u64) : u64 :=
let x_5 : u8 := USize.decEq x_2 x_3;
case x_5 : u8 of
Bool.false →
let x_6 : u8 := ByteArray.uget x_1 x_2 ◾;
let x_7 : obj := UInt8.toNat x_6;
let x_8 : u64 := UInt64.ofNat x_7;
dec x_7;
let x_9 : u64 := mixHash x_4 x_8;
let x_10 : usize := 1;
let x_11 : usize := USize.add x_2 x_10;
let x_12 : u64 := ByteArray.foldlMUnsafe.fold._at.computeByteHash._spec_1 x_1 x_11 x_3 x_9;
ret x_12
Bool.true →
ret x_4
If the performance is not good enough, I see two options for improving it:
1- We add primitives for reading UInt64
(and UInt32
, UInt16
) from a ByteArray
2- We implement the hash function for ByteArray in C.
Option 1 is my preferred choice right now.
Mac (Oct 19 2021 at 00:14):
@Leonardo de Moura cool!
The one thing that appears missing to me in that solution is a Hashable
instance forUInt8
(and UInt16
, UInt31
, etc.) so that the byte isn't converted to/from a Nat
for hashing.
Otherwise, the performance is fine and, as this hashing approach is just a stop-gap solution until we have a proper cryptographic hash function for Lean, it may not be worth it to spend much time on this (though I imagine these improvements will also be useful elsewhere).
Leonardo de Moura (Oct 19 2021 at 00:15):
@Mac I will add the missing instances.
Leonardo de Moura (Oct 19 2021 at 00:22):
@Mac Pushed it.
Leonardo de Moura (Oct 19 2021 at 00:22):
def ByteArray.foldlMUnsafe.fold._at.computeByteHash._spec_1 (x_1 : @& obj) (x_2 : usize) (x_3 : usize) (x_4 : u64) : u64 :=
let x_5 : u8 := USize.decEq x_2 x_3;
case x_5 : u8 of
Bool.false →
let x_6 : u8 := ByteArray.uget x_1 x_2 ◾;
let x_7 : u64 := UInt8.toUInt64 x_6;
let x_8 : u64 := mixHash x_4 x_7;
let x_9 : usize := 1;
let x_10 : usize := USize.add x_2 x_9;
let x_11 : u64 := ByteArray.foldlMUnsafe.fold._at.computeByteHash._spec_1 x_1 x_10 x_3 x_8;
ret x_11
Bool.true →
ret x_4
Mac (Oct 19 2021 at 00:26):
@Leonardo de Moura cool, thanks!
Last updated: Dec 20 2023 at 11:08 UTC