Zulip Chat Archive

Stream: lean4

Topic: No Hashable instance for ByteArray?


Mac (Oct 18 2021 at 15:28):

In Lake, I am trying to take the hash of a binary file -- in particular, the lean executable (for #23) -- but this appears infeasible as there is no Hashable ByteArray instance. As such, I cannot hash the result of IO.FS.readBinFile. Is there some better approach I am missing?

Mac (Oct 18 2021 at 15:42):

My current workaround is:

def computeByteHash (bytes : ByteArray) :=
   bytes.toList.foldl (init := 1723) fun h b => mixHash h (hash b.toNat)

but I imagine this is quite inefficient

Leonardo de Moura (Oct 18 2021 at 17:49):

@Mac I will try to improve the ByteArray primitives.

Leonardo de Moura (Oct 19 2021 at 00:01):

@Mac I pushed new primitives for ByteArray and other goodies.
Now, the main loop of

def computeByteHash (bytes : ByteArray) :=
   bytes.foldl (init := 1723) fun h b => mixHash h (hash b.toNat)

is implemented by the following IR

def ByteArray.foldlMUnsafe.fold._at.computeByteHash._spec_1 (x_1 : @& obj) (x_2 : usize) (x_3 : usize) (x_4 : u64) : u64 :=
  let x_5 : u8 := USize.decEq x_2 x_3;
  case x_5 : u8 of
  Bool.false 
    let x_6 : u8 := ByteArray.uget x_1 x_2 ◾;
    let x_7 : obj := UInt8.toNat x_6;
    let x_8 : u64 := UInt64.ofNat x_7;
    dec x_7;
    let x_9 : u64 := mixHash x_4 x_8;
    let x_10 : usize := 1;
    let x_11 : usize := USize.add x_2 x_10;
    let x_12 : u64 := ByteArray.foldlMUnsafe.fold._at.computeByteHash._spec_1 x_1 x_11 x_3 x_9;
    ret x_12
  Bool.true 
    ret x_4

If the performance is not good enough, I see two options for improving it:
1- We add primitives for reading UInt64 (and UInt32, UInt16) from a ByteArray
2- We implement the hash function for ByteArray in C.

Option 1 is my preferred choice right now.

Mac (Oct 19 2021 at 00:14):

@Leonardo de Moura cool!

The one thing that appears missing to me in that solution is a Hashable instance forUInt8 (and UInt16, UInt31, etc.) so that the byte isn't converted to/from a Nat for hashing.

Otherwise, the performance is fine and, as this hashing approach is just a stop-gap solution until we have a proper cryptographic hash function for Lean, it may not be worth it to spend much time on this (though I imagine these improvements will also be useful elsewhere).

Leonardo de Moura (Oct 19 2021 at 00:15):

@Mac I will add the missing instances.

Leonardo de Moura (Oct 19 2021 at 00:22):

@Mac Pushed it.

Leonardo de Moura (Oct 19 2021 at 00:22):

def ByteArray.foldlMUnsafe.fold._at.computeByteHash._spec_1 (x_1 : @& obj) (x_2 : usize) (x_3 : usize) (x_4 : u64) : u64 :=
  let x_5 : u8 := USize.decEq x_2 x_3;
  case x_5 : u8 of
  Bool.false 
    let x_6 : u8 := ByteArray.uget x_1 x_2 ◾;
    let x_7 : u64 := UInt8.toUInt64 x_6;
    let x_8 : u64 := mixHash x_4 x_7;
    let x_9 : usize := 1;
    let x_10 : usize := USize.add x_2 x_9;
    let x_11 : u64 := ByteArray.foldlMUnsafe.fold._at.computeByteHash._spec_1 x_1 x_10 x_3 x_8;
    ret x_11
  Bool.true 
    ret x_4

Mac (Oct 19 2021 at 00:26):

@Leonardo de Moura cool, thanks!


Last updated: Dec 20 2023 at 11:08 UTC