Zulip Chat Archive

Stream: lean4

Topic: dependent array output size


Anders Christiansen Sørby (Oct 12 2021 at 14:16):

I'm trying to add type safety to a hasher output

/- Ideally I want to guarantee the correct output size
 def Blake3Hash : Type := { r : Array UInt8 // r.size = BLAKE3_OUT_LEN }
But for now this seems to work
-/
def Blake3Hash : Type := Array UInt8

def hash (input : ByteArray) : IO Blake3Hash := do
  let hasher  initHasher
  let hasher  hasherUpdate hasher input (USize.ofNat input.size)
  let output  hasherFinalize hasher (USize.ofNat BLAKE3_OUT_LEN)
  return output.data

How do I prove / assert that output data has length BLAKE3_OUT_LEN? In the lean source I see several places that a constrined type like { r : Array UInt8 // r.size = BLAKE3_OUT_LEN } can be filled by a tuple of the value and a tactic.

Something like:
return (output.data, by erw [Array.size]; apply data.property)
But this (unsprisingly) leads to a hole:

Blake3> Blake3.lean:69:2: error: application type mismatch
Blake3>   pure (output.data, ?m.806)
Blake3> argument
Blake3>   (output.data, ?m.806)
Blake3> has type
Blake3>   Array UInt8 × ?m.805 : Type ?u.802
Blake3> but is expected to have type
Blake3>   Blake3Hash : Type
Blake3> Blake3.lean:69:31: error: tactic 'rewrite' failed, equality or iff proof expected
Blake3>   Nat
Blake3> input : ByteArray
Blake3> hasher : Blake3Hasher
Blake3> output : ByteArray
Blake3>  ?m.805

Johan Commelin (Oct 12 2021 at 14:18):

Why does hash have IO _ as its return type? Shouldn't it be a pure function?

Johan Commelin (Oct 12 2021 at 14:20):

I think it will be hard to prove something once you've wrapped yourself in the IO monad.

Gabriel Ebner (Oct 12 2021 at 14:25):

return (output.data, by erw [Array.size]; apply data.property)

You're also using the wrong parentheses: ( ) is for products only, you need ⟨ ⟩ here.

Gabriel Ebner (Oct 12 2021 at 14:27):

def Blake3Hash : Type := Array UInt8

Note that Array UInt8 has a different runtime representation than ByteArray (which is more efficient).

Reid Barton (Oct 12 2021 at 14:33):

I'm guessing the hash function is implemented via FFI. In this case you can do a runtime check that the output size is correct using something like if h : output.size = BLAKE3_OUT_LEN then ... else ... with a panic in the else case.

Reid Barton (Oct 12 2021 at 14:36):

In the then case, you get the "proof" h that you need to build something of type { r : Array UInt8 // r.size = BLAKE3_OUT_LEN }.

Anders Christiansen Sørby (Oct 12 2021 at 14:38):

Thanks. I were hoping it would be possible to do it purely.
It's not easy to tell the difference between ( ) and ⟨ ⟩

Anders Christiansen Sørby (Oct 12 2021 at 14:45):

How do I go from the IO FFI monad stuff to the pure data? This is what I have:

def hash (input : ByteArray) : Blake3Hash :=
  let output := (do
    let hasher  initHasher
    let hasher  hasherUpdate hasher input (USize.ofNat input.size)
    hasherFinalize hasher (USize.ofNat BLAKE3_OUT_LEN)
    )
  if h : output.size = BLAKE3_OUT_LEN
  then output.data
  else panic "Incorrect output size"

Mario Carneiro (Oct 12 2021 at 15:17):

You need to return IO Blake3Hash if you are using IO functions

Mario Carneiro (Oct 12 2021 at 15:18):

Possibly you can cheat and make some non-IO extern functions if you have reason to believe they are pure

Sebastian Ullrich (Oct 12 2021 at 15:21):

You can also cheat even more and assume the FFI already returns the subtype with the proof, though this specific runtime check doesn't exactly look costly

Anders Christiansen Sørby (Oct 12 2021 at 15:34):

How do I do that? These functions are in essence pure, but somewhat stateful on the hasher. They don't do any real IO operations.

Anders Christiansen Sørby (Oct 12 2021 at 15:36):

Can I remove the IO monad from the FFI calls? That lead to errors previously.

Sebastian Ullrich (Oct 12 2021 at 15:38):

IO enforces ordering and exactly-once execution. It looks like this API is using the hasher linearly, so that might not be an issue and you can drop the IO. You need to adjust the called functions of course.

Anders Christiansen Sørby (Oct 13 2021 at 14:42):

Removing IO leads to this:

   >   Inhabited Blake3Hasher
       > Blake3.lean:38:0: error: failed to synthesize instance
       >   Inhabited Blake3Hasher
       > Blake3.lean:41:0: error: failed to synthesize instance
       >   Inhabited Blake3Hasher
       > Blake3.lean:47:0: error: failed to synthesize instance

Anders Christiansen Sørby (Oct 13 2021 at 14:43):

Is this because I'm using a constant? How can I ensure the foreign type is Inhabited?

Sebastian Ullrich (Oct 13 2021 at 14:46):

You can use PointedType for that, see e.g. https://github.com/leanprover/lean4/blob/66fcfcce3716774dacbd35e1ea0f5c75356df311/tests/compiler/foreign/main.lean#L1-L3

Anders Christiansen Sørby (Oct 13 2021 at 15:06):

That helped. Now I get another error:

       > Blake3.lean:71:4: error: type mismatch
       >   output
       > has type
       >   ByteArray : Type
       > but is expected to have type
       >   Blake3Hash : Type
       > Blake3.lean:73:4: error: failed to synthesize instance
       >   Inhabited Blake3Hash

where

def Blake3Hash : Type := { r : Array UInt8 // r.size = BLAKE3_OUT_LEN }

Do I need to declare an Inhabited instance?

Sebastian Ullrich (Oct 13 2021 at 15:10):

Yes. There is no trivial Inhabited instance for Subtype since you need to prove something about the element.

Sebastian Ullrich (Oct 13 2021 at 15:10):

Btw, do you solely compile on the cmdline :) ? Not in an editor?

Anders Christiansen Sørby (Oct 13 2021 at 17:39):

I just use the cmdline for ease.

Anders Christiansen Sørby (Oct 14 2021 at 00:19):

How do I create an Inhabited instance?
I've tried so far to:

def Blake3Hash : Type := { r : ByteArray // r.size = BLAKE3_OUT_LEN }
instance : Inhabited Blake3Hash where
  default := ⟨(List.replicate BLAKE3_OUT_LEN 0).toByteArray

I see that Subtype just needs a theorem proving the property of the ByteArray size. What is the simplest way and what is "the most explicit" way (For better understanding)?

Yakov Pechersky (Oct 14 2021 at 00:39):

The proof will be rfl, or if not, rely on a theorem that the size of a toBytes of a replicate is what you expect.

Scott Morrison (Oct 14 2021 at 00:42):

Presumably you want to fill in the sorry in

constant BLAKE3_OUT_LEN : Nat

@[simp] theorem List.to_ByteArray_size : (L : List UInt8)  L.toByteArray.size = L.length
| [] => rfl
| a::l => sorry

def Blake3Hash : Type := { r : ByteArray // r.size = BLAKE3_OUT_LEN }
instance : Inhabited Blake3Hash where
  default := ⟨(List.replicate BLAKE3_OUT_LEN 0).toByteArray, by simp

Scott Morrison (Oct 14 2021 at 01:00):

The API for ByteArray is missing essentially everything for proving things about it, as it hasn't been needed much yet. But the missing pieces are:

@[simp] theorem ByteArray.size_empty : ByteArray.empty.size = 0 :=
rfl

@[simp] theorem ByteArray.size_push (B : ByteArray) (a : UInt8) : (B.push a).size = B.size + 1 :=
by { cases B; simp only [ByteArray.push, ByteArray.size, Array.size_push] }

@[simp] theorem List.to_ByteArray_size : (L : List UInt8)  L.toByteArray.size = L.length
| [] => rfl
| a::l => by simp [List.toByteArray, to_ByteArray_loop_size]
where to_ByteArray_loop_size :
  (L : List UInt8)  (B : ByteArray)  (List.toByteArray.loop L B).size = L.length + B.size
| [], B => by simp [List.toByteArray.loop]
| a::l, B => by
    simp [List.toByteArray.loop, to_ByteArray_loop_size]
    rw [Nat.add_succ, Nat.succ_add]

constant BLAKE3_OUT_LEN : Nat

def Blake3Hash : Type := { r : ByteArray // r.size = BLAKE3_OUT_LEN }
instance : Inhabited Blake3Hash where
  default := ⟨(List.replicate BLAKE3_OUT_LEN 0).toByteArray, by simp

Scott Morrison (Oct 14 2021 at 01:00):

(The proof of any obvious fact that conceivably could be proved by simp should be proved by simp, and if simp doesn't do it, you should add the missing @[simp] lemmas! :-)

Mario Carneiro (Oct 14 2021 at 01:30):

I would guess that (List.replicate BLAKE3_OUT_LEN 0).toByteArray is not a good way to initialize an array. Is there an Array.mk?

Mario Carneiro (Oct 14 2021 at 01:36):

Unfortunately lean_alloc_sarray is not exposed to lean

Mario Carneiro (Oct 14 2021 at 01:44):

The best function I can write with the current API is

def ByteArray.fromElem (a : UInt8) (n : Nat) : ByteArray :=
  let rec go : Nat  ByteArray  ByteArray
  | 0, arr => arr
  | n+1, arr => arr.push a
  go n empty

Mac (Oct 14 2021 at 03:25):

@Mario Carneiro there is Array.mkEmpty if you want to pre-allocate an array of a given size.

Mario Carneiro (Oct 14 2021 at 03:26):

That's not a scalar array though; you have to reallocate and copy everything over to turn it into a ByteArray

Sebastian Ullrich (Oct 14 2021 at 06:02):

ByteArray.mkEmpty then :) ?

Mario Carneiro (Oct 14 2021 at 06:02):

mkWithCapacity would also be nice

Sebastian Ullrich (Oct 14 2021 at 06:08):

I'm not sure there are good use cases for ByteArray.mk in general. As you said, one should stick with scalar arrays from the beginning instead of converting to and from boxed arrays.

Mario Carneiro (Oct 14 2021 at 07:09):

I think ByteArray.mk is fine, like toByteArray; it's a conversion function in case you have an Array UInt8 and need a ByteArray

Mario Carneiro (Oct 14 2021 at 07:09):

it's just not the first function you should reach for

Anders Christiansen Sørby (Oct 14 2021 at 11:44):

How do I use the hypothesis h in the then branch to ensure that this type checks?

def hash (input : ByteArray) : Blake3Hash :=
  let hasher := initHasher
  let hasher := hasherUpdate hasher input (USize.ofNat input.size)
  let output := hasherFinalize hasher (USize.ofNat BLAKE3_OUT_LEN)
  if h : output.size = BLAKE3_OUT_LEN then
    output
  else
    panic "Incorrect output size"

Sebastian Ullrich (Oct 14 2021 at 12:23):

If you e.g. use a _ placeholder or open a tactic block in the then branch, you should see that h is in the context with the expected type. I highly recommend using an editor extension to incrementally and interactively develop Lean code :) .


Last updated: Dec 20 2023 at 11:08 UTC