Zulip Chat Archive

Stream: mathlib4

Topic: Q. about ByteSlice.getOp


Nicolas Rouquette (Jan 30 2022 at 21:24):

Given the definition of ByteSlice as:

structure ByteSlice := (arr : ByteArray) (off len : Nat)

I am surprised that the function below does not check for the possibility that idx >= self.len.
Shouldn't there be an error of some kind in that case?

/-- Index into a byte slice. The `getOp` function allows the use of the `buf[i]` notation. -/
@[inline] def getOp (self : ByteSlice) (idx : Nat) : UInt8 := self.arr.get! (self.off + idx)
  • Nicolas.

Marcus Rossel (Jan 31 2022 at 10:07):

The get! function does that check and calls panic! if the index is out of bounds.

Wojciech Nawrocki (Jan 31 2022 at 14:18):

get! will succeed if the slice is in the middle of the array and the access is OOB for the slice but within bounds of the array, no? So this looks like a bug to me.

Marcus Rossel (Jan 31 2022 at 14:50):

(deleted)

Gabriel Ebner (Jan 31 2022 at 16:00):

I wouldn't call it a bug in getOp. The result is just unspecified if you access the array outside of its bounds, and here it happens to return a value other than default.

Gabriel Ebner (Jan 31 2022 at 16:03):

Other functions don't panic either if you pass them out-of-bounds indices; e.g. modify will silently do nothing.

Gabriel Ebner (Jan 31 2022 at 16:03):

A very unfortunate (and related) problem is that you can't prove panic-freeness of functions using these basic primitives in Lean 4 though.

Nicolas Rouquette (Feb 04 2022 at 22:39):

Allowing an out-of-bound access seems to me an open door for a vulnerability that could be potentially serious for an application.
If this vulnerability is intrinsically tied to our inability to prove panic-freeness, then is it really reasonable to use panic in mathlib4?

Panic in lean corresponds to throwing an exception in Java/Scala. In Scala, functional programming guidelines suggest using an appropriate type to encapsulate the distinction between a value and an error; see: https://docs.scala-lang.org/scala3/book/fp-functional-error-handling.html

So, in the case of ByteSlice, instead of the panic-vulnerable interface:

 def getOp (self : ByteSlice) (idx : Nat) : UInt8

We would have instead a monad like Either and a type of Error so that we could define a panic-free interface like this:

 def getOp (self : ByteSlice) (idx : Nat) : Either Error UInt8

Then presumably we could write a theorem that says that if the index is within bound, the result must be a Right, otherwise it must be a Left.

Mario Carneiro (Feb 05 2022 at 00:12):

You shouldn't use getOp if you need error checking

Mario Carneiro (Feb 05 2022 at 00:13):

there are other functions like get? for that

Mario Carneiro (Feb 05 2022 at 00:14):

I don't see the sense in which this could cause a vulnerability, except in the general sense that an out of bounds read is probably a bug and if you aren't using the dependently typed get then you are making assertions that could be false

Nicolas Rouquette (Feb 05 2022 at 22:53):

Mario Carneiro said:

You shouldn't use getOp if you need error checking

How so?

Looking at the definition, it is unclear to me what error checking takes place and where.

/-- Index into a byte slice. The `getOp` function allows the use of the `buf[i]` notation. -/
@[inline] def getOp (self : ByteSlice) (idx : Nat) : UInt8 := self.arr.get! (self.off + idx)

If self.off + idx > self.arr.size, does self.arr.get! ... raise an error?
If idx > self.len, what catches that error?

Looking at lean4's ByteArray, I would have expected a dependently typed index argument, something like this:

def getOp: (self : @& ByteSlice)  (@& Fin self.len)  UInt8
  | self⟩, idx => self.arr.get! (self.off + idx)

Isn't it preferable to use the type system as a mechanism to prevent errors?

Mario Carneiro (Feb 06 2022 at 05:59):

Nicolas Rouquette said:

Mario Carneiro said:

You shouldn't use getOp if you need error checking

How so?

Looking at the definition, it is unclear to me what error checking takes place and where.

/-- Index into a byte slice. The `getOp` function allows the use of the `buf[i]` notation. -/
@[inline] def getOp (self : ByteSlice) (idx : Nat) : UInt8 := self.arr.get! (self.off + idx)

If self.off + idx > self.arr.size, does self.arr.get! ... raise an error?
If idx > self.len, what catches that error?

The error handling behavior is inherited from get! here. If you use Array.get! with an out of bounds index, the program will not halt, you will get default : A back, and an error message will be printed on stderr (which is usually reserved for low level errors, and gives users an uncomfortable experience, so it should be avoided when possible). In other words, this is an assert statement which you should only use if you are sure it will not occur in your code. However note that memory safety is not compromised here, and a bounds check is still performed.

Nicolas Rouquette said:

Looking at lean4's ByteArray, I would have expected a dependently typed index argument, something like this:

def getOp: (self : @& ByteSlice)  (@& Fin self.len)  UInt8
  | self⟩, idx => self.arr.get! (self.off + idx)

Isn't it preferable to use the type system as a mechanism to prevent errors?

It's certainly possible to define such a thing yourself. But self.len is not the right bound here, as you would see if you deferred to Array.get instead:

def getOp: (self : @& ByteSlice)  (@& Fin self.len)  UInt8
  | self⟩, idx => self.arr.get self.off + idx, _

At the _ you will see that it expects a proof that self.off + idx < self.arr.size, and your assumption idx < self.len does not help to prove this.

Mario Carneiro (Feb 06 2022 at 06:04):

I think it is reasonable to add more proofs to these data structures, to assert that off + len <= arr.size in ByteSlice and possibly also off <= arr.size in ByteSliceT. If we did so, then it would make more sense to provide an "unchecked" dependently typed get function. Currently a dependently typed get function would be useless because knowing that it is in bounds of the slice doesn't mean it is in bounds of the underlying array

Mario Carneiro (Feb 06 2022 at 06:08):

However, I don't think that getOp (the one that gets the a[i] syntax) should be dependently typed. Following the lead of lean 4 core, this should just take a nat and it should have unspecified out of bounds behavior (it shouldn't do any checks it does not need to for memory safety)

Nicolas Rouquette (Feb 06 2022 at 20:46):

Mario Carneiro said:

I think it is reasonable to add more proofs to these data structures, to assert that off + len <= arr.size in ByteSlice and possibly also off <= arr.size in ByteSliceT. If we did so, then it would make more sense to provide an "unchecked" dependently typed get function. Currently a dependently typed get function would be useless because knowing that it is in bounds of the slice doesn't mean it is in bounds of the underlying array

Thanks for the explanation! While I understand in principle the suggestion, it is still unclear to me how to do this.

So, starting from the definition without assertions:

structure ByteSliceT := (arr : ByteArray) (off : Nat)

Is it correct that to add support for these assertions, we have to effectively turn them into proof obligations?

I tried this:

structure ByteSliceT where
  arr : ByteArray
  off : Nat
  isLt : Nat.lt off arr.size

def a: ByteArray := (((ByteArray.empty.push 10).push 20).push 30).push 40
#eval a -- [10, 20, 30, 40]

def b1 : ByteSliceT := ByteSliceT.mk a 1 (Nat.lt 1 a.size)

application type mismatch
  { arr := a, off := 1, isLt := Nat.lt 1 (ByteArray.size a) }
argument
  Nat.lt 1 (ByteArray.size a)
has type
  Prop : Type
but is expected to have type
  Nat.lt 1 (ByteArray.size a) : Prop

It seems that the structure is close what what would be needed since it would reject this:

def b3 : ByteSliceT := ByteSliceT.mk a 20 (Nat.lt 1 a.size)

application type mismatch
  { arr := a, off := 20, isLt := Nat.lt 1 (ByteArray.size a) }
argument
  Nat.lt 1 (ByteArray.size a)
has type
  Prop : Type
but is expected to have type
  Nat.lt 20 (ByteArray.size a) : Prop

There is something that I am missing but I am not sure what it is.

Mario Carneiro (Feb 06 2022 at 22:40):

first, you want the requirement to be off <= arr.size not off < arr.size. An empty byte slice at the end of the array should be okay

Mario Carneiro (Feb 06 2022 at 22:42):

The issue with b1 is that you gave it 1 < a.size instead of a proof of 1 < a.size. In this case, it reduces to 1 < 4 so (by decide) should work as a proof

Mario Carneiro (Feb 06 2022 at 22:44):

The structure is otherwise correct. Most likely you would also want a "smart constructor" mkByteSliceT that does not take a proof, and instead sets the offset to min off a.size which can be proven to always be in bounds. That way users can use mkByteSliceT when they don't want to do the side proof, at the cost of doing a bounds check in the constructor

Nicolas Rouquette (Feb 07 2022 at 00:59):

Makes sense, though I am still struggling with how to write this properly:

def mkByteSliceT (arr : ByteArray) (off : Nat) :=
  if off <= arr.size then
    (
      have h : Nat.lt off arr.size
      | ByteSliceT.mk {arr:=arr, off:=off, isLT:=h}
    )
  else
    panic! s"Error: offset, {off}, exceeds array size, {arr.size}"

Although I've read the doc several times, it is still unclear to me how to combine these aspects of computation and proof together.

Nicolas Rouquette (Feb 10 2022 at 04:17):

In case it is not obvious, I am stuck at how to do invoke ByteSliteT.mk from mkByteSliceT.
That is, how can I convert a confirmation that off <= arr.size into a proof suitable for calling ByteSliteT.mk where ByteSliceT is:

structure ByteSliceT where
  arr : ByteArray
  off : Nat
  isLt : Nat.lt off arr.size

Mario Carneiro (Feb 10 2022 at 04:30):

again, you don't want isLt; it should be isLe

Mario Carneiro (Feb 10 2022 at 04:30):

structure ByteSliceT where
  arr : ByteArray
  off : Nat
  isLe : off <= arr.size

Mario Carneiro (Feb 10 2022 at 04:37):

structure ByteSliceT where
  arr : ByteArray
  off : Nat
  isLe : off  arr.size

def mkByteSliceT (arr : ByteArray) (off : Nat) : ByteSliceT :=
  arr, min off arr.size, min_le_right _ _

Mario Carneiro (Feb 10 2022 at 04:39):

I don't think we want to panic on out of bounds input here. It is reasonable to want to create a suffix starting after the end of the array, and this simply doesn't span any elements

Nicolas Rouquette (Feb 11 2022 at 05:29):

Thanks for the clarification.

Reasonable behavior can be subjective unless there is a way to describe what guarantees we can provide.

Considering the signature of the function:

def mkByteSliceT (arr : ByteArray) (off : Nat) : ByteSliceT

In conventional programming, there is the notion of pre/post condition to specify some properties about a function.
It seems to me that with Lean, we do not need pre/post conditions per se, rather, it's the implementation of the function
that, if written in a logical way, enables lean to "reason" about its properties as you've done for the constructor's 3rd argument:

def mkByteT (arr : ByteArray) (off : Nat) : ByteSliceT :=
  arr, min off arr.size, min_le_right _ _

Suppose we wanted another constructor whose contract is that it either succeeds with the offset as requested or fails if the offset is too large.
I agree that it is probably undesirable to use panic for this.
In Scala, we would instead have a return type like this: Either[String, ByteStringT]

Are there recommendations for how to do this in lean4?

-- what is the result type?
-- what is the implementation?
def mkByteSliceAsRequested (arr : ByteArray) (off : Nat) : ??? :=
  ???

Mario Carneiro (Feb 11 2022 at 05:31):

We don't need a function with a precondition, because the constructor already serves that purpose

Mario Carneiro (Feb 11 2022 at 05:31):

a version that panics might be useful but it would still have to return a valid result, so it would probably just be a guarded version of mkByteSliceT

Mario Carneiro (Feb 11 2022 at 05:33):

because unless you return an option or other enriched return type, even if you "fail" you are still required to not violate the type system because consistency of the logic depends on it


Last updated: Dec 20 2023 at 11:08 UTC