Zulip Chat Archive
Stream: mathlib4
Topic: Q. about ByteSlice.getOp
Nicolas Rouquette (Jan 30 2022 at 21:24):
Given the definition of ByteSlice
as:
structure ByteSlice := (arr : ByteArray) (off len : Nat)
I am surprised that the function below does not check for the possibility that idx >= self.len
.
Shouldn't there be an error of some kind in that case?
/-- Index into a byte slice. The `getOp` function allows the use of the `buf[i]` notation. -/
@[inline] def getOp (self : ByteSlice) (idx : Nat) : UInt8 := self.arr.get! (self.off + idx)
- Nicolas.
Marcus Rossel (Jan 31 2022 at 10:07):
The get!
function does that check and calls panic!
if the index is out of bounds.
Wojciech Nawrocki (Jan 31 2022 at 14:18):
get!
will succeed if the slice is in the middle of the array and the access is OOB for the slice but within bounds of the array, no? So this looks like a bug to me.
Marcus Rossel (Jan 31 2022 at 14:50):
(deleted)
Gabriel Ebner (Jan 31 2022 at 16:00):
I wouldn't call it a bug in getOp
. The result is just unspecified if you access the array outside of its bounds, and here it happens to return a value other than default
.
Gabriel Ebner (Jan 31 2022 at 16:03):
Other functions don't panic either if you pass them out-of-bounds indices; e.g. modify
will silently do nothing.
Gabriel Ebner (Jan 31 2022 at 16:03):
A very unfortunate (and related) problem is that you can't prove panic-freeness of functions using these basic primitives in Lean 4 though.
Nicolas Rouquette (Feb 04 2022 at 22:39):
Allowing an out-of-bound access seems to me an open door for a vulnerability that could be potentially serious for an application.
If this vulnerability is intrinsically tied to our inability to prove panic-freeness, then is it really reasonable to use panic in mathlib4?
Panic in lean corresponds to throwing an exception in Java/Scala. In Scala, functional programming guidelines suggest using an appropriate type to encapsulate the distinction between a value and an error; see: https://docs.scala-lang.org/scala3/book/fp-functional-error-handling.html
So, in the case of ByteSlice
, instead of the panic-vulnerable interface:
def getOp (self : ByteSlice) (idx : Nat) : UInt8
We would have instead a monad like Either
and a type of Error
so that we could define a panic-free interface like this:
def getOp (self : ByteSlice) (idx : Nat) : Either Error UInt8
Then presumably we could write a theorem that says that if the index is within bound, the result must be a Right
, otherwise it must be a Left
.
Mario Carneiro (Feb 05 2022 at 00:12):
You shouldn't use getOp
if you need error checking
Mario Carneiro (Feb 05 2022 at 00:13):
there are other functions like get?
for that
Mario Carneiro (Feb 05 2022 at 00:14):
I don't see the sense in which this could cause a vulnerability, except in the general sense that an out of bounds read is probably a bug and if you aren't using the dependently typed get
then you are making assertions that could be false
Nicolas Rouquette (Feb 05 2022 at 22:53):
Mario Carneiro said:
You shouldn't use
getOp
if you need error checking
How so?
Looking at the definition, it is unclear to me what error checking takes place and where.
/-- Index into a byte slice. The `getOp` function allows the use of the `buf[i]` notation. -/
@[inline] def getOp (self : ByteSlice) (idx : Nat) : UInt8 := self.arr.get! (self.off + idx)
If self.off + idx > self.arr.size
, does self.arr.get! ...
raise an error?
If idx > self.len
, what catches that error?
Looking at lean4's ByteArray
, I would have expected a dependently typed index argument, something like this:
def getOp: (self : @& ByteSlice) → (@& Fin self.len) → UInt8
| ⟨self⟩, idx => self.arr.get! (self.off + idx)
Isn't it preferable to use the type system as a mechanism to prevent errors?
Mario Carneiro (Feb 06 2022 at 05:59):
Nicolas Rouquette said:
Mario Carneiro said:
You shouldn't use
getOp
if you need error checkingHow so?
Looking at the definition, it is unclear to me what error checking takes place and where.
/-- Index into a byte slice. The `getOp` function allows the use of the `buf[i]` notation. -/ @[inline] def getOp (self : ByteSlice) (idx : Nat) : UInt8 := self.arr.get! (self.off + idx)
If
self.off + idx > self.arr.size
, doesself.arr.get! ...
raise an error?
Ifidx > self.len
, what catches that error?
The error handling behavior is inherited from get!
here. If you use Array.get!
with an out of bounds index, the program will not halt, you will get default : A
back, and an error message will be printed on stderr (which is usually reserved for low level errors, and gives users an uncomfortable experience, so it should be avoided when possible). In other words, this is an assert
statement which you should only use if you are sure it will not occur in your code. However note that memory safety is not compromised here, and a bounds check is still performed.
Nicolas Rouquette said:
Looking at lean4's
ByteArray
, I would have expected a dependently typed index argument, something like this:def getOp: (self : @& ByteSlice) → (@& Fin self.len) → UInt8 | ⟨self⟩, idx => self.arr.get! (self.off + idx)
Isn't it preferable to use the type system as a mechanism to prevent errors?
It's certainly possible to define such a thing yourself. But self.len
is not the right bound here, as you would see if you deferred to Array.get
instead:
def getOp: (self : @& ByteSlice) → (@& Fin self.len) → UInt8
| ⟨self⟩, idx => self.arr.get ⟨self.off + idx, _⟩
At the _
you will see that it expects a proof that self.off + idx < self.arr.size
, and your assumption idx < self.len
does not help to prove this.
Mario Carneiro (Feb 06 2022 at 06:04):
I think it is reasonable to add more proofs to these data structures, to assert that off + len <= arr.size
in ByteSlice
and possibly also off <= arr.size
in ByteSliceT
. If we did so, then it would make more sense to provide an "unchecked" dependently typed get
function. Currently a dependently typed get function would be useless because knowing that it is in bounds of the slice doesn't mean it is in bounds of the underlying array
Mario Carneiro (Feb 06 2022 at 06:08):
However, I don't think that getOp
(the one that gets the a[i]
syntax) should be dependently typed. Following the lead of lean 4 core, this should just take a nat and it should have unspecified out of bounds behavior (it shouldn't do any checks it does not need to for memory safety)
Nicolas Rouquette (Feb 06 2022 at 20:46):
Mario Carneiro said:
I think it is reasonable to add more proofs to these data structures, to assert that
off + len <= arr.size
inByteSlice
and possibly alsooff <= arr.size
inByteSliceT
. If we did so, then it would make more sense to provide an "unchecked" dependently typedget
function. Currently a dependently typed get function would be useless because knowing that it is in bounds of the slice doesn't mean it is in bounds of the underlying array
Thanks for the explanation! While I understand in principle the suggestion, it is still unclear to me how to do this.
So, starting from the definition without assertions:
structure ByteSliceT := (arr : ByteArray) (off : Nat)
Is it correct that to add support for these assertions, we have to effectively turn them into proof obligations?
I tried this:
structure ByteSliceT where
arr : ByteArray
off : Nat
isLt : Nat.lt off arr.size
def a: ByteArray := (((ByteArray.empty.push 10).push 20).push 30).push 40
#eval a -- [10, 20, 30, 40]
def b1 : ByteSliceT := ByteSliceT.mk a 1 (Nat.lt 1 a.size)
application type mismatch
{ arr := a, off := 1, isLt := Nat.lt 1 (ByteArray.size a) }
argument
Nat.lt 1 (ByteArray.size a)
has type
Prop : Type
but is expected to have type
Nat.lt 1 (ByteArray.size a) : Prop
It seems that the structure is close what what would be needed since it would reject this:
def b3 : ByteSliceT := ByteSliceT.mk a 20 (Nat.lt 1 a.size)
application type mismatch
{ arr := a, off := 20, isLt := Nat.lt 1 (ByteArray.size a) }
argument
Nat.lt 1 (ByteArray.size a)
has type
Prop : Type
but is expected to have type
Nat.lt 20 (ByteArray.size a) : Prop
There is something that I am missing but I am not sure what it is.
Mario Carneiro (Feb 06 2022 at 22:40):
first, you want the requirement to be off <= arr.size
not off < arr.size
. An empty byte slice at the end of the array should be okay
Mario Carneiro (Feb 06 2022 at 22:42):
The issue with b1
is that you gave it 1 < a.size
instead of a proof of 1 < a.size
. In this case, it reduces to 1 < 4
so (by decide)
should work as a proof
Mario Carneiro (Feb 06 2022 at 22:44):
The structure is otherwise correct. Most likely you would also want a "smart constructor" mkByteSliceT
that does not take a proof, and instead sets the offset to min off a.size
which can be proven to always be in bounds. That way users can use mkByteSliceT
when they don't want to do the side proof, at the cost of doing a bounds check in the constructor
Nicolas Rouquette (Feb 07 2022 at 00:59):
Makes sense, though I am still struggling with how to write this properly:
def mkByteSliceT (arr : ByteArray) (off : Nat) :=
if off <= arr.size then
(
have h : Nat.lt off arr.size
| ByteSliceT.mk {arr:=arr, off:=off, isLT:=h}
)
else
panic! s"Error: offset, {off}, exceeds array size, {arr.size}"
Although I've read the doc several times, it is still unclear to me how to combine these aspects of computation and proof together.
Nicolas Rouquette (Feb 10 2022 at 04:17):
In case it is not obvious, I am stuck at how to do invoke ByteSliteT.mk
from mkByteSliceT
.
That is, how can I convert a confirmation that off <= arr.size
into a proof suitable for calling ByteSliteT.mk
where ByteSliceT
is:
structure ByteSliceT where
arr : ByteArray
off : Nat
isLt : Nat.lt off arr.size
Mario Carneiro (Feb 10 2022 at 04:30):
again, you don't want isLt
; it should be isLe
Mario Carneiro (Feb 10 2022 at 04:30):
structure ByteSliceT where
arr : ByteArray
off : Nat
isLe : off <= arr.size
Mario Carneiro (Feb 10 2022 at 04:37):
structure ByteSliceT where
arr : ByteArray
off : Nat
isLe : off ≤ arr.size
def mkByteSliceT (arr : ByteArray) (off : Nat) : ByteSliceT :=
⟨arr, min off arr.size, min_le_right _ _⟩
Mario Carneiro (Feb 10 2022 at 04:39):
I don't think we want to panic on out of bounds input here. It is reasonable to want to create a suffix starting after the end of the array, and this simply doesn't span any elements
Nicolas Rouquette (Feb 11 2022 at 05:29):
Thanks for the clarification.
Reasonable behavior can be subjective unless there is a way to describe what guarantees we can provide.
Considering the signature of the function:
def mkByteSliceT (arr : ByteArray) (off : Nat) : ByteSliceT
In conventional programming, there is the notion of pre/post condition to specify some properties about a function.
It seems to me that with Lean, we do not need pre/post conditions per se, rather, it's the implementation of the function
that, if written in a logical way, enables lean to "reason" about its properties as you've done for the constructor's 3rd argument:
def mkByteT (arr : ByteArray) (off : Nat) : ByteSliceT :=
⟨arr, min off arr.size, min_le_right _ _⟩
Suppose we wanted another constructor whose contract is that it either succeeds with the offset as requested or fails if the offset is too large.
I agree that it is probably undesirable to use panic
for this.
In Scala, we would instead have a return type like this: Either[String, ByteStringT]
Are there recommendations for how to do this in lean4?
-- what is the result type?
-- what is the implementation?
def mkByteSliceAsRequested (arr : ByteArray) (off : Nat) : ??? :=
???
Mario Carneiro (Feb 11 2022 at 05:31):
We don't need a function with a precondition, because the constructor already serves that purpose
Mario Carneiro (Feb 11 2022 at 05:31):
a version that panics might be useful but it would still have to return a valid result, so it would probably just be a guarded version of mkByteSliceT
Mario Carneiro (Feb 11 2022 at 05:33):
because unless you return an option or other enriched return type, even if you "fail" you are still required to not violate the type system because consistency of the logic depends on it
Last updated: Dec 20 2023 at 11:08 UTC