Zulip Chat Archive

Stream: lean4

Topic: Isn't Fin a bad type for safe array lookup?


Evgeniy Kuznetsov (Dec 17 2021 at 12:49):

Hello everyone)

As from a dependently typed general-purpose language, I expect from Lean4 ability for writing programs with provable correctness, but the implicit behavior of Fin could break safe array access. In some cases Lean4 loses in safety even compared with C++ (the latter has at least warnings and runtime checks) and produce total code with undesired behavior:

import Mathlib

def n: Nat := 2

-- silent truncation of a constant value
def val_4: Fin (n+1) := 4
#eval val_4 -- output: 0

-- silent overflow
def val_1: Fin n := 1, by simp
def overflowed_value: Fin n := val_1 + val_1
#eval overflowed_value -- output: 0

-- silent underflow
def val_0: Fin n := 0, by simp
def underflowed_value : Fin n := val_0 - val_1
#eval underflowed_value -- output: 1

-- Array becomes a circular buffer
def msgs: Array String := #[
  "It's not even a runtime error!",
  "Lean is a functional programming language that makes it easy to write correct and maintainable code"
]
#eval msgs.get (val_1 + val_1) -- output: "It's not even a runtime error!"
#eval msgs.get (val_0 - val_1) -- output: "Lean is a functional programming language that makes it easy to write correct and maintainable code"

section -- with Mathlib4 imported

  -- silent downcast
  def val_n_plus_1: Fin (n+2) := n+1, by simp
  def downcasted_val_n_plus_1: Fin (n+1) := val_n_plus_1
  #eval downcasted_val_n_plus_1 -- output: 0

  -- explicit negative index
  #eval msgs.get (-val_1)  -- output: "Lean is a functional programming language that makes it easy to write correct and maintainable code"
end
// MSVC with EnableAllWarnings (/Wall) and Debug configuration
#include <iostream>
#include <string>
#include <array>

int main()
{
    uint8_t val_256 = 256;
    std::cout << "val_256 = " << +val_256 << "\n"; // warning C4305: 'initializing': truncation from 'int' to 'uint8_t'
                                                   // warning C4309: 'initializing': truncation of constant value
                                                   // output: val_256 = 0
    uint8_t val_255 = 255, val_1 = 1;
    uint8_t overflowed_value = val_255 + val_1;
    std::cout << "(val_255 + val_1) = " << +overflowed_value << "\n"; // output: (val_255 + val_1) = 0

    uint8_t val_0 = 0;
    uint8_t underflowed_value = val_0 - val_1;
    std::cout << "(val_0 - val_1) = " << +underflowed_value << "\n"; // output: (val_0 - val_1) = 255

    std::array<std::string, 2> msgs = {
        "It's not even a runtime error!",
        "Lean is a functional programming language that makes it easy to write correct and maintainable code"
    };

    std::cout << "msgs[val_1 + val_1] = " << msgs[val_1 + val_1] << "\n"; // warning C4365: 'argument': conversion from 'int' to 'std::array<std::string,2>::size_type', signed/unsigned mismatch
                                                                          // runtime exception: 'array subscript out of range'
    std::cout << "msgs[val_0 - val_1] = " << msgs[val_0 - val_1] << "\n"; // warning C4365: 'argument': conversion from 'int' to 'std::array<std::string,2>::size_type', signed/unsigned mismatch
                                                                          // runtime exception: 'array subscript out of range'

    uint16_t val_257 = 257;
    uint8_t downcasted_val_257 = val_257;
    std::cout << "downcasted_val_257 = " << +downcasted_val_257 << "\n"; // warning C4244: 'initializing': conversion from 'uint16_t' to 'uint8_t', possible loss of data
                                                                         // output: downcasted_val_257 = 1

    std::cout << "msgs[- val_1] = " << msgs[- val_1] << "\n"; // warning C4365: 'argument': conversion from 'int' to 'std::array<std::string,2>::size_type', signed/unsigned mismatch
                                                              // runtime exception: 'array subscript out of range'
}

I wouldn't imagine someones expecting their indexes silently overflowing/underflowing/truncating modulo size of a container, except for circular buffers, and almost all arrays aren't circular buffers.

Wouldn't that be better to have another one bounded Nat type without modular arithmetic?

Kevin Buzzard (Dec 17 2021 at 12:57):

What structure fin should have on it is a contentious topic. Like you I am not a fan of all this extra stuff -- in Lean 3 + mathlib fin n has an addition (which wraps) and it wouldn't surprise me if someone put a multiplication on it too. In mathematics there is a structure Z/nZ\mathbb{Z}/n\mathbb{Z} which is a ring, so has wrapping addition and multiplication and all the behaviour being exhibited by Fin in your snippet.

Evgeniy Kuznetsov (Dec 17 2021 at 13:13):

I'm not claiming that Fin is broken in some sense, just that it may doesn't fit the role of array index type.

Tomas Skrivan (Dec 17 2021 at 13:17):

Well it depends what do you mean by safe array access. It prevents you from out of bound memory access but it does not prevent you from accidentally accessing the incorrect element. To prevent this issue you can abandon Fin and use Int64 together with Array.uget to access elements.

I agree that this behavior of Fin might be confusing if you do not think about it as Z/nZ as Kevin described. In my code, I actually use this structure all the time and I'm quite happy with it.

I think that the main question is, what should be the default interface for Array. For example the notation a[i] uses Array.get! where i is Nat and out of bound access returns default value of what ever the array is holding. You get another set of unexpected behaviour then using Fin and Array.get

Tomas Skrivan (Dec 17 2021 at 13:24):

Ohh, I didn't read the question in title properly. What would you suggest to use instead of Fin?

I see two options

  1. Ditch algebraic structure on Fin.
  2. Use Nat or Int and always provide proof of valid index.

I find both options clunky to use and rather accept this sometimes unintuitive behavior of Fin.

Tomas Skrivan (Dec 17 2021 at 13:27):

And honestly, the only unexpected case is the val_4 evaluating to 0 instead of 1, as 4=1 mod 3.

Evgeniy Kuznetsov (Dec 17 2021 at 13:28):

my mistake) it's really 1

Tomas Skrivan (Dec 17 2021 at 13:30):

Uff, I was almost about to report a bug :)

Henrik Böving (Dec 17 2021 at 13:31):

I've come across this before and even tried to come up with a Fin that would not allow this behavior for constants, however there is an issue I stumbled upon while doing this. In Lean 4 the number notation is controlled by a type class, if you want to allow a number literal n to be converted to your type T you have to implement OfNat T n. You can of course also universally quantify over the n here to allow any literal to be converted as is for example done for the natural numbers:

@[defaultInstance 100] /- low prio -/
instance (n : Nat) : OfNat Nat n where
  ofNat := n

And similarly also for Fin, and I tried to come up with an OfNat instance that would fail type class resolution if this happens like this:

instance (n i : Nat) (h : n + 1 < i := by decide) : OfNat (NFin (no_index (n+1))) i where
  ofNat := NFin.mk i h

#check (12 : NFin 13)

but this does not work because if we check the definition of autoParam (which is the name of the feature used to generate h) we find that

/--
  Gadget for automatic parameter support. This is similar to the `optParam` gadget, but it uses
  the given tactic.
  Like `optParam`, this gadget only affects elaboration.
  For example, the tactic will *not* be invoked during type class resolution. -/

and this is where I didn't know how to proceed anymore.

Evgeniy Kuznetsov (Dec 17 2021 at 13:32):

Tomas Skrivan said:

Ohh, I didn't read the question in title properly. What would you suggest to use instead of Fin?

I see two options

  1. Ditch algebraic structure on Fin.
  2. Use Nat or Int and always provide proof of valid index.

I find both options clunky to use and rather accept this sometimes unintuitive behavior of Fin.

I think bounded Nat type with safe arithmetic operations and sugar and without implicit truncation would be enough.

Tomas Skrivan (Dec 17 2021 at 13:34):

What is 'safe arithmetics+without implicit truncation'?

Yakov Pechersky (Dec 17 2021 at 13:34):

You have saturating addition, which gives you a valid monoid.

Yakov Pechersky (Dec 17 2021 at 13:36):

And multiplication still makes sense, in that it's iterated addition. So distributivity still works and you get a semiring.

Tomas Skrivan (Dec 17 2021 at 13:36):

Cool, I wasn't aware of saturation arithmetics. That looks like a very reasonable path to take.

Yakov Pechersky (Dec 17 2021 at 13:37):

But it shouldn't be called Fin. Because that's a choice that's been argued over many times, and it's probably easier to just have a new type.

Evgeniy Kuznetsov (Dec 17 2021 at 13:40):

Tomas Skrivan said:

What is 'safe arithmetics+without implicit truncation'?

I mean checked addition/subtraction without modular arithmetic.

Yakov Pechersky (Dec 17 2021 at 13:43):

Evgeniy, is 10 - 6, when operating in Index 5, 4 or 0?

Evgeniy Kuznetsov (Dec 17 2021 at 13:48):

Yakov Pechersky said:

Evgeniy, is 10 - 6, when operating in Index 5, 4 or 0?

Didn't you think both 10 and 6 should be unrepresentable in Index 5?

Yakov Pechersky (Dec 17 2021 at 13:54):

I am trying to figure out our specification. If a user provides the syntax "10 - 6" to represent a term of index 5, we have several options. Runtime error is always possible. Another is to interpret that whole syntax as a term of nat, and then truncate, which gives 4. Or to interpret the subtraction as happening in Index 5, then interpret each numerical syntax as terms in index 5, which gives 4 - 4 = 0.

Tomas Skrivan (Dec 17 2021 at 13:55):

Evgeniy Kuznetsov said:

I mean checked addition/subtraction without modular arithmetic.

Checked by proof or at runtime?
If by proof then the go to access function should probably be Array.uget and maybe having a[i] expand to a.uget i (by linarith). I would be interested if that would work in practice or linarith(or any other appropriate tactic) would fail anytime the index operations would get a bit more involved.
At runtime, you would be breaking the purity of the code and would have to embed everything in a monad.

Evgeniy Kuznetsov (Dec 17 2021 at 14:13):

Tomas Skrivan said:

Evgeniy Kuznetsov said:

I mean checked addition/subtraction without modular arithmetic.

Checked by proof or at runtime?
If by proof then the go to access function should probably be Array.uget and maybe having a[i] expand to a.uget i (by linarith). I would be interested if that would work in practice or linarith(or any other appropriate tactic) would fail anytime the index operations would get a bit more involved.
At runtime, you would be breaking the purity of the code and would have to embed everything in a monad.

By proof.
I agree with you that in comparison with Array.get and Array.get! functions Array.uget and Array.get? are more predictable.
I just had a thought that one time in a docs will appear a chapter "Avoid Array.get" how it was already happened with partial functions in haskell

Evgeniy Kuznetsov (Dec 17 2021 at 14:30):

Yakov Pechersky said:

I am trying to figure out our specification. If a user provides the syntax "10 - 6" to represent a term of index 5, we have several options. Runtime error is always possible. Another is to interpret that whole syntax as a term of nat, and then truncate, which gives 4. Or to interpret the subtraction as happening in Index 5, then interpret each numerical syntax as terms in index 5, which gives 4 - 4 = 0.

Perhaps implicit coercion from Nat should be prohibited, except for very primitive scenarios like:

def i : Index 5 := 4

when it 'obvious' for elaborator

Tomas Skrivan (Dec 17 2021 at 14:35):

By proof.

I would be interested in yours and others experience with proving correctness of arrays indices.

I got soo fed up with it and gave up on proving it, it was slowing me too much to get anything meaningful done. I ended up defining a macro:

macro:max "!" noWs t:term : term => `(⟨$t, sorry⟩)

that allows me to write a.get !i that converts i : Nat to Fin a.size by omitting the proof. I will come back to these proofs once I start debugging my program.

Yakov Pechersky (Dec 17 2021 at 14:41):

Tomas, you're asking for a runtime proof generator for propositions of inequalities. That's what norm_num does in mathlib3

Chris B (Dec 17 2021 at 19:07):

I may be missing something, but I'm not sure how saturation as opposed to wrapping is going to solve the issue of getting the wrong element or having unexpected results from coercions. If you have an array of length 10, and you try to get the 4+8th element, returning the last element using saturating addition is generally going to be just as wrong as returning the element you'd get by wrapping.

I know almost nothing about C++, but the comparison seems kind of apples to oranges since the array size is part of the type, and the indices are constants.

Reid Barton (Dec 17 2021 at 19:11):

Or more realistically, if you are comparing elements i and i+1 of an array for each i, you probably don't want to compare the last element to the first element or to itself.

Kyle Miller (Dec 17 2021 at 19:12):

Maybe an idea is rather than saturation, have, basically, Option (Fin n) where values that go outside the range become None.

An efficient implementation for array access might be

/-- Fin but where arithmetic operations going outside range become invalid values,
and values are represented as `USize`. -/
def SFin (n : USize) := Quot (λ (a b : USize) => a = b  (a  n  b  n))

This uses all values outside the range as equivalent sentinel values.

Chris B (Dec 17 2021 at 19:33):

fwiw Mathlib4 does provide checked arithmetic for Fin n that ends up in Option (Fin n); there's also a method pattern taken from rust where you get (Bool x Fin n) back, returning the result and a flag for over/underflow: https://github.com/leanprover-community/mathlib4/blob/6276f24a77e6d7d8caa0c0a3f82e2ccf9a7e4289/Mathlib/Data/Fin/Basic.lean#L127

Mac (Dec 17 2021 at 22:22):

@Evgeniy Kuznetsov One think that it might help to remember is that Fin is used in the Lean core to implement native unsigned integers (e.g, UInt8/UInt32/USize), so it is important that Fin shares their behavior. Since such things do silently overflow and underflow at runtime, Fin needs to do so as well.

To handle array indices in a safe manner, I think @Kyle Miller's suggestion is great alternative.

Reid Barton (Dec 19 2021 at 20:24):

Mac said:

Evgeniy Kuznetsov One think that it might help to remember is that Fin is used in the Lean core to implement native unsigned integers (e.g, UInt8/UInt32/USize), so it is important that Fin shares their behavior. Since such things do silently overflow and underflow at runtime, Fin needs to do so as well.

I don't see why this follows, since e.g. USize is a structure wrapper around Fin.

Mac (Dec 19 2021 at 20:47):

@Reid Barton yes, but they just directly inherit the Lean-level implementations of operations (e.g., add, mul, etc.) from Fin. The wrapper is merely for the code generator.

Reid Barton (Dec 19 2021 at 20:47):

but anyways surely the correct approach is to first decide what behavior we want, then adjust the implementation to match?

Mac (Dec 19 2021 at 20:50):

@Reid Barton UInt8/UInt32/USize/etc. have a fixed native behavior, Fin is the Lean abstraction of this behavior (so that the modulo arithmetic does not have to be copied to each of UInt8/UInt32/USize/etc.). I don't think there is any behavior there that is changeable (but I may be wrong).

Mac (Dec 19 2021 at 20:53):

One thing that could be done though is to leave the name Fin for mathematics and use something like APNat/APInt for the codegen version.

Kevin Buzzard (Dec 20 2021 at 00:59):

I think the lean 4 definition of fin does not agree with the lean 3 definition? Maybe this was because of mathematicians?


Last updated: Dec 20 2023 at 11:08 UTC