Zulip Chat Archive

Stream: mathlib4

Topic: Why did the community decide to use "special values"?


Kent Van Vels (Aug 04 2023 at 03:12):

This question is not meant to be a criticism in any way. I play around with lean and mathlib 4 and it is far beyond anything I could have done. I am curious about the choice to define, (1:Nat)-(100:Nat) as (0:Nat). I know that C and C++ leave this (unsigned integer underflow) as "undefined" and this provides some flexibility but also some "footguns" to the language. For example, for all unsigned integers u we have (u - 100) + 100 because, in some sense, integer underflow needn't be considered, it is undefined. So to a C++ program (compiler) the expression (u-100)+100 is also a proof that u \ge 100.

Were any similar approaches tried in Mathlib? Is there anything written that I could read that discusses this, and similar (like 1/0 = 0), choices? Perhaps using the some/none approach. I read about this online somewhere and now I can't find a link to it.

Any responses will be appreciated.

Scott Morrison (Aug 04 2023 at 03:23):

https://xenaproject.wordpress.com/2020/07/05/division-by-zero-in-type-theory-a-faq/

Kent Van Vels (Aug 04 2023 at 03:39):

Thanks. I remember that I read this earlier. I fall into the category of "OK so I can see that it can be made to work. But I still feel a bit uncomfortable about all this."

Mac Malone (Aug 04 2023 at 04:42):

Kent Van Vels said:

I know that C and C++ leave this (unsigned integer underflow) as "undefined"

It may be worth noting that while the C/C++ standard leaves some things "undefined" (i.e., implementation defined), the implementations themselves (e.g., GCC, Clang) do have defined behaviors for such special cases. Since Lean is both the language and the compiler, leaving it "undefined" is not an option -- there has to be some concrete implementation.

Eric Wieser (Aug 04 2023 at 04:42):

Kent Van Vels said:

I know that C and C++ leave this (unsigned integer underflow) as "undefined" and this provides some flexibility but also some "footguns" to the language. For example, for all unsigned integers u we have (u - 100) + 100 because, in some sense, integer underflow needn't be considered, it is undefined.

This is false, unsigned integer overflow is always defined as modular arithmetic

Mac Malone (Aug 04 2023 at 04:43):

@Eric Wieser Note that the example was underflow, not overflow.

Eric Wieser (Aug 04 2023 at 04:44):

The same comment applies

Eric Wieser (Aug 04 2023 at 04:44):

signed integer under/overflow is the one that is undefined, see Wikipedia

Mac Malone (Aug 04 2023 at 04:46):

The standard is a bit weird in how it phrases this, though, because it only makes reference to "overflow", not "underflow", but the behavior mentioned )modulo the max + 1) does seem to logically apply to underflow as well (though I imagine a lawyer could argue their way out of it :rofl: ).

Kent Van Vels (Aug 04 2023 at 07:55):

@Eric Wieser Thanks the correction. I did get the unsigned/signed integer case backwards.

It doesn't refute your point in any way, @Mac Malone,but I am reasonably sure that GCC and Clang leave overflow undefined. This file exhibits undefined behavior and, interestingly, under different optimization level on g++-9 we get different outputs.

Indeed, on -O3 we get "yes" and on -O0 we get "no" printed to the console.

#include <iostream>
#include <limits>

bool test(int n){
  const int temp = n - 10000;
  return n >= temp;
}

int main(){
  const int n = std::numeric_limits<int>::min();
  if (test(n)) {
    std::cout << "yes\n";
  } else {
    std::cout << "no\n";
  }
  return 0;
}

Sebastian Ullrich (Aug 04 2023 at 08:09):

This is a common misconception: underflow is usually defined as what happens when your floats get too close to zero. You can't underflow an int, you can only negatively overflow it.

Anne Baanen (Aug 04 2023 at 10:26):

Kent Van Vels said:

Eric Wieser Thanks the correction. I did get the unsigned/signed integer case backwards.

It doesn't refute your point in any way, Mac Malone,but I am reasonably sure that GCC and Clang leave overflow undefined. This file exhibits undefined behavior and, interestingly, under different optimization level on g++-9 we get different outputs.

Signed integer overflow is undefined behaviour in the standard, but most compilers let you choose to define it, e.g. using the -fwrapv or -ftrapv flags.

Anne Baanen (Aug 04 2023 at 10:27):

(But we're getting somewhat off topic here!)

Eric Rodriguez (Aug 04 2023 at 10:34):

I will remark that sometimes these special values do help mathematically; if 1/0 != 0 then we would not have docs#Rat.distribSMul, which is a genuinely important result allowing us to remove a specific diamond to do with rat-algebras

Kevin Buzzard (Aug 04 2023 at 10:50):

All this underflow stuff is fine but at the end of the day the type of sub is X -> X -> X in Lean so you have to return a natural if you subtract two naturals and that's basically the end of the matter other than the argument over the junk value returned when the subtraction is mathematically invalid. Any other design decision (e.g. partial functions, returning option X etc) is harder to work with in practice, in a functional language. Moreover the junk value choice doesn't actually matter to mathematicians; all we're doing is defining a new function which extends subtraction. Errors are caught in other places, not in the act of subtraction itself; this is the counterintuitive part, because mathematicians usually catch the issue earlier.

Anne Baanen (Aug 04 2023 at 11:25):

One specific solution that I haven't seen come up too often in these sorts of discussion is the "bundled" one. Namely: supplying a proof at each point you mention subtraction is annoying, so why not make it part of the arguments? So we would write the type of subtraction as ∀ (a : ℕ), Fin (a+1) → ℕ. My gut feeling is that this is still more annoying than junk values, yet other parts of the library seem to fare no worse by doing this bundling. (For instance, docs#Ideal.map could just as well be defined for any function, not just ring homomorphisms.) I would love to learn why this is!

Kent Van Vels (Aug 04 2023 at 11:35):

I am thankful for everyone's input. I am just learning the library and I am impressed. I have been doing some digging and there are "partial-subtraction" psub and "partial-predecessor" function defined on the natural numbers. I will play around with that to get an idea of how burdensome the some/none paradigm is. Thanks!

Richard Copley (Aug 05 2023 at 17:33):

Mac Malone said:

It may be worth noting that while the C/C++ standard leaves some things "undefined" (i.e., implementation defined), the implementations themselves (e.g., GCC, Clang) do have defined behaviors for such special cases.

"Undefined behaviour" and "implementation defined" are very different! If a code path invokes undefined behaviour, the compiler can assume that the code path is unreachable. This enables certain optimizations. Here's an example on Compiler Explorer where a function's side effects are optimized away entirely.

Mario Carneiro (Aug 06 2023 at 00:37):

Note that lean also has "undefined behavior": obviously C code with UB is UB, but also False.elim: the compiler will optimize out branches where one side is a False.elim application


Last updated: Dec 20 2023 at 11:08 UTC