Zulip Chat Archive

Stream: lean4 dev

Topic: IntN Lean representation?

Andrew Kent (Dec 27 2023 at 23:48):

I'm starting to tinker with how signed fixed-width integer types IntN (similar to Lean's UIntN types) could be efficiently represented.

Any knee-jerk thoughts on how the underlying Lean representation (i.e., non-C definitions) might be best modeled?

At the moment I'm playing around with a type Fin2C n that is a sort of play on Fin but focused on supporting two's complement arithmetic:

/--
`Fin2C n` is an integer `i` with the constraint that `-n ≤ i < n`.
-/
structure Fin2C (n : Nat)where
  /-- If `i : Fin2C n`, then `i.val : ℤ` is the described number. It can also be
  written as `i.1` or just `i` when the target type is known. -/
  val  : Int
  /-- If `i : Fin2C n`, then `i.2` is a proof that `-n ≥ i.1`. -/
  isGe : LE.le (-↑n) val
  /-- If `i : Fin2C n`, then `i.3` is a proof that `i.1 < n`. -/
  isLt : LT.lt val n

from which you would define Int64 as Fin2C 2^63.

But a more generic, arbitrary finite interval over ℤ could also work:

/--
`FinZ n m` is an integer `i` with the constraint that `n ≤ i < n + m`.
-/
structure FinZ (n : Int) (m : Nat) where
  /-- If `i : FinZ n m`, then `i.val : ℤ` is the described number. It can also be
  written as `i.1` or just `i` when the target type is known. -/
  val  : Int
  /-- If `i : FinZ n m`, then `i.2` is a proof that `n ≤ i.1`. -/
  isGe : LE.le n val
  /-- If `i : FinZ n m`, then `i.3` is a proof that `i.1 < n + m`. -/
  isLt : LT.lt val (n + m)

from which you would define Int64 as FinZ -2^63 2^64.

Anyway, it's been a while since I've hacked at length in Lean and I'm not far enough along to have strong opinions about which approach is best... any experience-informed biases you have I'd love to hear =)

(I am just starting to define various Fin2C n defs/lemmas by translating corresponding Fin n definitions at the moment... hoping that trying both approaches will prove to be elucidating.)

Mario Carneiro (Dec 27 2023 at 23:51):

my recommendation would be to just have IntN := UIntN with different instances

Andrew Kent (Dec 28 2023 at 00:34):

Mario Carneiro said:

my recommendation would be to just have IntN := UIntN with different instances

Ah, that would gives us some nice representation properties "for free" (e.g., we immediately get the compiler using the correct size bitvectors I suspect).

Is that the right long term representation, though? I.e., would that make some proofs involving IntN values potentially more painful? I.e., since there's no trivial projection to either Nat or Int from an underlying UIntN/Fin representation of an IntN.

(Ah... but perhaps the last point is less of a concern since most proofs about IntN can't rely on Int theories/lemmas/etc? I haven't done tons of complex proofs over IntN types in practice... so I'm speculating a little here...)

Mario Carneiro (Dec 28 2023 at 00:35):

There is a natural map from UIntN to IntN which just wraps

Mario Carneiro (Dec 28 2023 at 00:35):

and this map is the identity function which is even nicer

Mario Carneiro (Dec 28 2023 at 00:36):

so the only thing that IntN really adds is the toInt function

Andrew Kent (Dec 28 2023 at 01:17):

Ah, yah so using UIntN under the hood for IntN means sign insensitive operations can just re-use UIntN logic:

addition
subtraction
multiplication
negation
bitwise AND
bitwise OR
bitwise XOR
bitwise NOT
left shift

and for sign sensitive operations, we would leverage IntN.toInt and IntN.ofInt definitions and then define those operations in whatever terms are most convenient, e.g.:

division := fun (n m : IntN) : IntN := IntN.ofInt (n.toInt / m.toInt)
modulo := fun (n m : IntN) : IntN := IntN.ofInt (n.toInt % m.toInt)
right shift ...
comparisons (except equality) ...

Is reasoning about those projections toInt/ofInt or how the UIntN operation semantics correspond to IntN values going to be too frustrating for lemmas about IntN types? Or is that a reasonable pill to have to swallow once so-to-speak (i.e., for each primitive IntN operations' lemmas) and then it's no longer an issue anyone has to worry about?

James Gallicchio (Dec 28 2023 at 07:14):

I'm not sure how else you would do it. The bigger annoyance is that signed div/mod have really mathematically ugly semantics, which makes it a pain to spec.

Joe Hendrix (Dec 28 2023 at 07:21):

I did a quick test and it looks like a structure lets you reuse the compiler representation, but more safety on the operation. e.g.,

structure Int8 where
  ofUInt ::
  toUInt : UInt8
  deriving DecidableEq


def Int8.add (x y : Int8) : Int8 := ⟨x.asUInt + y.asUInt⟩

François G. Dorais (Dec 28 2023 at 09:07):

Don't forget comparison functions, which are also sign sensitive.

Henrik Böving (Dec 28 2023 at 14:20):

Joe Hendrix said:

I did a quick test and it looks like a structure lets you reuse the compiler representation, but more safety on the operation. e.g.,
structure Int8 where
  ofUInt ::
  toUInt : UInt8
  deriving DecidableEq


def Int8.add (x y : Int8) : Int8 := ⟨x.asUInt + y.asUInt⟩

Yes, in general if your struct has one runtime relevant field and any number of irrelevant fields (such as types, proofs etc.) it will be represented as just the runtime relvant field without any wrapping.

François G. Dorais (Dec 28 2023 at 14:41):

You can also avoid some indirection using structure Int8 extends UInt8 but there isn't much to gain in this case.

Tomas Skrivan (Jan 02 2024 at 08:34):

I have an implementation of Int64 but without a spec. (... I should have used UInt64 instead of USize)

Mario Carneiro (Jan 02 2024 at 08:35):

maybe rename it to ISize?

Tomas Skrivan (Jan 02 2024 at 08:37):

Sure if you really care to have the same size as USize but I would rather use the familiar Int64 and use UInt64 for the internal representation.

Andrew Kent (Jan 02 2024 at 16:05):

Anyone know off the top of their head how to safely cast an int8_t to a bitwise-identical uint8_t?

I was wondering how one nicely uses these these sorts of representations that use the unsigned ints under the hood (e.g., here) with native C/C++ code that uses actual signed integers explicitly... and at least this morning all I'm coming up with are various expressions that I think end up including undefined behavior, lol: reinterpret_cast<uint8_t&>((int8_t&)i8), *reinterpret_cast<uint8_t*>((int8_t*)&i8).

I'll have to look at this more tonight...

James Gallicchio (Jan 02 2024 at 17:34):

I think in C at least, int8_t x = ...; uint8_t y = (uint8_t) x; is not UB? I have no familiarity with C++ though.

Joe Hendrix (Jan 02 2024 at 19:54):

It's a shame I can't find an official C++ standard to link to, but the unofficial standards seem to define casts to unsigned as using modulo arithmetic to preserve the expected bit pattern, (uint8_t) should always be correct.

Andrew Kent (Jan 02 2024 at 22:22):

Joe Hendrix said:

It's a shame I can't find an official C++ standard to link to, but the unofficial standards seem to define casts to unsigned as using modulo arithmetic to preserve the expected bit pattern, (uint8_t) should always be correct.

Ah, okay. I may have read something that got me a little paranoid here. The C-style casts do seem to work in practice… wish it was easier to find a confident/official answer.

Sebastian Ullrich (Jan 02 2024 at 22:32):

I believe that's just standard conversion? https://en.cppreference.com/w/cpp/language/implicit_conversion#Integral_conversions

Andrew Kent (Jan 02 2024 at 23:41):

Sebastian Ullrich said:

I believe that's just standard conversion? https://en.cppreference.com/w/cpp/language/implicit_conversion#Integral_conversions

Ah, thanks for the reference link. I think you're right.

According to that ref (and experiments seem to confirm):

intN_t => uintN_t is what we'd intuitively think/expect
uintN_t => intN_t is technically implementation defined pre C++20 but specified to be what we'd expect C++20 and beyond

Sounds like a non-issue then :octopus: (And perhaps you'd want some tests to confirm the compiler implementation does the latter case as expected if we're not using -std=c++20 or more recent...?)

Tobias Grosser (Jan 03 2024 at 03:09):

I did not look up references, but AFAIR the only way that always works is to cast to an array of chars, which alias which everything and then do a memcpy. A good compiler should optimize this away.

Andrew Kent (Jan 10 2024 at 16:13):

RFC Github issue is up: https://github.com/leanprover/lean4/issues/3162

James Gallicchio (Jan 16 2024 at 04:19):

@Thea Brick said she has an implementation generic across the bit count (originally for wasm) that might be useful here as a model for the IntXs

Thea Brick (Jan 16 2024 at 04:24):

James Gallicchio said:

Thea Brick said she has an implementation generic across the bit count (originally for wasm) that might be useful here as a model for the IntXs

link: https://github.com/T-Brick/Numbers

Last updated: May 02 2025 at 03:31 UTC