Zulip Chat Archive
Stream: lean4 dev
Topic: IntN Lean representation?
Andrew Kent (Dec 27 2023 at 23:48):
I'm starting to tinker with how signed fixed-width integer types IntN
(similar to Lean's UIntN
types) could be efficiently represented.
Any knee-jerk thoughts on how the underlying Lean representation (i.e., non-C definitions) might be best modeled?
At the moment I'm playing around with a type Fin2C n
that is a sort of play on Fin
but focused on supporting two's complement arithmetic:
/--
`Fin2C n` is an integer `i` with the constraint that `-n ≤ i < n`.
-/
structure Fin2C (n : Nat)where
/-- If `i : Fin2C n`, then `i.val : ℤ` is the described number. It can also be
written as `i.1` or just `i` when the target type is known. -/
val : Int
/-- If `i : Fin2C n`, then `i.2` is a proof that `-n ≥ i.1`. -/
isGe : LE.le (-↑n) val
/-- If `i : Fin2C n`, then `i.3` is a proof that `i.1 < n`. -/
isLt : LT.lt val n
from which you would define Int64
as Fin2C 2^63
.
But a more generic, arbitrary finite interval over ℤ could also work:
/--
`FinZ n m` is an integer `i` with the constraint that `n ≤ i < n + m`.
-/
structure FinZ (n : Int) (m : Nat) where
/-- If `i : FinZ n m`, then `i.val : ℤ` is the described number. It can also be
written as `i.1` or just `i` when the target type is known. -/
val : Int
/-- If `i : FinZ n m`, then `i.2` is a proof that `n ≤ i.1`. -/
isGe : LE.le n val
/-- If `i : FinZ n m`, then `i.3` is a proof that `i.1 < n + m`. -/
isLt : LT.lt val (n + m)
from which you would define Int64
as FinZ -2^63 2^64
.
Anyway, it's been a while since I've hacked at length in Lean and I'm not far enough along to have strong opinions about which approach is best... any experience-informed biases you have I'd love to hear =)
(I am just starting to define various Fin2C n
defs/lemmas by translating corresponding Fin n
definitions at the moment... hoping that trying both approaches will prove to be elucidating.)
Mario Carneiro (Dec 27 2023 at 23:51):
my recommendation would be to just have IntN := UIntN
with different instances
Andrew Kent (Dec 28 2023 at 00:34):
Mario Carneiro said:
my recommendation would be to just have
IntN := UIntN
with different instances
Ah, that would gives us some nice representation properties "for free" (e.g., we immediately get the compiler using the correct size bitvectors I suspect).
Is that the right long term representation, though? I.e., would that make some proofs involving IntN
values potentially more painful? I.e., since there's no trivial projection to either Nat
or Int
from an underlying UIntN
/Fin
representation of an IntN
.
(Ah... but perhaps the last point is less of a concern since most proofs about IntN
can't rely on Int
theories/lemmas/etc? I haven't done tons of complex proofs over IntN
types in practice... so I'm speculating a little here...)
Mario Carneiro (Dec 28 2023 at 00:35):
There is a natural map from UIntN
to IntN
which just wraps
Mario Carneiro (Dec 28 2023 at 00:35):
and this map is the identity function which is even nicer
Mario Carneiro (Dec 28 2023 at 00:36):
so the only thing that IntN
really adds is the toInt
function
Andrew Kent (Dec 28 2023 at 01:17):
Ah, yah so using UIntN
under the hood for IntN
means sign insensitive operations can just re-use UIntN logic:
- addition
- subtraction
- multiplication
- negation
- bitwise AND
- bitwise OR
- bitwise XOR
- bitwise NOT
- left shift
and for sign sensitive operations, we would leverage IntN.toInt
and IntN.ofInt
definitions and then define those operations in whatever terms are most convenient, e.g.:
- division
:= fun (n m : IntN) : IntN := IntN.ofInt (n.toInt / m.toInt)
- modulo
:= fun (n m : IntN) : IntN := IntN.ofInt (n.toInt % m.toInt)
- right shift ...
- comparisons (except equality) ...
Is reasoning about those projections toInt/ofInt or how the UIntN operation semantics correspond to IntN values going to be too frustrating for lemmas about IntN
types? Or is that a reasonable pill to have to swallow once so-to-speak (i.e., for each primitive IntN operations' lemmas) and then it's no longer an issue anyone has to worry about?
James Gallicchio (Dec 28 2023 at 07:14):
I'm not sure how else you would do it. The bigger annoyance is that signed div/mod have really mathematically ugly semantics, which makes it a pain to spec.
Joe Hendrix (Dec 28 2023 at 07:21):
I did a quick test and it looks like a structure lets you reuse the compiler representation, but more safety on the operation. e.g.,
structure Int8 where
ofUInt ::
toUInt : UInt8
deriving DecidableEq
def Int8.add (x y : Int8) : Int8 := ⟨x.asUInt + y.asUInt⟩
François G. Dorais (Dec 28 2023 at 09:07):
Don't forget comparison functions, which are also sign sensitive.
Henrik Böving (Dec 28 2023 at 14:20):
Joe Hendrix said:
I did a quick test and it looks like a structure lets you reuse the compiler representation, but more safety on the operation. e.g.,
structure Int8 where ofUInt :: toUInt : UInt8 deriving DecidableEq def Int8.add (x y : Int8) : Int8 := ⟨x.asUInt + y.asUInt⟩
Yes, in general if your struct has one runtime relevant field and any number of irrelevant fields (such as types, proofs etc.) it will be represented as just the runtime relvant field without any wrapping.
François G. Dorais (Dec 28 2023 at 14:41):
You can also avoid some indirection using structure Int8 extends UInt8
but there isn't much to gain in this case.
Tomas Skrivan (Jan 02 2024 at 08:34):
I have an implementation of Int64 but without a spec. (... I should have used UInt64 instead of USize)
Mario Carneiro (Jan 02 2024 at 08:35):
maybe rename it to ISize
?
Tomas Skrivan (Jan 02 2024 at 08:37):
Sure if you really care to have the same size as USize
but I would rather use the familiar Int64
and use UInt64
for the internal representation.
Andrew Kent (Jan 02 2024 at 16:05):
Anyone know off the top of their head how to safely cast an int8_t
to a bitwise-identical uint8_t
?
I was wondering how one nicely uses these these sorts of representations that use the unsigned ints under the hood (e.g., here) with native C/C++ code that uses actual signed integers explicitly... and at least this morning all I'm coming up with are various expressions that I think end up including undefined behavior, lol: reinterpret_cast<uint8_t&>((int8_t&)i8)
, *reinterpret_cast<uint8_t*>((int8_t*)&i8)
.
I'll have to look at this more tonight...
James Gallicchio (Jan 02 2024 at 17:34):
I think in C at least, int8_t x = ...; uint8_t y = (uint8_t) x;
is not UB? I have no familiarity with C++ though.
Joe Hendrix (Jan 02 2024 at 19:54):
It's a shame I can't find an official C++ standard to link to, but the unofficial standards seem to define casts to unsigned as using modulo arithmetic to preserve the expected bit pattern, (uint8_t)
should always be correct.
Andrew Kent (Jan 02 2024 at 22:22):
Joe Hendrix said:
It's a shame I can't find an official C++ standard to link to, but the unofficial standards seem to define casts to unsigned as using modulo arithmetic to preserve the expected bit pattern,
(uint8_t)
should always be correct.
Ah, okay. I may have read something that got me a little paranoid here. The C-style casts do seem to work in practice… wish it was easier to find a confident/official answer.
Sebastian Ullrich (Jan 02 2024 at 22:32):
I believe that's just standard conversion? https://en.cppreference.com/w/cpp/language/implicit_conversion#Integral_conversions
Andrew Kent (Jan 02 2024 at 23:41):
Sebastian Ullrich said:
I believe that's just standard conversion? https://en.cppreference.com/w/cpp/language/implicit_conversion#Integral_conversions
Ah, thanks for the reference link. I think you're right.
According to that ref (and experiments seem to confirm):
intN_t
=>uintN_t
is what we'd intuitively think/expectuintN_t
=>intN_t
is technically implementation defined pre C++20 but specified to be what we'd expect C++20 and beyond
Sounds like a non-issue then :octopus: (And perhaps you'd want some tests to confirm the compiler implementation does the latter case as expected if we're not using -std=c++20 or more recent...?)
Tobias Grosser (Jan 03 2024 at 03:09):
I did not look up references, but AFAIR the only way that always works is to cast to an array of chars, which alias which everything and then do a memcpy. A good compiler should optimize this away.
Andrew Kent (Jan 10 2024 at 16:13):
RFC Github issue is up: https://github.com/leanprover/lean4/issues/3162
James Gallicchio (Jan 16 2024 at 04:19):
@Thea Brick said she has an implementation generic across the bit count (originally for wasm) that might be useful here as a model for the IntX
s
Thea Brick (Jan 16 2024 at 04:24):
James Gallicchio said:
Thea Brick said she has an implementation generic across the bit count (originally for wasm) that might be useful here as a model for the
IntX
s
link: https://github.com/T-Brick/Numbers
Last updated: May 02 2025 at 03:31 UTC