Zulip Chat Archive

Stream: new members

Topic: type class or library of machine integers


Jonathan Protzenko (Jan 31 2023 at 23:22):

I am trying to reason about machine integers in a generic fashion... (UInt8, ..., USize)

all of these types have exactly the same operators (ofNatCore, size, val, etc.) but I don't seem to be able to find a type class that would allow me to write generic operations

this is what I came up with (sketch):

class MachineInteger (t: Type) where
  size: Nat
  val: t -> Fin size
  ofNatCore: (n:Nat) -> LT.lt n size -> t

-- @**Chris Bailey** showed me this style
set_option hygiene false in
run_cmd
  for typeName in [`UInt8, `UInt16, `UInt32, `UInt64, `USize].map Lean.mkIdent do
  Lean.Elab.Command.elabCommand ( `(
    namespace $typeName
    instance: MachineInteger $typeName where
      size := size
      val := val
      ofNatCore := ofNatCore
    end $typeName
  ))

-- Sample generic operation
def scalar_cast { src dst: Type } [ MachineInteger src ] [ MachineInteger dst ] (x: src): Option dst :=
  if h: MachineInteger.val x < MachineInteger.size dst then
    .some (MachineInteger.ofNatCore (MachineInteger.val x).val h)
  else
    .none

I'm particularly curious to hear @Sebastian Ullrich's thoughts on the matter, since I assume you must have done something comparable for electrolysis

Eric Wieser (Jan 31 2023 at 23:39):

Would it be a bad idea to use something like

structure UInt (size : Nat) where
  val : Fin size

def UInt8 := UInt UInt8.size
def UInt8.mk (x) : UInt8 := UInt.mk x

instead of the current

structure UInt8 where
  val : Fin UInt8.size

That might make this kind of generalization easier

Mario Carneiro (Jan 31 2023 at 23:40):

yes, that would be a bad idea

Mario Carneiro (Jan 31 2023 at 23:41):

these types exist mainly to facilitate ABI issues (in C type signatures and the like), and the compiler has a much harder time recognizing these types when a simple whnf doesn't yield a constant symbol

Mario Carneiro (Jan 31 2023 at 23:43):

the other thing is, we would want such a typeclass to have a really big interface, not a minimal one, including at least all the operations that have compiler builtins

Mario Carneiro (Jan 31 2023 at 23:43):

or we can just keep using that macro to stamp out lots of copies of the theorems and not bother with a typeclass

Jonathan Protzenko (Jan 31 2023 at 23:57):

makes sense... I guess I'm not bound to those types (which seem indeed to be compiler internals), so I could just redefine my own library of machine integers in the style that Eric proposes, which would indeed make things easier (and would remove one layer of perhaps unnecessary abstraction), but then I would lose the benefit of the operations on those builtin types, which are defined e.g. here https://github.com/leanprover-community/mathlib4/blob/master/Mathlib/Data/UInt.lean

Mario Carneiro (Feb 01 2023 at 00:44):

If you define your own machine integers, then they won't be machine integers anymore, which usually defeats the purpose for using machine integer types in the first place

Mario Carneiro (Feb 01 2023 at 00:45):

a typeclass is a better way to abstract over the types without actually redefining them

Mario Carneiro (Feb 01 2023 at 00:46):

(just to be clear, my "that would be a bad idea" was directed at Eric Wieser 's suggestion to change the definition of the UInt* types, not your suggestion of having a MachineInteger typeclass)

Jonathan Protzenko (Feb 01 2023 at 04:49):

ok, sounds like I'm on the right track then... how inefficient are Nats? right now for the Lean backend of Aeneas, I need to mimic Rust's semantics of error-on-overflow, which I do by running the operation on Nats, checking that it fits, then casting back to the right size, e.g.

def USize.checked_mul (n: USize) (m: USize): result USize :=
  if h: n.val * m.val < USize.size then
    .ret  n.val * m.val, h 
  else
    .fail integerOverflow

are Nats really running as succ (succ ...) like in Coq, or is there an efficient bignum implementation under the hood? if it's the former, I probably have a problem, but if it's the latter, it's probably fine for now

Henrik Böving (Feb 01 2023 at 06:07):

They have efficient bignum representations internally.

Mario Carneiro (Feb 01 2023 at 06:15):

A Nat is represented as a tagged pointer (i.e. the type says lean_object* but it's actually a usize with low bit 1) if it is less than 2^31 or 2^63 depending on architecture. For larger natural numbers, it falls back to a GMP bignum (or a homebrew bignum if you disable GMP) representation. In any case, lean 4 will never use unary representation for numerals (even though the inductive declaration suggests that it would), since the kernel is also clued in on this bignum stuff.

Mario Carneiro (Feb 01 2023 at 06:17):

It's still a bit less efficient than if you use actual fixed-width integer types since these don't need the tagging scheme

Mario Carneiro (Feb 01 2023 at 06:23):

That implementation of checked_mul sounds about as efficient as you are likely to get. The only thing I could see which might improve it would be to do the operation on the double-width fixed width integer type, when one exists, since a multiplication can't overflow that. For added fun, that gives you a nontrivial proof condition to show that it is as good as doing the multiplication over Nat

Sebastian Ullrich (Feb 01 2023 at 08:31):

Jonathan Protzenko said:

I'm particularly curious to hear Sebastian Ullrich's thoughts on the matter, since I assume you must have done something comparable for electrolysis

Hah, those were quite different times https://github.com/Kha/electrolysis/blob/master/thys/core/pre.lean#L74-L78. No built-in bounded nats to be found.

Jonathan Protzenko (Feb 01 2023 at 16:28):

Mario Carneiro said:

That implementation of checked_mul sounds about as efficient as you are likely to get. The only thing I could see which might improve it would be to do the operation on the double-width fixed width integer type, when one exists, since a multiplication can't overflow that. For added fun, that gives you a nontrivial proof condition to show that it is as good as doing the multiplication over Nat

yes that's how I would implement it in C, with the caveat that for the max type (e.g. uint64) you need a couple intrinsics to do that efficiently... (since there's no uint128 by default)

since this is bignums under the hood, I'm not worried about performance just now -- we don't intend to execute the Lean code, just verify it, and as long as the simplifier's performance isn't shot, I think we're good!

Jonathan Protzenko (Feb 01 2023 at 16:28):

No built-in bounded nats to be found.

different times indeed!


Last updated: Dec 20 2023 at 11:08 UTC