Zulip Chat Archive
Stream: lean4
Topic: runtime constructor layout
Yuri (Jul 07 2024 at 22:14):
While reading the constructor layout in the FFI documentation, I see that Char
is stored as a non-scalar field.
From the FFI manual:
structure S where
ptr_1 : Array Nat
usize_1 : USize
sc64_1 : UInt64
ptr_2 : { x : UInt64 // x > 0 } -- wrappers don't count as scalars
sc64_2 : Float -- `Float` is 64 bit
sc8_1 : Bool
sc16_1 : UInt16
sc8_2 : UInt8
sc64_3 : UInt64
usize_2 : USize
ptr_3 : Char -- trivial wrapper around `UInt32`
sc32_1 : UInt32
sc16_2 : UInt16
And the corresponding layout:
S.ptr_1 - lean_ctor_get(val, 0)
S.ptr_2 - lean_ctor_get(val, 1)
S.ptr_3 - lean_ctor_get(val, 2)
S.usize_1 - lean_ctor_get_usize(val, 3)
S.usize_2 - lean_ctor_get_usize(val, 4)
S.sc64_1 - lean_ctor_get_uint64(val, sizeof(void*)*5)
S.sc64_2 - lean_ctor_get_float(val, sizeof(void*)*5 + 8)
S.sc64_3 - lean_ctor_get_uint64(val, sizeof(void*)*5 + 16)
S.sc32_1 - lean_ctor_get_uint32(val, sizeof(void*)*5 + 24)
S.sc16_1 - lean_ctor_get_uint16(val, sizeof(void*)*5 + 28)
S.sc16_2 - lean_ctor_get_uint16(val, sizeof(void*)*5 + 30)
S.sc8_1 - lean_ctor_get_uint8(val, sizeof(void*)*5 + 32)
S.sc8_2 - lean_ctor_get_uint8(val, sizeof(void*)*5 + 33)
Considering that a Char
is a trivial constructor around UInt32
, why is it stored as a non-scalar field?
structure Char where
/-- The underlying unicode scalar value as a `UInt32`. -/
val : UInt32
/-- The value must be a legal codepoint. -/
valid : val.isValidChar
As I understand it, Prop
is erased and Char
becomes a trivial structure taking the shape of its only field, which is the UInt32 scalar.
I was expecting to see lean_ctor_get_uint32
right before S.sc32_1
.
Marcus Rossel (Jul 08 2024 at 11:55):
This is kind of a nonanswer, but as ptr_2
states: wrappers don't count as scalars
. Why that is the case ¯\_(ツ)_/¯
Yuri (Jul 08 2024 at 12:19):
Marcus Rossel said:
This is kind of a nonanswer, but as
ptr_2
states:wrappers don't count as scalars
. Why that is the case ¯\_(ツ)_/¯
It is my understanding that in some cases Prop
is not erased completely but passed as lean_box(0)
at runtime: effectively being ignored instead of erased. I guess this could explain why { x : UInt64 // x > 0 }
is a non-trivial structure.
But by this logic both Char
and Subtype
would be non-trivial:
structure Subtype {α : Sort u} (p : α → Prop) where
val : α
property : p val -- Prop
vs
structure Char where
val : UInt32
valid : val.isValidChar -- Prop
Yuri (Jul 08 2024 at 12:23):
Or maybe the simplest explanation is that this is an optimization to-be-done once the new code generator is in place.
Yuri (Jul 08 2024 at 12:28):
I just checked the generated code and when Subtype is used as a function parameter, it is 'trivial' and the runtime representation is α
.
Only when stored in a structure it is not seen as 'trivial'.
def mkAa (char: Char) (ptr_2 : { x : UInt64 // x > 0 }): S :=
S.mk
(Array.mkEmpty 0) -- ptr_1
0 -- usize_1 : USize
0 -- sc64_1 : UInt64
ptr_2 -- ptr_2 : { x : UInt64 // x > 0 }
0 -- sc64_2 : Float
false -- sc8_1 : Bool
0 -- sc16_1 : UInt16
0 -- sc8_2 : UInt8
0 -- sc64_3 : UInt64
0 -- usize_2 : USize
char -- ptr_3 : Char
0 -- sc32_1 : UInt32
0 -- sc16_2 : UInt16
And the generated code:
LEAN_EXPORT lean_object *l_mkAa(uint32_t x_1, uint64_t x_2)
{
_start:
{
size_t x_3;
uint64_t x_4;
uint16_t x_5;
uint8_t x_6;
uint32_t x_7;
lean_object *x_8;
double x_9;
uint8_t x_10;
lean_object *x_11;
lean_object *x_12;
lean_object *x_13;
x_3 = 0;
x_4 = 0;
x_5 = 0;
x_6 = 0;
x_7 = 0;
x_8 = l_mkAa___closed__1;
x_9 = l_mkAa___closed__2;
x_10 = 0;
x_11 = lean_box_uint64(x_2); // <--- boxing the `uint64_t` representation of the Subtype value
x_12 = lean_box_uint32(x_1);
x_13 = lean_alloc_ctor(0, 3, sizeof(size_t) * 2 + 34);
lean_ctor_set(x_13, 0, x_8);
lean_ctor_set(x_13, 1, x_11); // <--- setting in the boxed fields section of the constructor
lean_ctor_set(x_13, 2, x_12);
lean_ctor_set_usize(x_13, 3, x_3);
lean_ctor_set_uint64(x_13, sizeof(void *) * 5, x_4);
lean_ctor_set_float(x_13, sizeof(void *) * 5 + 8, x_9);
lean_ctor_set_uint8(x_13, sizeof(void *) * 5 + 32, x_10);
lean_ctor_set_uint16(x_13, sizeof(void *) * 5 + 28, x_5);
lean_ctor_set_uint8(x_13, sizeof(void *) * 5 + 33, x_6);
lean_ctor_set_uint64(x_13, sizeof(void *) * 5 + 16, x_4);
lean_ctor_set_usize(x_13, 4, x_3);
lean_ctor_set_uint32(x_13, sizeof(void *) * 5 + 24, x_7);
lean_ctor_set_uint16(x_13, sizeof(void *) * 5 + 30, x_5);
return x_13;
}
}
Marcus Rossel (Jul 08 2024 at 12:55):
Yuri de Wit said:
But by this logic both
Char
andSubtype
would be non-trivial.
And isn't that exactly how things are according to the documentation? (That is, I don't understand the problem.)
Yuri (Jul 08 2024 at 13:01):
Marcus Rossel said:
Yuri de Wit said:
But by this logic both
Char
andSubtype
would be non-trivial.And isn't that exactly how things are according to the documentation? (That is, I don't understand the problem.)
Maybe there is a problem. But my goal is to understand the generated code in the context of FFI and stumbled on this behavior.
Marcus Rossel (Jul 08 2024 at 13:01):
Ohhh, I see. I think the docs might be inconsistent. If I understand the following correctly, it does not match the documentation example you posted above:
An inductive type with a trivial structure, that is,
* it is none of the types described above
* it is not markedunsafe
* it has a single constructor with a single parameter of relevant typeis represented by the representation of that parameter's type.
For example,
{ x : α // p }
, theSubtype
structure of a value of typeα
and an irrelevant proof, is represented by the representation ofα
.
I feel like this suggests that Char
should be represented as a scalar. It is unclear to me what exactly counts as a "relevant" type though.
Yuri (Jul 08 2024 at 13:02):
Yes, this was the source of my confusion.
Marcus Rossel (Jul 08 2024 at 13:06):
@Mario Carneiro wrote the layout docs, so perhaps he has some insights on this?
Daniil Kisel (Jul 08 2024 at 13:07):
Marcus Rossel said:
It is unclear to me what exactly counts as a "relevant" type though.
They are described below in the same doc:
A universe
Sort u
, type constructor... → Sort u
, or propositionp : Prop
is irrelevant and is either statically erased (see above) or represented as alean_object *
with the runtime valuelean_box(0)
Marcus Rossel (Jul 08 2024 at 13:08):
Daniil Kisel said:
Marcus Rossel said:
It is unclear to me what exactly counts as a "relevant" type though.
They are described below in the same doc:
A universe
Sort u
, type constructor... → Sort u
, or propositionp : Prop
is irrelevant and is either statically erased (see above) or represented as alean_object *
with the runtime valuelean_box(0)
Yeah, I saw that, but wasn't sure if that's a complete list.
Yuri (Jul 08 2024 at 13:10):
My conclusion so far aside from potential doc fixes:
- trivial structures/inductives like Subtype and Char are represented as their underlying type when used as function parameters
- the structure layout does not see them as trivial and store them as regular boxed fields.
Yuri (Jul 08 2024 at 21:56):
It turns out that not only trivial structures are stored as boxed fields in a structure, but all of the scalars too: UInt8, UInt16, UInt32, UInt64, Float, USize.
def mkProd (x : UInt64) (y : UInt64) (z : Float) : (UInt64 × Float) :=
⟨ x + y, 2 * z ⟩
def mkProd2 (x : UInt8) (y : UInt8) (z : UInt16) : (UInt8 × UInt16) :=
⟨ x + y, 2 * z ⟩
def mkProd3 (x : USize) (y : Int) : (USize × Int) :=
⟨ x, y ⟩
LEAN_EXPORT lean_object* l_mkProd(uint64_t x_1, uint64_t x_2, double x_3) {
_start:
{
uint64_t x_4; double x_5; double x_6; lean_object* x_7; lean_object* x_8; lean_object* x_9;
x_4 = lean_uint64_add(x_1, x_2);
x_5 = l_mkProd___closed__1;
x_6 = lean_float_mul(x_5, x_3);
x_7 = lean_box_uint64(x_4);
x_8 = lean_box_float(x_6);
x_9 = lean_alloc_ctor(0, 2, 0);
lean_ctor_set(x_9, 0, x_7); // <-----
lean_ctor_set(x_9, 1, x_8); // <-----
return x_9;
}
}
LEAN_EXPORT lean_object* l_mkProd2(uint8_t x_1, uint8_t x_2, uint16_t x_3) {
_start:
{
uint8_t x_4; uint16_t x_5; uint16_t x_6; lean_object* x_7; lean_object* x_8; lean_object* x_9;
x_4 = lean_uint8_add(x_1, x_2);
x_5 = 2;
x_6 = lean_uint16_mul(x_5, x_3);
x_7 = lean_box(x_4);
x_8 = lean_box(x_6);
x_9 = lean_alloc_ctor(0, 2, 0);
lean_ctor_set(x_9, 0, x_7); // <-----
lean_ctor_set(x_9, 1, x_8); // <-----
return x_9;
}
}
LEAN_EXPORT lean_object* l_mkProd3(size_t x_1, lean_object* x_2) {
_start:
{
lean_object* x_3; lean_object* x_4;
x_3 = lean_box_usize(x_1);
x_4 = lean_alloc_ctor(0, 2, 0);
lean_ctor_set(x_4, 0, x_3); // <-----
lean_ctor_set(x_4, 1, x_2); // <-----
return x_4;
}
}
Mario Carneiro (Jul 08 2024 at 21:57):
I wrote that part of the docs exactly because it's nonintuitive and arguably a bug. There isn't bandwidth to fix it, but at least we can document it
Daniil Kisel (Jul 08 2024 at 21:58):
Yuri de Wit said:
It turns out that not only trivial structures are stored as boxed fields in a structure, but all of the scalars too: UInt8, UInt16, UInt32, UInt64, Float, USize.
They are stored boxed when their type is not fixed in the structure. For example, Prod
is generic and thus its arguments are always boxed.
Mario Carneiro (Jul 08 2024 at 21:59):
Yes, someone brought up that this was not sufficiently clear in the docs I wrote, as I used a nonparametric structure as an example
Yuri (Jul 08 2024 at 22:01):
@Daniil Kisel yes, you are right: it is the polymorphic case!
Mario Carneiro (Jul 08 2024 at 22:01):
for parametric types, all type parameters are stored boxed, and the type layout is not recomputed when you insert concrete types. So UInt32 x UInt32
will be stored as a ctor cell with two lean_object*
fields, each storing a boxed integer (i.e. using no additional allocation but with the shift left trick) and taking up 16 bytes of data space (plus the header)
Mario Carneiro (Jul 08 2024 at 22:01):
whereas a struct Foo := (x y : UInt32)
is stored as a ctor cell with 8 bytes of scalar space (plus the header)
Yuri (Jul 08 2024 at 22:04):
Mario Carneiro said:
whereas a
struct Foo := (x y : UInt32)
is stored as a ctor cell with 8 bytes of scalar space (plus the header)
LEAN_EXPORT lean_object *l_mkFoo(uint32_t x_1, uint32_t x_2)
{
_start:
{
lean_object *x_3;
x_3 = lean_alloc_ctor(0, 0, 8);
lean_ctor_set_uint32(x_3, 0, x_1);
lean_ctor_set_uint32(x_3, 4, x_2);
return x_3;
}
}
Mario Carneiro (Jul 08 2024 at 22:04):
and struct Foo := (x y : Char)
is again stored as 16 bytes of data space
Mario Carneiro (Jul 08 2024 at 22:05):
this is the "arguably a bug" part
Yuri (Jul 08 2024 at 22:05):
Mario Carneiro said:
I wrote that part of the docs exactly because it's nonintuitive and arguably a bug. There isn't bandwidth to fix it, but at least we can document it
@Mario Carneiro what bug are you referring to: the fact that trivial structures/inductives are not stored in constructors as their wrapped value? Or do you have something different in mind?
Mario Carneiro (Jul 08 2024 at 22:06):
even though Foo.mk : Char -> Char -> Foo
here will take its arguments unboxed as uint32_t
in the function call ABI
Mario Carneiro (Jul 08 2024 at 22:10):
@Yuri de Wit note, my response to your question outran the question
Yuri (Jul 08 2024 at 22:13):
For my own sake:
structure Foo2 := (x y : Char)
def mkFoo2 (x : Char) (y : Char) : Foo2 :=
{ x := x, y := y }
It is generated like this:
LEAN_EXPORT lean_object *l_mkFoo2(uint32_t x_1, uint32_t x_2)
{
_start:
{
lean_object *x_3;
lean_object *x_4;
lean_object *x_5;
x_3 = lean_box_uint32(x_1);
x_4 = lean_box_uint32(x_2);
x_5 = lean_alloc_ctor(0, 2, 0);
lean_ctor_set(x_5, 0, x_3);
lean_ctor_set(x_5, 1, x_4);
return x_5;
}
}
But should be something like (in addition to the get side):
LEAN_EXPORT lean_object *l_mkFoo2(uint32_t x_1, uint32_t x_2)
{
_start:
{
lean_object *x_3;
x_3 = lean_alloc_ctor(0, 0, 8);
lean_box_uint32(x_3, 0, x_1);
lean_box_uint32(x_3, 4, x_2);
return x_3;
}
}
Last updated: May 02 2025 at 03:31 UTC