Zulip Chat Archive

Stream: lean4

Topic: FFI for fixed-length arrays


James Gallicchio (Apr 05 2022 at 08:27):

Okay, I got something working:

Lean: https://github.com/JamesGallicchio/LeanColls/blob/main/LeanColls/Array.lean
C: https://github.com/JamesGallicchio/LeanColls/blob/main/bindings/leancolls_array.c

Still need to implement the FFI interface for a few things, but I'm pretty happy it works at all :rofl:

That said, not sure if what I'm doing is the right approach. If someone has a moment to take a quick look, I'd greatly appreciate it!

Mario Carneiro (Apr 05 2022 at 08:31):

I think it would be better to have this be a wrapper around Array instead of List

Mario Carneiro (Apr 05 2022 at 08:32):

are there zero cost conversions to and from LeanColls.Array?

James Gallicchio (Apr 05 2022 at 08:35):

Mario Carneiro said:

I think it would be better to have this be a wrapper around Array instead of List

Part of me agrees, but part of me wants to completely replace the built-in array (see ArrayBuffer for UBA built on top of this Array), which I think means it is more reasonable to have a list representation. Unsure.

Mario Carneiro (Apr 05 2022 at 08:35):

That implementation seems like you would have no way to actually get the length

Mario Carneiro (Apr 05 2022 at 08:36):

I expect you will need the garbage collection functionalities of the original implementation even if it's basically a plain C array

James Gallicchio (Apr 05 2022 at 08:37):

Mario Carneiro said:

are there zero cost conversions to and from LeanColls.Array?

to/from LeanColls.Array and List? No, I'm not sure that's even possible (might be wrong?)

Mario Carneiro (Apr 05 2022 at 08:37):

but I assume that the existing Array implementation supports something that is essentially a single contiguous array, so you would just copy into a contiguous array when converting Array -> LeanColls.Array and then both conversions become zero cost after that

Mario Carneiro (Apr 05 2022 at 08:37):

no, zero cost between Array and LeanColls.Array

Mario Carneiro (Apr 05 2022 at 08:38):

that's why I suggested that LeanColls.Array wraps Array instead of List

James Gallicchio (Apr 05 2022 at 08:38):

Mario Carneiro said:

That implementation seems like you would have no way to actually get the length

Yeah, the length (where relevant) gets passed in to functions like it does in C world (because it has to be a parameter of the function). I'm still trying to figure out how to get Lean to erase it where irrelevant, though with aggressive enough inlining by the compiler it shouldn't matter

Mario Carneiro (Apr 05 2022 at 08:39):

wouldn't LC.Array need to be a dependent type for that?

Mario Carneiro (Apr 05 2022 at 08:39):

you declared it as just LC.Array A and not LC.Array A n

Mario Carneiro (Apr 05 2022 at 08:40):

oh whoops nvm

Mario Carneiro (Apr 05 2022 at 08:41):

you need a bounds check in leancolls_array_get, it can safely be called out of bounds but that yields UB in your implementation

James Gallicchio (Apr 05 2022 at 08:41):

Mario Carneiro said:

but I assume that the existing Array implementation supports something that is essentially a single contiguous array, so you would just copy into a contiguous array when converting Array -> LeanColls.Array and then both conversions become zero cost after that

Ah, yeah, I could implement conversions between Array and this one. Lean's built-in seems to have specializations for small types and I'm not entirely sure how that works yet, which is what discourages me atm...

James Gallicchio (Apr 05 2022 at 08:43):

I think the only way to call leancolls_array_get from Lean is by passing a Fin n which should ensure it's always within bounds, no?

Mario Carneiro (Apr 05 2022 at 08:45):

oh I see, you have some misleadingly named functions

Mario Carneiro (Apr 05 2022 at 08:46):

why does getExternal have a definition at all? It's a lie

Mario Carneiro (Apr 05 2022 at 08:46):

those functions shouldn't be private in any case

Mario Carneiro (Apr 05 2022 at 08:47):

you should just slap the extern directly on get

James Gallicchio (Apr 05 2022 at 08:47):

One thing I'm particularly unsure about is this function for ensuring all mutations are safe to perform in place. It seems like there's kind of two cases where you aren't allowed to modify in place:
1) The object is shared. Here I want the implementation to just panic, because using these Arrays persistently should be eagerly caught.
2) The object is persistent. It seems like constants in a program just get compiled to always be marked as persistent even when they're only used once? Which I'm not sure how to avoid

Mario Carneiro (Apr 05 2022 at 08:47):

persistent objects can't be modified, because they are shared with the whole program

Mario Carneiro (Apr 05 2022 at 08:48):

it would not be sound to do so even if they are "unshared" because the next time you call the function you will get that constant again but its value is different

James Gallicchio (Apr 05 2022 at 08:50):

Mario Carneiro said:

you should just slap the extern directly on get

Okay, so that was my first thought as well -- but there's kinda two steps happening here?

First step is to erase index/length parameters from Nat to USize (which is safe because of memory constraints). Second step is to call the extern'd function.

I wanted the getExternal to be a constant rather than a definition, but it can't automatically generate an Inhabited instance which made everything sad.

Mario Carneiro (Apr 05 2022 at 08:51):

you can just change the signature of the extern function to match get

James Gallicchio (Apr 05 2022 at 08:51):

Mario Carneiro said:

it would not be sound to do so even if they are "unshared" because the next time you call the function you will get that constant again but its value is different

What determines whether an expression is compiled to a persistent object? It seems kinda arbitrary

Mario Carneiro (Apr 05 2022 at 08:52):

Lean will automatically lift any subexpression with no free variables to a constant

Mario Carneiro (Apr 05 2022 at 08:53):

which basically means that Array.mk will probably be a copy from a global instead of a constructor

James Gallicchio (Apr 05 2022 at 08:54):

Mario Carneiro said:

you can just change the signature of the extern function to match get

Yeah. I'll try this out and see what it would look like.

James Gallicchio (Apr 05 2022 at 08:55):

Mario Carneiro said:

Lean will automatically lift any subexpression with no free variables to a constant

Huh, okay. What if I don't want it to do that..? I could see a use-case here where you don't want to have, say, an extra copy of a very large array floating around just because it's technically a closed expression...

Mario Carneiro (Apr 05 2022 at 08:56):

there is some trick here involving Unit -> A functions that I forget

Mario Carneiro (Apr 05 2022 at 08:57):

@Leonardo de Moura will know if there is a way to disable constant lifting in a function

James Gallicchio (Apr 05 2022 at 08:58):

It just seems like not what I'd expect/want as a programmer. If I want an expression to not be recomputed every call, I'll lift it manually.

Mario Carneiro (Apr 05 2022 at 08:59):

It's one of the advantages of pure functional programming. lots of subexpressions can be cached or precomputed

Mario Carneiro (Apr 05 2022 at 09:00):

I don't think you would want to lift it manually considering that there are a huge number of such liftings in the average lean program

Mario Carneiro (Apr 05 2022 at 09:00):

Most imperative languages lift constants if they can as well (C / C++ / Rust all do this)

James Gallicchio (Apr 05 2022 at 09:02):

Fair enough! Maybe the solution is to just be less aggressive with marking things persistent, maybe by making sure that the things which are marked are actually used twice

James Gallicchio (Apr 05 2022 at 09:02):

Anyways -- I'll try to get benchmarks down soon to see how the performance looks, before I even try to optimize the emitted bytecode

Sebastian Ullrich (Apr 05 2022 at 09:03):

Mario Carneiro said:

Leonardo de Moura will know if there is a way to disable constant lifting in a function

That's set_option compiler.extract_closed false

Mario Carneiro (Apr 05 2022 at 09:04):

Have you considered using a variation on Fin n that uses USize instead of Nat?

Mario Carneiro (Apr 05 2022 at 09:04):

that would avoid the boxing overhead

Mario Carneiro (Apr 05 2022 at 09:04):

and might actually be enough to get something that looks like a C loop over an array after enough inlining

Mario Carneiro (Apr 05 2022 at 09:05):

although I guess set is still hard to optimize

Sebastian Ullrich (Apr 05 2022 at 09:06):

Mario Carneiro said:

I believe it should be possible to bind to those functions directly, but it doesn't seem to be working:

...
-- could not find native implementation of external declaration 'Nat.toUSize!' (symbols 'l_Nat_toUSize_x21___boxed' or 'l_Nat_toUSize_x21')

The interpreter only speaks the always-boxed subset of the Lean ABI, so externs must always be compiled first before evaluation

Mario Carneiro (Apr 05 2022 at 09:07):

I tried putting it in a separate file but it didn't make a difference

Mario Carneiro (Apr 05 2022 at 09:07):

how does lean do it?

Mario Carneiro (Apr 05 2022 at 09:08):

maybe the interpreter is just ignoring those extern declarations on things like USize.add

Sebastian Ullrich (Apr 05 2022 at 09:08):

By compiling it first, into a binary named lean :)

Sebastian Ullrich (Apr 05 2022 at 09:09):

That is, you would need https://github.com/leanprover/lake/pull/47 to make it work in a different module

Mario Carneiro (Apr 05 2022 at 09:10):

BTW @James Gallicchio I'm also wondering whether these functions can instead be exposed as regular functions on Array instead of a new type

James Gallicchio (Apr 05 2022 at 15:41):

oh definitely -- the goal here was to sidestep Arrayentirely, and reimplement dynamic arrays in Lean

James Gallicchio (Apr 05 2022 at 15:43):

Mario Carneiro said:

Have you considered using a variation on Fin n that uses USize instead of Nat?

That's a good idea. {u : USize // u < n}. I'll think about it.

James Gallicchio (Apr 05 2022 at 15:53):

James Gallicchio said:

I wanted the getExternal to be a constant rather than a definition, but it can't automatically generate an Inhabited instance which made everything sad.

Bumping this -- is there a way to give Lean an inhabitant without calling that inhabitant the definition?

Jannis Limperg (Apr 05 2022 at 16:12):

private unsafe def getExternal [Inhabited α] ... : α

If you have an arbitrary type α, that's the only way to ensure inhabitation. But I think your primitive should rather take a proof that the index is in bounds and then you can give a reference implementation (in terms of lists) that doesn't need an Inhabited constraint.

James Gallicchio (Apr 05 2022 at 16:27):

Yeah, that makes sense. I guess this is the tradeoff we make to avoid soundness issues :/

James Gallicchio (Apr 05 2022 at 16:29):

It just feels .. slightly not ideal that, in Lean world, it "looks" super cheap to go from the list representation to the array representation, when in reality that is a very expensive operation

Mario Carneiro (Apr 05 2022 at 16:29):

I thought you didn't need Inhabited if the constant is unsafe

Mario Carneiro (Apr 05 2022 at 16:34):

go figure.

@[extern "foo"] unsafe constant something : α
-- failed to synthesize
--   Inhabited α

@[extern "foo"] unsafe constant loop : α := loop -- ok

Arthur Paulino (Apr 05 2022 at 17:15):

Mario Carneiro said:

@[extern "foo"] unsafe constant something : α
-- failed to synthesize
--   Inhabited α

I think that's a small bug

Leonardo de Moura (Apr 05 2022 at 18:29):

Yes, we can clearly skip the Inhabitant synthesis code for unsafe. The current implementation always executes the synthesis code.

Leonardo de Moura (Apr 05 2022 at 18:57):

@James Gallicchio Sorry for the delay. I had to take the day off yesterday.

Right now the behaviors just don't match up, which seems like a recipe for issues

The C and Lean behavior should match. Are you observing a mismatch in a particular example?

Maybe this does actually mod?

It should. We may have a bug in the current implementation, but our intention was to "mod" in C too.

James Gallicchio (Apr 05 2022 at 19:05):

Ah, no, I was reading the wrong thing there, sorry! It looks like it has the mod behavior in both C and Lean

Tomas Skrivan (Apr 05 2022 at 19:58):

What would be the best way to implement Fin n with USize? My main concern is: should n : USize or n : Nat or it does not matter?

My attempt:

structure UFin (n : USize) where
  val : USize
  property : val < n

@[export ufin_add]
def add {n} (x y : UFin n) : UFin n := ⟨(x.1 + y.1) % n, sorry
@[export ufin_sub]
def sub {n} (x y : UFin n) : UFin n := ⟨(x.1 + (n - y.1)) % n, sorry
@[export ufin_mul]
def mul {n} (x y : UFin n) : UFin n := ⟨(x.1 * y.1) % n, sorry
@[export ufin_div]
def div {n} (x y : UFin n) : UFin n := ⟨(x.1 / y.1), sorry

instance : Add (UFin n) := add
instance : Sub (UFin n) := sub
instance : Mul (UFin n) := mul
instance : Div (UFin n) := div

instance : ToString (UFin n) := λ n => toString n.1
instance : OfNat (UFin m) n := Nat.toUSize n % m, sorry

#eval (3 : UFin 5) - (4 : UFin 5)
#eval (3 : UFin 5) + (4 : UFin 5)

Example of the generated C code:

LEAN_EXPORT size_t ufin_sub(size_t x_1, size_t x_2, size_t x_3) {
_start:
{
size_t x_4; size_t x_5; size_t x_6;
x_4 = lean_usize_sub(x_1, x_3);
x_5 = lean_usize_add(x_2, x_4);
x_6 = lean_usize_mod(x_5, x_1);
return x_6;
}
}

With n : Nat the generated code is:

LEAN_EXPORT size_t ufin_sub(lean_object* x_1, size_t x_2, size_t x_3) {
_start:
{
size_t x_4; size_t x_5; size_t x_6; size_t x_7;
x_4 = lean_usize_of_nat(x_1);
x_5 = lean_usize_sub(x_4, x_3);
x_6 = lean_usize_add(x_2, x_5);
x_7 = lean_usize_modn(x_6, x_1);
lean_dec(x_1);
return x_7;
}
}

You probably want to avoid the reference counting lean_dec(x_1) and lean_usize_modn that unboxes Nat inside.

Tomas Skrivan (Apr 05 2022 at 19:59):

Also not sure if I got those operations right for n > sqrt(USize.size).

Sebastian Ullrich (Apr 05 2022 at 20:04):

If you care about performance, you definitely want to avoid that modulo (by directly proving val < n for each index access, most probably)

Tomas Skrivan (Apr 05 2022 at 20:08):

Right, so having x y : UFin n you would not write x + y but ⟨x.1 + y.1, ...⟩ where you provide a custom proof.

Mario Carneiro (Apr 05 2022 at 20:19):

it doesn't make much sense to add two array indexes anyway

Mario Carneiro (Apr 05 2022 at 20:21):

I would imagine that in user code you wouldn't have a UFin n in the first place if you are doing anything other than handling this index opaquely; you would just have a USize and combine this with the proof of val < n at the last minute when calling get

Tomas Skrivan (Apr 05 2022 at 20:33):

I'm quite often writing code like λ i => if (i=0) ∨ (i=(n-1)) then 0 else u[i+1] - u[i-1] where the type is automatically deduced to Fin n where n is the size of array u.

Mario Carneiro (Apr 05 2022 at 20:40):

there are redundant bounds checks in that expression though

James Gallicchio (Apr 06 2022 at 17:27):

I think that the built-in Array in Lean does the right thing here, where arrays use Nat for indexing (because it's way easier that way), but behind the scenes we are making (technically unsafe) assumptions that everything fits in USize because otherwise you'd have already hit an out-of-memory error

Mario Carneiro (Apr 07 2022 at 00:53):

the problem is that it impacts the ABI of the function if you use Nat - you don't get unboxed values anymore. This applies to both extern functions like get and also lean functions that just call each other normally: in addition to the cost of boxing and unboxing this is an optimization barrier, and you need a whole lot of things to go right for lean loops to look like C loops and trigger LLVM loop optimizations

James Gallicchio (Apr 07 2022 at 18:14):

Yeah. Maybe instead I implement the interface directly on USize and then provide a separate interface for using Nats with it

James Gallicchio (Apr 08 2022 at 11:35):

So, I'm revamping my implementation to work around some boxing costs with ArrayBuffer as defined.

I need a type whose ABI representation could be any arbitrary value (not just leanbox(0)). Does something like True serve that purpose? Its value per the ABI would be leanbox(0), but does it have to be? or can it be any arbitrary value of the right width?

James Gallicchio (Apr 08 2022 at 11:38):

If not -- is there a way to mark a recursor noncomputable?

inductive Uninit (α)
| uninit : Uninit α
| init : (α)  Uninit α

attribute [noncomputable] Uninit.rec

This doesn't work because noncomputable is not an attribute. But it would definitely suffice my use-case, since it is not possible to distinguish the two constructors in any produced program.

James Gallicchio (Apr 08 2022 at 11:40):

I could also go the extern route, but if I conjecture an Uninit type by constant then I can't conjecture values of that type since it can't prove the type is inhabited...

James Gallicchio (Apr 08 2022 at 11:47):

Having an Uninit type would be useful for other FFIs as well; it's nice to say at the type-level that some value is junk, instead of having to ensure everything is known everywhere

James Gallicchio (Apr 09 2022 at 13:09):

Also, should allocations for external objects go through lean_alloc or directly call malloc? And if I do use lean_alloc I assume the finalize function should just calllean_dealloc?

James Gallicchio (Apr 09 2022 at 13:25):

After poking around the codebase, I think my real question is why the small object allocator exists

Sebastian Ullrich (Apr 09 2022 at 13:27):

To make Lean go fast

James Gallicchio (Apr 09 2022 at 13:29):

I kinda assumed most malloc implementations already had paging for small allocations but that must be a wrong assumption

Sebastian Ullrich (Apr 09 2022 at 13:33):

mimalloc would not have been so successful then. The Lean allocator is very similar to it.

Sebastian Ullrich (Apr 09 2022 at 13:33):

So yes, Lean objects should always allocated with lean_alloc so that Lean uses the correct deallocator on them.

James Gallicchio (Apr 09 2022 at 13:56):

Actually, I think I am not able to use the lean allocator right now because I need a realloc. For now I'll just use malloc, might try my hand at implementing a realloc on Lean's allocator later

Sebastian Ullrich (Apr 09 2022 at 14:06):

I assume you are not talking about the lean_external_object itself then, those should have a fixed size I believe

James Gallicchio (Apr 09 2022 at 15:09):

Oh, yeah -- I'm using lean_alloc_external for the external object itself. But I wasn't sure if I should also use lean_alloc for the array allocation

Sebastian Ullrich (Apr 09 2022 at 16:56):

It's just a void * to Lean, you are responsible for it

James Gallicchio (Apr 09 2022 at 18:52):

James Gallicchio said:

I need a type whose ABI representation could be any arbitrary value (not just leanbox(0)). Does something like () serve that purpose? Its value per the ABI would be leanbox(0), but does it have to be?

RE: this, I think it just depends on whether Unit.rec actually checks its value, no? I know the compiler optimizes away matches with only one live branch, so I assume that's true here as well and Unit.rec doesn't check the value?


Last updated: Dec 20 2023 at 11:08 UTC