Zulip Chat Archive

Stream: lean4

Topic: FFI, Rust and mutability


Arthur Paulino (Feb 19 2025 at 23:10):

Hi all,

I am creating Lean bindings for a Rust library whose main type works on the basis of mutation and doesn't implement Clone. Thus I can't create deep copies of these objects.

So the behavior I would naively get is as if by doing let set' := set.insert 42 I would end up with set' == set. That is, I mutated set:double_exclamation:

I know I am crawling into cursed territory here. But how would you model this at the Lean API level in order to prevent catastrophes?

I have a working solution right now, but I don't want to spoil it and cause unwanted bias. I'm looking for new approaches.

Thanks in advance!

Aaron Liu (Feb 19 2025 at 23:13):

I would treat it like a file, which also doesn't have Clone.

Arthur Paulino (Feb 19 2025 at 23:16):

What does treating it like a file look like? What do you mean, more specifically?

Aaron Liu (Feb 19 2025 at 23:20):

When accessing a file, you open a file handler, and then do stuff by calling methods on the file handler, and then when you are done, you close the file handler. Everything is through the file handler.

So, I guess what I mean is, use some kind of handler which will be your Lean object that can be duplicated.

Arthur Paulino (Feb 19 2025 at 23:23):

Like, you'd accumulate the operations in an array and then apply them all at once?

Aaron Liu (Feb 19 2025 at 23:25):

Put the operations in a monad, it what I meant, but that might work too.

Arthur Paulino (Feb 19 2025 at 23:27):

Hm, there's a caveat. These Rust functions that mutate the original object return things that I need in order to perform the next operations. So the accumulation approach won't work.

The monad approach follows the path I chose. Maybe there's no escape from a monad.

Chris Wong (Feb 20 2025 at 00:07):

I believe the two viable paths are linear types or monad, and Lean doesn't have the former, so it looks like our hand is forced here.

Arthur Paulino (Feb 20 2025 at 23:26):

I found a much simpler solution: return a tuple as if the function wouldn't perform inplace-mutation, keep this function private and then expose a macro that does the rewrite automatically :magic_wand:

No need for a monad. And the resulting (macro-expanded) code can be formally verified.

/--
**Mutates** the `ConstraintSystemBuilder` and returns a `ChannelId` and the
mutated `ConstraintSystemBuilder`.
-/
@[extern "c_constraint_system_builder_add_channel"]
private opaque addChannel : ConstraintSystemBuilder  ChannelId × ConstraintSystemBuilder

/-- Simulates inplace mutation with a syntactical overwrite. -/
macro "let" ch:ident "≪" "add_channel" csb:ident rst:term : term =>
  `(let ($ch, $csb) := addChannel $csb; $rst)

And then:

let a  add_channel builder

I'll be following this pattern to deal with Rust APIs, which are often based on mutation.

Eric Wieser (Feb 20 2025 at 23:36):

Can you share a toy example?

Arthur Paulino (Feb 20 2025 at 23:43):

Sure

/-- Imagine this function mutates the input `Nat` -/
private opaque cursedFunction : Nat  Bool × Nat

macro "let" bool:ident "≪" "cursed_function" nat:ident rst:term : term =>
  `(let ($bool, $nat) := cursedFunction $nat; $rst)

#check
  let someNat := 42
  let b  cursed_function someNat
  b

Aaron Liu (Feb 20 2025 at 23:48):

What happens if you try to use someNat after?

Arthur Paulino (Feb 20 2025 at 23:50):

It will contain the mutated value, which is expected, given the syntactical rewrite (which in fact did nothing because someNat was already mutated)

Arthur Paulino (Feb 20 2025 at 23:52):

What I want to prevent is:

let someNat := 42
let (x, y) := cursedFunction someNat
let (p, q) := cursedFunction someNat

Such that (x, y) != (p, q)

Arthur Paulino (Feb 20 2025 at 23:57):

You may ask if rewriting is indeed necessary. In my experiments, it's necessary otherwise Lean will cache the call to the cursedFunction and simulate idempotency behind the curtains. And also, it keeps the resulting expanded code healthy, as if cursedFunction wasn't cursed at all

Arthur Paulino (Feb 21 2025 at 00:01):

And here's a trick to satisfy the unused variable linter:

/-- Imagine this function mutates the input `Nat` -/
private opaque cursedFunction : Nat  Bool × Nat

macro "let" bool:ident "≪" "cursed_function" nat:ident rst:term : term =>
  `(let ($bool, $nat) := cursedFunction $nat; $rst)

macro "let" bool:ident "≪" "cursed_function" nat:ident "." rst:term : term =>
  `(let ($bool, $nat) := cursedFunction $nat; let _ := $nat; $rst)

#check
  let someNat := 42
  let b₁  cursed_function someNat
  let b₂  cursed_function someNat. -- no "unused variable `someNat`"!
  b₂ && b₁
-- let someNat := 42;
-- match cursedFunction someNat with
-- | (b₁, someNat) =>
--   match cursedFunction someNat with
--   | (b₂, someNat) =>
--     let x := someNat;
--     b₂ && b₁ : Bool

Aaron Liu (Feb 21 2025 at 00:03):

/-- Imagine this function mutates the input `Nat` -/
private opaque cursedFunction : Nat  Bool × Nat

macro "let" bool:ident "≪" "cursed_function" nat:ident rst:term : term =>
  `(let ($bool, $nat) := cursedFunction $nat; $rst)

#check
  let someNat := 42
  let b  cursed_function someNat
  by
    clear someNat
    exact someNat

Arthur Paulino (Feb 21 2025 at 00:10):

That's interesting. If you really want you can break the implementation. I could implement an elaborator to erase the previous value completely, but the current macro is enough for me not to shoot my own foot

Mario Carneiro (Feb 21 2025 at 00:27):

I think the better way to implement something like this if you really need to manage a linear object in lean is to wrap the external object in a lean_external_object which keeps track of a generation id (one immutable one per handle and one which is bumped every time you use a mutating method) in addition to the linear resource, and if you attempt to use a handle that is not current you panic and/or do nothing

Mario Carneiro (Feb 21 2025 at 00:28):

then the opaque you expose to lean code will actually take and return a handle

Mario Carneiro (Feb 21 2025 at 00:30):

if copying is possible but expensive you can also do what most objects in lean do and do the copy if the handle is shared when doing the mutating operation

Mario Carneiro (Feb 21 2025 at 00:30):

that way you don't have to panic but instead you "just" have a performance cliff

Mario Carneiro (Feb 21 2025 at 00:34):

Another way to implement sound linear values in lean, without generation ids, is for a handle to basically be a Option<Box<T>>, i.e. owning the linear resource or being null. Then when you perform a mutating operation you take the value out of the old handle and put it in the new one to be returned, and uses of the newly null out of date handle panic and/or do nothing

Arthur Paulino (Feb 21 2025 at 00:39):

My first ideas involved this panicking approach if the code wasn't right. But I would rather have some solution in which you can't get the code wrong (or you need to try slightly harder in order to do it)

Mario Carneiro (Feb 21 2025 at 00:39):

you can just use a state monad and don't provide any rollback function

Mario Carneiro (Feb 21 2025 at 00:40):

that's basically what IO does

Mario Carneiro (Feb 21 2025 at 00:40):

your let macro is just reinventing do notation and I'm not sure why

Mario Carneiro (Feb 21 2025 at 00:41):

the point of the panics is not just to avoid getting the code wrong, it's to make it sound to embed in lean at all

Mario Carneiro (Feb 21 2025 at 00:41):

mutating the input is unsound behavior

Mario Carneiro (Feb 21 2025 at 00:42):

but you can still treat the old values as if they still exist for logical reasoning purposes; the panics are there to ensure you can never observe the difference

Mario Carneiro (Feb 21 2025 at 00:44):

so the opaques are state-threading pure functions and you can axiomatize these if you want to reason about them, and the monad is there to guide people to correct usage of the functions when actually writing code, in the style of your macro

Aaron Liu (Feb 21 2025 at 00:48):

Have your working environment be a monad that can create, modify, and destroy instances of your type. Then, have all your accessors return a value wrapped in your monad. That way you never have direct access to this type that cannot be duplicated. Since you don't ever have it directly, you can never duplicate it. You can probably even do this without opaque by writing a coinductive definition for your monad and all the accessors.

Arthur Paulino (Feb 21 2025 at 00:49):

My previous solution was indeed a stateful monad. But this macro is more ergonomic.

Mario's input made me reconsider the panics. The best approach is probably a mix of the two

Arthur Paulino (Feb 21 2025 at 01:04):

Mario Carneiro said:

I think the better way to implement something like this if you really need to manage a linear object in lean is to wrap the external object in a lean_external_object which keeps track of a generation id (one immutable one per handle and one which is bumped every time you use a mutating method) in addition to the linear resource, and if you attempt to use a handle that is not current you panic and/or do nothing

Is there a place containing some documentation on these IDs?

Chris Wong (Feb 21 2025 at 01:39):

It might be my Haskell mindset leaking, but I don't think the ergonomic cost of monad notation is that high. Especially given that the Lean compiler itself is built around it.

Arthur Paulino (Feb 21 2025 at 02:00):

Monad notation is fine. Having to be in a monad... not as cool as not having to

Chris Wong (Feb 21 2025 at 03:10):

The generation idea seems clever, but if I'm reading it correctly, it might be less safe. You could mutate the object 2^64 times and then the counter would overflow. Whereas with a refcount you would need to deliberately leak memory to break it.

Arthur Paulino (Feb 21 2025 at 09:02):

You can also panic if the incremented counter is zero.

If you check lean.h, there are several other ways Lean panics if some variables reach a certain value.

Chris Wong (Feb 21 2025 at 09:20):

Yeah, but my point is, it would be reasonable to mutate something 2^64 times :stuck_out_tongue:

Mario Carneiro (Feb 23 2025 at 17:31):

if you try to mutate something 2^64 times your computer will break

Mario Carneiro (Feb 23 2025 at 17:32):

keep in mind that there are similar caveats about types like Nat, which claim to be infinite but I can clearly cause my computer to crash by evaluating graham's number

Mario Carneiro (Feb 23 2025 at 17:33):

lean's type theory is not sound wr.t. liveness of produced programs because of modeling decisions like this

Niels Voss (Feb 24 2025 at 03:34):

Assuming you mutated a variable 1 billion times per second it would take >500 years for the counter to overflow

Niels Voss (Feb 24 2025 at 04:09):

A while back when this thread was discussed #lean4 > Semantics of IO don't match provable properties? I was trying to think of if there was a way to get Linear Types into Lean without having to bolt on an extra typechecker like in Ground Zero (they have to write hott def instead of def to avoid inconsistencies). I looked up how Linear Haskell worked, and apparently, there's a new function type A #-> B which is like A -> B but only uses a once.

Do you think such a #-> could be created using opaque trickery? I was thinking that #-> could be an opaque type that is equal at runtime to -> known to satisfy the following axioms:

  • A function f : A #-> B can be converted to a function g : A -> B
  • Given f : A #-> B and g : B #-> C, there is a function gf : A #-> C, and this operation is associative.
  • If a type A in Lean has an instance of the Cloneable typeclass, which would be automatically derived for all inductive types except those that contain "outside" types like IO.RealWorld, a function f : A -> B can be converted to a function f : A #-> B
  • Given f : A #-> B x C and g : B #-> D and h : C #-> D, we can get a function of type A #-> D
  • Certain opaque functions are known to be linear, like we would have a opaque IO.println (s : String) : IO.RealWorld #-> IO.RealWorld.
  • Certain external functions only accept linear functions as inputs. (Without this, there would be no reason to ever use linear functions).

If we have this, we can then define a LStateM σ α := σ #-> α × σ. This also gets us a LIO version of the IO monad which respects the proving semantics.

This would let "bad" functions which use a variable twice (or have time-traveling tokens like in the IO thread) be defined but prevent them from ever actually being run.

Again, I have put very little thought into this, so I apologize if this is neither feasible nor useful.

Arthur Paulino (Feb 24 2025 at 15:29):

I was able to simplify the mutation counts. In fact, we don't even need to count mutations. This is enough

typedef struct {
    bool outdated;
    /* A reference to the underlying mutable object */
    void *object_ref;
    /* A pointer to a function that can free `object_ref` */
    void (*finalizer)(void *);
} linear_object;

finalizer calls a Rust function that triggers the default drop routine.

When the Rust object mutates, we do

static inline void mark_outdated(linear_object *linear) {
    linear->outdated = true;
}

If we want to create a new reference, we do

static inline linear_object *linear_bump(linear_object *linear) {
    mark_outdated(linear);
    return linear_object_init(linear->object_ref, linear->finalizer);
}

Before using the object, we need to

static inline void assert_linearity(linear_object *linear) {
    if (LEAN_UNLIKELY(linear->outdated)) {
        lean_internal_panic("Non-linear usage of linear object");
    }
}

And to free the object from memory:

static inline void free_linear_object(linear_object *linear) {
    // Only finalize `object_ref` if `linear` is the latest linear object reference.
    // By doing this, we avoid double-free attempts.
    if (LEAN_UNLIKELY(!linear->outdated)) {
        linear->finalizer(linear->object_ref);
    }
    free(linear);
}

As per the higher level guardrails in Lean itself, I've become unsatisfied with all solutions that don't involve core support for linear types.

Arthur Paulino (Feb 24 2025 at 15:43):

The solution involving syntactical manipulation via macros is the one with the least bad UX I've found so far, as there's no need to define things in monads. The monad approach becomes especially annoying when you need to do things in another monad, which then would need some kind of lifting.

Arthur Paulino (Feb 24 2025 at 15:45):

Is there a tactic that clears all "dead" binders from my context? I mean those with daggers at the end of their names.

Mario Carneiro (Feb 24 2025 at 16:06):

Arthur Paulino said:

If we want to create a new reference, we do

static inline linear_object *linear_bump(linear_object *linear) {
    mark_outdated(linear);
    return linear_object_init(linear->object_ref, linear->finalizer);
}

This one seems wrong, cloning a handle on the lean side shouldn't need anything to be marked as outdated. Think of it as lean having a Rc<T> and you do Rc::from_mut to optimistically get mutable access only when doing a mutating operation and Rc::clone everywhere else

Arthur Paulino (Feb 24 2025 at 16:12):

Yeah I've gone back and forth on that one. At first I had separated functions but in the end everytime I wanted to call the second function I had to call the first one first. So I made a function that does just that.

linear_bump is supposed to be called after the Rust object is mutated, as the old reference needs to be invalidated.

Arthur Paulino (Feb 24 2025 at 16:19):

These are the cases:

  • The Rust function needs &mut self: the C code does a linear_bump
  • The Rust function takes ownership of self: a wrapper Rust function calls std::mem::take and the C code just calls mark_outdated, not generating a new valid reference for Lean
  • The Rust function takes a shared reference &self: this is fine and the C code doesn't need to do anything so the Lean reference is healthy

Mario Carneiro (Feb 24 2025 at 17:05):

is there a reason you can't have outdated double up with object_ref and make it a nullable pointer?

Mario Carneiro (Feb 24 2025 at 17:06):

Do outdated objects still get finalized?

Arthur Paulino (Feb 24 2025 at 17:22):

Mario Carneiro said:

is there a reason you can't have outdated double up with object_ref and make it a nullable pointer?

Sorry, I don't follow the idea
Mario Carneiro said:

Do outdated objects still get finalized?

Yeah, they just don't get to free object_ref

Arthur Paulino (Feb 24 2025 at 17:42):

Mario Carneiro said:

is there a reason you can't have outdated double up with object_ref and make it a nullable pointer?

Oh I see now. Instead of using a bool to keep track of the data validity, we can just use objcet_ref and make it point to NULL to invalidate the linear_object. That's even better. Thanks!

Arthur Paulino (Mar 11 2025 at 00:14):

Is there a way to mark an opaque whose definition is extern (in C) to not cache at runtime? I need the C code to always run

Tomas Skrivan (Mar 11 2025 at 01:19):

Maybe something similar to what dbg_trace is doing?

@[never_extract, extern "lean_dbg_trace"]
def dbgTrace {α : Type u} (s : String) (f : Unit  α) : α := f ()

I'm just guessing here though.

Arthur Paulino (Mar 11 2025 at 01:22):

Whoa, you just hit the nail on the head. Thank you!!


Last updated: May 02 2025 at 03:31 UTC