Zulip Chat Archive
Stream: lean4
Topic: FFI, Rust and mutability
Arthur Paulino (Feb 19 2025 at 23:10):
Hi all,
I am creating Lean bindings for a Rust library whose main type works on the basis of mutation and doesn't implement Clone
. Thus I can't create deep copies of these objects.
So the behavior I would naively get is as if by doing let set' := set.insert 42
I would end up with set' == set
. That is, I mutated set
:double_exclamation:
I know I am crawling into cursed territory here. But how would you model this at the Lean API level in order to prevent catastrophes?
I have a working solution right now, but I don't want to spoil it and cause unwanted bias. I'm looking for new approaches.
Thanks in advance!
Aaron Liu (Feb 19 2025 at 23:13):
I would treat it like a file, which also doesn't have Clone
.
Arthur Paulino (Feb 19 2025 at 23:16):
What does treating it like a file look like? What do you mean, more specifically?
Aaron Liu (Feb 19 2025 at 23:20):
When accessing a file, you open a file handler, and then do stuff by calling methods on the file handler, and then when you are done, you close the file handler. Everything is through the file handler.
So, I guess what I mean is, use some kind of handler which will be your Lean object that can be duplicated.
Arthur Paulino (Feb 19 2025 at 23:23):
Like, you'd accumulate the operations in an array and then apply them all at once?
Aaron Liu (Feb 19 2025 at 23:25):
Put the operations in a monad, it what I meant, but that might work too.
Arthur Paulino (Feb 19 2025 at 23:27):
Hm, there's a caveat. These Rust functions that mutate the original object return things that I need in order to perform the next operations. So the accumulation approach won't work.
The monad approach follows the path I chose. Maybe there's no escape from a monad.
Chris Wong (Feb 20 2025 at 00:07):
I believe the two viable paths are linear types or monad, and Lean doesn't have the former, so it looks like our hand is forced here.
Arthur Paulino (Feb 20 2025 at 23:26):
I found a much simpler solution: return a tuple as if the function wouldn't perform inplace-mutation, keep this function private and then expose a macro that does the rewrite automatically :magic_wand:
No need for a monad. And the resulting (macro-expanded) code can be formally verified.
/--
**Mutates** the `ConstraintSystemBuilder` and returns a `ChannelId` and the
mutated `ConstraintSystemBuilder`.
-/
@[extern "c_constraint_system_builder_add_channel"]
private opaque addChannel : ConstraintSystemBuilder → ChannelId × ConstraintSystemBuilder
/-- Simulates inplace mutation with a syntactical overwrite. -/
macro "let" ch:ident "≪" "add_channel" csb:ident rst:term : term =>
`(let ($ch, $csb) := addChannel $csb; $rst)
And then:
let a ≪ add_channel builder
I'll be following this pattern to deal with Rust APIs, which are often based on mutation.
Eric Wieser (Feb 20 2025 at 23:36):
Can you share a toy example?
Arthur Paulino (Feb 20 2025 at 23:43):
Sure
/-- Imagine this function mutates the input `Nat` -/
private opaque cursedFunction : Nat → Bool × Nat
macro "let" bool:ident "≪" "cursed_function" nat:ident rst:term : term =>
`(let ($bool, $nat) := cursedFunction $nat; $rst)
#check
let someNat := 42
let b ≪ cursed_function someNat
b
Aaron Liu (Feb 20 2025 at 23:48):
What happens if you try to use someNat
after?
Arthur Paulino (Feb 20 2025 at 23:50):
It will contain the mutated value, which is expected, given the syntactical rewrite (which in fact did nothing because someNat
was already mutated)
Arthur Paulino (Feb 20 2025 at 23:52):
What I want to prevent is:
let someNat := 42
let (x, y) := cursedFunction someNat
let (p, q) := cursedFunction someNat
Such that (x, y) != (p, q)
Arthur Paulino (Feb 20 2025 at 23:57):
You may ask if rewriting is indeed necessary. In my experiments, it's necessary otherwise Lean will cache the call to the cursedFunction and simulate idempotency behind the curtains. And also, it keeps the resulting expanded code healthy, as if cursedFunction
wasn't cursed at all
Arthur Paulino (Feb 21 2025 at 00:01):
And here's a trick to satisfy the unused variable linter:
/-- Imagine this function mutates the input `Nat` -/
private opaque cursedFunction : Nat → Bool × Nat
macro "let" bool:ident "≪" "cursed_function" nat:ident rst:term : term =>
`(let ($bool, $nat) := cursedFunction $nat; $rst)
macro "let" bool:ident "≪" "cursed_function" nat:ident "." rst:term : term =>
`(let ($bool, $nat) := cursedFunction $nat; let _ := $nat; $rst)
#check
let someNat := 42
let b₁ ≪ cursed_function someNat
let b₂ ≪ cursed_function someNat. -- no "unused variable `someNat`"!
b₂ && b₁
-- let someNat := 42;
-- match cursedFunction someNat with
-- | (b₁, someNat) =>
-- match cursedFunction someNat with
-- | (b₂, someNat) =>
-- let x := someNat;
-- b₂ && b₁ : Bool
Aaron Liu (Feb 21 2025 at 00:03):
/-- Imagine this function mutates the input `Nat` -/
private opaque cursedFunction : Nat → Bool × Nat
macro "let" bool:ident "≪" "cursed_function" nat:ident rst:term : term =>
`(let ($bool, $nat) := cursedFunction $nat; $rst)
#check
let someNat := 42
let b ≪ cursed_function someNat
by
clear someNat
exact someNat
Arthur Paulino (Feb 21 2025 at 00:10):
That's interesting. If you really want you can break the implementation. I could implement an elaborator to erase the previous value completely, but the current macro is enough for me not to shoot my own foot
Mario Carneiro (Feb 21 2025 at 00:27):
I think the better way to implement something like this if you really need to manage a linear object in lean is to wrap the external object in a lean_external_object
which keeps track of a generation id (one immutable one per handle and one which is bumped every time you use a mutating method) in addition to the linear resource, and if you attempt to use a handle that is not current you panic and/or do nothing
Mario Carneiro (Feb 21 2025 at 00:28):
then the opaque
you expose to lean code will actually take and return a handle
Mario Carneiro (Feb 21 2025 at 00:30):
if copying is possible but expensive you can also do what most objects in lean do and do the copy if the handle is shared when doing the mutating operation
Mario Carneiro (Feb 21 2025 at 00:30):
that way you don't have to panic but instead you "just" have a performance cliff
Mario Carneiro (Feb 21 2025 at 00:34):
Another way to implement sound linear values in lean, without generation ids, is for a handle to basically be a Option<Box<T>>
, i.e. owning the linear resource or being null. Then when you perform a mutating operation you take the value out of the old handle and put it in the new one to be returned, and uses of the newly null out of date handle panic and/or do nothing
Arthur Paulino (Feb 21 2025 at 00:39):
My first ideas involved this panicking approach if the code wasn't right. But I would rather have some solution in which you can't get the code wrong (or you need to try slightly harder in order to do it)
Mario Carneiro (Feb 21 2025 at 00:39):
you can just use a state monad and don't provide any rollback function
Mario Carneiro (Feb 21 2025 at 00:40):
that's basically what IO
does
Mario Carneiro (Feb 21 2025 at 00:40):
your let
macro is just reinventing do notation and I'm not sure why
Mario Carneiro (Feb 21 2025 at 00:41):
the point of the panics is not just to avoid getting the code wrong, it's to make it sound to embed in lean at all
Mario Carneiro (Feb 21 2025 at 00:41):
mutating the input is unsound behavior
Mario Carneiro (Feb 21 2025 at 00:42):
but you can still treat the old values as if they still exist for logical reasoning purposes; the panics are there to ensure you can never observe the difference
Mario Carneiro (Feb 21 2025 at 00:44):
so the opaques are state-threading pure functions and you can axiomatize these if you want to reason about them, and the monad is there to guide people to correct usage of the functions when actually writing code, in the style of your macro
Aaron Liu (Feb 21 2025 at 00:48):
Have your working environment be a monad that can create, modify, and destroy instances of your type. Then, have all your accessors return a value wrapped in your monad. That way you never have direct access to this type that cannot be duplicated. Since you don't ever have it directly, you can never duplicate it. You can probably even do this without opaque
by writing a coinductive definition for your monad and all the accessors.
Arthur Paulino (Feb 21 2025 at 00:49):
My previous solution was indeed a stateful monad. But this macro is more ergonomic.
Mario's input made me reconsider the panics. The best approach is probably a mix of the two
Arthur Paulino (Feb 21 2025 at 01:04):
Mario Carneiro said:
I think the better way to implement something like this if you really need to manage a linear object in lean is to wrap the external object in a
lean_external_object
which keeps track of a generation id (one immutable one per handle and one which is bumped every time you use a mutating method) in addition to the linear resource, and if you attempt to use a handle that is not current you panic and/or do nothing
Is there a place containing some documentation on these IDs?
Chris Wong (Feb 21 2025 at 01:39):
It might be my Haskell mindset leaking, but I don't think the ergonomic cost of monad notation is that high. Especially given that the Lean compiler itself is built around it.
Arthur Paulino (Feb 21 2025 at 02:00):
Monad notation is fine. Having to be in a monad... not as cool as not having to
Chris Wong (Feb 21 2025 at 03:10):
The generation idea seems clever, but if I'm reading it correctly, it might be less safe. You could mutate the object 2^64 times and then the counter would overflow. Whereas with a refcount you would need to deliberately leak memory to break it.
Arthur Paulino (Feb 21 2025 at 09:02):
You can also panic if the incremented counter is zero.
If you check lean.h, there are several other ways Lean panics if some variables reach a certain value.
Chris Wong (Feb 21 2025 at 09:20):
Yeah, but my point is, it would be reasonable to mutate something 2^64 times :stuck_out_tongue:
Mario Carneiro (Feb 23 2025 at 17:31):
if you try to mutate something 2^64 times your computer will break
Mario Carneiro (Feb 23 2025 at 17:32):
keep in mind that there are similar caveats about types like Nat
, which claim to be infinite but I can clearly cause my computer to crash by evaluating graham's number
Mario Carneiro (Feb 23 2025 at 17:33):
lean's type theory is not sound wr.t. liveness of produced programs because of modeling decisions like this
Niels Voss (Feb 24 2025 at 03:34):
Assuming you mutated a variable 1 billion times per second it would take >500 years for the counter to overflow
Niels Voss (Feb 24 2025 at 04:09):
A while back when this thread was discussed #lean4 > Semantics of IO don't match provable properties? I was trying to think of if there was a way to get Linear Types into Lean without having to bolt on an extra typechecker like in Ground Zero (they have to write hott def
instead of def
to avoid inconsistencies). I looked up how Linear Haskell worked, and apparently, there's a new function type A #-> B
which is like A -> B
but only uses a
once.
Do you think such a #->
could be created using opaque trickery? I was thinking that #->
could be an opaque type that is equal at runtime to ->
known to satisfy the following axioms:
- A function
f : A #-> B
can be converted to a functiong : A -> B
- Given
f : A #-> B
andg : B #-> C
, there is a functiongf : A #-> C
, and this operation is associative. - If a type
A
in Lean has an instance of theCloneable
typeclass, which would be automatically derived for all inductive types except those that contain "outside" types likeIO.RealWorld
, a functionf : A -> B
can be converted to a functionf : A #-> B
- Given
f : A #-> B x C
andg : B #-> D
andh : C #-> D
, we can get a function of typeA #-> D
- Certain opaque functions are known to be linear, like we would have a
opaque IO.println (s : String) : IO.RealWorld #-> IO.RealWorld
. - Certain external functions only accept linear functions as inputs. (Without this, there would be no reason to ever use linear functions).
If we have this, we can then define a LStateM σ α := σ #-> α × σ
. This also gets us a LIO
version of the IO
monad which respects the proving semantics.
This would let "bad" functions which use a variable twice (or have time-traveling tokens like in the IO thread) be defined but prevent them from ever actually being run.
Again, I have put very little thought into this, so I apologize if this is neither feasible nor useful.
Arthur Paulino (Feb 24 2025 at 15:29):
I was able to simplify the mutation counts. In fact, we don't even need to count mutations. This is enough
typedef struct {
bool outdated;
/* A reference to the underlying mutable object */
void *object_ref;
/* A pointer to a function that can free `object_ref` */
void (*finalizer)(void *);
} linear_object;
finalizer
calls a Rust function that triggers the default drop
routine.
When the Rust object mutates, we do
static inline void mark_outdated(linear_object *linear) {
linear->outdated = true;
}
If we want to create a new reference, we do
static inline linear_object *linear_bump(linear_object *linear) {
mark_outdated(linear);
return linear_object_init(linear->object_ref, linear->finalizer);
}
Before using the object, we need to
static inline void assert_linearity(linear_object *linear) {
if (LEAN_UNLIKELY(linear->outdated)) {
lean_internal_panic("Non-linear usage of linear object");
}
}
And to free the object from memory:
static inline void free_linear_object(linear_object *linear) {
// Only finalize `object_ref` if `linear` is the latest linear object reference.
// By doing this, we avoid double-free attempts.
if (LEAN_UNLIKELY(!linear->outdated)) {
linear->finalizer(linear->object_ref);
}
free(linear);
}
As per the higher level guardrails in Lean itself, I've become unsatisfied with all solutions that don't involve core support for linear types.
Arthur Paulino (Feb 24 2025 at 15:43):
The solution involving syntactical manipulation via macros is the one with the least bad UX I've found so far, as there's no need to define things in monads. The monad approach becomes especially annoying when you need to do things in another monad, which then would need some kind of lifting.
Arthur Paulino (Feb 24 2025 at 15:45):
Is there a tactic that clears all "dead" binders from my context? I mean those with daggers at the end of their names.
Mario Carneiro (Feb 24 2025 at 16:06):
Arthur Paulino said:
If we want to create a new reference, we do
static inline linear_object *linear_bump(linear_object *linear) { mark_outdated(linear); return linear_object_init(linear->object_ref, linear->finalizer); }
This one seems wrong, cloning a handle on the lean side shouldn't need anything to be marked as outdated. Think of it as lean having a Rc<T>
and you do Rc::from_mut
to optimistically get mutable access only when doing a mutating operation and Rc::clone
everywhere else
Arthur Paulino (Feb 24 2025 at 16:12):
Yeah I've gone back and forth on that one. At first I had separated functions but in the end everytime I wanted to call the second function I had to call the first one first. So I made a function that does just that.
linear_bump
is supposed to be called after the Rust object is mutated, as the old reference needs to be invalidated.
Arthur Paulino (Feb 24 2025 at 16:19):
These are the cases:
- The Rust function needs
&mut self
: the C code does alinear_bump
- The Rust function takes ownership of
self
: a wrapper Rust function callsstd::mem::take
and the C code just callsmark_outdated
, not generating a new valid reference for Lean - The Rust function takes a shared reference
&self
: this is fine and the C code doesn't need to do anything so the Lean reference is healthy
Mario Carneiro (Feb 24 2025 at 17:05):
is there a reason you can't have outdated
double up with object_ref
and make it a nullable pointer?
Mario Carneiro (Feb 24 2025 at 17:06):
Do outdated objects still get finalized?
Arthur Paulino (Feb 24 2025 at 17:22):
Mario Carneiro said:
is there a reason you can't have
outdated
double up withobject_ref
and make it a nullable pointer?
Sorry, I don't follow the idea
Mario Carneiro said:
Do outdated objects still get finalized?
Yeah, they just don't get to free object_ref
Arthur Paulino (Feb 24 2025 at 17:42):
Mario Carneiro said:
is there a reason you can't have
outdated
double up withobject_ref
and make it a nullable pointer?
Oh I see now. Instead of using a bool
to keep track of the data validity, we can just use objcet_ref
and make it point to NULL
to invalidate the linear_object
. That's even better. Thanks!
Arthur Paulino (Mar 11 2025 at 00:14):
Is there a way to mark an opaque
whose definition is extern
(in C) to not cache at runtime? I need the C code to always run
Tomas Skrivan (Mar 11 2025 at 01:19):
Maybe something similar to what dbg_trace
is doing?
@[never_extract, extern "lean_dbg_trace"]
def dbgTrace {α : Type u} (s : String) (f : Unit → α) : α := f ()
I'm just guessing here though.
Arthur Paulino (Mar 11 2025 at 01:22):
Whoa, you just hit the nail on the head. Thank you!!
Last updated: May 02 2025 at 03:31 UTC