Zulip Chat Archive

Stream: lean4 dev

Topic: Proposal: Customizable runtime representations

Parth Shastri (Jun 26 2025 at 17:21):

The Lean core has a handful of types that are equivalent to existing types but have alternate runtime representations. For example, String is equivalent to List Char but has an optimized runtime representation. This feature is implemented by special single-field structures where the constructor and projection are overridden to convert to and from the separate runtime representation. I propose that this feature be extended to users (e.g. library authors).

Currently, the only way to express this is to use an opaque type and opaque functions to go between the opaque type and the existing type. For the structure trick used by the core types, the reduction and eta laws make it so that round tripping is definitionally the identity (e.g. String.data ∘ .mk = id and .mk ∘ String.data = id hold by rfl). However, when using opaque types, the round trip can only be a propositional equality. While it is possible to work with this limitations for some types, it becomes extremely cumbersome when there are dependent functions out of the opaque type.
For example, consider an opaque MySigma type that has the same operations as Sigma α β (mk, fst, snd). While we may have the propositional equality fst (mk x y) = x, we cannot have the propositional equality snd (mk x y) = y since the lhs has type β (fst (mk x y)) while the rhs has type β x. Thus, we are forced to either cast or use HEq instead, which makes it harder to work with MySigma. The problem can become even worse if we use a more complicated type.

I see two approaches to remedy this problem. One is to support overriding the constructor and projection for arbitrary structures, which would require that the compiler check for the presence of overrides instead of only special casing the builtin types.
The other approach is to have a single structure that allows for swapping out the representation like

structure Encoded (eqv : α ≃ β) where
  encode ::
  decode : α

The same way that the compiler recognizes that the runtime representation of String should be string objects and not List Char, the compiler would treat the runtime representation of Encoded as β instead of α, using the functions in the equivalence as the implementations of encode and decode. Using an opaque β and the equivalence implemented by FFI, Encoded would allow custom runtime representations while having encode and decode definitionally round trip to the identity, solving the issue that arises when using opaque by itself.

Mac Malone (Jun 26 2025 at 17:25):

I believe this is lean4#2292.

Eric Wieser (Jun 26 2025 at 17:43):

Is this already possible to do today with @[implemented_by] on the constructor and projection?

Parth Shastri (Jun 26 2025 at 17:45):

Ah I didn't see that when I searched for existing discussions. In my testing, overriding fields in structures with more than one field also doesn't work, as the compiler emits lean_ctor_get even when the projections are overridden, so it isn't possible today.

Robin Arnez (Jun 26 2025 at 17:46):

There's a fun exception, which is that if you use MutQuot as a name instead of Encoded it will not do trivial structure optimizations

Robin Arnez (Jun 26 2025 at 17:47):

Only works on the old compiler though

Robin Arnez (Jun 26 2025 at 17:57):

but yeah the new compiler will definitely be able support this better, although I'd argue @[override_repr β] would be better than @[opaque_repr], e.g. because of unboxing

Parth Shastri (Jun 26 2025 at 17:59):

lean4#2292 does seem sufficient to write Encoded as it does more than just inhibit the trivial structure optimization, as the compiler does not normally seem to respect implemented_by or extern on projections. I don't understand the motivation behind inhibiting the (mk x).val = x optimization.

Robin Arnez (Jun 26 2025 at 18:03):

Parth Shastri schrieb:

motivation behind inhibiting the (mk x).val = x optimization

Which optimization are you referring to?

Robin Arnez (Jun 26 2025 at 18:06):

Parth Shastri schrieb:

does seem sufficient to write Encoded

It works but we'd certainly want a more complete approach, e.g. error if you try to add an implemented_by or extern to a constructor or projection of a type without such an attribute or making the constructors and projections noncomputable when using such an attribute to prevent misuse. Also specifying which repr to use would probably work better for future plans of unboxing

Parth Shastri (Jun 26 2025 at 18:07):

I'm referring to the test from lean4#2292: https://github.com/leanprover/lean4/pull/2292/files#diff-aaf48b1a7816072080ae55e2197fd72215ae78cb586a1bff7db0a2d36283a2f5

Cameron Zwarich (Jun 26 2025 at 22:21):

The specific check that lean4#2292 is about is (in the new compiler) this line of code:
https://github.com/leanprover/lean4/blob/b1a306cf696ead9724e9d3879885d4e119800fe6/src/Lean/Compiler/LCNF/MonoTypes.lean#L55

Parth Shastri (Jun 26 2025 at 23:52):

That's not enough right? Currently even if you don't run into the trivial structure case, you still cannot override projections.

structure S where
  x : Nat
  y : Nat

attribute [extern "S_mk"] S.mk
attribute [extern "S_x"] S.x
attribute [extern "S_y"] S.y

def S.casesOnImp {motive : S → _} s (mk : ∀ x y, motive ⟨x, y⟩) : motive s := mk s.x s.y
@[csimp] theorem S.casesOn_eq_casesOnImp : @casesOn = @casesOnImp := rfl

def test (s : S) : Nat := S.x s

Compiling this to C gives

LEAN_EXPORT lean_object* l_test(lean_object* x_1) {
_start:
{
lean_object* x_2;
x_2 = lean_ctor_get(x_1, 0);
lean_inc(x_2);
return x_2;
}
}

showing that extern did not actually work.
I think the key line is actually https://github.com/leanprover/lean4/blob/b1a306cf696ead9724e9d3879885d4e119800fe6/src/Lean/Compiler/LCNF/ToLCNF.lean#L668. It should probably be changed to only reduce the projection if its implementation isn't overridden.

Cameron Zwarich (Jun 27 2025 at 00:25):

I was just talking about the "trivial structure" part of lean4#2292. To do as you propose, we would probably need to make the change you mentioned as well. We would also need to support casesOn (which is turned into cases at this point) for these types, because we definitely don't want to support implemented_by for casesOn(which is regrettably currently used internally for computed fields).

Mario Carneiro (Jun 29 2025 at 14:36):

Cameron Zwarich said:

we definitely don't want to support implemented_by for casesOn(which is regrettably currently used internally for computed fields).

We could just say that computed fields are incompatible with opaque_repr. To my knowledge none of the built in overridden types e.g. UInt64, String, Array (which have the behavior I want @[opaque_repr] to generalize) have computed fields.

Mario Carneiro (Jun 29 2025 at 14:37):

If you don't want to override casesOn, then how would you compile them? Would it always unfold to an application of the projections and then the projections are overridden?

Parth Shastri (Jul 03 2025 at 03:02):

My actual use case is to implement coinductives that are represented at runtime by Thunks. As such, the constructor and projection of the structure would only be needed in automatically generated definitions like the corecursor, which would have their implementations overridden to use Thunks instead. I could therefore make the constructor and projection private, and not generate casesOn at all since I am manually creating the structure from meta code. Would this currently be sound, or are there any optimizations that look at the type of the field (function in this case) and would cause problems?

Last updated: Feb 28 2026 at 14:05 UTC