Zulip Chat Archive

Stream: lean4

Topic: Help with my array language design.


Jan Keller (Mar 28 2025 at 23:54):

Hi everyone,

I'm relatively new to Lean4, so I apologize if the example is a bit lengthy. I tried to reduce it, but smaller examples didn't have the same problem. The core issue seems to revolve around how contexts and shapes interact in the term construction.

I'm working on implementing an array language in Lean4 where every term inherently has a shape, effectively making all terms arrays. I'm encountering some challenges with Lean4's type unification when using the Tm.bld constructor, and I'd appreciate any insights or suggestions the community might have.
Problem Overview:
In my design, the Tm.bld constructor is supposed to build terms by appending dimensions to the context. However, whenever using variables (Var), Lean4 isn't able to unifying the necessary types, forcing manual specification/proofs.
Example Code:
source.lean

Any guidance, examples, or tips on language design would be greatoyl appreciated. Thanks in advance for any help!

Best regards,
Jan Keller

Kevin Buzzard (Mar 28 2025 at 23:55):

(You can get nicely-formatted code within a spoiler block by putting ````lean before it and ```` after it)

Tomas Skrivan (Mar 29 2025 at 02:38):

I'm having a bit hard time to understand what is the code supposed to do. Longer names and maybe a few words on what each types is supposed to do would be helpful.

So far this is my understanding:

Shape r - elementn1 :: n2 :: ... :: [n_r] of type Shape r is supposed to to represent shape of a array/tensor of rank r with dimensions n1, n2, ..., n_r.

This type is effectively Vector Nat r with the only difference that Shape 0 is an empty type. You might want to switch to using Vector as it has functions like append already defined.

Ty - is a some sort of type? It can be float but I have no idea what Ty.Idx is supposed to mean.

Ctx n - represents context with n variables and Γ : Ctx n stores type information of all those variables?

Here again Ctx n is just Vector Ty n and you might be better of using Vector here.

I do not really understand what Var and Tm is supposed to mean and what t6₂ and t6₃ are supposed to define.

Anyway I'm happy help as I share interest in array languages!

Tomas Skrivan (Mar 29 2025 at 02:41):

Also my high level impression from your code is that you are overusing dependent types. Having everything indexed by its size sounds great when starting using dependently typed language but one learns soon enough that it can cause more trouble than benefit and should be used in moderation.

Tomas Skrivan (Mar 29 2025 at 02:50):

It is unclear what is your aim. Do you want to implement an array language in Lean? Or do you want to write a specification of an array language and prove some properties of it?

If the former then I would recommend to have a look how Lean itself defines expressions(the core lambda calculus terms) and for example how it defines a local context (the connection between variable names/ids and their types and values)

Tomas Skrivan (Mar 29 2025 at 03:11):

You can of course embed an array language in Lean and work only with types Tensor X shape where the definition of Tensor could look something like this:

code

Jannis Limperg (Mar 29 2025 at 18:18):

Here's a fix for the original example (with some miscellaneous style improvements):

inductive Shape: Nat  Type where
  | S : Nat  Shape 1
  | C : Nat  Shape n  Shape (n + 1)
  deriving Repr

@[simp]
def Shape.append : Shape n  Shape m  Shape (m + n)
  | .S x, ys => .C x ys
  | .C x xs, ys => .C x (.append xs ys)

notation x "::" xs => Shape.C x xs
notation "["x"]" => Shape.S x
notation x "++" y => Shape.append x y

inductive Ty : Type where
  | float : Ty
  | Idx : Nat  Ty
  deriving Repr

inductive Ctx : Nat  Type where
  | nil : Ctx 0
  | cons : Ty  Ctx n  Ctx (n + 1)
  deriving Repr

@[simp]
def Ctx.append : Ctx n  Ctx m  Ctx (m + n)
  | .nil, ys => ys
  | .cons x xs, ys => .cons x (append xs ys)

@[simp]
def Ctx.fromDims : Shape n  Ctx n
  | .S x => .cons (.Idx x) .nil
  | .C x xs => .cons (.Idx x) (.fromDims xs)

@[simp]
def Ctx.appDims (c : Ctx n) (s : Shape m) : Ctx (m + n) :=
  .append c (.fromDims s)

inductive Var : Ctx n  Type where
  | zero : {ty : Ty}  {ctx : Ctx (n)}  Var (Ctx.cons ty ctx)
  | succ : {ty : Ty}  {ctx : Ctx (n)}  Var ctx  Var (Ctx.cons ty ctx)
  deriving Repr

notation "H" => Var.zero
notation "T" v => Var.succ v

@[simp]
def Ctx.get (Γ : Ctx n) (v : Var Γ) : Ty :=
  match v with
  | .zero (ty := ty) => ty
  | .succ (ctx := ctx) v' => Ctx.get ctx v'

inductive Tm : Ctx n  (s : Shape m)  Ty  Type where
  | bld :  (dims : Shape n)  Tm (Ctx.appDims Γ dims) s τ  Tm Γ (dims ++ s) τ
  | var : (p : Var Γ)  Tm Γ [1] (Ctx.get Γ p)
  deriving Repr

def t6₂: Tm Ctx.nil (50::10::[1]) (.Idx 10) :=
  Tm.bld (50::[10]) (.var (T H))

def t6₃ : Tm Ctx.nil (50::10::[1]) (.Idx 10) :=
  let e1: Tm (Ctx.appDims Ctx.nil (50::[10])) [1] (.Idx 10) := (.var (T H))
  Tm.bld (50::[10]) e1

I mostly switched from 1 + n to n + 1. This helps because addition recurses on the right in Lean, so you get nicer defeqs. The rest then falls out more or less automatically.

A suggestion: maybe Var should be indexed not by the context but by the length of the context, i.e. Var n with n : Nat.

In general, like Tomas, I'm not convinced that indexing everything by length is worth it, but if your type system is simple enough (no dependencies, no type variables), it may work out without too much pain.

Jan Keller (Mar 30 2025 at 17:03):

Hey, thanks a bunch for your insights! You’re spot on indexing the contexts and shapes just isn’t needed and only makes things more complicated. That induction on the right side of add is pretty cool too; I always thought it was just like Idr or Coq. And yeah, the shape term is basically just a vector.

Indexing the Var type over the size is a smart simplification since I don’t need to carry the type around in the variable itself.

Really appreciate your help with this!

With best regards,
Jan Keller


Last updated: May 02 2025 at 03:31 UTC