Zulip Chat Archive

Stream: lean4

Topic: Help with my array language design.

Jan Keller (Mar 28 2025 at 23:54):

Hi everyone,

I'm relatively new to Lean4, so I apologize if the example is a bit lengthy. I tried to reduce it, but smaller examples didn't have the same problem. The core issue seems to revolve around how contexts and shapes interact in the term construction.

I'm working on implementing an array language in Lean4 where every term inherently has a shape, effectively making all terms arrays. I'm encountering some challenges with Lean4's type unification when using the Tm.bld constructor, and I'd appreciate any insights or suggestions the community might have.
Problem Overview:
In my design, the Tm.bld constructor is supposed to build terms by appending dimensions to the context. However, whenever using variables (Var), Lean4 isn't able to unifying the necessary types, forcing manual specification/proofs.
Example Code:
source.lean

`import Lean

inductive Shape: Nat → Type where
  | S: Nat → Shape 1
  | C: Nat → Shape n → Shape (1+n)
deriving Repr

@[simp]
def Shape.append : Shape n → Shape m → Shape (n + m)
  | .S x, ys => Shape.C x ys
  | .C x xs, ys => by rw[Nat.add_assoc]; exact Shape.C x (Shape.append xs ys)
@[simp]
notation x "::" xs => Shape.C x xs
@[simp]
notation "["x"]" => Shape.S x
@[simp]
notation x "++" y => Shape.append x y

inductive Ty : Type where
  | float : Ty
  | Idx: Nat → Ty
deriving Repr

inductive Ctx : Nat → Type where
  | nil : Ctx 0
  | cons : Ty → Ctx n → Ctx (1+n)
deriving Repr

@[simp]
def Ctx.append : Ctx n → Ctx m → Ctx (n + m)
  | .nil, ys => by rw [Nat.zero_add]; exact ys
  | .cons x xs, ys => by rw[Nat.add_assoc]; exact .cons x (append xs ys)

@[simp]
def Ctx.fromDims : Shape n → Ctx n
  | Shape.S x => Ctx.cons (Ty.Idx x) Ctx.nil
  | Shape.C x xs => Ctx.cons (Ty.Idx x) (Ctx.fromDims xs)

@[simp]
def Ctx.appDims (c: Ctx n) (s: Shape m): Ctx (n + m) :=
  Ctx.append c (Ctx.fromDims s)

inductive Var : Ctx n → Type where
  | zero : {ty : Ty} → {ctx : Ctx (n)} → Var (Ctx.cons ty ctx)
  | succ : {ty : Ty} → {ctx : Ctx (n)} → Var ctx → Var (Ctx.cons ty ctx)
deriving Repr
notation "H" => Var.zero
notation "T" v => Var.succ v


@[simp]
def Ctx.get (Γ : Ctx n) (v : Var Γ) : Ty :=  by cases v with
  | @zero n ty ctx => exact ty
  | @succ n ty ctx v' => exact Ctx.get ctx v'

inductive Tm : Ctx n → (s : Shape m) → Ty → Type where
  | bld :  (dims : Shape n) → Tm (Ctx.appDims Γ dims) s τ → Tm Γ (dims ++ s) τ
  | var : (p: Var Γ) → Tm Γ [1] (Ctx.get Γ p)
deriving Repr

def t6₂: Tm Ctx.nil (50::10::[1]) (.Idx 10) := Tm.bld (50::[10])  (Tm.var (T H));

def t6₃ : Tm Ctx.nil (50::10::[1]) (.Idx 10) :=
by let e1: Tm (Ctx.appDims Ctx.nil (50::[10])) [1] (.Idx 10) := (Tm.var (T H));
     exact Tm.bld (50::[10]) e1`

Any guidance, examples, or tips on language design would be greatoyl appreciated. Thanks in advance for any help!

Best regards,
Jan Keller

Kevin Buzzard (Mar 28 2025 at 23:55):

(You can get nicely-formatted code within a spoiler block by putting ````lean before it and ```` after it)

Tomas Skrivan (Mar 29 2025 at 02:38):

I'm having a bit hard time to understand what is the code supposed to do. Longer names and maybe a few words on what each types is supposed to do would be helpful.

So far this is my understanding:

Shape r - elementn1 :: n2 :: ... :: [n_r] of type Shape r is supposed to to represent shape of a array/tensor of rank r with dimensions n1, n2, ..., n_r.

This type is effectively Vector Nat r with the only difference that Shape 0 is an empty type. You might want to switch to using Vector as it has functions like append already defined.

Ty - is a some sort of type? It can be float but I have no idea what Ty.Idx is supposed to mean.

Ctx n - represents context with n variables and Γ : Ctx n stores type information of all those variables?

Here again Ctx n is just Vector Ty n and you might be better of using Vector here.

I do not really understand what Var and Tm is supposed to mean and what t6₂ and t6₃ are supposed to define.

Anyway I'm happy help as I share interest in array languages!

Tomas Skrivan (Mar 29 2025 at 02:41):

Also my high level impression from your code is that you are overusing dependent types. Having everything indexed by its size sounds great when starting using dependently typed language but one learns soon enough that it can cause more trouble than benefit and should be used in moderation.

Tomas Skrivan (Mar 29 2025 at 02:50):

It is unclear what is your aim. Do you want to implement an array language in Lean? Or do you want to write a specification of an array language and prove some properties of it?

If the former then I would recommend to have a look how Lean itself defines expressions(the core lambda calculus terms) and for example how it defines a local context (the connection between variable names/ids and their types and values)

Tomas Skrivan (Mar 29 2025 at 03:11):

You can of course embed an array language in Lean and work only with types Tensor X shape where the definition of Tensor could look something like this:

code

/-- Shape of a tensor of given rank
For examples:
`#v[n]` is a shape of n-dimensional vector
`#v[m,n]` is a shape of m×n matrix -/
abbrev Shape (rank : Nat) := Vector Nat rank

def Shape.size {r}(shape : Shape r) : Nat := shape.foldl (·*·) 1

/-- Index that can access elements of a tensor of given shape. -/
structure Index {r} (shape : Shape r) where
  val : Vector Nat r
  isLt : ∀ d : Fin r, val[d] < shape[d]

/-- Tensor/Array of shape `shape` with elements of type `X` -/
structure Tensor {r} (X : Type) (shape : Shape r) where
  vals : Array X
  h_size : vals.size = shape.size

variable {X : Type} {r : Nat} {shape : Shape r}

/-- Convert tensor index into linear index, assume row major ordering -/
def Index.toFin (idx : Index shape) : Fin shape.size :=
  let (linIdx,_) := (idx.1.zip shape).foldr (b:=(0,1))
    (fun (i,dim) (linIdx,stride) => (linIdx+i*stride, stride*dim))
  ⟨linIdx, sorry⟩

def Tensor.get (t : Tensor X shape) (idx : Index shape) : X := t.vals[t.2 ▸ idx.toFin]

Jannis Limperg (Mar 29 2025 at 18:18):

Here's a fix for the original example (with some miscellaneous style improvements):

inductive Shape: Nat → Type where
  | S : Nat → Shape 1
  | C : Nat → Shape n → Shape (n + 1)
  deriving Repr

@[simp]
def Shape.append : Shape n → Shape m → Shape (m + n)
  | .S x, ys => .C x ys
  | .C x xs, ys => .C x (.append xs ys)

notation x "::" xs => Shape.C x xs
notation "["x"]" => Shape.S x
notation x "++" y => Shape.append x y

inductive Ty : Type where
  | float : Ty
  | Idx : Nat → Ty
  deriving Repr

inductive Ctx : Nat → Type where
  | nil : Ctx 0
  | cons : Ty → Ctx n → Ctx (n + 1)
  deriving Repr

@[simp]
def Ctx.append : Ctx n → Ctx m → Ctx (m + n)
  | .nil, ys => ys
  | .cons x xs, ys => .cons x (append xs ys)

@[simp]
def Ctx.fromDims : Shape n → Ctx n
  | .S x => .cons (.Idx x) .nil
  | .C x xs => .cons (.Idx x) (.fromDims xs)

@[simp]
def Ctx.appDims (c : Ctx n) (s : Shape m) : Ctx (m + n) :=
  .append c (.fromDims s)

inductive Var : Ctx n → Type where
  | zero : {ty : Ty} → {ctx : Ctx (n)} → Var (Ctx.cons ty ctx)
  | succ : {ty : Ty} → {ctx : Ctx (n)} → Var ctx → Var (Ctx.cons ty ctx)
  deriving Repr

notation "H" => Var.zero
notation "T" v => Var.succ v

@[simp]
def Ctx.get (Γ : Ctx n) (v : Var Γ) : Ty :=
  match v with
  | .zero (ty := ty) => ty
  | .succ (ctx := ctx) v' => Ctx.get ctx v'

inductive Tm : Ctx n → (s : Shape m) → Ty → Type where
  | bld :  (dims : Shape n) → Tm (Ctx.appDims Γ dims) s τ → Tm Γ (dims ++ s) τ
  | var : (p : Var Γ) → Tm Γ [1] (Ctx.get Γ p)
  deriving Repr

def t6₂: Tm Ctx.nil (50::10::[1]) (.Idx 10) :=
  Tm.bld (50::[10]) (.var (T H))

def t6₃ : Tm Ctx.nil (50::10::[1]) (.Idx 10) :=
  let e1: Tm (Ctx.appDims Ctx.nil (50::[10])) [1] (.Idx 10) := (.var (T H))
  Tm.bld (50::[10]) e1

I mostly switched from 1 + n to n + 1. This helps because addition recurses on the right in Lean, so you get nicer defeqs. The rest then falls out more or less automatically.

A suggestion: maybe Var should be indexed not by the context but by the length of the context, i.e. Var n with n : Nat.

In general, like Tomas, I'm not convinced that indexing everything by length is worth it, but if your type system is simple enough (no dependencies, no type variables), it may work out without too much pain.

Jan Keller (Mar 30 2025 at 17:03):

Hey, thanks a bunch for your insights! You’re spot on indexing the contexts and shapes just isn’t needed and only makes things more complicated. That induction on the right side of add is pretty cool too; I always thought it was just like Idr or Coq. And yeah, the shape term is basically just a vector.

Indexing the Var type over the size is a smart simplification since I don’t need to carry the type around in the variable itself.

Really appreciate your help with this!

With best regards,
Jan Keller

Last updated: May 02 2025 at 03:31 UTC