Zulip Chat Archive

Stream: lean4

Topic: How you annotate the AST in lean4 compiler?


xiao (Mar 04 2025 at 22:31):

I am going to annotate the AST of some DSL. What is the best practice? And how did you do that in lean compiler? The basic example is from Haskell.

inductive Exp where
  | Const: Nat -> Exp
  | Add: Exp -> Exp -> Exp

inductive Inst where
  | Ann: String -> Inst
  | Push: Nat -> Inst
  | Add | Mult
  deriving Repr

abbrev Prog := List Inst

def Exp.compile: Exp -> Prog
  | .Const n => [.Push n]
  | .Add a b => a.compile ++ b.compile ++ [.Add]

def Exp.eval: Exp -> Nat
  | .Const n => n
  | .Add a b => a.eval + b.eval

Now, I want to make two variants: one with some line number annotation and the other with a new constructor Mult.

inductive ExpA where
  | Const: String -> Nat -> ExpA
  | Add: String -> ExpA -> ExpA -> ExpA

def ExpA.compile: ExpA -> Prog
  | .Const s n => [.Ann s, .Push n]
  | .Add s a b => [.Ann s] ++ a.compile ++ b.compile ++ [.Add]

def ExpA.eval: ExpA -> Nat
  | .Const _ n => n
  | .Add _ a b => a.eval + b.eval

inductive ExpM where
  | Const: Nat -> ExpM
  | Add: ExpM -> ExpM -> ExpM
  | Mult: ExpM -> ExpM -> ExpM

def ExpM.compile: ExpM -> Prog
  | .Const n => [.Push n]
  | .Add a b => a.compile ++ b.compile ++ [.Add]
  | .Mult a b => a.compile ++ b.compile ++ [.Mult]

def ExpM.eval: ExpM -> Nat
  | .Const n => n
  | .Add a b => a.eval + b.eval
  | .Mult a b => a.eval * b.eval

Apparently, the same logic for Const and Add is repeated many times. I understand the W-type is itself the lfp operator in lean. So I tried the following

inductive ExpF (T: Type) where
  | Const: Nat -> ExpF T
  | Add: T -> T -> ExpF T

inductive Exp where
  | In: ExpF Exp -> Exp

def Exp.const n := Exp.In (ExpF.Const n)
def Exp.add a b := Exp.In (ExpF.Add a b)

def Exp.cata {B} (f: ExpF B -> B): Exp -> B
  | .In (.Const n) => f (.Const n)
  | .In (.Add a b) => f (.Add (a.cata f) (b.cata f))

def ExpF.compile: ExpF Prog -> Prog
  | .Const n => [.Push n]
  | .Add a b => a ++ b ++ [.Add]

def ExpF.eval: ExpF Nat -> Nat
  | .Const n => n
  | .Add a b => a + b

def Exp.eval := Exp.cata ExpF.eval
def Exp.compile := Exp.cata ExpF.compile

def e23: Exp := .add (.const 2) (.const 3)

example: e23.eval = 5 := by eq_refl
example: e23.compile = [.Push 2, .Push 3, .Add] := by eq_refl

And the other two variants are

inductive ExpA where
  | Ann: String -> ExpF ExpA -> ExpA

def ExpA.cata
  {B: Type}
  (f: String -> ExpF B -> B)
: ExpA -> B
  | .Ann s (.Const n) => f s (.Const n)
  | .Ann s (.Add a b) => f s (.Add (a.cata f) (b.cata f))


def ExpA.eval := ExpA.cata (fun _ b => ExpF.eval b)
def ExpA.compile := ExpA.cata (fun s b => .Ann s :: ExpF.compile b)


def a2: ExpA := (.Ann "pos1" (.Const 2))
def a3: ExpA := (.Ann "pos2" (.Const 3))
def a23: ExpA := (.Ann "pos3" (.Add a2 a3))

example: a23.eval = 5 := by eq_refl
example: a23.compile = [
  Inst.Ann "pos3",
  Inst.Ann "pos1",
  Inst.Push 2,
  Inst.Ann "pos2",
  Inst.Push 3,
  Inst.Add
] := by eq_refl


inductive ExpM where
  | In: ExpF ExpM -> ExpM
  | Mult: ExpM -> ExpM -> ExpM


def ExpM.mult := ExpM.Mult
def ExpM.const n := ExpM.In (.Const n)
def ExpM.add a b := ExpM.In (.Add a b)

def ExpM.cata {B} (f: ExpF B -> B) (mult: B -> B -> B): ExpM -> B
  | .In (.Const n) => f (.Const n)
  | .In (.Add n1 n2) => f (.Add (ExpM.cata f mult n1) (ExpM.cata f mult n2))
  | .Mult a1 a2 => mult (a1.cata f mult) (a2.cata f mult)

def ExpM.eval := ExpM.cata ExpF.eval Nat.mul
def ExpM.compile := ExpM.cata ExpF.compile (fun a b => a ++ b ++ [.Mult])

def m523 := ExpM.mult (.const 5) (.add (.const 2) (.const 3))
example: m523.eval = 5 * (2 + 3) := by eq_refl
example: m523.compile = [
  Inst.Push 5, Inst.Push 2, Inst.Push 3, Inst.Add, Inst.Mult
] := by eq_refl

This time I can reuse the repeated logic, but the cons are that I have to define the catamorphism every time and I have to manually prove the induction rules (recursors). (I think there could be some macro or elaboration helping with that.)

def Exp.ind.{u}
  {P: Exp -> Sort u}
  (const: forall n, P (.const n))
  (add: forall a b, P a -> P b -> P (.add a b))
:
  forall (e: Exp), P e
:= fun e => match e with
  | .In (.Const a) => const a
  | .In (.Add a b) => add a b
    (Exp.ind const add a) (Exp.ind const add b)

def ExpM.ind.{u}
  {P: ExpM -> Sort u}
  (const: forall n, P (.const n))
  (add: forall a b, P a -> P b -> P (.add a b))
  (mult: forall a b, P a -> P b -> P (.mult a b))
: forall e, P e
  | .const n => const n
  | .add a b => add a b (a.ind const add mult) (b.ind const add mult)
  | .mult a b => mult a b (a.ind const add mult) (b.ind const add mult)

Which one should I choose? I know Haskell is even doing something like

inductive ExpF (C A O: Type) where
  | Const: C -> Nat -> ExpF C A O
  | Add: A -> ExpF C A O -> ExpF C A O -> ExpF C A O
  | Out: O -> ExpF C A O

abbrev Exp := ExpF Unit Unit Empty
def Exp.const (n: Nat): Exp := .Const Unit.unit n
def Exp.add (a b: Exp): Exp := .Add Unit.unit a b

def ExpF.compile {C A: Type}: ExpF C A Prog -> Prog
  | .Const _ n => [.Push n]
  | .Add _ a b => a.compile ++ b.compile ++ [.Add]
  | .Out p => p

inductive ExpM where
  | In: ExpF Unit Unit ExpM -> ExpM
  | Mult: ExpM -> ExpM -> ExpM

Is this how you annotate the AST of lean? And is there any general suggestion on tree-decoration?
(I aslo asked this in the general channel, but no one has responded.)

Chris Bailey (Mar 05 2025 at 00:19):

how did you do that in lean compiler?

I'm not sure if the compiler specifically has a different representation, but the Lean.Syntax type just has the annotation data included in the constructors. The approach you take in the first example is sort of fundamentally different in that the annotation purports to be its own constructor.

Chris Bailey (Mar 05 2025 at 00:31):

Just eyeballing this, you could probably tackle some of it with a derive handler. If you have your heart set on this kind of representation, I think the non-metaprogramming approaches are things like recursion schemes (which do not work the way you would expect in Lean because of universes) or "tagless finally".

xiao (Mar 05 2025 at 00:58):

I'm not sure if the compiler specifically has a different representation, but the Lean.Syntax type just has the annotation data included in the constructors. The approach you take in the first example is sort of fundamentally different in that the annotation purports to be its own constructor.

I used the product functor to add the annotation to all nodes of the AST. If you think there could be some leaves without the annotation, you can use a coproduct functor to add the annotations optionally. Though I named it as Ann, it is actually the in morphism of the initial algebra.


Last updated: May 02 2025 at 03:31 UTC