Expressions
Expressions (terms of type Expr
) are an abstract syntax tree for Lean programs.
This means that each term which can be written in Lean has a corresponding Expr
.
For example, the application f e
is represented by the expression Expr.app ⟦f⟧ ⟦e⟧
,
where ⟦f⟧
is a representation of f
and ⟦e⟧
a representation of e
.
Similarly, the term Nat
is represented by the expression Expr.const `Nat []
.
(The backtick and empty list are discussed below.)
The ultimate purpose of a Lean tactic block
is to generate a term which serves as a proof of the theorem we want to prove.
Thus, the purpose of a tactic is to produce (part of) an Expr
of the right type.
Much metaprogramming therefore comes down to manipulating expressions:
constructing new ones and taking apart existing ones.
Once a tactic block is finished, the Expr
is sent to the kernel,
which checks whether it is well-typed and whether it really has the type claimed by the theorem.
As a result, tactic bugs are not fatal:
if you make a mistake, the kernel will ultimately catch it.
However, many internal Lean functions also assume that expressions are well-typed,
so you may crash Lean before the expression ever reaches the kernel.
To avoid this, Lean provides many functions which help with the manipulation of expressions.
This chapter and the next survey the most important ones.
Let's get concrete and look at the
Expr
type:
import Lean
namespace Playground
inductive Expr where
| bvar : Nat → Expr -- bound variables
| fvar : FVarId → Expr -- free variables
| mvar : MVarId → Expr -- meta variables
| sort : Level → Expr -- Sort
| const : Name → List Level → Expr -- constants
| app : Expr → Expr → Expr -- application
| lam : Name → Expr → Expr → BinderInfo → Expr -- lambda abstraction
| forallE : Name → Expr → Expr → BinderInfo → Expr -- (dependent) arrow
| letE : Name → Expr → Expr → Expr → Bool → Expr -- let expressions
-- less essential constructors:
| lit : Literal → Expr -- literals
| mdata : MData → Expr → Expr -- metadata
| proj : Name → Nat → Expr → Expr -- projection
end Playground
What is each of these constructors doing?
bvar
is a bound variable. For example, thex
infun x => x + 2
or∑ x, x²
. This is any occurrence of a variable in an expression where there is a binder above it. Why is the argument aNat
? This is called a de Bruijn index and will be explained later. You can figure out the type of a bound variable by looking at its binder, since the binder always has the type information for it.fvar
is a free variable. These are variables which are not bound by a binder. An example isx
inx + 2
. Note that you can't just look at a free variablex
and tell what its type is, there needs to be a context which contains a declaration forx
and its type. A free variable has an ID that tells you where to look for it in aLocalContext
. In Lean 3, free variables were called "local constants" or "locals".mvar
is a metavariable. There will be much more on these later, but you can think of it as a placeholder or a 'hole' in an expression that needs to be filled at a later point.sort
is used forType u
,Prop
etc.const
is a constant that has been defined earlier in the Lean document.app
is a function application. Multiple arguments are done using partial application:f x y ↝ app (app f x) y
.lam n t b
is a lambda expression (fun ($n : $t) => $b
). Theb
argument is called the body. Note that you have to give the type of the variable you are binding.forallE n t b
is a dependent arrow expression (($n : $t) → $b
). This is also sometimes called a Π-type or Π-expression and is often written∀ $n : $t, $b
. Note that the non-dependent arrowα → β
is a special case of(a : α) → β
whereβ
doesn't depend ona
. TheE
on the end offorallE
is to distinguish it from theforall
keyword.letE n t v b
is a let binder (let ($n : $t) := $v in $b
).lit
is a literal, this is a number or string literal like4
or"hello world"
. Literals help with performance: we don't want to represent the expression(10000 : Nat)
asNat.succ $ ... $ Nat.succ Nat.zero
.mdata
is just a way of storing extra information on expressions that might be useful, without changing the nature of the expression.proj
is for projection. Suppose you have a structure such asp : α × β
, rather than storing the projectionπ₁ p
asapp π₁ p
, it is expressed asproj Prod 0 p
. This is for efficiency reasons ([todo] find link to docstring explaining this).
You've probably noticed
that you can write many Lean programs which do not have an obvious corresponding Expr
.
For example, what about match
statements, do
blocks or by
blocks?
These constructs, and many more, must indeed first be translated into expressions.
The part of Lean which performs this (substantial) task is called the elaborator
and is discussed in its own chapter.
The benefit of this setup is that once the translation to Expr
is done,
we have a relatively simple structure to work with.
(The downside is that going back from Expr
to a high-level Lean program can be challenging.)
The elaborator also fills in any implicit or typeclass instance arguments
which you may have omitted from your Lean program.
Thus, at the Expr
level, constants are always applied to all their arguments, implicit or not.
This is both a blessing
(because you get a lot of information which is not obvious from the source code)
and a curse
(because when you build an Expr
, you must supply any implicit or instance arguments yourself).
De Bruijn Indexes
Consider the lambda expression (λ x : ℕ => λ y : ℕ => x + y) y
.
When we evaluate it naively, by replacing x
with y
in the body of the outer lambda, we obtain λ y : ℕ => y + y
.
But this is incorrect: the lambda is a function with two arguments that adds one argument to the other, yet the evaluated version adds its argument to itself.
The root of the issue is that the name y
is used for both the variable outside the lambdas and the variable bound by the inner lambda.
Having different variables use the same name means that when we evaluate, or β-reduce, an application, we must be careful not to confuse the different variables.
To address this issue, Lean does not, in fact, refer to bound variables by name.
Instead, it uses de Bruijn indexes.
In de Bruijn indexing,
each variable bound by a lam
or a forallE
is converted into a number #n
.
The number says how many binders up the Expr
tree we should skip
to find the binder that binds this variable.
So our above example would become
(replacing inessential parts of the expression with _
for brevity):
app (lam `x _ (lam `y _ (app (app `plus #1) #0) _) _) (fvar _)
The fvar
represents y
and the lambdas' variables are now represented by #0
and #1
.
When we evaluate this application, we replace the bound variable belonging to lam `x
(here #1
) with the argument fvar _
, obtaining
(lam `y _ (app (app `plus (fvar _)) #0) _)
This is pretty-printed as
λ y_1 => y + y_1
Note that Lean has automatically chosen a name y_1
for the remaining bound variable that does not clash with the name of the fvar
y
.
The chosen name is based on the name suggestion y
contained in the lam
.
If a de Bruijn index is too large for the number of binders preceding it,
we say it is a loose bvar
;
otherwise we say it is bound.
For example, in the expression lam `x _ (app #0 #1)
the bvar
#0
is bound by the preceding binder and #1
is loose.
The fact that Lean calls all de Bruijn indexes bvar
s ("bound variables")
points to an important invariant:
outside of some very low-level functions,
Lean expects that expressions do not contain any loose bvar
s.
Instead, whenever we would be tempted to introduce a loose bvar
,
we immediately convert it into an fvar
("free variable").
(Hence, Lean's binder representation is "locally nameless".)
Precisely how that works is discussed in the next chapter.
If there are no loose bvar
s in an expression, we say that the expression is closed.
The process of replacing all instances of a loose bvar
with an Expr
is called instantiation.
Going the other way is called abstraction.
If you are familiar with the standard terminology around variables, Lean's terminology may be confusing, so here's a map: Lean's "bvars" are usually called just "variables"; Lean's "loose" is usually called "free"; and Lean's "fvars" might be called "local hypotheses".
Universe Levels
Some expressions involve universe levels, represented by the Lean.Level
type.
A universe level is a natural number,
a universe parameter (introduced with a universe
declaration),
a universe metavariable or the maximum of two universes.
They are relevant for two kinds of expressions.
First, sorts are represented by Expr.sort u
, where u
is a Level
.
Prop
is sort Level.zero
; Type
is sort (Level.succ Level.zero)
.
Second, universe-polymorphic constants have universe arguments.
A universe-polymorphic constant is one whose type contains universe parameters.
For example, the List.map
function is universe-polymorphic,
as the pp.universes
pretty-printing option shows:
set_option pp.universes true in
#check @List.map
The .{u_1,u_2}
suffix after List.map
means that List.map
has two universe arguments, u_1
and u_2
.
The .{u_1}
suffix after List
(which is itself a universe-polymorphic constant)
means that List
is applied to the universe argument u_1
, and similar for .{u_2}
.
In fact, whenever you use a universe-polymorphic constant,
you must apply it to the correct universe arguments.
This application is represented by the List Level
argument of Expr.const
.
When we write regular Lean code, Lean infers the universes automatically,
so we do not need think about them much.
But when we construct Expr
s,
we must be careful to apply each universe-polymorphic constant to the right universe arguments.
Constructing Expressions
Constants
The simplest expressions we can construct are constants.
We use the const
constructor and give it a name and a list of universe levels.
Most of our examples only involve non-universe-polymorphic constants,
in which case the list is empty.
We also show a second form where we write the name with double backticks. This checks that the name in fact refers to a defined constant, which is useful to avoid typos.
open Lean
def z' := Expr.const `Nat.zero []
#eval z' -- Lean.Expr.const `Nat.zero []
def z := Expr.const ``Nat.zero []
#eval z -- Lean.Expr.const `Nat.zero []
The double-backtick variant also resolves the given name, making it fully-qualified.
To illustrate this mechanism, here are two further examples.
The first expression, z₁
, is unsafe:
if we use it in a context where the Nat
namespace is not open,
Lean will complain that there is no constant called zero
in the environment.
In contrast, the second expression, z₂
,
contains the fully-qualified name Nat.zero
and does not have this problem.
open Nat
def z₁ := Expr.const `zero []
#eval z₁ -- Lean.Expr.const `zero []
def z₂ := Expr.const ``zero []
#eval z₂ -- Lean.Expr.const `Nat.zero []
Function Applications
The next class of expressions we consider are function applications.
These can be built using the app
constructor,
with the first argument being an expression for the function
and the second being an expression for the argument.
Here are two examples. The first is simply a constant applied to another. The second is a recursive definition giving an expression as a function of a natural number.
def one := Expr.app (.const ``Nat.succ []) z
#eval one
-- Lean.Expr.app (Lean.Expr.const `Nat.succ []) (Lean.Expr.const `Nat.zero [])
def natExpr: Nat → Expr
| 0 => z
| n + 1 => .app (.const ``Nat.succ []) (natExpr n)
Next we use the variant mkAppN
which allows application with multiple arguments.
def sumExpr : Nat → Nat → Expr
| n, m => mkAppN (.const ``Nat.add []) #[natExpr n, natExpr m]
As you may have noticed, we didn't show #eval
outputs for the two last functions.
That's because the resulting expressions can grow so large
that it's hard to make sense of them.
Lambda Abstractions
We next use the constructor lam
to construct a simple function which takes any natural number x
and returns Nat.zero
.
The argument BinderInfo.default
says that x
is an explicit argument
(rather than an implicit or typeclass argument).
def constZero : Expr :=
.lam `x (.const ``Nat []) (.const ``Nat.zero []) BinderInfo.default
#eval constZero
-- Lean.Expr.lam `x (Lean.Expr.const `Nat []) (Lean.Expr.const `Nat.zero [])
-- (Lean.BinderInfo.default)
As a more elaborate example which also involves universe levels,
here is the Expr
that represents List.map (λ x => Nat.add x 1) []
(broken up into several definitions to make it somewhat readable):
def nat : Expr := .const ``Nat []
def addOne : Expr :=
.lam `x nat
(mkAppN (.const ``Nat.add []) #[.bvar 0, mkNatLit 1])
BinderInfo.default
def mapAddOneNil : Expr :=
mkAppN (.const ``List.map [levelZero, levelZero])
#[nat, nat, addOne, .app (.const ``List.nil [levelZero]) nat]
With a little trick (more about which in the Elaboration chapter),
we can turn our Expr
into a Lean term, which allows us to inspect it more easily.
elab "mapAddOneNil" : term => return mapAddOneNil
#check mapAddOneNil
-- List.map (fun x => Nat.add x 1) [] : List Nat
set_option pp.universes true in
set_option pp.explicit true in
#check mapAddOneNil
-- @List.map.{0, 0} Nat Nat (fun x => x.add 1) (@List.nil.{0} Nat) : List.{0} Nat
#reduce mapAddOneNil
-- []
In the next chapter we explore the MetaM
monad,
which, among many other things,
allows us to more conveniently construct and destruct larger expressions.
Exercises
- Create expression
1 + 2
withExpr.app
. - Create expression
1 + 2
withLean.mkAppN
. - Create expression
fun x => 1 + x
. - [De Bruijn Indexes] Create expression
fun a, fun b, fun c, (b * a) + c
. - Create expression
fun x y => x + y
. - Create expression
fun x, String.append "hello, " x
. - Create expression
∀ x : Prop, x ∧ x
. - Create expression
Nat → String
. - Create expression
fun (p : Prop) => (λ hP : p => hP)
. - [Universe levels] Create expression
Type 6
.