Simp, made simple.
This is the second blog post in a series of three. In the first blog post, we introduced the notion of a simproc, which can be thought of as a form of "modular" simp lemma. In this sequel, we give a more detailed exposition of the inner workings of the simp tactic in preparation of our third post, where we will see how to write new simprocs.
Throughout this post, we will assume that the reader has at least a little exposure to some of the core concepts that underpin metaprogramming in Lean, e.g.
Exprand theMetaMmonad. For those that don't, the book Metaprogramming in Lean 4 is a great introduction to the topic.
How simp works
In this section we present some of the inner workings of simp.
First we give an overview of the way simp works, then we delve into the specifics by introducing:
- the
SimpMmonad, which is the metaprogramming monad holding the information relevant to asimpcall; -
Step, the Lean representation of a single simplification step; -
Simproc, the Lean representation of simprocs.
All the simp-specific declarations introduced in this section are in the Lean.Meta or Lean.Meta.Simp namespace.
Overview
When calling simp in a proof, we give it a simp context.
This is made of a few different things, but for our purposes think of it as the set of lemmas/simprocs simp is allowed to rewrite with, i.e
$$\text{default lemmas/simprocs} + \text{explicitly added ones} - \text{explicitly removed ones}.$$
For example, simp [foo, -bar] means "Simplify using the standard simp lemmas/simprocs except bar, with foo added".
A perhaps surprising fact is that every simp lemma is internally turned into a simproc for simp's consumption.
From now on, we will refer to simp lemmas/simprocs as procedures.
Each procedure in a simp context comes annotated with extra data, such as priority, as well as the stage at which the procedure should be called.
Procedures have two possible stages:
-
Postprocedures are called on an expression
eafter subexpressions ofeare simplified. Procedures are by default postprocedures as oftentimes a procedure can only trigger after the inner expressions have been simplified. -
Preprocedures are called on an expression
ebefore subexpressions ofeare simplified. Preprocedures are mostly used when the simplification order induced by a postprocedure would otherwise be inefficient by visiting irrelevant subexpressions first. Preprocedures are associated with the↓symbol in several syntaxes throughout the simp codebase.
The general rule of thumb is that postprocedures simplify from the inside-out, while preprocedures simplify from the outside-in.
Roughly speaking, when traversing an expression e, simp does the following in order:
- Run preprocedures on
e; - Traverse subexpressions of
e(note that the preprocedures might have changedeby this point); - Run postprocedures on
e.
We call this the simplification loop.
graph TD e["e"] e1["e₁"] e2["e₂"] e -->|pre| e1 e -->|pre| e2 e1 -->|post| e e2 -->|post| e
In the figure above, the simplification loop does the following:
- Preprocedures on
e - Preprocedures on
e₁ - Postprocedures on
e₁(as it has no children) - Preprocedures on
e₂ - Postprocedures on
e₂(as it has no children) - Postprocedures on
e(as it has no further children)
The above loop is merely an approximation of the true simplification loop: each procedure actually gets to decide whether to go to step 1 or 3 after it was triggered, as we shall see in the coming subsection.
Step
Step is the type that represents a single step in the simplification loop. In simp's algorithm, a step intuitively corresponds to two pieces of information:
1) The result of simplifying an expression e,
2) The location of what should be simplified next, and in which direction (pre or post).
The result of simplifying e is encoded as an expression e' and a proof that e = e'.
This is encapsulated by the Result
structure:
structure Result where /-- The new expression `e'` -/ expr : Expr /-- The proof that `e = e'` -/ proof? : Option Expr := none -- Note, `Result` currently has an extra `cache` field that is both deprecated -- and irrelevant to our current discussion.
The proof is optional: If
proof?is set to its defaultnonevalue, the equality is assumed to be definitional.The proof is allowed to be an arbitrary
Expr, i.e. nothing ensures that it actually is a proof thate = e'. It is up to the author of the procedure to make sure that the generated proof terms are valid.
After simplifying the current expression e to a new expression e',
there are a few possible options for the next location:
- Simplify
e'further. Preprocedures are tried one'. - Simplify subexpressions of
e'. Preprocedures are tried on each child expression ofe'in turn. Ife'has no child, then we run postprocedures one'. - Don't simplify further. Preprocedures are tried on the next child of the parent expression. If there is no such child, then postprocedures are tried on the parent expression.
The three possibilities above correspond to the three constructors of Step:
inductive Step where /-- Try preprocedures on the simplified expression. -/ | visit (e : Result) /-- For `pre` procedures, continue transformation by visiting subexpressions, and then executing `post` procedures. For `post` procedures, this is equivalent to returning `visit`. -/ | continue (e? : Option Result := none) /-- Returns the result without visiting any subexpressions. -/ | done (r : Result) -- Note: the docstrings here are simplified versions of the real docstrings (which can be found in the Lean source code)
If a procedure fails to simplify an expression, it should return
continue none. Bothvisitanddonesignify success.
Whenever a simproc is called on a given expression, it outputs a Step, which determines what will happen next during the simp call.
Since every simproc call is running a metaprogram to produce the output Step, the constructor that ends up being used may vary according to the input,
e.g. in some cases a simproc may use visit and in others use continue.
To make this more concrete, let's take a look at how these are used in the simprocs Nat.reduceDvd and reduceIte that we looked at in the
previous blog post.
As a reminder
Nat.reduceDvdtakes expressions of the forma | bwherea,bare explicit natural numbers, and returnsTrueorFalse.
reduceItetakes expressions of the formif h then a else band outputsa(resp.b) ifhcan be simplified toTrue(resp.False).
The constructors do the following:
-
continueindicates that the simproc is done with this expression. As a result, simp will not attempt to simplify the expression again using the same simproc to prevent the simplification procedure from looping. This is often used as the "default" output if a simproc was unable to find a simplification in a given expression. For example: -
Nat.reduceDvduses this when the expression is not of the forma | bwherea,bare explicit natural numbers. -
reduceIteuse this when the expression is not of the formif h then a else bwherehis an expression that can be simplified toTrueorFalse(note that the simplification ofhis handled by a differentsimpcall).
This only applies for the expression at hand: if this is a pre-procedure then the simproc may still end up being called on subexpressions.
For example, when calling simp on if RiemannHypothesis then 0 else if 1 + 1 = 2 then 0 else 0, the simproc reduceIte runs twice: once on the outer if ... then ... else, where it uses continue, and once on the inner if ... then ... else, which gets simplified to 0.
-
doneindicates thatsimpis done with a given expression. WhenNat.reduceDvdis called on an expression of the forma | bwherea,bare explicit natural numbers, it simplifies it toTrueorFalse. Either way, the output is in simp normal form and there is no need to simplify it further. ThusNat.reduceDvdusesdonein such a case. -
visitindicates (for a pre-procedure) that a simplification has been done but that pre-procedures should be tried again on the simplified expression. WhenreduceIteis called on a expressions of the formif p then a else bwherepcan simplified toTrue(resp.False), it outputsa(resp.b). Sinceaandbcould be arbitrarily complicated expressions, it makes sense to try and simplify them further. ThusreduceIteusesvisitin such a case.
The SimpM monad
In this section, we take a look at another key component of the internals of simp, namely the SimpM monad.
SimpM
SimpM is the monad that tracks the current context simp is running in (what simp theorems are available, etc) and what has been done so far (e.g. number of steps taken, theorems used).
In particular it also captures the MetaM context.
Let's go through this in more detail. The monad SimpM is defined using monad transformers as follows:
abbrev SimpM := ReaderT Simp.MethodsRef <| ReaderT Simp.Context <| StateRefT Simp.State MetaM
Let's go through these steps one by one.
1) The monad MetaM. This is one of the fundamental monads for metaprogramming in Lean.
The state of MetaM allows one to access things like:
-
Information about the file we're running in (e.g. name, imports, etc)
-
Information about what definitions/theorems we're allowed to use
-
What local variables/declarations we have access to
2) The first monad transformer application: StateRefT Simp.State MetaM.
The idea here is the following: since the goal of the SimpM monad is to track the state of a simp call
(i.e. what's happening, as the program runs), we need to capture more information than what MetaM gives us.
Specifically, we want a monad that can track what's happening via the following structure:
lean
structure Simp.State where
cache : Cache
congrCache : CongrCache
dsimpCache : ExprStructMap Expr
usedTheorems : UsedSimps
numSteps : Nat
diag : Diagnostics
This is something we can achieve using the StateRefT monad transformer, which takes as input a state type (Simp.State in our case) and a monad, and creates a new monad that can read and write this state.
In other words, StateRefT Simp.State MetaM is a souped up version of MetaM that can now track extra information by storing (and updating) a term of type Simp.State.
3) The second monad transformer application: ReaderT Simp.Context <| StateRefT Simp.State MetaM.
The SimpM monad should also be able to access the "context" that simp is running in, e.g. which simp theorems it has access to and so on.
This is captured by the type Simp.Context.
Here, the situation is not quite the same as when we were adding a Simp.State state to MetaM:
while we will often want to change the state during the simp call, we will rarely need to change the context.
In programmer lingo, the context should be immutable.
Thus, we use a different monad transformer called ReaderT, which is almost identical to StateT, but outputs a new monad where one can only read the type passed as parameter.
For completeness: when working with
ReaderT, one can still locally override the value of the variable that the monad keeps track of by usingwithReader. Intuitively, the difference betweenState(Ref)TandReaderTis the following:
In
State(Ref)T, one has access to a global variable that can be modified at will,In
ReaderT, given a programx : ReaderT a m, one can only choose to executexwith a different context. In particular, the context before and after the execution ofxstays the same.
4) The final monad transformer application: ReaderT Simp.MethodsRef <| ReaderT Simp.Context <| StateRefT Simp.State MetaM.
This outputs a monad that has access to Simp.Method (passed via a ref).
This captures the "pre" and "post" procedures that a given simp call can use, as well as the discharger that a given simp call can use, etc.
Simprocs
Let's now formally define what a simproc is.
Recall that, intuitively, a simproc takes in an expression and outputs a simplification step, possibly after modifying the current SimpM state (e.g. by adding new goals to be closed by the discharger).
This behavior is partially encapsulated by the Simproc type:
abbrev Simproc := Expr → SimpM Step
Concretely, it is helpful to think of a simproc as a function (or rather metaprogram) of the form
def mySimproc (e : Expr) : SimpM Step := do -- Various manipulations involving the expression `e` ... let step : Step := ... ... return step
The picture above is however a slight oversimplification, as simp does not consume bare elements of type Simproc.
Instead, a simproc is an element of type Simproc annotated with the extra data mentioned in the overview subsection, like whether the simproc is pre or post, and what kind of expression it matches on.
As we shall see in the next blog post, when defining a simproc, one always provides a pattern that the simproc will activate on.
For example, a simproc involving addition might match on the pattern _ + _.
On a final note, there is a DSimproc type to encode simprocs that only simplify along definitional equalities:
abbrev DSimproc := Expr → SimpM DStep
Just like Simproc, DSimproc is built by combining the SimpM monad with a type of steps (DStep).
DStep is exactly analogous to Step, except that each occurrence of Result has been replaced by Expr.
Indeed, if an equality e = e' really is definitional, then you don't need to remember its proof as it is rfl.
inductive DStep where /-- Return expression without visiting any subexpressions. -/ | done (e : Expr) /-- Visit expression (which should be different from current expression) instead. The new expression `e` is passed to `pre` again. -/ | visit (e : Expr) /-- Continue transformation with the given expression (defaults to current expression). For `pre`, this means visiting the children of the expression. For `post`, this is equivalent to returning `done`. -/ | continue (e? : Option Expr := none) deriving Inhabited, Repr
Note: The above snippet is a simplification and the constructors as shown actually belong to Lean.TransformStep, which Lean.Meta.Simp.DStep is an abbrev of.
The reader should note that
DStepcan only be used for writing dsimprocs, even though it might be the case that a given simplification step in a (non d)simproc happens to produce an expression that is definitionally equal to the original one.
Exploring the SimpM monad via simprocs
In the next blog post, we will cover in detail how to implement simprocs that are useful for proving theorems in Lean.
In the meantime, to whet the reader's appetite, let's explore the internals of the SimpM monad using fake simprocs that print info instead of simplifying.
Throughout this section, all code is assumed to be prefaced with
open Lean Elab Meta Simp.
More specifically, let's try to use simprocs to output information about the state of SimpM during a given simp call.
The first thing we may want to print out is the expression that is currently being traversed. As a simproc, this would correspond to
def printExpressions (e : Expr) : SimpM Step := do Lean.logInfo m!"{e}" return .continue -- declare the simproc simproc_decl printExpr (_) := printExpressions
The last line is needed to "declare" the simproc officially - this is where we can specify priority, whether this is a pre/post procedure and what expression we're matching on (here, we match on the pattern _, i.e. on everything!).
More on this in the next post.
Next, let's print out all the theorems that have been used by simp "so far".
def printUsedTheorems (e : Expr) : SimpM Step := do -- Read the current `Simp.State` from `SimpM` let simpState ← getThe Simp.State -- Read the current results simp has used. These are stored in a datatype -- called `Simp.Origin`, which includes simp theorems, but also other -- terms that have been given by the user to `simp`. let simps := simpState.usedTheorems.map.toList.map Prod.fst -- Get all the names let names := simps.map Origin.key -- Only print if at least one result has been used so far. unless names.isEmpty do Lean.logInfo m!"{names}" return .continue simproc_decl printThms (_) := printUsedTheorems
We encourage the reader to add these simprocs to their simp calls to see what's happening within.
example : Even (if 2 ^ 4 % 9 ∣ 6 then 2 ^ 3 else 4) := by simp [printThms, ↓printThms] example : Even (if 2 ^ 4 % 9 ∣ 6 then 2 ^ 3 else 4) := by simp [printExpr, ↓printExpr]
Exercise: try accessing more information about the current
SimpMstate, e.g.
- The number of simp theorems that are currently available to the tactic (and try varying the imports of the file to see what happens!)
- The name of the discharger tactic that the current simp call is using.