Zulip Chat Archive

Stream: lean4

Topic: Is `simp` still an ordered rewriting system in Lean 4?

Niels Voss (Aug 14 2025 at 00:51):

It may seem that commutativity and left-commutativity are problematic, in that repeated application of either causes looping. But the simplifier detects identities that permute their arguments, and uses a technique known as ordered rewriting. This means that the system maintains an internal ordering of terms, and only applies the identity if doing so decreases the order. With the three identities mentioned above, this has the effect that all the parentheses in an expression are associated to the right, and the expressions are ordered in a canonical (though somewhat arbitrary) way. Two expressions that are equivalent up to associativity and commutativity are then rewritten to the same canonical form.

I also think that at one point I may have heard that simp assigned natural number scores to various expressions, and only applied a rewrite if it decreased the score, though I'm not sure if this was true.

I'm wondering if this is still the case in Lean 4, or does Lean 4 always just replace the LHS with the RHS?

Aaron Liu (Aug 14 2025 at 00:52):

I think simp still compares expressions with docs#Lean.Expr.lt to prevent infinite looping with commutativity theorems

Aaron Liu (Aug 14 2025 at 00:52):

or maybe it's the quick lt

Niels Voss (Aug 14 2025 at 00:53):

Oh I see. So it doesn't compute numerical scores, but there's still a linear order on expressions (implemented in C++)?

Aaron Liu (Aug 14 2025 at 00:58):

I will see if I can find where this happens in the code

Niels Voss (Aug 14 2025 at 01:07):

Well I found the core algorithm in https://github.com/leanprover/lean4/blob/f8c743e37d993b3c95754e4865b88cab8fbea2ce/src/library/expr_lt.cpp#L197 (and there's another copy of it in stage0) with a simple search, though I don't know where it is invoked

Niels Voss (Aug 14 2025 at 01:18):

It seems to be defined by recursion on Expr. I could be wrong but:

Expressions of different types (e.g. fvar vs bvar) are compared by some ordering that I'm not sure about
Literals are compared by their value (I'm guessing alphabetical order is used for strings)
Bound variables are compared by their de brujin index
Constants are compared first by their name and next by their universe level expression
Function applications are compared first by their function (making a recursive call) and next by their argument (again, making a recursive call)
Lambdas and Pis are compared first by their "binding domain" and next by their body
Free variables are compared by their index in the local context
Metavariables are compared by their ids
Sorts and lets seem to be more complicated

Niels Voss (Aug 14 2025 at 01:19):

There's also a "quick mode" which first uses hashes for comparison, and only uses the extended algorithm if the hashes were equal

Henrik Böving (Aug 14 2025 at 07:23):

A couple of remarks on this:

When talking about "ordered term rewriting" people usually mean term rewriting systems ordered by a relation that is stable under substitution (and often more properties, called a "rewrite ordering"). These orderings are inherently not linear and not even total. The two most popular orderings used for these purposes are KBO and LPO
Expr.lt is not called directly from within simp to the best of my understanding
simp will use an LPO implementation called acLt in a particular situation:

if thm.perm then
  /-
  We use `.reduceSimpleOnly` because this is how we indexed the discrimination tree.
  See issue #1815
  -/
  if !(← acLt rhs e .reduceSimpleOnly) then
    trace[Meta.Tactic.simp.rewrite] "{← ppSimpTheorem thm}, perm rejected {e} ==> {rhs}"
    return none
trace[Meta.Tactic.simp.rewrite] "{← ppSimpTheorem thm}:{indentExpr e}\n==>{indentExpr rhs}"
let rhs ← if type.hasBinderNameHint then rhs.resolveBinderNameHint else pure rhs
recordSimpTheorem thm.origin
return some { expr := rhs, proof? }

where the perm flag on a simp theorem that indicates:

`perm` is true if lhs and rhs are identical modulo permutation of variables.

which is exactly what is referred to in the quote above.

And lastly, simp in Lean 4 will totally rewrite in loops to its heart's content, after all this special situation from above often does not occur in cyclic rewrite systems:

opaque f : Nat → Nat
opaque g : Nat → Nat

@[simp]
theorem thm1 : f x = g x := sorry

@[simp]
theorem thm2 : g x = f x := sorry

theorem foo : f (g x) = g (g x) := by simp

yields

tactic 'simp' failed, nested error:
maximum recursion depth has been reached
use `set_option maxRecDepth <num>` to increase limit
use `set_option diagnostics true` to get diagnostic information

Yan Yablonovskiy 🇺🇦 (Aug 15 2025 at 00:21):

https://leanprover.github.io/theorem_proving_in_lean4/Tactics/#using-the-simplifier (3rd major paragraph)

I think the same quote was pasted into the Lean 4 version, but surroundings aren't necessarily the same. So id guess it was purposeful and the remark would still apply.

Henrik Böving (Aug 15 2025 at 10:24):

Just to be clear about this as I can't quite tell if there is any confusion left: simp is not in general an ordered rewriting system but as noted in the initial quote, my analysis of the code and this part of the reference manual, falls back to ordered rewriting in a very particular situation:

But the simplifier detects identities that permute their arguments, and uses a technique known as ordered rewriting.

In all other situations it will just perform normal left-to-right rewriting without any safe guards for rewrite loops, confluence etc. (all things that a full ordered rewriting system is able to do). In fact if simp was an ordered rewriting system it would behave completely differently to how we are used to working with it and things like engineering particular simp normal forms would be a very different endeavour.

Last updated: Feb 28 2026 at 14:05 UTC