Zulip Chat Archive

Stream: lean4 dev

Topic: Core.transform vs Expr.replace usage in new codegen

Simon Dima (Jun 10 2025 at 13:41):

The new compiler performs a number of transformations on the input expression such as

inlining auxiliary matchers
inlining macro_inline functions
removing ._unsafe_rec from function names
performing the registered csimp replacements.

The first three are done in LCNF/ToDecl.lean, just once before the expression is passed to toLCNF, and use Core.transform (or Meta.transform, which looks like essentially the same thing) to traverse and rewrite the expression. The csimp replacements are done inside toLCNF via CSimp.replaceConstants, which uses Expr.replace for the rewriting logic. I have some questions:

What exactly does Expr.replace do? From the little I can glean from the C++ implementation it looks like a lower-powered version of the transform functions, and so it should perform the replacements "in depth". Is replace f e equivalent to transform e (fun e => pure match f e with | .none => .continue | .some e' => .done e')?
Is there a specific reason for performing the csimp replacements "at the leaf level" in ToLCNF.lean, instead of running them once on the whole expression in ToDecl like macroInline et al.?

Simon Dima (Jun 13 2025 at 15:02):

@Cameron Zwarich, you're probably the specialist regarding this code

Cameron Zwarich (Jun 13 2025 at 15:15):

Apologies for the late reply. I had something partially written up and then got sidetracked making progress on enabling the new compiler by default (hopefully next week).

The main reason for using CSimp.replaceConstants in the new compiler is that it was already there and used by the old compiler. After we switch to the new compiler, we'll go back and revisit all of these, and move any dependencies on C++ functions to Lean functions (either existing or ported) instead.

What exactly does Expr.replace do? From the little I can glean from the C++ implementation it looks like a lower-powered version of the transform functions, and so it should perform the replacements "in depth". Is replace f e equivalent to transform e (fun e => pure match f e with | .none => .continue | .some e' => .done e')?

That seems about right. It also looks like it uses a different caching strategy (pointer equality vs. structural equality).

Is there a specific reason for performing the csimp replacements "at the leaf level" in ToLCNF.lean, instead of running them once on the whole expression in ToDecl like macroInline et al.?

I don't think there's any reason because it seemed like the simplest thing to do at the time. One constant gets replaced by another, so why not do it at the constant level rather than the expr or decl level? Does this ordering of transformations cause some deleterious effects somehow?

Simon Dima (Jun 13 2025 at 16:24):

Apologies for the late reply. I had something partially written up and then got sidetracked making progress on enabling the new compiler by default (hopefully next week).

No problem at all, I'm thankful for the work you're doing and the helpful answers!

I don't think there's any reason because it seemed like the simplest thing to do at the time. One constant gets replaced by another, so why not do it at the constant level rather than the expr or decl level? Does this ordering of transformations cause some deleterious effects somehow?

I don't know of any bad effects, I just found it confusing when looking at it for the first time for a number of small reasons.

macroInline, inlineMatchers and removeUnsafeRecNames seem like a similar kind of thing to CSimp.replaceConstants -- expression transformations that apply various rewritings to special constants -- so I think I would have expected them to all happen at the same spot in ToDecl.
replaceConstants doesn't use the features of the ToLCNF.M monad to run, and I think it would make the code's organization simpler to have the big where block for toLCNF only contain the logic/processing which needs to be there; I guess this is similar to the previous point, separating "preparing the Expr for conversion to LCNF" from "converting a prepared Expr to LCNF".
CSimp.replaceConstants has its own logic to traverse an Expr in order to find the places where it should rewrite; using it within a larger process that does its own Expr traversal feels strange, since then the whole "rewrite in depth" power of replaceConstants isn't needed when it's just called on expressions of the shape .const ...
Due to the way information is passed between the various visitAsdf functions in toLCNF, the result of the replaceConstants call in visitApp is lost when control passes to visitAppDefaultConst and so replaceConstants needs to be called again on the same argument, which again feels a bit strange.

To be clear, I'm trying to describe why I was confused here, not to make a normative judgement on how the code should be written; it ain't broke, so "it was the simplest thing to do" is a perfectly satisfying explanation of why the code is what it is :)

Simon Dima (Jun 13 2025 at 16:28):

After we switch to the new compiler, we'll go back and revisit all of these, and move any dependencies on C++ functions to Lean functions (either existing or ported) instead.

Does that mean that on medium-term timescales the entire way from a Lean source file to a compiled-to-C file will be entirely done by code written in Lean, with only the kernel (and maybe the logic for the lean CLI) remaining in C++?

Kyle Miller (Jun 13 2025 at 16:43):

Simon Dima said:

What exactly does Expr.replace do?

You can find a reference implementation here: docs#Lean.Expr.replaceNoCache (minus the pointer-based caching)

It's different from Core.transform in one significant way, which is that Core.transform processes .app chains differently. While Expr.replace shows each app to the function (so it sees all prefixes of an application), Core.transform uses Expr.withApp to skip to the function and arguments immediately.

Kyle Miller (Jun 13 2025 at 16:52):

Practically, it's not very different. I'd imagine that Expr.replace is faster than Core.transform for simpler replacements (like replacing constants), but I haven't measured anything. For tasks like "replace all n-argument replacements of f", I think Core.transform saves a little bit of quadratic complexity (with Expr.replace you have to look back n apps on each app to see if it's a match).

Kyle Miller (Jun 13 2025 at 16:56):

Since you mention Meta.transform: that one is very different. It manages local contexts as it enters bindings, guaranteeing that each expression you see has all its bvars instantiated as local variables. That means you can use inferType/isDefEq/whnf, etc. when computing replacements.

You can't use Meta.transform for doing csimp replacement, since the transformation can produce terms that are no longer type correct. Type correctness can easily depend on the exact definition of the functions being applied.

Simon Dima (Jun 13 2025 at 17:01):

Kyle Miller said:

You can't use Meta.transform for doing csimp replacement, since the transformation can produce terms that are no longer type correct. Type correctness can easily depend on the exact definition of the functions being applied.

That seems like a good reason to do csimp replacements as late as possible then, since toLCNF does use MetaM for type inference quite a bit!

Actually, removing the _unsafe_rec suffix from every name is also a nontrivial transformation of terms with respect to typing. Can we be sure that it's safe to do that and keep using Meta functions afterwards?

Kyle Miller (Jun 13 2025 at 17:01):

Going slightly off topic (edit: I guess with your last message, it's still on topic after all!): It would be nice if csimp lemmas could be more conditional (e.g. to replace a function with a function that's best suited for the given types; perhaps there's a much better algorithm if the types are hashable for instance). That needs some Meta computation, but there's that type incorrectness problem.

One trick there is to have something like

def csimpReplace {α : Type _} (orig _new : α) : α := orig

and then the csimp transformation could replace a whole expression x with x' by first replacing it with csimpReplace x x', and then later having a pass that actually replaces it with x'.

Simon Dima (Jun 13 2025 at 17:15):

Rephrasing to make sure I get it: using that you could at some point run a pass that annotates the expression tree with information about csimp replacements to be performed later, in a way that preserves typeability and doesn't require any other state since the information is stored in the Expr. That's neat!

Simon Dima (Jun 13 2025 at 17:20):

My intuitive solution would have been along the line of inserting casts; everywhere the type of some term depends on orig and changes via the replacement, casting along the equality between orig and _new should be able to fix that, and would later be compiled to a no-op
I wouldn't want to have to implement that, though :sweat_smile:

Kyle Miller (Jun 13 2025 at 17:50):

Yes, inserting casts is tricky business!

One example: Consider let x := v; b and the type T of b depends on the value of v to be type correct. If you have a rewrite v = v', how do you cast b? The normal way to create a cast, using Eq.rec, needs fun x => T to be type correct ("motive not type correct"), but, worse, x might not even appear in T directly. As far as I know, the only things you can do are (1) say that v cannot be rewritten (except by definitional equalities) or (2) zeta reduce to b[x:=v] and ignore such lets entirely.

Cameron Zwarich (Jun 13 2025 at 19:25):

Does that mean that on medium-term timescales the entire way from a Lean source file to a compiled-to-C file will be entirely done by code written in Lean, with only the kernel (and maybe the logic for the leanCLI) remaining in C++?

After enabling the new compiler, the only things remaining in C++ will be the kernel (which includes some pieces of the shared data structures like Expr) and some utility functions that the new compiler relies upon. I think it should be very easy to rewrite all of the latter in Lean, so we're talking about a timeframe of a few weeks to a few months I think.

Simon Dima (Jun 16 2025 at 08:46):

Simon Dima said:

Actually, removing the _unsafe_rec suffix from every name is also a nontrivial transformation of terms with respect to typing. Can we be sure that it's safe to do that and keep using Meta functions afterwards?

@Cameron Zwarich, what do you think, could there be cases where removeUnsafeRecNames breaks typing because something relies on a definitional equality involving a recursive function?

Last updated: Feb 28 2026 at 14:05 UTC