Zulip Chat Archive
Stream: lean4 dev
Topic: Core.transform vs Expr.replace usage in new codegen
Simon Dima (Jun 10 2025 at 13:41):
The new compiler performs a number of transformations on the input expression such as
- inlining auxiliary matchers
- inlining
macro_inlinefunctions - removing
._unsafe_recfrom function names - performing the registered
csimpreplacements.
The first three are done in LCNF/ToDecl.lean, just once before the expression is passed to toLCNF, and use Core.transform (or Meta.transform, which looks like essentially the same thing) to traverse and rewrite the expression. The csimp replacements are done inside toLCNF via CSimp.replaceConstants, which uses Expr.replace for the rewriting logic. I have some questions:
- What exactly does
Expr.replacedo? From the little I can glean from the C++ implementation it looks like a lower-powered version of thetransformfunctions, and so it should perform the replacements "in depth". Isreplace f eequivalent totransform e (fun e => pure match f e with | .none => .continue | .some e' => .done e')? - Is there a specific reason for performing the
csimpreplacements "at the leaf level" inToLCNF.lean, instead of running them once on the whole expression inToDecllikemacroInlineet al.?
Simon Dima (Jun 13 2025 at 15:02):
@Cameron Zwarich, you're probably the specialist regarding this code
Cameron Zwarich (Jun 13 2025 at 15:15):
Apologies for the late reply. I had something partially written up and then got sidetracked making progress on enabling the new compiler by default (hopefully next week).
The main reason for using CSimp.replaceConstants in the new compiler is that it was already there and used by the old compiler. After we switch to the new compiler, we'll go back and revisit all of these, and move any dependencies on C++ functions to Lean functions (either existing or ported) instead.
- What exactly does
Expr.replacedo? From the little I can glean from the C++ implementation it looks like a lower-powered version of thetransformfunctions, and so it should perform the replacements "in depth". Isreplace f eequivalent totransform e (fun e => pure match f e with | .none => .continue | .some e' => .done e')?
That seems about right. It also looks like it uses a different caching strategy (pointer equality vs. structural equality).
- Is there a specific reason for performing the
csimpreplacements "at the leaf level" inToLCNF.lean, instead of running them once on the whole expression inToDecllikemacroInlineet al.?
I don't think there's any reason because it seemed like the simplest thing to do at the time. One constant gets replaced by another, so why not do it at the constant level rather than the expr or decl level? Does this ordering of transformations cause some deleterious effects somehow?
Simon Dima (Jun 13 2025 at 16:24):
Apologies for the late reply. I had something partially written up and then got sidetracked making progress on enabling the new compiler by default (hopefully next week).
No problem at all, I'm thankful for the work you're doing and the helpful answers!
I don't think there's any reason because it seemed like the simplest thing to do at the time. One constant gets replaced by another, so why not do it at the constant level rather than the expr or decl level? Does this ordering of transformations cause some deleterious effects somehow?
I don't know of any bad effects, I just found it confusing when looking at it for the first time for a number of small reasons.
macroInline,inlineMatchersandremoveUnsafeRecNamesseem like a similar kind of thing toCSimp.replaceConstants-- expression transformations that apply various rewritings to special constants -- so I think I would have expected them to all happen at the same spot inToDecl.replaceConstantsdoesn't use the features of theToLCNF.Mmonad to run, and I think it would make the code's organization simpler to have the bigwhereblock fortoLCNFonly contain the logic/processing which needs to be there; I guess this is similar to the previous point, separating "preparing the Expr for conversion to LCNF" from "converting a prepared Expr to LCNF".CSimp.replaceConstantshas its own logic to traverse an Expr in order to find the places where it should rewrite; using it within a larger process that does its own Expr traversal feels strange, since then the whole "rewrite in depth" power ofreplaceConstantsisn't needed when it's just called on expressions of the shape.const ...- Due to the way information is passed between the various
visitAsdffunctions intoLCNF, the result of thereplaceConstantscall invisitAppis lost when control passes tovisitAppDefaultConstand soreplaceConstantsneeds to be called again on the same argument, which again feels a bit strange.
To be clear, I'm trying to describe why I was confused here, not to make a normative judgement on how the code should be written; it ain't broke, so "it was the simplest thing to do" is a perfectly satisfying explanation of why the code is what it is :)
Simon Dima (Jun 13 2025 at 16:28):
After we switch to the new compiler, we'll go back and revisit all of these, and move any dependencies on C++ functions to Lean functions (either existing or ported) instead.
Does that mean that on medium-term timescales the entire way from a Lean source file to a compiled-to-C file will be entirely done by code written in Lean, with only the kernel (and maybe the logic for the lean CLI) remaining in C++?
Kyle Miller (Jun 13 2025 at 16:43):
Simon Dima said:
- What exactly does
Expr.replacedo?
You can find a reference implementation here: docs#Lean.Expr.replaceNoCache (minus the pointer-based caching)
It's different from Core.transform in one significant way, which is that Core.transform processes .app chains differently. While Expr.replace shows each app to the function (so it sees all prefixes of an application), Core.transform uses Expr.withApp to skip to the function and arguments immediately.
Kyle Miller (Jun 13 2025 at 16:52):
Practically, it's not very different. I'd imagine that Expr.replace is faster than Core.transform for simpler replacements (like replacing constants), but I haven't measured anything. For tasks like "replace all n-argument replacements of f", I think Core.transform saves a little bit of quadratic complexity (with Expr.replace you have to look back n apps on each app to see if it's a match).
Kyle Miller (Jun 13 2025 at 16:56):
Since you mention Meta.transform: that one is very different. It manages local contexts as it enters bindings, guaranteeing that each expression you see has all its bvars instantiated as local variables. That means you can use inferType/isDefEq/whnf, etc. when computing replacements.
You can't use Meta.transform for doing csimp replacement, since the transformation can produce terms that are no longer type correct. Type correctness can easily depend on the exact definition of the functions being applied.
Simon Dima (Jun 13 2025 at 17:01):
Kyle Miller said:
You can't use
Meta.transformfor doing csimp replacement, since the transformation can produce terms that are no longer type correct. Type correctness can easily depend on the exact definition of the functions being applied.
That seems like a good reason to do csimp replacements as late as possible then, since toLCNF does use MetaM for type inference quite a bit!
Actually, removing the _unsafe_rec suffix from every name is also a nontrivial transformation of terms with respect to typing. Can we be sure that it's safe to do that and keep using Meta functions afterwards?
Kyle Miller (Jun 13 2025 at 17:01):
Going slightly off topic (edit: I guess with your last message, it's still on topic after all!): It would be nice if csimp lemmas could be more conditional (e.g. to replace a function with a function that's best suited for the given types; perhaps there's a much better algorithm if the types are hashable for instance). That needs some Meta computation, but there's that type incorrectness problem.
One trick there is to have something like
def csimpReplace {α : Type _} (orig _new : α) : α := orig
and then the csimp transformation could replace a whole expression x with x' by first replacing it with csimpReplace x x', and then later having a pass that actually replaces it with x'.
Simon Dima (Jun 13 2025 at 17:15):
Rephrasing to make sure I get it: using that you could at some point run a pass that annotates the expression tree with information about csimp replacements to be performed later, in a way that preserves typeability and doesn't require any other state since the information is stored in the Expr. That's neat!
Simon Dima (Jun 13 2025 at 17:20):
My intuitive solution would have been along the line of inserting casts; everywhere the type of some term depends on orig and changes via the replacement, casting along the equality between orig and _new should be able to fix that, and would later be compiled to a no-op
I wouldn't want to have to implement that, though :sweat_smile:
Kyle Miller (Jun 13 2025 at 17:50):
Yes, inserting casts is tricky business!
One example: Consider let x := v; b and the type T of b depends on the value of v to be type correct. If you have a rewrite v = v', how do you cast b? The normal way to create a cast, using Eq.rec, needs fun x => T to be type correct ("motive not type correct"), but, worse, x might not even appear in T directly. As far as I know, the only things you can do are (1) say that v cannot be rewritten (except by definitional equalities) or (2) zeta reduce to b[x:=v] and ignore such lets entirely.
Cameron Zwarich (Jun 13 2025 at 19:25):
Does that mean that on medium-term timescales the entire way from a Lean source file to a compiled-to-C file will be entirely done by code written in Lean, with only the kernel (and maybe the logic for the
leanCLI) remaining in C++?
After enabling the new compiler, the only things remaining in C++ will be the kernel (which includes some pieces of the shared data structures like Expr) and some utility functions that the new compiler relies upon. I think it should be very easy to rewrite all of the latter in Lean, so we're talking about a timeframe of a few weeks to a few months I think.
Simon Dima (Jun 16 2025 at 08:46):
Simon Dima said:
Actually, removing the
_unsafe_recsuffix from every name is also a nontrivial transformation of terms with respect to typing. Can we be sure that it's safe to do that and keep usingMetafunctions afterwards?
@Cameron Zwarich, what do you think, could there be cases where removeUnsafeRecNames breaks typing because something relies on a definitional equality involving a recursive function?
Last updated: Dec 20 2025 at 21:32 UTC