Zulip Chat Archive

Stream: new members

Topic: Is elaboration part of trusted code?

Jakub Nowak (Oct 05 2025 at 17:37):

Is Lean's elaboration, i.e. tranformation from Lean's syntax to Lean.Expr trusted code? Are there proofs that these transformations preserve semantics?

Chris Bailey (Oct 05 2025 at 17:39):

Jakub Nowak said:

Is Lean's elaboration, i.e. tranformation from Lean's syntax to Lean.Expr trusted code?

No.

Jakub Nowak (Oct 05 2025 at 17:41):

I want to understand why it isn't.

Henrik Böving (Oct 05 2025 at 17:49):

The answer to this is a bit nuanced. Generally speaking when proof assistants talk about their trusted code base they usually refer to the thing that says "given some expression in the core lambda calculus, does this type check". In this sense it is not a part of your trusted code base because the elaborator is of course just the translation to this core calculus. On the other hand it is very much possible that the elaborator mistranslates one of your statements. However, you must of course ask yourself, what is the likelihood that the elaborator mistranslates one of your statements in a way that:

Actually still typechecks
Is still provable in the way that you expect it to be

Both of these occuring is extremly unlikely so we usually don't care too much about elaboration bugs in terms of correctness (in term of UX etc. we do of course very much care about them)

One nice property about this elaborator/kernel architecture is that it is actually entirely irrelevant how your proof ends up being elaborated, the only thing you need to be certain are elaborated correctly are your definitions and the theorem statement you are using. One easy way to convince yourself of the correctness of your definitions is to just do proofs about properties you know they have. But in the end if you want to be 100% and not just 99.999% sure that what you are proving is what you are intending to prove you need to read the Lean.Expr.

Chris Bailey (Oct 05 2025 at 17:49):

Because it's not used to type check declarations, and it's not needed to read the type of a declaration back to you so you can determine whether what was checked is what you intended. In practice most people do rely on some elaboration/delaboration for the second one, but it's not necessary.

Jakub Nowak (Oct 05 2025 at 20:14):

Thanks @Henrik Böving for clear explanation.
Do you maybe have some reading material about this design of Lean?
I'm not entirely convinced by that approach. I could image a situation where there's some assumption that most humans have, and whoever wrote elaboration had that assumption and every person who writes Lean definitions also have that assumption and not notice lack of that assumption in definitions.
To me, elaboration part would also benefit from multiple implementations to cross-check against, like with kernel. But I definitely agree, that it is of less importance than kernel.

Henrik Böving (Oct 05 2025 at 20:16):

This is not really a particular design of Lean, Rocq has the same principle so there isn't really something to read about it no.

Having a second implementation of Lean's elaborator is, I would say, completely impossible unless you are bit by bit copying most of what Lean's elaborator does. The algorithms and the employed heuristics in combination with the amount of code that rely on them are just far too complex and unstable to changes.

Jakub Nowak (Oct 05 2025 at 20:17):

Chris Bailey said:

Because it's not used to type check declarations, and it's not needed to read the type of a declaration back to you so you can determine whether what was checked is what you intended. In practice most people do rely on some elaboration/delaboration for the second one, but it's not necessary.

I'm not sure if I understand, but what you describe is that we shouldn't trust elaboration and always read produced Lean.Expr? I never did that when programming in Lean, do people do that?

Henrik Böving (Oct 05 2025 at 20:18):

No people do not generally inspect the generated Lean.Expr by hand

Jakub Nowak (Oct 05 2025 at 20:25):

I started pondering these questions about Lean elaboration after reading that paper: https://arxiv.org/pdf/1904.11818v2

TL;DR; they use model of computation called L, and they wrote translation from Rocq to L. They also proved that computing function in Rocq and translating the result to L is the same as translating function and it's argument to L and then evaluating application in L.

The reason they did that was to establish confidence in correctness of their translation, but I think, that having something like this also makes one more confident in correctness of Rocq itself. Because now it's being cross-checked in some different model of computation.

Something similar could be done for Lean too.

Henrik Böving (Oct 05 2025 at 20:31):

this also makes one more confident in correctness of Rocq itself. Because now it's being cross-checked in some different model of computation.

Why would that be the case?

Jakub Nowak (Oct 05 2025 at 20:36):

Now, you have three parts: Rocq evalution, L evaluation, and translation from Rocq to L. Having a bug in only one of these, would make the aforementioned proof not type-check. You would need to have bug in at least two of them, to make it slip by.

Henrik Böving (Oct 05 2025 at 20:36):

But evaluation doesn't matter to most proofs that are done in Lean

Jakub Nowak (Oct 05 2025 at 20:37):

I agree that evaluation doesn't matter for proofs, but it doesn't change the fact that it does for definitions, and so for the statements of theorems too.

Henrik Böving (Oct 05 2025 at 20:38):

A lot of definitions that are commonly used in Lean don't really have executable semantics either, reals don't compute, polynomials don't compute etc.

Jakub Nowak (Oct 05 2025 at 20:40):

Don't compute means that the corresponding Lean.Expr doesn't have a normal form?

Henrik Böving (Oct 05 2025 at 20:42):

They are not executable as compiled code, the Expr do have a normal form but I don't know if this is in any way helpful for you to look at for any non trivial thing

Kenny Lau (Oct 05 2025 at 20:52):

Jakub Nowak said:

I agree that evaluation doesn't matter for proofs, but it doesn't change the fact that it does for definitions, and so for the statements of theorems too.

I think you might be a bit confused as to how definitions are made, the part where you can actually call them is quite separated from the part where you prove properties about them, at least to my understanding

Kenny Lau (Oct 05 2025 at 20:52):

being computable and being able to call functions wouldn't increase/decrease my trust because producing a correct result for the first 1000 integers doesn't actually guarantee that it's correct

Kenny Lau (Oct 05 2025 at 20:53):

and most of the time we don't compute the first 1000 outputs to check whether it's correct anyway, we prove theorems about them

Jakub Nowak (Oct 05 2025 at 21:21):

I never meant to mention execution of the code, I think I confused you with my incorrect choice of words.

I realised that proving "computing function in Rocq and translating the result to L is the same as translating function and it's argument to L and then evaluating application in L" basically boils down to saying that application in Rocq translates to application in L and not much else, so it really doesn't give you anything.

Chris Bailey (Oct 05 2025 at 22:04):

Jakub Nowak said:

Chris Bailey said:

Because it's not used to type check declarations, and it's not needed to read the type of a declaration back to you so you can determine whether what was checked is what you intended. In practice most people do rely on some elaboration/delaboration for the second one, but it's not necessary.

I'm not sure if I understand, but what you describe is that we shouldn't trust elaboration and always read produced Lean.Expr? I never did that when programming in Lean, do people do that?

This is introducing a second axis to the initial question: "is elaboration/delaboration, by design, part of the trusted code base" versus "how much trust do Lean users place in elaboration/delaboration when working with Lean on a day to day basis".

Jakub Nowak said:

Thanks Henrik Böving for clear explanation.
Do you maybe have some reading material about this design of Lean?
I'm not entirely convinced by that approach. I could image a situation where there's some assumption that most humans have, and whoever wrote elaboration had that assumption and every person who writes Lean definitions also have that assumption and not notice lack of that assumption in definitions.

Maybe an example would be helpful?

I think what you're asking about broadly is how do we know that the elaboration of complex definitions works correctly if those compilation steps not trusted and we're not reading the output, especially for complex recursive or monadic functions that, when elaborated, tend to very quickly lose readability. The answer that you prove characterizing lemmas; you write and prove theorems that show your function exhibits the properties you want it to have, where those theorem statements are generally going to be (relative to the definition) extremely concise.

Jakub Nowak (Oct 05 2025 at 22:11):

Chris Bailey said:

The answer that you prove characterizing lemmas; you write and prove theorems that show your function exhibits the properties you want it to have, where those theorem statements are generally going to be (relative to the definition) extremely concise.

But why cannot it be the case that you hit exactly the same bug in both definition and characterising lemmas?
Imagine, I replace elaborator with a buggy one, that just elaborates every term to a constant True. The whole of mathlib would still compile successfully (well, if you ignore #guard_msgs tests).

Kenny Lau (Oct 05 2025 at 22:13):

well then you would be able to prove 0 = 1 as well because that just means True

Kenny Lau (Oct 05 2025 at 22:14):

you're right that mathlib itself contains only true theorems, but everyone in the process of writing mathlib constantly bumps into false and unprovable theorems (which never make it into mathlib)

Jakub Nowak (Oct 05 2025 at 22:23):

Yeah, but in my example of just elaborating everything to True you don't have to prove 0 = 1 to notice something is wrong. You can just read implementation of this elaboration and decide that elaborating everything to True seems wrong.
Real world situation would be having an elaboration that looks correct for reviewers and one that allows proving statements that look correct for reviewers. But maybe some of these statements are not actually correct?

Chris Bailey (Oct 05 2025 at 22:45):

Jakub Nowak said:

Yeah, but in my example of just elaborating everything to True you don't have to prove 0 = 1 to notice something is wrong. You can just read implementation of this elaboration and decide that elaborating everything to True seems wrong.
Real world situation would be having an elaboration that looks correct for reviewers and one that allows proving statements that look correct for reviewers. But maybe some of these statements are not actually correct?

If you've reached this level of bug where everything outside the kernel is not only buggy, but both catastrophically and deceptively buggy, then I think by definition you have to look at the result of elaboration and how it's interacting with the kernel.

The reference type checkers include a pretty printer, and you would only have to check types, so the output is really quite readable for theorem statements. It's basically what you get with pp.notation false.

Michael Gehlke (Oct 06 2025 at 10:08):

Henrik Böving said:

The algorithms and the employed heuristics in combination with the amount of code that rely on them are just far too complex and unstable to changes.

Isn't this just recreating the persistent problem with proofs?

We have some solid logical core that we believe but our process for translating higher mathematics into that core is at a level of complexity which introduces errors -- whether that's what happened with calculus kicking off the drive for modern formalism or the issues with Voevodsky's work that inspired his efforts.

Elaboration seems like a critical step in formal verification systems.

Henrik Böving (Oct 06 2025 at 11:20):

It doesn't introduce errors. When I say the algorithms are unstable I mean that very subtle changes can cause them to time out on a lot of stuff instead of providing a (almost always) correct result. Whenever we do a change to e.g. the definitional equality engine Kim is busy for days to make subtle changes to mathlib so the defeq algorithm terminates again in certain situations. So what happens when you have two implementations of Lean's elaborator is that (given enough engineering time) both will likely turn out to be correct but accept a different set of surface level terms unless they mirror each other perfectly. The problems that are being solved are just too difficult and heuristic driven to have two actually separate implementations that do the same on all inputs.

Michael Gehlke (Oct 06 2025 at 11:33):

As I understand your description:

small changes cause algorithm instability, eg, timeouts
and it's hard to know where those will occur because the system itself is complex
and trying to create another copy will cause it to accept different terms (because of timeouts?)

What is driving the conclusion that this software is correct and doesn't include precisely the subtle translation bugs that have plagued higher level mathematical proofs historically?

To my perception, this is a critical step in the trust chain (translating my code into the internal type theory, which is then checked), but I'm not seeing what's assuring it. (And I'm happy to believe that's just my own lack of deep knowledge about Lean's internals, eg, deep insight about Lean.Expr.)

Do you know a good source to learn more about Lean's elaboration process?

Henrik Böving (Oct 06 2025 at 11:58):

What is driving the conclusion that this software is correct and doesn't include precisely the subtle translation bugs that have plagued higher level mathematical proofs historically?

As pointed out above it is highly unlikely that an elaborator that is built for a specific purpose will produce an expression that still type checks and behaves as you, the user who is conducting the proof, expect it to. By far the most likely result in case of an elaborator bug of any form is that the generated expression doesn't type check anymore which is sure to be caught by the kernel (which in turn very much can be re-implemented multiple times and people have done so). Even if the situation arises where the elaborator does end up producing a wrong term that type checks it is very likely that the user who is conducting the proof will not be able to continue the proof how they intended to because they do of course have an idea of what they want to do but suddenly the underlying expression is different so it doesn't work anymore.

This is not to say that a second implementation of the elaborator would not aid in increasing the trust you can have in the elaborator, it's just extremly unlikely that a bug in the elaborator triggers and goes unnoticed by both the kernel and the user to begin with.

Michael Gehlke (Oct 06 2025 at 12:05):

I'm not sure I follow that chain of logic: why wouldn't your proof work due to the elaborator error?

Naively, the number of math theorems that were thought to be true but had subtle counterexamples suggests that if, eg, the elaborator slightly restricts the type during translation that you might exclude such counter examples, thereby making the proof work as you thought (but hiding the fact your theorem isn't actually true).

Aaron Liu (Oct 06 2025 at 12:09):

I feel like the chance that an elaborator error gives you a statement that still typechecks is very small

Henrik Böving (Oct 06 2025 at 12:09):

eg, the elaborator slightly restricts the type during translation that you might exclude such counter examples

The elaborators are very general pieces of software that are not geared towards a particular math framework or something like that. The likelihood of an elaborator producing a statement (due to a bug) that still type checks and then even happens to include a restriction to the proof statement which simplifies your proof is just tiny.

Aaron Liu (Oct 06 2025 at 12:14):

And even if it still typechecks, the chance that the error has modified the statement in a way that your originally flawed proof will now work, and also that you won't catch the modification while you're in the process of typing up the proof into Lean, is even smaller still.

Michael Gehlke (Oct 06 2025 at 12:14):

Okay -- what indicates that?

I'm happy to go read more about how the elaborator works if somebody can point me in that direction; I acknowledge that you don't feel that's a likely scenario, but I hope you can understand why that's not enough to fulfill my interest in the topic.

Aaron Liu (Oct 06 2025 at 12:17):

Well, for example, there's also a pretty printer that will take the expression and pretty-print it and if the elaborator does something weird the output of this won't match what you expect

Aaron Liu (Oct 06 2025 at 12:23):

It's also basically impossible to prove any sufficiently complicated theorem in Lean without looking at either the tactic state or expected type window (which both contain pretty-printer output).

Jakub Nowak (Oct 21 2025 at 22:28):

Henrik Böving said:

This is not really a particular design of Lean, Rocq has the same principle so there isn't really something to read about it no.

But in Rocq the translation from Rocq to typed lambda calculus is verified: https://metarocq.github.io/#pcuic
Here it says there's a proof that it preserves call-by-value semantics: https://github.com/MetaRocq/metarocq/blob/9.0/pcuic/theories/README.md

Chris Bailey (Oct 21 2025 at 23:16):

But in Rocq the translation from Rocq to typed lambda calculus is verified

The second link is broken, but are you sure that's what that says anyway? The proof statement is about a translation between TemplateRocq and PCUIC, the description of TemplateRocq is "It takes Rocq terms and constructs a representation of their syntax tree as an inductive data type. The representation is based on the kernel’s term representation", emphasis mine.

This is a gigantic project and it's not immediately obvious how much of the Rocq syntax <-> kernel term pipeline they are or aren't covering.

Jakub Nowak (Oct 22 2025 at 00:06):

Chris Bailey said:

The second link is broken

Works for me. :astonished:

Chris Bailey said:

"[...] The representation is based on the kernel’s term representation", emphasis mine.

Ah yes, you're right. So I think this project is only about writing verified kernel.

Chris Bailey said:

it's not immediately obvious how much of the Rocq syntax <-> kernel term pipeline they are or aren't covering.

I looks like they have OCaml code for quoting and unquoting, so I guess that part is not verified.

Last updated: Feb 28 2026 at 14:05 UTC