Zulip Chat Archive
Stream: lean4
Topic: Question about verification of the C IR output
Eduardo Sandalo Porto (Sep 05 2025 at 18:02):
Hi! I'm doing a course on formal verification for software security organized by professor Lucas Cordeiro, and we have to present a short talk on a subject related to the course.
For my talk, I'm trying to use ESBMC to verify the C output generated by the Lean compiler for Lean programs. As of now I've only been finding false-positives (most times, the verifier believes a null-pointer dereference can happen where it can't). I probably won't find any issues using simple techniques, since the Lean compiler is remarkably well written, so I'd also like to talk about what's done by the Lean maintainers to assure the IR generation works correctly.
This leads me to ask - what are the techniques used to verify/assure correctness for the produced Lean IR? Are any tools or testing used, or is it mostly done through manual auditing? I also wonder about any formal techniques done to verify the C/C++ parts of the Lean compiler code base, such as the trusted kernel. Do any of these parts, especially the kernel, use any tools or methods for verification? Although I've found a lot of mentions to the trusted kernel throughout the language manual, I couldn't find a section explaining exactly what techniques allows us to trust it.
Thank you for your time!
Niels Voss (Sep 05 2025 at 18:26):
As far as I know, the trusted kernel has not actually been formally verified at all. There are however multiple reimplementations of the trusted kernel so that you don't actually have to just trust Lean's kernel, but can instead run your code through other checkers as well.
Niels Voss (Sep 05 2025 at 18:28):
I'm sure you've probably already seen this, but there is this paper (which I haven't had time to read yet) which describes how reference counting works in Lean: https://arxiv.org/abs/1908.05647
Henrik Böving (Sep 05 2025 at 18:46):
so I'd also like to talk about what's done by the Lean maintainers to assure the IR generation works correctly.
Lean is a bootstrapping programming language with a lot of non trivial code in it. So quite often when mistakes during development of the code generator are made they can be easily noticed (though of course not necessarily easily be fixed) by just recompiling lean with itself and then running the hundreds of tests that we have as they are quite likely to hit a miscompilation that causes a segfault etc. if there is one.
Are any tools or testing used, or is it mostly done through manual auditing?
We don't do things like run model checkers etc. over the generated IR no. As Niels points out for some of the transformations that the compiler does there is theoretical work that demonstrates that the transformations are correct (though the practically implemented ones of course employ additional unverified tricks).
I also wonder about any formal techniques done to verify the C/C++ parts of the Lean compiler code base, such as the trusted kernel.
Markus Himmel and Paul Reichert have previously used fuzzers to find bugs in the implementation of the Lean kernel. We also live in a world where companies spend large sums of money trying to get AIs to do proofs and these AIs occasionally learn to generate bugs that trigger bugs in the code generator (which can be abused to do proofs through native_decide if you opt into the "trust the code generator" axiom).
I couldn't find a section explaining exactly what techniques allows us to trust it.
The general argument for why you can trust the kernel is that:
- It is a very small code base that is rarely ever touched and people have been using it for quite long so the chance of more bugs existing is quite small.
- Thanks to the modular design of Lean people can actually just implement another Lean kernel that implements the same theoretically proven algorithm (+ potential optimizations of course) and run their proofs against that kernel as well.
I would generally expect model checkers to struggle with a program like the Lean kernel as it is operating on a ton of pointers (that are on top of that often reference counted), usually in a recursive manner and as such has a gigantic search space for solvers to go through.
Chris Bailey (Sep 05 2025 at 20:04):
Eduardo Sandalo Porto said:
Do any of these parts, especially the kernel, use any tools or methods for verification? Although I've found a lot of mentions to the trusted kernel throughout the language manual, I couldn't find a section explaining exactly what techniques allows us to trust it.
There are different axes of trustworthiness in this case, "is the implementation sound" which is about logic errors in the kernel implementation (and in dependencies), "is the transformation from source code to executable sound" concerning problems in the implementing language's toolchain, "is the execution sound" which concerns the execution environment (dylibs, OS, hardware), and "is the type theory sound".
Leaving the type theory question aside since it sounds like you're focusing more on the implementation side, the short answer given the current state of affairs is (1) the existence of multiple implementations using different languages and toolchains, and (2) relatively small code size. Both of these are fairly good (but of course not perfect) protection against compiler bugs and logic errors. Also the type theory gives a relatively well defined spec of sorts to start from, and we have mathlib to act as a large test suite. I think Mario's lean4lean has some proofs relating the execution to the theory.
Verifiable guarantees about execution environments and toolchain soundness (for C, C++, Rust, etc.) are of course desirable, but extremely difficult to come by and would involve people from the far reaches of the computing universe participating. Even then, you will often have a bootstrapping problem with verification tool which are themselves unverified or compiled with an unverified toolchain.
There is probably a bright shining future in which some kind of proof framework (whether Lean's kernel or something else) is implemented with a completely verified path from the monitor down to the hardware, but we're very far from that, and without any better base camps to start climbing the mountain from, the pragmatic choices of multiple independent implementations and small code size are two pretty reasonable places to start.
Eduardo Sandalo Porto (Sep 06 2025 at 18:27):
Thank you very much for your answers! I think Niel's, Henrik's and Chris' answers complement each other well, and I've gotten a good idea of what's done to keep the implementation and execution sides of Lean correct. I'll be taking a look at the specific papers you've mentioned and continuing my experiments with tools intended for C and C++, and maybe report back in case of any interesting findings.
Niels Voss (Sep 06 2025 at 18:44):
There is probably a bright shining future in which some kind of proof framework (whether Lean's kernel or something else) is implemented with a completely verified path from the monitor down to the hardware, but we're very far from that, and without any better base camps to start climbing the mountain from, the pragmatic choices of multiple independent implementations and small code size are two pretty reasonable places to start.
I believe the most practical way to achieve this would be to verify Lean's kernel in a language designed for down-to-the-hardware verification like CakeML or MM0, or alternatively translate Lean proofs into a simpler, generic proof format like MM0's. However, this only checks the correctness of Lean's kernel, not Lean's compiler or generated Lean code.
Niels Voss (Sep 06 2025 at 18:48):
I do think that if we were to tackle the problem of verifying compiled Lean code, it would be better to verify the generated C rather than verify the compiler, because the semantics of Lean are probably very hard to describe. Additionally, the Lean compiler is implemented in Lean, so even if you verified it, it wouldn't actually be sufficient to ensure its correctness. I consider the actual proof-verification part of Lean (which only depends on the reliability of the kernel) to be very robust because of the existence of external checkers, so you could even use Lean itself to express the statement that a particular bit of C code is correct.
Last updated: Dec 20 2025 at 21:32 UTC