Zulip Chat Archive
Stream: general
Topic: Trusted codebase size
Martin Dvořák (Feb 26 2025 at 13:08):
For years, I have been saying that Lean has a small core and the soundness of proofs in Lean depends on the core only (assuming hardware is flawless, and so on). Lately, however, people have been asking me for a concrete number, which I don't know.
How many lines of code is it that we really have to trust? Consider native_decide to be outside of the trusted codebase but the code that ensures Lean.ofReduceBool appears in #print axioms to be part of it. Consider the VS Code extension (and its backend support) to be outside of the trusted codebase, even though, had a dangerous bug been introduced into the VS Code extension, it might potentially be used to deceive people. Do we get to a reasonable number?
As far as I know, the size of the core increased from Lean 3 to Lean 4, and the trust in it has decreased accordingly. I understand the role of external checkers; you can perhaps also inform me what their code size is.
Floris van Doorn (Feb 26 2025 at 13:32):
I am not a great authority here, but this is what I've gathered. Some things might be wrong here.
I believe the kernel in the Lean codebase has not been intentionally designed to be as small as possible, and the boundaries between what is the kernel and what isn't is not an entirely hard line. I believe in the Lean code base it's about 20k lines of code(?)
There are external typecheckers (lean4lean, nanoda_lib) that work well in practice (i.e. they have the required performance optimizations to be able to check Mathlib), and they're about 6k-8k lines of code. This also includes things like a pretty printer, so a minimal checker can be smaller.
Martin Dvořák (Jan 19 2026 at 18:09):
@Leonardo de Moura answered the question in his talk today with the estimate of 8000 lines.
James E Hanson (Jan 19 2026 at 19:07):
Floris van Doorn said:
the boundaries between what is the kernel and what isn't is not an entirely hard line.
A good example of this is the fact that the kernel is written with the assumption that certain things are defined specifically as they are in Prelude (such as the hard-coded behavior of some operations on Nat).
Bbbbbbbbba (Jan 19 2026 at 19:25):
Could these things be easily checked with asserts?
Thomas C. (Jan 19 2026 at 19:30):
@Martin Dvořák by any chance, would the talk you're referring to be accessible somewhere?
Martin Dvořák (Jan 19 2026 at 19:35):
Thomas C. said:
Martin Dvořák by any chance, would the talk you're referring to be accessible somewhere?
It was recorded. I think it will soon appear here:
https://www.youtube.com/channel/UCWe5B7Ikr0AI9727doEUxPg
Chris Bailey (Jan 19 2026 at 20:29):
Bbbbbbbbba said:
Could these things be easily checked with asserts?
It's not currently easy, no. I think there is a plan upstream to make forward progress on this at some point.
James E Hanson (Jan 19 2026 at 21:10):
Chris Bailey said:
It's not currently easy, no.
Why is it not easy?
Chris Bailey (Jan 19 2026 at 21:41):
Not all of the relevant items are declared (and therefore exported) simply. For some of them, you can just require e.g. defEq (x + 0) x and defEq (x + y.succ) (succ (x + y)) and be done with addition. Those are easy enough to construct by hand in the checker, and they're robust against changes in lean.
For things like div or mod, the compiled definition in lean is currently not so simple to construct, and changes in things like elaboration of recursive functions or auxiliary declarations might change what gets exported in ways that are equal to what was there before, but not definitionally equal. It would be a very annoying and unpredictable breakage in the checker, and this by hand construction is not very fun.
Joachim Breitner (Jan 19 2026 at 22:10):
Would it suffice to check that these functions
- have the right propositional equations (by requiring such a theorem to be proven to the kernel) so that if they reduce literals to a literal, it has to be the right one, and
- are defined without any axioms or opaques, so by some meta theory theorem, they do normalize to a ground value when applied to ground values? Or does no such theorem exist?
Chris Bailey (Jan 19 2026 at 22:51):
I think option 1 would work, that was my guess for how to do it in our side discussion with Mario before he introduced the other option of changing the definitions so that the definitional equalities work. If it were possible to get the definitions to work out that would be nicer of course, but I don't know how practical it is. On the bright side I think the relevant lemmas for the alternative route are already there (e.g. Nat.div_eq, Nat.mod_eq) though with seemingly inconsistent names (see Nat.pow_eq).
Joachim Breitner (Jan 20 2026 at 07:20):
Sorry, I meant checking both: the first to know that that the function can't go wrong, the second to know that the equalities on literals are actually definitionally
Last updated: Feb 28 2026 at 14:05 UTC