Zulip Chat Archive

Stream: lean4

Topic: What is the kernel, really?—and other fundamental questions


Thomas Murrills (Mar 29 2023 at 22:35):

I'm trying to develop an understanding of how the lean 4 pipeline works (if that's the right term). What I know so far is that syntax gets parsed by the parser, then elaborated into expressions (and/or side-effects) by the elaborator. These operate in an environment which provides access to existing declarations and the like (via CoreM).

Now we hit what I don't know: At some point (at this point?), these expressions get "sent" to the kernel. (By what?) The kernel then does something with those expressions. At some (other?) point, lean code gets compiled into an intermediate representation, and this in turn gets compiled into lower-level code. .oleans are involved somehow. (So are .ileans, which as I understand it are used for e.g. jump-to-definition interactive behavior by the language server.) C code is generated too.

So, to turn this into a few questions:

  1. What is the kernel, really? What does the kernel actually receive, what's its role, and what does it produce? "Where" does it run? The metaprogramming book mentions it performs "core tasks like type inference and definitional equality checks"—but this is a bit confusing to me, since we seem to do those during elaboration all the time. Are we communicating back and forth with the kernel behind the scenes during elaboration? What else does the kernel do, and what does it talk to?
  2. Where do things like the environment, elaborator, and parser "live" and operate with respect to the kernel? ("Where" is e.g. the environment stored?) "When" do they interact with the kernel, and what orchestrates their actions with respect to one another?
  3. What's the final destination of lean code, and what stops does it make along the way? Somehow oleans are produced—what's their role in all this? And eventually C code is produced—when, and by what? Are lower-level executables produced somehow as well? How do they relate to each other (and to the kernel)? (Is this the domain of lake?)
  4. How are the products of compilation re-used? When I evaluate lean ... in the command line or use #eval via the language server in vs code, what is it that has the role of finding which pieces of compiled code to use, linking them up, and running them to produce output?

I know this is a lot of questions—I'd really appreciate even partial answers, or pointers to some documentation that lays out the roles of lean's component parts. I'd also appreciate any confusion I've exhibited here being pointed out—maybe parts of these questions don't make sense, or maybe I really should be asking a different question. All info is welcome—the motivation here is that I feel like I don't know how lean "really works" at this level yet, and I'd like to! :) Thanks for reading and thanks in advance.

Henrik Böving (Mar 29 2023 at 22:56):

  1. The kernel is (one of the very few) parts of Lean that are written in C++ (lives here: https://github.com/leanprover/lean4/tree/master/src/kernel) ...it is in essence "just" an implementation of Leans type theory that is done in a very minimal fashion so we can be sure nothing is going wrong. Yes the elaborator can do things like type checking and defeq as well but those can be quite a bit smarter than the kernel (And in particular can work with meta variables) we talk to it simply via FFI really. I do not knnow myself where exactly we talk to the kernel but a bit of clever grepping could probably figure that one out. It should be at some point after elaboration is done and basically checks for us that we did not mess up.
  2. The kernel does not care about things like the elaborator or parser. The environment is accessed via C FFI into Lean (instead of from Lean like when we send stuff to it), after all the environment is just another Lean type that gets compiled into C. You can take a look at the FFI interface here https://github.com/leanprover/lean4/blob/master/src/kernel/environment.cpp you will find that the things declared extern here are at some point @[export] tagged in the Lean code base. And as to when they interact with the kernel...well the kernel just accesses them when it wants to after you asked it to verify something
  3. There is a nice visualization by Sebastian from his gotta prove fast talk that visualizes the rough idea: image.png As you can see after the elaborator the path of Lean expressions basically splits up. They get sent to the kernel on one hand and to the compiler on the other hand. Now I'm saying the compiler currently there are in fact two compilers, there is the rewrite which is purely in Lean and can be found here: https://github.com/leanprover/lean4/tree/master/src/Lean/Compiler/LCNF sadly it is not yet done so not yet active. THe currently active one is split into two parts the LambdaRC part that does our reference counting optimizations written in pure Lean: https://github.com/leanprover/lean4/tree/master/src/Lean/Compiler/IR and the part that does "regular" optimizations like code simplification etc. that one is here: https://github.com/leanprover/lean4/tree/master/src/library/compiler and written in C++ (the LCNF directory is currently a rewrite of the C++ but better, the next step would be to translate it to the LambdaRC IR so we can throw the C++ out but we've been stuck on that for a while. However I do have hopes that we might start making progress on this in April again). Anyways once we have been in the compiler the finally optimized IR can by now actually go one of 3 ways:

    • to our interpreter which I have to admit I forgot the path for but it is also written in C++, not too much optimized and we want to make it better in the future too. The interpreter is what we for example use for #eval
    • to the C code generator here: https://github.com/leanprover/lean4/blob/master/src/Lean/Compiler/IR/EmitC.lean which, as you might guess makes it spit out C. You can view that C in build/ir/*.c if you are using lake to compile your thing. The C then gets compiled by a slightly custom configured clang called leanc into a binary whihc is what lake build is essentially a frontend for in bigger projects
    • to the LLVM code generator here: https://github.com/leanprover/lean4/blob/master/src/Lean/Compiler/IR/EmitLLVM.lean whihc, as you might guess makes it spit out LLVM! It is however not enabled per default because we are still working on this as well, it does however pass all tests (of some bootstrapping stages at least..)
  4. When you #eval it is just an interpreter so it just has to use the generated IR (Lean autoomatically generates optimized IR for all definitions, a thing we might not want to keep enabled forever as you can see in #lean4 dev) and runs that through the interpreter. When you use lean just normally it does interpretation as well, if you call it the way lake does (you can see what it does with lake build --verbose) it will make C stuff which gets linked together by lake.

There is afaik barely any documentation about all of this /o\ @Thomas Murrills

If you have further questions about the compiler I can talk about that a lot, for the other components you were interested in someone else will have to explain^^

Thomas Murrills (Mar 29 2023 at 23:05):

Wow, thank you very much for this answer @Henrik Böving! :) I might take you up on that offer re: the compiler once I read through & digest it!

Kevin Buzzard (Mar 29 2023 at 23:07):

I have a completely dumb question about the compiler: why rewrite in lean, what are the advantages, and won't it be slower than the C++ one?

Wojciech Nawrocki (Mar 29 2023 at 23:10):

As far as the metaprogramming book goes, AFAIK the phrasing

Expressions (terms of type Expr) carry the data used to communicate with the Lean kernel for core tasks such as type inference and definitional equality checks.

is a bit unfortunate because the kernel doesn't do type inference, rather the elaborator does it. The kernel is just there to check that terms are type-correct. But someone should definitely correct me if I am misunderstanding.

Henrik Böving (Mar 29 2023 at 23:15):

As for the initial "why" for Lean in particular thats something Sebastian or Leo will have to explain.

But in general using higher level programming languages usually makes things easier for the developer since they dont have to keep so much stuff in mind. For example in C++ you have to think about memory, about having your data manipulated from other functions etc. So its just a nicer experience to develop Lean instead, especially if you like functional programming which is...well let's say barely possible in a nice way with C++.

As for the speed aspect, Lean does a lot of things to get very close to C++ level performance:

  • Lean code actually gets translated to C and then compiled with the same compiler as C++. However it is of course not as good as handwritten C but!
  • Leans runtime, that is, the thing that manages stuff like memory for you, as well as the code generator have quite a few tricks up their sleeve to get you very fast again. In the end its of course unlikely Lean code can keep up with handwritten and optimized C++ code but we can get quite a bit of the way there.

Henrik Böving (Mar 29 2023 at 23:20):

Oh and if you were referring only to the code generator part instead of the whole system (both of those are often called compiler) we wanted to do that one in Lean because we have a lot of feature ideas that we wanted to do, we saw quite a few issues with the old implementation and since it is not a critical part and Lean is more fun than C++ we decided to do it in Lean basically^^

Scott Morrison (Mar 29 2023 at 23:58):

Another reason to rewrite the compiler in Lean is just the usual for PL development: "to eat one's own dogfood".

Arthur Paulino (Mar 30 2023 at 12:27):

Kevin Buzzard said:

I have a completely dumb question about the compiler: why rewrite in lean, what are the advantages, and won't it be slower than the C++ one?

Here's another reason for that: it allows more people to build meta tools in Lean, about Lean code, more easily. For example, the entire yatima stack, https://github.com/lurk-lab/yatima, which allows us to

  1. content-address any Lean 4 source
  2. build an alternative kernel for Lean 4 that consumes the content-addressed data
  3. compile any Lean 4 source to a language called Lurk

So we're compiling our alternative kernel to Lurk, which will allow us to create and verify zero-knowledge proofs of Lean 4 typechecking

Jannis Limperg (Mar 30 2023 at 12:54):

Wojciech Nawrocki said:

As far as the metaprogramming book goes, AFAIK the phrasing

Expressions (terms of type Expr) carry the data used to communicate with the Lean kernel for core tasks such as type inference and definitional equality checks.

is a bit unfortunate because the kernel doesn't do type inference, rather the elaborator does it. The kernel is just there to check that terms are type-correct. But someone should definitely correct me if I am misunderstanding.

Good point, I'll fix this.

Notification Bot (Mar 30 2023 at 12:57):

14 messages were moved from this topic to #lean4 > Status of precompilation by Sebastian Ullrich.

Mario Carneiro (Mar 30 2023 at 16:42):

Arthur Paulino said:

Kevin Buzzard said:

I have a completely dumb question about the compiler: why rewrite in lean, what are the advantages, and won't it be slower than the C++ one?

Here's another reason for that: it allows more people to build meta tools in Lean, about Lean code, more easily.

I have to second this point. Even if the lean elaborator is slower, it is more configurable at every level, and this is absolutely a deliverable of the language itself. You would be very hard-pressed to rewrite the lean 4 elaborator in C++ while still keeping all user behaviors the same, where user behaviors includes being able to call / adapt elaborator-internal functions.


Last updated: Dec 20 2023 at 11:08 UTC