## Stream: general

### Topic: Lean 4 at ETH Zurich Compiler Social - Wed 12. Dec

#### Tobias Grosser (Nov 28 2018 at 12:48):

Dear all,

I am very pleased to announce that on WEDNESDAY 12th December, we will have on our pre-christmas compiler social as special guest Sebastian Ulrich who presents:

"Towards Lean 4: An Optimized Object Model for an Interactive Theorem Prover"

Lean 4, the next version of the Lean theorem prover, will move most of its frontend code from C++ to Lean itself. To ensure that the resulting code is reasonably efficient, Lean 4 will feature a new code generation backend together with a new object and memory management model. In this talk, I will discuss the general and ITP-specific constraints, such as performance, language interoperability, and startup time, that led us to this model and how we are planning to solve them with it.

Sebastian Ullrich is a second-year PhD student at Gregor Snelting's programming paradigms group at the Karlsruhe Institute of Technology (KIT). He is working on the Lean theorem prover together with Leonardo de Moura (Microsoft Research). Sebastian's research interests are program verification, the design of interactive theorem proving frontends, and macro expansion.

https://www.meetup.com/llvm-compiler-and-code-generation-socials-zurich/events/256742083/

#### Andrew Ashworth (Nov 28 2018 at 12:54):

It would be awesome if this was recorded for us poor plebians who cannot fly to Europe :)

#### Tobias Grosser (Nov 28 2018 at 12:59):

It will (and published if @Sebastian Ullrich agrees ;-))

Sure thing :)

#### Tobias Grosser (Dec 09 2018 at 20:19):

This is happening this Wednesday! We already have 24 people signed up! Whoever is close by is very much invited.

#### Tobias Grosser (Dec 12 2018 at 06:39):

This is happening tonight.

#### Kenny Lau (Dec 12 2018 at 08:47):

is there livestreaming?

No.

#### Tobias Grosser (Dec 12 2018 at 09:16):

But there will be a recording.

#### Edward Ayers (Dec 14 2018 at 13:51):

Any news on when/where the recording is released?

#### Tobias Grosser (Dec 14 2018 at 20:49):

Beginning next week. Still need to recode it.

#### Tobias Grosser (Dec 18 2018 at 20:41):

The upload is fresh so it takes another 30min for the HD version to become public.

#### Tobias Grosser (Dec 18 2018 at 20:42):

Thanks @Sebastian Ullrich for this very interesting talk.

#### Patrick Massot (Dec 18 2018 at 21:56):

I think I understood it very superficially, but it was still interesting and fun to watch.

#### Kevin Buzzard (Dec 19 2018 at 07:55):

My 16 year old son explained parts of it to me. Apparently malloc is expensive

#### Kevin Buzzard (Dec 19 2018 at 09:19):

I deduce from this that you're probably not paying the electricity bills

#### Scott Morrison (Dec 19 2018 at 22:49):

At 44:28, someone in the audiences says "But it can't be after 2019, because [indecipherable]" --- does anyone know that that was?

#### Wojciech Nawrocki (Dec 19 2018 at 23:33):

@Sebastian Ullrich regarding the region mmaping, I presume if you have ASLR on and thus the heap base is randomized, you would indeed get a very low probability of collision, but with systems that always place the heap at the same address (ASLR off in Linux, etc), you'd get a collision every time? Or you could randomize the addresses by manually mmaping every new region in the first place, using a well-distributed hash function as a postfix for the region address :D But that would introduce severe memory fragmentation and would probably be a nightmare for cross-platform support.

#### Patrick Massot (Dec 20 2018 at 00:35):

At 44:28, someone in the audiences says "But it can't be after 2019, because [indecipherable]" --- does anyone know that that was?

Because the time line at the beginning says Lean 4 appears in 201X. That's the same remark as always

#### Keeley Hoek (Dec 20 2018 at 07:44):

What do you mean by "severe memory fragmentation"?

#### Wojciech Nawrocki (Dec 20 2018 at 14:54):

@Keeley Hoek if they want to have their regions uniformly distributed across the address space so that the probability of collision is low when directly loading them into memory, they will naturally be all over the place, as opposed to the usual behaviour of the heap which just expands by adding more pages into a contiguous region

#### Sebastian Ullrich (Dec 20 2018 at 15:24):

@Wojciech Nawrocki Ah, I hadn't thought of page table fragmentation before. Not sure if it could become an issue. Anyway, we've thought about collecting all object files of a package into a single big object file. Then you have less fragmentation, even fewer collisions, and more sharing of object graph parts as an added bonus.

#### Wojciech Nawrocki (Dec 20 2018 at 15:50):

@Sebastian Ullrich Emacs does something kinda similar, so if you're in need of a good horror story, here's theirs :D

#### Sebastian Ullrich (Dec 20 2018 at 16:09):

@Wojciech Nawrocki Nice. The "portable dumper" proposal indeed sounds quite similar.

#### Keeley Hoek (Dec 21 2018 at 04:59):

I see. Though, so long as the regions are quite a bit larger than 4Kib, shouldn't there be essentially no performance penalty?

#### Tobias Grosser (Dec 21 2018 at 09:56):

There is a risk of increased TLB misses, I guess.

#### Mario Carneiro (Dec 22 2018 at 06:07):

@Sebastian Ullrich 38:27

...and some flags like, does it have a metavariable in there, does it have a parameter. So this is just bookkeeping metadata that we use for optimizations. But the nice thing is that the lean type doesn't have to be this ugly. The lean type just says the const constructor just takes a name and a list of universes, and because all the other data is at the end of the object we can just hide it from lean. So if the compiler tries to access any of these fields it will still use the correct offset.

I was metaphorically shouting "don't do that" at my monitor when I heard this. This is how you get irreplicable "lean magic" that the compiler does and you (the lean user) can't. I won't insist on changing the way you handle this particular case, but I would much rather you provide the means for users to do the same thing as you are doing in the compiler. In this case, if you wanted to actually write this in lean, the expr type would actually be the full ugly thing with all these fields, but there would be invariants built in (before or after the initial construction) asserting that these metadata fields are correct (i.e. actually track properties they claim to). You write some nice front end constructors like expr.const that automatically populate the metadata fields with the only values they are allowed to have, and a front end recursor that doesn't give you the metadata when you don't care. Then you hook the front end recursor into the equation compiler so that pattern matching is unaffected. (This last part can't currently be done in lean, but this is a nice use case.)

Working in C++ means that you have the ability to "cheat" and hide things from lean. Avoid the temptation, and lean will be better.

#### Sebastian Ullrich (Dec 25 2018 at 15:52):

@Mario Carneiro These optimizations are completely local to expr.cpp, there isn't a single line in the whole C++ codebase that "cheats" around it. You get exactly the same interface from Lean as you get from C++. The fields really are cheap enough to compute in the expr constructors, and for accessing them we should have primitives just like in Lean 3.

#### Mario Carneiro (Dec 25 2018 at 16:51):

I just want lean to get similar kinds of support for being able to define a complicated inner representation and shielding it from outside users as you can do in C++

#### Mario Carneiro (Dec 25 2018 at 17:34):

I'm not really opposed to this practice, besides the fact that this makes it necessary to make expr meta, but it's one place to try dogfooding. I'm sure you will have plenty of other examples in lean 4 that will require you to address this though

#### Kenny Lau (Dec 26 2018 at 06:03):

@Mario Carneiro and how does this work with second incompleteness?

#### Mario Carneiro (Dec 26 2018 at 06:56):

huh?

Last updated: May 14 2021 at 06:16 UTC