Zulip Chat Archive

Stream: lean4

Topic: Performance aware coding in Lean


Ioannis Konstantoulas (Jul 02 2023 at 13:32):

Hello all; I have some questions regarding performance-aware coding in Lean.

1) I heard (or misunderstood) in the Lean discord that recursive calls to
functions push data to the heap for various reasons. Are there ways (e.g.
tags/attributes) to instruct the compiler to treat a specific tail-recursive
function defined on concrete types (not polymorphically) specially by pushing to
stack for performance? Alternatively, am I misunderstanding the whole thing?

2) Sometimes I want to load static data (e.g. a large table of precomputed
floats, but still one that fits in a decent cache) at compile time whose size
will never change. Is an Array Float still the best data structure, or can we
do better in terms of performance by telling the compiler that this data is
totally unchangeable?

3) Are there any internal methods in Lean to measure and report the time of an
expression evaluation or a program? Additionally, is there a way to do timeouts
dependent on clock duration?

4) Is there any way to write inline assembly (after perhaps declaring an
architecture and platform) or other way to call specialized CPU instructions?
What about GPU coding?

Context: this is all for curiosity's sake. I have not read how Lean transpiles
to C code and how that is subsequently compiled. I am using Lean 4 v4.0 Release
on Linux 6.3.9 over an Intel i7 machine.

Henrik Böving (Jul 02 2023 at 13:41):

  1. You misunderstood, while the Lean compiler itself does not guarantee tail call elimination (right now, it will hopefully in the future) it does output things to C code in the end and modern C compilers are fully capable of TCO so you will get TCO almost all the time if your code is written properly. Where have you heard about this type of heap stuff?
  2. The Lean compiler does understand that if you have a def x : Array Float := ... x is a constant and will attempt to treat it as such. I don't think we put it into the .data section right now though. Instead it will at program initialization time create the Array object for you and from then on you can just use it. Generally speaking it does try to do this with all closed terms that it can find.
  3. What kind of expression evaluation do you mean here? You can benchmark the compiled programs like any other ones wiht perf + a perf data viewer like hotspot. I don't think we have instrumented the bytecode VM that is behind #eval, it is also not particularly optimized so if you want to have actually fast stuff you basically always want to use the compiled version. Lastly there is #reduce, using that for anything performance related will usually not end up well either.
  4. No, but the LLVM backend might enable that in the future

Further remarks: Using a Lean nightly is what almost everyone here does, it's what gets you the actual latest stuff and gives you access to the current eco system which is almost exclusively developed against nightlies. Furthermore for performance aware coding in Lean the most important thing that you can probably do is treating our efficient datastructures like Array and HashMap in a linear way.

Ioannis Konstantoulas (Jul 02 2023 at 14:01):

Thank you for the reply!

Regarding 1), it was an offhand remark in a functional programming discord server. The commenter said something about data in polymorphic functions not being able to be represented in the stack, or something like that. I had no doubt about TCO, it was more about the control of where data is placed in recursive functions.

On 2), when I def x that way, I can still grow the Array, right? So the allocation still needs to happen on the heap?

On 3), for instance I'd like to run a (compiled) program and it outputs the time lapsed for one of its actions or effects.

Regarding optimized data structures, I know Array and HashMap are the fundamental ones. Are there any other that I should be aware of?

Ioannis Konstantoulas (Jul 02 2023 at 14:02):

Regarding nightly, I am not doing anything so advanced that I need the latest features! I am still learning to swim in the shallows :blush:

Tomas Skrivan (Jul 02 2023 at 14:33):

One catch with Array Float is that it stores boxed floats, so it is very cache unfriendly. Use FloarArray which stores all floats in one block of memory.

Both Array and FloatArray support setting capacity i.e. specify the expected size of the array so you minimize reallocation.

Kyle Miller (Jul 02 2023 at 14:35):

Ioannis Konstantoulas said:

Regarding nightly, I am not doing anything so advanced that I need the latest features!

You probably still want to be on the lastest version, since there are improvements and bug fixes too! 4.0.0 has had a few releases, but even the newest is about a year old, and Lean 4 has only been public for 2.5 years

Kyle Miller (Jul 02 2023 at 14:37):

In addition to what @Tomas Skrivan said, regarding 2 what happens is that the array will be copied the very first time you push anything onto the array, even if you set the capacity. This is due to it being a global definition, which needs to retain the original array. After that though, if you keep growing that (copied) array it can do so in-place

Ioannis Konstantoulas (Jul 02 2023 at 15:46):

Oh, I see, thank you for the information about nightly!

James Gallicchio (Jul 02 2023 at 16:39):

RE: timing, if you are within the IO monad you can use IO.monoMsNow to get a monotonically increasing clock value that should correspond to the system clock.


Last updated: Dec 20 2023 at 11:08 UTC