Zulip Chat Archive

Stream: lean4

Topic: Two questions about building Lean with compiler optimization


James E Hanson (Jan 15 2026 at 21:12):

I've been playing around with building Lean with some compiler optimization settings. With the most aggressive settings I could get to work, I was able to reduce the build time of Mathlib on my computer by 13-14% (from about 59 minutes to about 51 minutes).

  1. Is there any particular reason why I shouldn't be using such a build? I know that aggressive compiler optimization can change program behavior and make crashes harder to debug, but is that much of an issue here?
  2. How would I go about building Lean for profiling and profile-guided optimization with LLVM? I've tried to set this up a couple times now and haven't been able to get it to work.

Henrik Böving (Jan 15 2026 at 21:48):

I'd say the answer to the first question depends on what kind of optimizations you are building it with (care to share maybe?). It might of course also make it more difficult to help you with whatever issues you might experience.

We don't have a PGO setup at the moment. It would quite likely also be very annoying to maintain the long term future. The kind of C code that we generate does fluctuate quite a bit due to changes we make to the compiler and we would have to keep mathlib and potentially other repositories profiled etc. with these builds which would be quite a lot of effort for us to maintain for, often, not too much benefit. Check out for exampe what the rust people have done https://blog.rust-lang.org/inside-rust/2020/11/11/exploring-pgo-for-the-rust-compiler/#final-numbers-and-a-benchmarking-plot-twist

James E Hanson (Jan 15 2026 at 22:19):

Henrik Böving said:

(care to share maybe?)

Sure, although it might be a bit embarrassing because I really don't know what I'm doing. This was the product of talking to free-tier ChatGPT about what compiler flags can increase runtime performance if you literally don't care about anything else (e.g., compile time, binary size, runtime safety, ease of debugging).

{
    "name": "optimized",
    "displayName": "Increase runtime speed",
    "cacheVariables": {
        "CMAKE_BUILD_TYPE": "Release",
        "CMAKE_C_COMPILER": "clang-20",
        "CMAKE_CXX_COMPILER": "clang++-20",
        "CMAKE_INTERPROCEDURAL_OPTIMIZATION": "ON",
        "LEAN_USE_MIMALLOC": "ON",
        "CMAKE_C_FLAGS": "-O3 -ffast-math -funroll-loops -falign-functions=32 -falign-loops=32 -march=native -flto -fno-plt -fno-stack-protector -D_FORTIFY_SOURCE=0 -DMI_SECURE=0 -DMI_PADDING=0 -DNDEBUG",
        "CMAKE_CXX_FLAGS": "-O3 -ffast-math -funroll-loops -falign-functions=32 -falign-loops=32 -march=native -flto -fno-plt -fno-stack-protector -D_FORTIFY_SOURCE=0 -DMI_SECURE=0 -DMI_PADDING=0 -DNDEBUG",
        "CMAKE_EXE_LINKER_FLAGS": "-fuse-ld=lld -Wl,-O3,--as-needed,--gc-sections"
    },
    "generator": "Unix Makefiles",
    "binaryDir": "${sourceDir}/build/optimized"
}

If I'm understanding correctly though, "LEAN_USE_MIMALLOC": "ON", at least is redundant, and I'm not even sure if -ffast-math does anything from the point of view of Mathlib as a benchmark (because why would there be any floating-point calculations there?). I'm sure some or most of this is stupid from a cost/benefit perspective.

Henrik Böving (Jan 15 2026 at 22:27):

You will probably be able to reap most benefits from just

-march=native -flto -fno-plt -fno-stack-protector

the remainder is mostly relevant for numeric work loads. We can't (or don't want to) use a few of these but they are certainly on the list at some point.

James E Hanson (Jan 15 2026 at 22:34):

Does that mean that the generic compiler optimization flags like -O2 and -O3 aren't doing much or anything?

Henrik Böving (Jan 15 2026 at 22:42):

Not at all, they are doing a ton of heavy lifting and I would expect -O0 to -O3 is a much larger change than what you are observing here but these flags are different:

  • march=native tells the compiler to emit a binary specifically tuned for your CPU, we can't do that on a build that we release to the general public. We could play with weaker versions of this though.
  • LTO is link time optimization and allows the compiler to do a more global optimization approach after (or rather while) linking the binary from all the object files but this comes at a significant compile time cost
  • disabling the stack protector is giving up on a security feature that is pretty standard to have
  • no PLT might actually be something we can just do

James E Hanson (Jan 15 2026 at 22:48):

Henrik Böving said:

We don't have a PGO setup at the moment.

Would it be possible for me to do this by hand, rather than by just modifying the make configuration?

James E Hanson (Jan 15 2026 at 22:53):

I guess what I should ask is how feasible would it be to do it by hand?

Henrik Böving (Jan 15 2026 at 22:58):

I don't understand what you mean with doing it by hand

James E Hanson (Jan 15 2026 at 23:04):

I guess what I'm envisioning is something like compiling the stage1 Lean build to C and then using Clang's PGO from there.

Henrik Böving (Jan 15 2026 at 23:05):

Well yes but why would that not involve touching the makefile? :D Everythign else seems horrible

James E Hanson (Jan 15 2026 at 23:06):

I guess what I meant by 'modifying the makefile' there was modifying it so that it just produces the Lean/Lake binaries with profiling directly.

James E Hanson (Jan 15 2026 at 23:07):

When you say you don't have PGO set up at the moment, do you mean that the Lean compiler is not able to do PGO, or do you mean that the compilation of Lean itself isn't configured for PGO?

James E Hanson (Jan 15 2026 at 23:08):

James E Hanson said:

I guess what I'm envisioning is something like compiling the stage1 Lean build to C and then using Clang's PGO from there.

Also, how difficult would this be exactly?

Henrik Böving (Jan 15 2026 at 23:10):

The component of Lean that produce the C has absolutely no means to do PGO and won't have for a very long time. But you can just do PGO on the C code and with what clang is doing of course. That would require touching the makefile to give clang the profile information at the appropriate spots (and of course collecting and aggregating relevant profile data to begin with which probably takes hours).

James E Hanson (Jan 15 2026 at 23:16):

Maybe I should have picked this up by now, but does compiling Lean code always go through a C compilation?

Henrik Böving (Jan 15 2026 at 23:17):

It can go through just LLVM without C in theory but yes Lean compiles to C.

Henrik Böving (Jan 15 2026 at 23:18):

James E Hanson said:

Henrik Böving said:

(care to share maybe?)

Sure, although it might be a bit embarrassing because I really don't know what I'm doing. This was the product of talking to free-tier ChatGPT about what compiler flags can increase runtime performance if you literally don't care about anything else (e.g., compile time, binary size, runtime safety, ease of debugging).

{
    "name": "optimized",
    "displayName": "Increase runtime speed",
    "cacheVariables": {
        "CMAKE_BUILD_TYPE": "Release",
        "CMAKE_C_COMPILER": "clang-20",
        "CMAKE_CXX_COMPILER": "clang++-20",
        "CMAKE_INTERPROCEDURAL_OPTIMIZATION": "ON",
        "LEAN_USE_MIMALLOC": "ON",
        "CMAKE_C_FLAGS": "-O3 -ffast-math -funroll-loops -falign-functions=32 -falign-loops=32 -march=native -flto -fno-plt -fno-stack-protector -D_FORTIFY_SOURCE=0 -DMI_SECURE=0 -DMI_PADDING=0 -DNDEBUG",
        "CMAKE_CXX_FLAGS": "-O3 -ffast-math -funroll-loops -falign-functions=32 -falign-loops=32 -march=native -flto -fno-plt -fno-stack-protector -D_FORTIFY_SOURCE=0 -DMI_SECURE=0 -DMI_PADDING=0 -DNDEBUG",
        "CMAKE_EXE_LINKER_FLAGS": "-fuse-ld=lld -Wl,-O3,--as-needed,--gc-sections"
    },
    "generator": "Unix Makefiles",
    "binaryDir": "${sourceDir}/build/optimized"
}

If I'm understanding correctly though, "LEAN_USE_MIMALLOC": "ON", at least is redundant, and I'm not even sure if -ffast-math does anything from the point of view of Mathlib as a benchmark (because why would there be any floating-point calculations there?). I'm sure some or most of this is stupid from a cost/benefit perspective.

All of the arguments you're playing with up here poke the flags used for the compilation of the C/C++


Last updated: Feb 28 2026 at 14:05 UTC