Zulip Chat Archive

Stream: lean4

Topic: Lean 4 Toolchain much bigger than lean3


MohanadAhmed (Jul 27 2023 at 14:24):

In the elan/toolchains folder if you ask for the sizes of each toolchain you get something similar to the following:

Mohanad@LAPTOP-OTL5MR4E MINGW64 ~/.elan/toolchains
$ du -h --max-depth=1 | sort -n
5.3G    .
26M     ./leanprover-community--lean---3.43.0
26M     ./leanprover-community--lean---3.50.0
26M     ./leanprover-community--lean---3.51.0
26M     ./leanprover-community--lean---3.51.1
739M    ./stable
906M    ./leanprover--lean4---nightly-2023-06-20
907M    ./leanprover--lean4---nightly
907M    ./leanprover--lean4---nightly-2023-07-12
907M    ./nightly
908M    ./leanprover--lean4---nightly-2023-07-15

All the lean 3 toolchains are very small about 26M compared to the above 800MB sizes of the lean4 toolchains. I find this surprising. Looking into the folders the bulk of the size is coming from lib\lean\Lean. What is Lean4 doing that requires it to be very big (and according to my subjective feeling slower)?

I have a vague idea that "lean is now extensible in the lean language and most of its components are compiled from lean into .oleans that are packed with the toolchain. The main in these "written in lean components" being the subfolders of lib\lean\Lean:

 Compiler | Data | Elab | Linter | Meta | Parser | ParserCompiler | PrettyPrinter | Server | Util | Widget

I am trying to understand, so I have no clear questions but feel free to chime in with any thoughts. Perhaps a few "vague questions":

  1. where can I read about the change in architecture (or only the new architecture)
  2. what makes things compiled into oleans bigger in size than whatever lean 3 was doing to compile them ? Is there anywhere I can read about this?

Floris van Doorn (Jul 27 2023 at 14:51):

This thread contains some info why Lean 4's oleans are bigger, and how we can reduce some of that size: https://leanprover.zulipchat.com/#narrow/stream/270676-lean4/topic/olean.20dump.20tool

Henrik Böving (Jul 27 2023 at 17:31):

Its not only the Oleans, large chunks of this amount are caused by shared objects such as LLVM that we ship

Scott Morrison (Jul 27 2023 at 23:37):

MohanadAhmed said:

and according to my subjective feeling slower

This subjective feeling is mostly wrong. We've stopped running the comparison bot against mathlib3, but there was overall about a 1.8x wallclock speedup compiling mathlib4 vs mathlib3 (and a 3.3x speedup in CPU time).

Scott Morrison (Jul 27 2023 at 23:38):

(and that's before people started seriously refactoring the slow files that were ported at the very end)

Eric Wieser (Jul 27 2023 at 23:38):

Presumably the larger oleans mean that Lean is more I/O bound than it used to be?

Eric Wieser (Jul 27 2023 at 23:39):

And so on systems with slow hard drives, Lean4 could be slower than Lean 3

Mac Malone (Jul 28 2023 at 00:28):

@Eric Wieser while this may be a factor, Lean does memory map oleans and share them across executions, so that should significantly decrease the amount of drive I/O that occurs.

Mario Carneiro (Jul 28 2023 at 00:47):

not compared to lean 3 because lean 3 didn't use interprocess memory in the first place

Mario Carneiro (Jul 28 2023 at 00:47):

lean 3 was its own build system so everything was running in the same process


Last updated: Dec 20 2023 at 11:08 UTC