Zulip Chat Archive

Stream: general

Topic: Reducing executable size


James Sully (Aug 03 2024 at 10:50):

I'm writing a program called sand (https://github.com/sullyj3/sand), a systemd service for countdown timers.
The executable is colossal, it weighs 89mb. At first I thought this was just the current state of lean 4, but I just noticed that lake is only 13mb. What could cause such a blow-up? How should I go about diagnosing it?

Henrik Böving (Aug 03 2024 at 14:23):

The first thing that you can always do is to just call strip on the binary which will get rid of a whole bunch of stuff. Apart from that you'll mostly have to analyze what exactly is going on in your binary

Note that lake is of course not 13 MB. you most likely just found the wrappers around the binaries int he elan folder:

total 88396
drwxr-xr-x. 1 nix nix       86 Apr  4 19:50 .
drwxr-xr-x. 1 nix nix       90 Jan  9  2024 ..
-rwxr-xr-x. 7 nix nix 12927160 Apr  4 19:49 elan
-rwxr-xr-x. 7 nix nix 12927160 Apr  4 19:49 lake
-rwxr-xr-x. 7 nix nix 12927160 Apr  4 19:49 lean
-rwxr-xr-x. 7 nix nix 12927160 Apr  4 19:49 leanc
-rwxr-xr-x. 7 nix nix 12927160 Apr  4 19:49 leanchecker
-rwxr-xr-x. 7 nix nix 12927160 Apr  4 19:49 leanmake
-rwxr-xr-x. 7 nix nix 12927160 Apr  4 19:49 leanpkg

note that they all have the same size.

The real lake binary that you'll find is much smaller:

λ du -sh lake
8.1M    lake

but also links against Lean through shared objects:

λ ldd lake
    linux-vdso.so.1 (0x00007f932fe26000)
    libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f932fa00000)
    libInit_shared.so => /home/nix/Desktop/formal_verification/lean/lean4/build/release/stage1/bin/./../lib/lean/libInit_shared.so (0x00007f932fdff000)
    libleanshared.so => /home/nix/Desktop/formal_verification/lean/lean4/build/release/stage1/bin/./../lib/lean/libleanshared.so (0x00007f932a000000)
    libgmp.so.10 => /lib64/libgmp.so.10 (0x00007f932fd5b000)
    libm.so.6 => /lib64/libm.so.6 (0x00007f932fc77000)
    libc.so.6 => /lib64/libc.so.6 (0x00007f9329e0f000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f932fe28000)
    libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f932f9d2000)

so a static variant would be much chonkier as well.

James Sully (Aug 03 2024 at 15:48):

Yeah, the 89mb is actually after stripping, before is 91

James Sully (Aug 03 2024 at 15:52):

It looks like my exe doesn't dynamically link to lean

 ldd ./.lake/build/bin/sand
        linux-vdso.so.1 (0x000072f269caa000)
        libm.so.6 => /usr/lib/libm.so.6 (0x000072f265315000)
        libdl.so.2 => /usr/lib/libdl.so.2 (0x000072f269c66000)
        libpthread.so.0 => /usr/lib/libpthread.so.0 (0x000072f269c61000)
        libc.so.6 => /usr/lib/libc.so.6 (0x000072f265129000)
        librt.so.1 => /usr/lib/librt.so.1 (0x000072f269c5c000)
        /lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x000072f269cac000)

Could that make up the difference?

James Sully (Aug 03 2024 at 15:59):

Oh I think batteries is the culprit

James Sully (Aug 03 2024 at 16:00):

I might have to ditch it, I'm only using hashmap

Mario Carneiro (Aug 03 2024 at 16:03):

This seems very suspicious to me, batteries is very small compared to lean core

James Sully (Aug 03 2024 at 16:08):

I'm basing that on the fact that when I build this commit, the executable is 84mb, but when I remove import Batteries from Main.lean it drops to 5.8mb

James Sully (Aug 03 2024 at 16:10):

it does seem surprising that batteries could be that big. I assume something must be going wrong

Sebastian Ullrich (Aug 03 2024 at 16:22):

You're probably including a big chunk of the Lean library

James Sully (Aug 03 2024 at 16:26):

Sebastian Ullrich said:

You're probably including a big chunk of the Lean library

Do you mean transitively via batteries?

Mario Carneiro (Aug 03 2024 at 16:27):

batteries uses minimal imports (of lean), but perhaps you should be as well instead of doing import Batteries

James Sully (Aug 03 2024 at 16:29):

Ah, ok, I guess I should just import Batteries.Data.HashMap.Basic

James Sully (Aug 03 2024 at 16:30):

I guess I just blithely assumed I'd only end up with the symbols I used from batteries in the executable

James Sully (Aug 03 2024 at 16:34):

hm, that doesn't seem to fix it.

Henrik Böving (Aug 03 2024 at 16:35):

If you are on a nightly or wait a couple of days you'll also be able to use Lean's built-in Std.HashMap and drop your batteries dependency (if that's all you need from batteries)

James Sully (Aug 03 2024 at 16:36):

It is, yeah.

James Sully (Aug 03 2024 at 16:36):

I look forward to that, thanks for the heads up

James Sully (Aug 03 2024 at 16:39):

I made a minimal repo illustrating the blow up if that's helpful. I'm still not totally clear on whether this is expected (even y'know, for now)
https://github.com/sullyj3/big_bin_repro_repo

James Sully (Aug 04 2024 at 13:45):

I've been on a quest to eliminate import Lean from my transitive dependencies, but I've just realised

Socket.lean -> Alloy.C -> Lean

The binaries from socket and alloy exhibit this large file size.

I assume I'm out of luck on this front until module importing works more parsimoniously, so I'm going to put this project on hold for now

Eric Wieser (Aug 04 2024 at 14:22):

Isn't this fixable by reducing the imports in Alloy.C?

James Sully (Aug 04 2024 at 14:42):

I'm not sure. I would assume it's already importing only what's needed. @Mac Malone ?

Eric Wieser (Aug 04 2024 at 15:33):

Looking at the most recent source, Alloy.C does not import Lean

James Sully (Aug 04 2024 at 15:41):

Right, I'm not talking about the entirety of Lean

If you lake build in lean4-alloy/examples/my_add, the binary is 83mb. I can't see how that can be anything other than the imports of Alloy.C, namely various submodules of Lean

As far as I can tell most imports of Lean.<Anything> pull in just a whole bunch of stuff

James Sully (Aug 04 2024 at 15:46):

perhaps I should have said something like "I've been on a quest to eliminate import Lean.*" to be clearer

James Sully (Aug 04 2024 at 16:01):

The vague hazy impression I have in my head right now is:

  • currently when importing a module, lean includes everything in the module, regardless of whether it's used, as well as the full contents of all modules it transitively imports. not much in the way of dead code elimination is done.
  • because of this, importing can quickly balloon executable sizes, even when doing few imports from your perspective.
  • this is due to the early state of lean4
  • this will presumably be addressed at some point in the future, but it's not a priority right now

do I have this right? please correct me if I'm wrong

Eric Wieser (Aug 04 2024 at 16:04):

Does your binary need to invoke the interpreter? If not, then you could probably use some form of link-time optimization to removed all the unused part of the binary

Eric Wieser (Aug 04 2024 at 16:04):

If you do need the interpreter, then you're in trouble because the code you execute could ask for arbitrary symbols from within the Lean shared library

James Sully (Aug 04 2024 at 16:07):

I don't, no. Has anyone ever done link time optimisation with lake, or is this more of a spitball idea? I don't really have the sophistication to be blazing trails here

James Sully (Aug 04 2024 at 16:10):

I mean, unless alloy invokes the interpreter. I don't know what the interpreter is to be honest

Sebastian Ullrich (Aug 04 2024 at 16:18):

Because of technical limitations, I believe it actually is the case right now that if you depend on any module in Lean, all Lean code will be linked into your binary. This isn't something you can avoid by setting link flags.

Mario Carneiro (Aug 04 2024 at 16:24):

I'm pretty sure lean does do basic trimming of unused things, this comes for free when using a C linker. However, the initialize_Module functions are always called if you import a module, even if you don't call anything in it, and the initializers themselves can pull in large parts of lean

James Sully (Aug 04 2024 at 16:35):

Sebastian Ullrich said:

Because of technical limitations, I believe it actually is the case right now that if you depend on any module in Lean, all Lean code will be linked into your binary. This isn't something you can avoid by setting link flags.

Thanks, I'm glad to have that confirmed

James Sully (Aug 04 2024 at 16:41):

I'm using ToJson/FromJson from Lean as well

Mac Malone (Aug 04 2024 at 17:32):

James Sully said:

If you lake build in lean4-alloy/examples/my_add, the binary is 83mb. I can't see how that can be anything other than the imports of Alloy.C, namely various submodules of Lean

Unfortunately, Alloy defines a custom DSL with custom elaborators and interfaces with the Lean compiler and server. Thus, anything depending on Alloy will transitively depend on most of Lean. For current Lean, binary bloat is a problem for anything that does metaprogramming. This is planned to be resolved in the future with the module systesm, but that is still in the planning stage (as far as I understand).

James Sully (Aug 05 2024 at 01:59):

Gotcha, thanks


Last updated: May 02 2025 at 03:31 UTC