Zulip Chat Archive
Stream: general
Topic: Reducing executable size
James Sully (Aug 03 2024 at 10:50):
I'm writing a program called sand (https://github.com/sullyj3/sand), a systemd service for countdown timers.
The executable is colossal, it weighs 89mb. At first I thought this was just the current state of lean 4, but I just noticed that lake is only 13mb. What could cause such a blow-up? How should I go about diagnosing it?
Henrik Böving (Aug 03 2024 at 14:23):
The first thing that you can always do is to just call strip
on the binary which will get rid of a whole bunch of stuff. Apart from that you'll mostly have to analyze what exactly is going on in your binary
Note that lake
is of course not 13 MB. you most likely just found the wrappers around the binaries int he elan folder:
total 88396
drwxr-xr-x. 1 nix nix 86 Apr 4 19:50 .
drwxr-xr-x. 1 nix nix 90 Jan 9 2024 ..
-rwxr-xr-x. 7 nix nix 12927160 Apr 4 19:49 elan
-rwxr-xr-x. 7 nix nix 12927160 Apr 4 19:49 lake
-rwxr-xr-x. 7 nix nix 12927160 Apr 4 19:49 lean
-rwxr-xr-x. 7 nix nix 12927160 Apr 4 19:49 leanc
-rwxr-xr-x. 7 nix nix 12927160 Apr 4 19:49 leanchecker
-rwxr-xr-x. 7 nix nix 12927160 Apr 4 19:49 leanmake
-rwxr-xr-x. 7 nix nix 12927160 Apr 4 19:49 leanpkg
note that they all have the same size.
The real lake binary that you'll find is much smaller:
λ du -sh lake
8.1M lake
but also links against Lean through shared objects:
λ ldd lake
linux-vdso.so.1 (0x00007f932fe26000)
libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f932fa00000)
libInit_shared.so => /home/nix/Desktop/formal_verification/lean/lean4/build/release/stage1/bin/./../lib/lean/libInit_shared.so (0x00007f932fdff000)
libleanshared.so => /home/nix/Desktop/formal_verification/lean/lean4/build/release/stage1/bin/./../lib/lean/libleanshared.so (0x00007f932a000000)
libgmp.so.10 => /lib64/libgmp.so.10 (0x00007f932fd5b000)
libm.so.6 => /lib64/libm.so.6 (0x00007f932fc77000)
libc.so.6 => /lib64/libc.so.6 (0x00007f9329e0f000)
/lib64/ld-linux-x86-64.so.2 (0x00007f932fe28000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f932f9d2000)
so a static variant would be much chonkier as well.
James Sully (Aug 03 2024 at 15:48):
Yeah, the 89mb is actually after stripping, before is 91
James Sully (Aug 03 2024 at 15:52):
It looks like my exe doesn't dynamically link to lean
▷ ldd ./.lake/build/bin/sand
linux-vdso.so.1 (0x000072f269caa000)
libm.so.6 => /usr/lib/libm.so.6 (0x000072f265315000)
libdl.so.2 => /usr/lib/libdl.so.2 (0x000072f269c66000)
libpthread.so.0 => /usr/lib/libpthread.so.0 (0x000072f269c61000)
libc.so.6 => /usr/lib/libc.so.6 (0x000072f265129000)
librt.so.1 => /usr/lib/librt.so.1 (0x000072f269c5c000)
/lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x000072f269cac000)
Could that make up the difference?
James Sully (Aug 03 2024 at 15:59):
Oh I think batteries is the culprit
James Sully (Aug 03 2024 at 16:00):
I might have to ditch it, I'm only using hashmap
Mario Carneiro (Aug 03 2024 at 16:03):
This seems very suspicious to me, batteries is very small compared to lean core
James Sully (Aug 03 2024 at 16:08):
I'm basing that on the fact that when I build this commit, the executable is 84mb, but when I remove import Batteries
from Main.lean
it drops to 5.8mb
James Sully (Aug 03 2024 at 16:10):
it does seem surprising that batteries could be that big. I assume something must be going wrong
Sebastian Ullrich (Aug 03 2024 at 16:22):
You're probably including a big chunk of the Lean
library
James Sully (Aug 03 2024 at 16:26):
Sebastian Ullrich said:
You're probably including a big chunk of the
Lean
library
Do you mean transitively via batteries?
Mario Carneiro (Aug 03 2024 at 16:27):
batteries uses minimal imports (of lean), but perhaps you should be as well instead of doing import Batteries
James Sully (Aug 03 2024 at 16:29):
Ah, ok, I guess I should just import Batteries.Data.HashMap.Basic
James Sully (Aug 03 2024 at 16:30):
I guess I just blithely assumed I'd only end up with the symbols I used from batteries in the executable
James Sully (Aug 03 2024 at 16:34):
hm, that doesn't seem to fix it.
Henrik Böving (Aug 03 2024 at 16:35):
If you are on a nightly or wait a couple of days you'll also be able to use Lean's built-in Std.HashMap
and drop your batteries dependency (if that's all you need from batteries)
James Sully (Aug 03 2024 at 16:36):
It is, yeah.
James Sully (Aug 03 2024 at 16:36):
I look forward to that, thanks for the heads up
James Sully (Aug 03 2024 at 16:39):
I made a minimal repo illustrating the blow up if that's helpful. I'm still not totally clear on whether this is expected (even y'know, for now)
https://github.com/sullyj3/big_bin_repro_repo
James Sully (Aug 04 2024 at 13:45):
I've been on a quest to eliminate import Lean
from my transitive dependencies, but I've just realised
Socket.lean
-> Alloy.C
-> Lean
The binaries from socket and alloy exhibit this large file size.
I assume I'm out of luck on this front until module importing works more parsimoniously, so I'm going to put this project on hold for now
Eric Wieser (Aug 04 2024 at 14:22):
Isn't this fixable by reducing the imports in Alloy.C?
James Sully (Aug 04 2024 at 14:42):
I'm not sure. I would assume it's already importing only what's needed. @Mac Malone ?
Eric Wieser (Aug 04 2024 at 15:33):
Looking at the most recent source, Alloy.C
does not import Lean
James Sully (Aug 04 2024 at 15:41):
Right, I'm not talking about the entirety of Lean
If you lake build
in lean4-alloy/examples/my_add
, the binary is 83mb. I can't see how that can be anything other than the imports of Alloy.C
, namely various submodules of Lean
As far as I can tell most imports of Lean.<Anything> pull in just a whole bunch of stuff
James Sully (Aug 04 2024 at 15:46):
perhaps I should have said something like "I've been on a quest to eliminate import Lean.*
" to be clearer
James Sully (Aug 04 2024 at 16:01):
The vague hazy impression I have in my head right now is:
- currently when importing a module, lean includes everything in the module, regardless of whether it's used, as well as the full contents of all modules it transitively imports. not much in the way of dead code elimination is done.
- because of this, importing can quickly balloon executable sizes, even when doing few imports from your perspective.
- this is due to the early state of lean4
- this will presumably be addressed at some point in the future, but it's not a priority right now
do I have this right? please correct me if I'm wrong
Eric Wieser (Aug 04 2024 at 16:04):
Does your binary need to invoke the interpreter? If not, then you could probably use some form of link-time optimization to removed all the unused part of the binary
Eric Wieser (Aug 04 2024 at 16:04):
If you do need the interpreter, then you're in trouble because the code you execute could ask for arbitrary symbols from within the Lean shared library
James Sully (Aug 04 2024 at 16:07):
I don't, no. Has anyone ever done link time optimisation with lake, or is this more of a spitball idea? I don't really have the sophistication to be blazing trails here
James Sully (Aug 04 2024 at 16:10):
I mean, unless alloy invokes the interpreter. I don't know what the interpreter is to be honest
Sebastian Ullrich (Aug 04 2024 at 16:18):
Because of technical limitations, I believe it actually is the case right now that if you depend on any module in Lean
, all Lean
code will be linked into your binary. This isn't something you can avoid by setting link flags.
Mario Carneiro (Aug 04 2024 at 16:24):
I'm pretty sure lean does do basic trimming of unused things, this comes for free when using a C linker. However, the initialize_Module
functions are always called if you import a module, even if you don't call anything in it, and the initializers themselves can pull in large parts of lean
James Sully (Aug 04 2024 at 16:35):
Sebastian Ullrich said:
Because of technical limitations, I believe it actually is the case right now that if you depend on any module in
Lean
, allLean
code will be linked into your binary. This isn't something you can avoid by setting link flags.
Thanks, I'm glad to have that confirmed
James Sully (Aug 04 2024 at 16:41):
I'm using ToJson
/FromJson
from Lean
as well
Mac Malone (Aug 04 2024 at 17:32):
James Sully said:
If you
lake build
inlean4-alloy/examples/my_add
, the binary is 83mb. I can't see how that can be anything other than the imports ofAlloy.C
, namely various submodules ofLean
Unfortunately, Alloy defines a custom DSL with custom elaborators and interfaces with the Lean compiler and server. Thus, anything depending on Alloy will transitively depend on most of Lean. For current Lean, binary bloat is a problem for anything that does metaprogramming. This is planned to be resolved in the future with the module systesm, but that is still in the planning stage (as far as I understand).
James Sully (Aug 05 2024 at 01:59):
Gotcha, thanks
Last updated: May 02 2025 at 03:31 UTC