Zulip Chat Archive
Stream: lean4
Topic: Precompile specific modules
Mario Carneiro (Nov 02 2022 at 08:58):
Right now, none of mathlib is using the precompileModules
flag. How do I enable it? More importantly, we don't want all modules to be precompiled, only those with performance sensitive tactics in them. Can I make a list of modules that I want to have precompiled in the lakefile?
Sebastian Ullrich (Nov 02 2022 at 09:08):
Selective precompilation gets pretty complicated when you skip some modules but then downstream modules (e.g. from someone depending on mathlib) want to get precompiled after all; http://leanprover.github.io/talks/WITS2022.pdf#page=23. It would be great to first quantify the overhead of precompiling everything, especially for modules that don't contain all that much computable code anyway. Though the fact that on Windows we have to pass the entire closure of DLLs in the command line does make scaling a bit more difficult.
Mario Carneiro (Nov 02 2022 at 09:09):
In that case, let's assume that the set of precompiled modules is downward-closed. It shouldn't be too hard to automate that in lake given some set of desired modules
Sebastian Ullrich (Nov 02 2022 at 09:10):
Isn't basically all of mathlib in the downstream of important tactics?
Mario Carneiro (Nov 02 2022 at 09:10):
er, upward-closed
Mario Carneiro (Nov 02 2022 at 09:11):
that is, only the tactic files and their dependencies would get precompiled, not dependents
Sebastian Ullrich (Nov 02 2022 at 09:12):
Yeah, it shouldn't be too hard to implement that
Gabriel Ebner (Nov 02 2022 at 19:02):
Note that the current precompilation logic in Lake has some big performance and functionality issues (https://github.com/leanprover/lake/pull/132#issuecomment-1286372892), and I'm currently looking into fixing that.
Mario Carneiro (Nov 16 2022 at 01:00):
I have a much simpler version of this problem. How can I precompile mathlib completely in a test file? I want to profile compiled tactics but I have no idea how to compile any tactics in std or mathlib4.
Wojciech Nawrocki (Nov 16 2022 at 01:10):
I'm not quite sure if this is what you're asking but here is something to load in the compiled version of a dependency when running tests. You may need to run lake build DepLib:sharedLib
first.
Wojciech Nawrocki (Nov 16 2022 at 01:11):
I guess the tldr almost-lake-less variant is lake env lean --load-dynlib=<somelib.so>
Mario Carneiro (Nov 16 2022 at 01:12):
Aha, I also found something like that. It runs, hard to say if it's working
Mario Carneiro (Nov 16 2022 at 01:52):
Help! I'm not having any success with perf-profiling tactics even after all these steps, there are no functions from mathlib showing up in the trace. I tried modifying the mathlib lakefile to this:
import Lake
open Lake DSL
package mathlib where
precompileModules := true
@[default_target]
lean_lib Mathlib where
moreLeanArgs := #["-DwarningAsError=true"]
moreLeancArgs := #["-UNDEBUG", "-Og", "-ggdb", "-g3", "-fno-omit-frame-pointer"]
moreLinkArgs := #[
"-fuse-ld=ld",
"-L/home/mario/.elan/toolchains/leanprover--lean4---nightly-2022-11-14/lib",
"-Lbuild/lib"
]
@[default_target]
lean_exe runLinter where
root := `scripts.runLinter
supportInterpreter := true
meta if get_config? doc = some "on" then -- do not download and build doc-gen4 by default
require «doc-gen4» from git "https://github.com/leanprover/doc-gen4" @ "main"
require std from git "https://github.com/leanprover/std4" @ "main"
require Qq from git "https://github.com/gebner/quote4" @ "master"
and then
lake env lean --load-dynlib ./build/lib/libMathlib-Tactic-Ring.so test/ring.lean
takes 1.7 s while
lake env lean test/ring.lean
takes 1.9 s, which is a performance difference small enough that it might be noise; and perf record --call-graph dwarf lake ...
records only stack frames from leanlibshared.so
.
Gabriel Ebner (Nov 16 2022 at 02:39):
I couldn't get perf report
to work either... hotspot
works just fine though. I only had to add precompiledModules := true
to the lakefile and then ran perf
on your example command.
Gabriel Ebner (Nov 16 2022 at 02:46):
The perf difference is so small because around 60% of the runtime is spent inside lean_is_expr_def_eq
(a large part of that seems to be in the code generated by ~q(...)
). Also really surprising: 23% of the time is spent in elaborating the statements.
Gabriel Ebner (Nov 16 2022 at 02:46):
(The 60% and 23% are probably not disjoint.)
Gabriel Ebner (Nov 16 2022 at 02:47):
Another cost center: 10% of the time is spent processing the imports.
Gabriel Ebner (Nov 16 2022 at 02:55):
Another exciting discovery: Simp.tryTheoremWithExtraArgs?
takes 25% of the runtime.
Mario Carneiro (Nov 16 2022 at 03:03):
What does the performance look like for a more extreme example like (1 + x) ^ 50 = (x + 1) ^ 50
for a sufficiently large value of 50
? Ideally it should not spend much time in ~q
and isDefEq
since that's only used during reification
Mario Carneiro (Nov 16 2022 at 03:04):
ran
perf
on your example command.
ran perf how? I used perf record --call-graph dwarf lake env lean ...
and then perf script > test.perf
, is that what you mean?
Wojciech Nawrocki (Nov 16 2022 at 03:08):
@Mario Carneiro it works for me with lake env perf record --call-graph dwarf,60000 lean --load-dynlib=build/lib/libMathlib-Tactic-Ring.so test/ring.lean
which avoids profiling lake itself.
Gabriel Ebner (Nov 16 2022 at 03:08):
lake env perf record --call-graph=dwarf,60000 lean --load-dynlib=./build/lib/libMathlib-Tactic-Ring.so test/ring2.lean
hotspot
Gabriel Ebner (Nov 16 2022 at 03:09):
The n=100 result is interesting. We now spend 17% of the time in Lean.getMaxHeight
.
Mario Carneiro (Nov 16 2022 at 03:10):
is compiling making a bigger difference?
Gabriel Ebner (Nov 16 2022 at 03:12):
6.5s vs 6.1s
Gabriel Ebner (Nov 16 2022 at 03:12):
But we're breaking the FoldConsts
cache here.
Mario Carneiro (Nov 16 2022 at 03:13):
oh, is it because the cache size is only 8192?
Gabriel Ebner (Nov 16 2022 at 03:13):
We're also breaking the Expr.find?
cache.
Gabriel Ebner (Nov 16 2022 at 03:14):
oh, is it because the cache size is only 8192?
Effectively it's even less because we take the pointer address modulo 8192 as the key. I think you can see the issue here.
Mario Carneiro (Nov 16 2022 at 03:15):
oof
Gabriel Ebner (Nov 16 2022 at 03:15):
For Expr.replace I added a >>> 4
and it's a much smaller issue.
Mario Carneiro (Nov 16 2022 at 03:15):
Maybe pick a prime number cache size? Or is the mod too much
Gabriel Ebner (Nov 16 2022 at 03:17):
Picking an odd cache size would already be an improvement I think (because then the modulo doesn't align perfectly with the pointer alignment..).
Gabriel Ebner (Nov 16 2022 at 03:17):
Or maybe just hash the pointer.
Gabriel Ebner (Nov 16 2022 at 03:18):
It would also be interesting to see whether different proof terms for ring
would be helpful here. (I always like nested lets.)
Mario Carneiro (Nov 16 2022 at 03:19):
I don't think ring
has a lot of pointer duplication in its proofs unless it is 'discovered' by ShareCommon
Mario Carneiro (Nov 16 2022 at 03:20):
except in the expression terms
Mario Carneiro (Nov 16 2022 at 03:21):
Can the typechecker check a let
without doing a full expression traversal to replace the value everywhere? It seems like that would defeat the optimization
Mario Carneiro (Nov 16 2022 at 03:22):
BTW if you are in the mood I also would like a performance gut check on the LinearCombination
PR. It was noticeably slow which got me on the topic
Gabriel Ebner (Nov 16 2022 at 03:23):
Ah I see where you're coming from. In Lean 3 the lets were substituted in the kernel iirc. But in Lean 4 the type checker adds let-bindings to the lctx now.
Gabriel Ebner (Nov 16 2022 at 03:23):
Mario Carneiro said:
BTW if you are in the mood I also would like a performance gut check on the
LinearCombination
PR. It was noticeably slow which got me on the topic
I think you should try installing hotspot
.
Mario Carneiro (Nov 16 2022 at 03:24):
I did, trying to replicate your result now
Mario Carneiro (Nov 16 2022 at 03:27):
Gabriel Ebner said:
Ah I see where you're coming from. In Lean 3 the lets were substituted in the kernel iirc. But in Lean 4 the type checker adds let-bindings to the lctx now.
Doesn't it still have to traverse the expression to replace the bvars with fvars?
Mario Carneiro (Nov 16 2022 at 03:30):
Oh hey, it works :tada: , mathlib functions are showing up now
Gabriel Ebner (Nov 16 2022 at 04:10):
Oh yeah the bvar replacement overhead is still there ofc. But this should be only linear.
Mario Carneiro (Nov 16 2022 at 04:24):
only if all the lets are in a block, right? If there was some arbitrary pattern it would be quadratic
Last updated: Dec 20 2023 at 11:08 UTC