Zulip Chat Archive
Stream: lean4
Topic: Stack trace profiling
Mario Carneiro (Feb 07 2022 at 17:16):
Has anyone had success using perf
to get a meaningful profile of a lean 4 application? I've been trying to run mathport in perf, but the stack trace seems to break down very often, with only one or two functions before hitting some [unknown] ([unknown])
, and it's rare to see any functions at all defined in the Mathport
package in the call stack.
Steps:
- add
moreLeancArgs := #["-UNDEBUG", "-Og", "-ggdb", "-g3", "-fno-omit-frame-pointer"]
to lakefile.lean sudo sh -c "echo 0 > /proc/sys/kernel/kptr_restrict"
sudo sh -c "echo -1 > /proc/sys/kernel/perf_event_paranoid"
perf record -g --call-graph fp -e cycles -F 999 -- cmd args...
perf script -F +pid > test.perf
- View in profiler.firefox.com
The perf file looks like this:
mathport 2429065/2429071 1263112.249188: 4429237 cycles:
1b5f5c7 lean_expr_loose_bvar_range+0x77 (/home/mario/Documents/lean/mathport/build/bin/mathport)
mathport 2429065/2429071 1263112.250253: 4413503 cycles:
1b61131 l_Lean_mkLambda+0x111 (/home/mario/Documents/lean/mathport/build/bin/mathport)
mathport 2429065/2429071 1263112.251281: 4380424 cycles:
19d775d lean::hash+0xd (/home/mario/Documents/lean/mathport/build/bin/mathport)
d000007fdb84006b [unknown] ([unknown])
mathport 2429065/2429071 1263112.252260: 4365599 cycles:
1b5ac94 l_Lean_Expr_mkData+0x134 (/home/mario/Documents/lean/mathport/build/bin/mathport)
mathport 2429065/2429071 1263112.253293: 4377971 cycles:
19dc2fc lean::replace_rec_fn::apply+0xfc (/home/mario/Documents/lean/mathport/build/bin/mathport)
mathport 2429065/2429071 1263112.254344: 4350249 cycles:
19d7d3e lean::mk_lambda+0xe (/home/mario/Documents/lean/mathport/build/bin/mathport)
mathport 2429065/2429071 1263112.255319: 4324505 cycles:
399d527 lean_apply_2+0xc7 (/home/mario/Documents/lean/mathport/build/bin/mathport)
24af839 l_Lean_Meta_mkLambdaFVars+0x279 (/home/mario/Documents/lean/mathport/build/bin/mathport)
1 [unknown] ([unknown])
Gabriel Ebner (Feb 07 2022 at 17:21):
I usually do --call-graph dwarf,65528
which records more of the stack.
Sebastian Ullrich (Feb 07 2022 at 17:21):
-fno-omit-fp
probably doesn't change much unless you also recompile Lean with it. But then you may as well download the Lean 4 "Linux" (not "Linux release") artifact, which is not linked using lld and thus should work with perf --call-graph dwarf
: https://github.com/llvm/llvm-project/issues/53156
Leonardo de Moura (Feb 07 2022 at 17:22):
perf works great for me. Here is an example:
perf record --call-graph dwarf build/release/stage1/bin/lean TBA/While/ConstantPropagation.lean
hotspot
Mario Carneiro (Feb 07 2022 at 17:23):
do you have to build lean with something other than the default -O3 -DNDEBUG
?
Sebastian Ullrich (Feb 07 2022 at 17:23):
No, if your default linker is not lld
Mario Carneiro (Feb 07 2022 at 17:24):
:thinking: what's my default linker?
Sebastian Ullrich (Feb 07 2022 at 17:24):
Most probably not lld :) . ld -v
.
Mario Carneiro (Feb 07 2022 at 17:25):
GNU ld (GNU Binutils for Ubuntu) 2.37
Sebastian Ullrich (Feb 07 2022 at 17:29):
And the output from above is with a self-compiled Lean?
Mario Carneiro (Feb 07 2022 at 17:30):
no, I'm on nightly
Mario Carneiro (Feb 07 2022 at 17:30):
I haven't tried your suggestion of recompiling lean yet
Gabriel Ebner (Feb 07 2022 at 17:31):
But then you may as well download the Lean 4 "Linux" (not "Linux release") artifact
--call-graph dwarf
works for me with the nightly build (though ld is used for linking mathport since I'm on nixos).
Mario Carneiro (Feb 07 2022 at 17:32):
gabriel's suggestion seems to produce some very large files, I get these Check IO/CPU overload!
warnings and some self test stuff seems to fail
Gabriel Ebner (Feb 07 2022 at 17:32):
Yeah, the files are larger because it captures more stack information. The overload warnings are harmless in my experience.
Gabriel Ebner (Feb 07 2022 at 17:33):
(@Sebastian Ullrich BTW, thanks for the perf report | less
tip. The dumb output is really much better than the TUI which gets shown by default.)
Mario Carneiro (Feb 07 2022 at 17:41):
hm, --call-graph dwarf,65528
didn't help
Gabriel Ebner (Feb 07 2022 at 17:44):
Do you get unknown in perf report | less
as well? (I don't get any here.)
Gabriel Ebner (Feb 07 2022 at 17:45):
IIRC you can switch the linker by doing export LEAN_CC=gcc
.
Mario Carneiro (Feb 07 2022 at 17:45):
Here's perf report (trimmed in kernel calls):
10.31% 10.27% mathport mathport [.] lean_dec_ref_cold
|
---lean_dec_ref_cold
6.45% 6.44% mathport mathport [.] lean::replace_rec_fn::apply
|
---lean::replace_rec_fn::apply
6.11% 0.01% mathport [kernel.kallsyms] [k] asm_exc_page_fault
|
--6.10%--asm_exc_page_fault
|
--5.66%--exc_page_fault
5.66% 0.01% mathport [kernel.kallsyms] [k] exc_page_fault
|
--5.65%--exc_page_fault
|
--5.61%--do_user_addr_fault
5.61% 0.02% mathport [kernel.kallsyms] [k] do_user_addr_fault
|
--5.59%--do_user_addr_fault
|
--5.36%--handle_mm_fault
5.36% 0.05% mathport [kernel.kallsyms] [k] handle_mm_fault
|
--5.31%--handle_mm_fault
5.28% 0.08% mathport [kernel.kallsyms] [k] __handle_mm_fault
|
--5.20%--__handle_mm_fault
5.24% 5.23% mathport mathport [.] l_Lean_mkApp
|
---l_Lean_mkApp
5.18% 0.02% mathport [kernel.kallsyms] [k] handle_pte_fault
|
--5.16%--handle_pte_fault
5.00% 4.98% mathport mathport [.] lean_free_small
|
---lean_free_small
4.77% 4.73% mathport mathport [.] lean_alloc_small
|
---lean_alloc_small
4.68% 0.01% mathport [kernel.kallsyms] [k] do_fault
|
--4.68%--do_fault
4.68% 0.00% mathport [kernel.kallsyms] [k] do_read_fault
|
--4.67%--do_read_fault
4.14% 0.06% mathport [kernel.kallsyms] [k] page_cache_ra_unbounded
|
--4.08%--page_cache_ra_unbounded
3.91% 0.00% mathport [kernel.kallsyms] [k] __do_fault
|
--3.91%--__do_fault
ext4_filemap_fault
3.91% 0.02% mathport [kernel.kallsyms] [k] ext4_filemap_fault
|
--3.89%--ext4_filemap_fault
3.88% 0.04% mathport [kernel.kallsyms] [k] filemap_fault
|
--3.84%--filemap_fault
3.18% 0.00% mathport [kernel.kallsyms] [k] do_page_cache_ra
|
--3.18%--do_page_cache_ra
3.01% 2.78% mathport mathport [.] lean_name_eq
|
--3.00%--lean_name_eq
2.72% 2.72% mathport mathport [.] lean::replace_cache::insert
|
---lean::replace_cache::insert
Gabriel Ebner (Feb 07 2022 at 17:46):
89.44% 0.72% mathport mathport [.] lean_apply_1
|
--88.72%--lean_apply_1
|
--87.41%--lean::lean_io_as_task_fn
lean_apply_1
l_EIO_toBaseIO___rarg
lean_apply_1
|
--87.41%--l_Mathport_mathport1
l_Mathport_mathport1___lambda__3
l_Lean_withImportModules___rarg
|
|--60.33%--lean_import_modules
| l_Lean_profileitIOUnsafe___rarg
| lean_profileit
| lean_apply_1
| l_Lean_profileitIOUnsafe___rarg___lambda__1___boxed
| l_Lean_profileitIOUnsafe___rarg___lambda__1
| lean_apply_1
| l_Lean_withImporting___rarg
| lean_apply_1
| |
| |--56.98%--lean_apply_2
| | l_Lean_importModules___lambda__3
| | |
| | |--43.81%--l___private_Lean_Environment_0__Lean_finalizePersistentExtensions_loop
| | | lean_apply_3
| | | |
| | | |--39.13%--l_Lean_registerSimplePersistentEnvExtension___rarg___lambda__1___boxed
| | | | lean_apply_1
| | | | |
| | | | |--31.26%--l_Lean_mkStateFromImportedEntries___at_Mathport_initFn____x40_Mathport_Bridge_Rename___hyg_61____spec__2
| | | | | l_Array_foldlMUnsafe_fold___at_Mathport_initFn____x40_Mathport_Bridge_Rename___hyg_61____spec__4
| | | | | l_Array_foldlMUnsafe_fold___at_Mathport_initFn____x40_Mathport_Bridge_Rename___hyg_61____spec__3
Mario Carneiro (Feb 07 2022 at 17:46):
yeah, definitely not getting that
Mario Carneiro (Feb 07 2022 at 17:47):
I did get something close to that when profiling in gdb though
Mario Carneiro (Feb 07 2022 at 17:48):
there is a call to a profiling command lean_profileit
in there? Did you do something on the lean side?
Gabriel Ebner (Feb 07 2022 at 17:49):
That's interesting. I didn't fiddle with the compiler options at all. This is completely vanilla mathport master. I've only set the LEAN_CC=cc
environment variable. And the --call-graph dwarf,65528
option.
Gabriel Ebner (Feb 07 2022 at 17:50):
Mario Carneiro said:
there is a call to a profiling command
lean_profileit
in there? Did you do something on the lean side?
Mario Carneiro (Feb 07 2022 at 17:50):
I'm trying that now
Mario Carneiro (Feb 07 2022 at 17:52):
oh, the linker failed
error: /usr/bin/ld: cannot find -lc++
/usr/bin/ld: cannot find -lc++abi
collect2: error: ld returned 1 exit status
Gabriel Ebner (Feb 07 2022 at 17:54):
Oh, the nixpkgs wrapper (shell script around leanc) also adds an -L
flag. Not sure if there's an easy way to simulate that elsewhere (without replacing leanc
by a shell script).
#! /nix/store/a54wrar1jym1d8yvlijq0l2gghmy8szz-bash-5.1-p12/bin/bash
LEAN_CC="${LEAN_CC:-/nix/store/xiq6j4jsyj351p8q3yw9cg1hdqp9m685-gcc-wrapper-10.3.0/bin/cc}" exec -a "$0" /home/gebner/.elan/toolchains/leanprover--lean4---nightly-2022-01-27/bin/leanc.orig "$@" -L /home/gebner/.elan/toolchains/leanprover--lean4---nightly-2022-01-27/lib # use bundled libraries, but not bundled compiler that doesn't know about NIX_LDFLAGS
Mario Carneiro (Feb 07 2022 at 18:00):
:tada: I added "-L/home/mario/.elan/toolchains/leanprover--lean4---nightly-2022-02-06/lib"
to lake's moreLinkArgs
and it worked and produced sensible perf output
Sebastian Ullrich (Feb 07 2022 at 18:03):
I'd really love to know how lld manages to screw this up... and why nobody else seems to mind
Sebastian Ullrich (Feb 07 2022 at 18:04):
-fuse-ld=ld
in the link args should also work btw. Maybe it needs a full path.
Wojciech Nawrocki (Nov 02 2022 at 16:55):
I am unable to get any combination of flags working to profile a Lean-built executable using nightly-2022-10-31
. My lakefile.lean
is:
@[default_target]
lean_exe checker {
root := `Main
moreLeancArgs := #["-DNDEBUG", "-O3"]
moreLinkArgs := #[
"-fuse-ld=ld",
"-L/home/nawrocki/.elan/toolchains/leanprover--lean4---nightly-2022-10-31/lib"
]
and I am using perf record --call-graph dwarf ./build/bin/checker
. The output I get is "flat", i.e. has a bunch of [unknown]
s like in Mario's original post.
Wojciech Nawrocki (Nov 02 2022 at 17:01):
I should say I also tried with moreLeancArgs := #["-UNDEBUG", "-Og", "-ggdb", "-g3", "-fno-omit-frame-pointer"]
but it did not help.
Gabriel Ebner (Nov 02 2022 at 17:23):
Screenshot_20221102_102251.png
Gabriel Ebner (Nov 02 2022 at 17:33):
You didn't post the code you're benchmarking, but if you have deep recursions you can try to increase the size of the captured stack trace with --call-graph dwarf,60000
.
Wojciech Nawrocki (Nov 02 2022 at 18:05):
The code is here (note the lakefile doesn't have the leanc flags I only use those locally). However I was able to get it working on another machine; it seems that something is wrong with the setup on the other one, so leanc
is not to blame.
Last updated: Dec 20 2023 at 11:08 UTC