Zulip Chat Archive

Stream: lean4

Topic: Stack trace profiling


Mario Carneiro (Feb 07 2022 at 17:16):

Has anyone had success using perf to get a meaningful profile of a lean 4 application? I've been trying to run mathport in perf, but the stack trace seems to break down very often, with only one or two functions before hitting some [unknown] ([unknown]), and it's rare to see any functions at all defined in the Mathport package in the call stack.

Steps:

  • add moreLeancArgs := #["-UNDEBUG", "-Og", "-ggdb", "-g3", "-fno-omit-frame-pointer"] to lakefile.lean
  • sudo sh -c "echo 0 > /proc/sys/kernel/kptr_restrict"
  • sudo sh -c "echo -1 > /proc/sys/kernel/perf_event_paranoid"
  • perf record -g --call-graph fp -e cycles -F 999 -- cmd args...
  • perf script -F +pid > test.perf
  • View in profiler.firefox.com

The perf file looks like this:

mathport 2429065/2429071 1263112.249188:    4429237 cycles:
             1b5f5c7 lean_expr_loose_bvar_range+0x77 (/home/mario/Documents/lean/mathport/build/bin/mathport)

mathport 2429065/2429071 1263112.250253:    4413503 cycles:
             1b61131 l_Lean_mkLambda+0x111 (/home/mario/Documents/lean/mathport/build/bin/mathport)

mathport 2429065/2429071 1263112.251281:    4380424 cycles:
             19d775d lean::hash+0xd (/home/mario/Documents/lean/mathport/build/bin/mathport)
    d000007fdb84006b [unknown] ([unknown])

mathport 2429065/2429071 1263112.252260:    4365599 cycles:
             1b5ac94 l_Lean_Expr_mkData+0x134 (/home/mario/Documents/lean/mathport/build/bin/mathport)

mathport 2429065/2429071 1263112.253293:    4377971 cycles:
             19dc2fc lean::replace_rec_fn::apply+0xfc (/home/mario/Documents/lean/mathport/build/bin/mathport)

mathport 2429065/2429071 1263112.254344:    4350249 cycles:
             19d7d3e lean::mk_lambda+0xe (/home/mario/Documents/lean/mathport/build/bin/mathport)

mathport 2429065/2429071 1263112.255319:    4324505 cycles:
             399d527 lean_apply_2+0xc7 (/home/mario/Documents/lean/mathport/build/bin/mathport)
             24af839 l_Lean_Meta_mkLambdaFVars+0x279 (/home/mario/Documents/lean/mathport/build/bin/mathport)
                   1 [unknown] ([unknown])

Gabriel Ebner (Feb 07 2022 at 17:21):

I usually do --call-graph dwarf,65528 which records more of the stack.

Sebastian Ullrich (Feb 07 2022 at 17:21):

-fno-omit-fp probably doesn't change much unless you also recompile Lean with it. But then you may as well download the Lean 4 "Linux" (not "Linux release") artifact, which is not linked using lld and thus should work with perf --call-graph dwarf: https://github.com/llvm/llvm-project/issues/53156

Leonardo de Moura (Feb 07 2022 at 17:22):

perf works great for me. Here is an example:

perf record --call-graph dwarf build/release/stage1/bin/lean TBA/While/ConstantPropagation.lean

hotspot

Mario Carneiro (Feb 07 2022 at 17:23):

do you have to build lean with something other than the default -O3 -DNDEBUG?

Sebastian Ullrich (Feb 07 2022 at 17:23):

No, if your default linker is not lld

Mario Carneiro (Feb 07 2022 at 17:24):

:thinking: what's my default linker?

Sebastian Ullrich (Feb 07 2022 at 17:24):

Most probably not lld :) . ld -v.

Mario Carneiro (Feb 07 2022 at 17:25):

GNU ld (GNU Binutils for Ubuntu) 2.37

Sebastian Ullrich (Feb 07 2022 at 17:29):

And the output from above is with a self-compiled Lean?

Mario Carneiro (Feb 07 2022 at 17:30):

no, I'm on nightly

Mario Carneiro (Feb 07 2022 at 17:30):

I haven't tried your suggestion of recompiling lean yet

Gabriel Ebner (Feb 07 2022 at 17:31):

But then you may as well download the Lean 4 "Linux" (not "Linux release") artifact

--call-graph dwarf works for me with the nightly build (though ld is used for linking mathport since I'm on nixos).

Mario Carneiro (Feb 07 2022 at 17:32):

gabriel's suggestion seems to produce some very large files, I get these Check IO/CPU overload! warnings and some self test stuff seems to fail

Gabriel Ebner (Feb 07 2022 at 17:32):

Yeah, the files are larger because it captures more stack information. The overload warnings are harmless in my experience.

Gabriel Ebner (Feb 07 2022 at 17:33):

(@Sebastian Ullrich BTW, thanks for the perf report | less tip. The dumb output is really much better than the TUI which gets shown by default.)

Mario Carneiro (Feb 07 2022 at 17:41):

hm, --call-graph dwarf,65528 didn't help

Gabriel Ebner (Feb 07 2022 at 17:44):

Do you get unknown in perf report | less as well? (I don't get any here.)

Gabriel Ebner (Feb 07 2022 at 17:45):

IIRC you can switch the linker by doing export LEAN_CC=gcc.

Mario Carneiro (Feb 07 2022 at 17:45):

Here's perf report (trimmed in kernel calls):

    10.31%    10.27%  mathport  mathport              [.] lean_dec_ref_cold
            |
            ---lean_dec_ref_cold

     6.45%     6.44%  mathport  mathport              [.] lean::replace_rec_fn::apply
            |
            ---lean::replace_rec_fn::apply

     6.11%     0.01%  mathport  [kernel.kallsyms]     [k] asm_exc_page_fault
            |
             --6.10%--asm_exc_page_fault
                       |
                        --5.66%--exc_page_fault

     5.66%     0.01%  mathport  [kernel.kallsyms]     [k] exc_page_fault
            |
             --5.65%--exc_page_fault
                       |
                        --5.61%--do_user_addr_fault

     5.61%     0.02%  mathport  [kernel.kallsyms]     [k] do_user_addr_fault
            |
             --5.59%--do_user_addr_fault
                       |
                        --5.36%--handle_mm_fault

     5.36%     0.05%  mathport  [kernel.kallsyms]     [k] handle_mm_fault
            |
             --5.31%--handle_mm_fault

     5.28%     0.08%  mathport  [kernel.kallsyms]     [k] __handle_mm_fault
            |
             --5.20%--__handle_mm_fault

     5.24%     5.23%  mathport  mathport              [.] l_Lean_mkApp
            |
            ---l_Lean_mkApp

     5.18%     0.02%  mathport  [kernel.kallsyms]     [k] handle_pte_fault
            |
             --5.16%--handle_pte_fault

     5.00%     4.98%  mathport  mathport              [.] lean_free_small
            |
            ---lean_free_small

     4.77%     4.73%  mathport  mathport              [.] lean_alloc_small
            |
            ---lean_alloc_small

     4.68%     0.01%  mathport  [kernel.kallsyms]     [k] do_fault
            |
             --4.68%--do_fault

     4.68%     0.00%  mathport  [kernel.kallsyms]     [k] do_read_fault
            |
             --4.67%--do_read_fault

     4.14%     0.06%  mathport  [kernel.kallsyms]     [k] page_cache_ra_unbounded
            |
             --4.08%--page_cache_ra_unbounded

     3.91%     0.00%  mathport  [kernel.kallsyms]     [k] __do_fault
            |
             --3.91%--__do_fault
                       ext4_filemap_fault

     3.91%     0.02%  mathport  [kernel.kallsyms]     [k] ext4_filemap_fault
            |
             --3.89%--ext4_filemap_fault

     3.88%     0.04%  mathport  [kernel.kallsyms]     [k] filemap_fault
            |
             --3.84%--filemap_fault

     3.18%     0.00%  mathport  [kernel.kallsyms]     [k] do_page_cache_ra
            |
             --3.18%--do_page_cache_ra

     3.01%     2.78%  mathport  mathport              [.] lean_name_eq
            |
             --3.00%--lean_name_eq

     2.72%     2.72%  mathport  mathport              [.] lean::replace_cache::insert
            |
            ---lean::replace_cache::insert

Gabriel Ebner (Feb 07 2022 at 17:46):

    89.44%     0.72%  mathport  mathport            [.] lean_apply_1
            |
             --88.72%--lean_apply_1
                       |
                        --87.41%--lean::lean_io_as_task_fn
                                  lean_apply_1
                                  l_EIO_toBaseIO___rarg
                                  lean_apply_1
                                  |
                                   --87.41%--l_Mathport_mathport1
                                             l_Mathport_mathport1___lambda__3
                                             l_Lean_withImportModules___rarg
                                             |
                                             |--60.33%--lean_import_modules
                                             |          l_Lean_profileitIOUnsafe___rarg
                                             |          lean_profileit
                                             |          lean_apply_1
                                             |          l_Lean_profileitIOUnsafe___rarg___lambda__1___boxed
                                             |          l_Lean_profileitIOUnsafe___rarg___lambda__1
                                             |          lean_apply_1
                                             |          l_Lean_withImporting___rarg
                                             |          lean_apply_1
                                             |          |
                                             |          |--56.98%--lean_apply_2
                                             |          |          l_Lean_importModules___lambda__3
                                             |          |          |
                                             |          |          |--43.81%--l___private_Lean_Environment_0__Lean_finalizePersistentExtensions_loop
                                             |          |          |          lean_apply_3
                                             |          |          |          |
                                             |          |          |          |--39.13%--l_Lean_registerSimplePersistentEnvExtension___rarg___lambda__1___boxed
                                             |          |          |          |          lean_apply_1
                                             |          |          |          |          |
                                             |          |          |          |          |--31.26%--l_Lean_mkStateFromImportedEntries___at_Mathport_initFn____x40_Mathport_Bridge_Rename___hyg_61____spec__2
                                             |          |          |          |          |          l_Array_foldlMUnsafe_fold___at_Mathport_initFn____x40_Mathport_Bridge_Rename___hyg_61____spec__4
                                             |          |          |          |          |          l_Array_foldlMUnsafe_fold___at_Mathport_initFn____x40_Mathport_Bridge_Rename___hyg_61____spec__3

Mario Carneiro (Feb 07 2022 at 17:46):

yeah, definitely not getting that

Mario Carneiro (Feb 07 2022 at 17:47):

I did get something close to that when profiling in gdb though

Mario Carneiro (Feb 07 2022 at 17:48):

there is a call to a profiling command lean_profileit in there? Did you do something on the lean side?

Gabriel Ebner (Feb 07 2022 at 17:49):

That's interesting. I didn't fiddle with the compiler options at all. This is completely vanilla mathport master. I've only set the LEAN_CC=cc environment variable. And the --call-graph dwarf,65528 option.

Gabriel Ebner (Feb 07 2022 at 17:50):

Mario Carneiro said:

there is a call to a profiling command lean_profileit in there? Did you do something on the lean side?

https://github.com/leanprover/lean4/blob/95aec2cf93762530a9c69dfc98c165a8b54238e2/src/Lean/Environment.lean#L615

Mario Carneiro (Feb 07 2022 at 17:50):

I'm trying that now

Mario Carneiro (Feb 07 2022 at 17:52):

oh, the linker failed

error: /usr/bin/ld: cannot find -lc++
/usr/bin/ld: cannot find -lc++abi
collect2: error: ld returned 1 exit status

Gabriel Ebner (Feb 07 2022 at 17:54):

Oh, the nixpkgs wrapper (shell script around leanc) also adds an -L flag. Not sure if there's an easy way to simulate that elsewhere (without replacing leanc by a shell script).

#! /nix/store/a54wrar1jym1d8yvlijq0l2gghmy8szz-bash-5.1-p12/bin/bash
LEAN_CC="${LEAN_CC:-/nix/store/xiq6j4jsyj351p8q3yw9cg1hdqp9m685-gcc-wrapper-10.3.0/bin/cc}" exec -a "$0" /home/gebner/.elan/toolchains/leanprover--lean4---nightly-2022-01-27/bin/leanc.orig "$@" -L /home/gebner/.elan/toolchains/leanprover--lean4---nightly-2022-01-27/lib  # use bundled libraries, but not bundled compiler that doesn't know about NIX_LDFLAGS

Mario Carneiro (Feb 07 2022 at 18:00):

:tada: I added "-L/home/mario/.elan/toolchains/leanprover--lean4---nightly-2022-02-06/lib" to lake's moreLinkArgs and it worked and produced sensible perf output

Sebastian Ullrich (Feb 07 2022 at 18:03):

I'd really love to know how lld manages to screw this up... and why nobody else seems to mind

Sebastian Ullrich (Feb 07 2022 at 18:04):

-fuse-ld=ld in the link args should also work btw. Maybe it needs a full path.

Wojciech Nawrocki (Nov 02 2022 at 16:55):

I am unable to get any combination of flags working to profile a Lean-built executable using nightly-2022-10-31. My lakefile.lean is:

@[default_target]
lean_exe checker {
  root := `Main
  moreLeancArgs := #["-DNDEBUG", "-O3"]
  moreLinkArgs := #[
    "-fuse-ld=ld",
    "-L/home/nawrocki/.elan/toolchains/leanprover--lean4---nightly-2022-10-31/lib"
  ]

and I am using perf record --call-graph dwarf ./build/bin/checker. The output I get is "flat", i.e. has a bunch of [unknown]s like in Mario's original post.

Wojciech Nawrocki (Nov 02 2022 at 17:01):

I should say I also tried with moreLeancArgs := #["-UNDEBUG", "-Og", "-ggdb", "-g3", "-fno-omit-frame-pointer"] but it did not help.

Gabriel Ebner (Nov 02 2022 at 17:23):

Screenshot_20221102_102251.png

Gabriel Ebner (Nov 02 2022 at 17:33):

You didn't post the code you're benchmarking, but if you have deep recursions you can try to increase the size of the captured stack trace with --call-graph dwarf,60000.

Wojciech Nawrocki (Nov 02 2022 at 18:05):

The code is here (note the lakefile doesn't have the leanc flags I only use those locally). However I was able to get it working on another machine; it seems that something is wrong with the setup on the other one, so leanc is not to blame.


Last updated: Dec 20 2023 at 11:08 UTC