Zulip Chat Archive

Stream: new members

Topic: trace.profiler doesn't complete when debugging slow proofs

pandaman (Jan 15 2025 at 15:19):

I have a file with many theorems each taking a lot of time to type check. To learn which part is slow, I tried to enable profiler by adding these lines at the top of the file:

set_option trace.profiler.output "~/Regex/correctness/epsilon_closure_lemmas.json"
set_option trace.profiler.threshold 0
set_option trace.profiler.useHeartbeats true
set_option trace.profiler true
-- set_option maxHeartbeats 1000000

However, when I enabled trace.profiler, Lean started to conplain that the heartbeat is exhausted. And if I increase the heartbeat, it just ran forever. What's the best way to optimize the type checking time in this case? Any ideas are appreciated!

Sebastian Ullrich (Jan 15 2025 at 15:23):

threshold 0 likely creates way too much data for a nontrivial proof

pandaman (Jan 16 2025 at 14:19):

Thank you. I put the following to lakefile.toml, and lake build completed within a reasonable time with threshold set to 5000.

[[lean_lib]]
name = "RegexCorrectness"
moreLeanArgs = [
    "-Dtrace.profiler.output=./profile-5000.json",
    "-Dtrace.profiler.useHeartbeats=true",
    "-Dtrace.profiler.threshold=5000",
    "-Dtrace.profiler=true",
    "-DmaxHeartbeats=0"
]

However, the emitted profile file contains no info (link). Maybe the top-level file has overwritten the generated profiling data... Do you have any guidance on running the profiler for the whole package?

Sebastian Ullrich (Jan 17 2025 at 07:29):

Instead of adding the arguments to the lakefile, after compiling the project once, you can call lake lean on each file in turn and specify a file-specific output path. There is a script in the lean4 repo for combining profile files but it loses some information so it may not be that helpful to you.

pandaman (Jan 18 2025 at 08:47):

Thank you for the suggestion! I managed to get a trace by adding trace.profiler true to one of the slowest theorem. Unfortunately, it displays that Elab.command: Lean.Parser.Command.theorem takes 84% of the time, but it's not captured in any of its sub-traces. Since the theorem uses functional induction principle, I'll look at how it's implemented and maybe I can learn something to make it faster (and less memory-hungry).

Here is the trace: https://share.firefox.dev/4fYy9dH

Eric Wieser (Jan 18 2025 at 10:28):

What threshold did you use?

Eric Wieser (Jan 18 2025 at 10:29):

The 5000 from your message above is 5s, which seems very coarse. You'll do much better with 50 or even 5.

pandaman (Jan 18 2025 at 11:07):

When I used threshold=5000, I also set trace.profiler.useHeartbeats=true, so I guess it's 5000 steps?

The issue is that Lean takes significantly more time as I decrease the threshold. For example, Lean didn't complete checking proofs after >1h when I used threshold=1000 for this file. (With threshold=5000, it took around a few minutes)
The pasted trace was taken by enabling profiler only for a single theorem.

Sebastian Ullrich (Jan 18 2025 at 14:21):

Does not using heartbeats change anything about the coverage of the profile?

pandaman (Jan 19 2025 at 01:58):

I was able to get more traces with a time-based 5ms threshold. Since the slowness comes from the declaration of εClosure'.induct' and its usage, let me report them together here. (the trace in the previous message was from the declaration only)

Declaration of `εClosure'.induct'`

I took a file-level trace and I got https://share.firefox.dev/4hiWOKW

Command

$ lake build && time lake env lean -Dtrace.profiler.output='./epsilon_closure_basic_threshold_5.json' -Dtrace.profiler.threshold=5 -Dtrace.profiler=true RegexCorrectness/VM/EpsilonClosure/Basic.lean
Build completed successfully.

real    0m15.507s
user    0m14.922s
sys     0m0.760s

The 60% of the processing time is still spent on unattributed Elab.command: Lean.Parser.Command.theorem stack, so there is no change in the coverage.

Usages of `εClosure'.induct'`

For this, I got more insightful traces: https://share.firefox.dev/42lEywm

Command

$ lake build && time lake env lean -Dtrace.profiler.output='./epsilon_closure_lemmas_threshold_5.json' -Dtrace.profiler.threshold=5 -Dtrace.profiler=true -DmaxHeartbeats=0 RegexCorrectness/VM/EpsilonClosure/Lemmas.lean
Build completed successfully.

real    0m38.007s
user    0m38.212s
sys     0m0.235s

Lean is spending 71% of time for Meta.isDefEq! I can see many Meta.isDefEq.delta inside the stack. I guess Lean is unfolding some definitions? I'll take a look at which defeq is causing the trouble.

Thank you for many suggestions!

pandaman (Jan 19 2025 at 02:40):

I was able to narrowed down the problematic Meta.isDefEq to a pattern like exact SparseSet.mem_of_mem_of_subset mem (εClosure.subset h). They are defined as this (links: SparseSet.mem_of_mem_of_subset, εClosure.subset)

theorem SparseSet.mem_of_mem_of_subset {s₁ s₂ : SparseSet n} {i : Fin n} (mem : i ∈ s₁) (sub : s₁ ⊆ s₂) : i ∈ s₂ :=
  sub i mem

theorem εClosure.subset (h : εClosure' nfa wf it matched next stack = (matched', next')) :
  next.states ⊆ next'.states := by
  <induction proof here>

When I pass εClosure.subset h as an immediate argument to SparseSet.mem_of_mem_of_subset, Lean seems to check the defeq of s₁ ⊆ s₂, unfolding the definitions of the subset operator of the SparseSet data structure and consuming ~5s each. Since I had several invocations of this pattern, they added up and consumed around 40s in total.

I was able to work around this issue by lifting εClosure.subset h to a local variable like this:

have := εClosure.subset h
exact SparseSet.mem_of_mem_of_subset mem this

Now I don't see Lean unfold s₁ ⊆ s₂, and the file only takes 5-6s to type-check. The profiling shows a more spread attributions between simp and other stuff, so I think it's reasonable.

pandaman (Jan 19 2025 at 03:51):

FYI this is the trace where Lean checks defeq when processing exact SparseSet.mem_of_mem_of_subset mem (εClosure.subset h). I'm not really sure why Lean tries to check it though.
スクリーンショット 2025-01-19 123722_annotated.png

Eric Wieser (Jan 19 2025 at 14:16):

I would guess that exact SparseSet.mem_of_mem_of_subset mem (εClosure.subset h :) also works in place of the have

pandaman (Jan 20 2025 at 09:01):

Oh year. Sometimes a smile is effective to make Lean faster :)

Last updated: May 02 2025 at 03:31 UTC