Zulip Chat Archive

Stream: lean4

Topic: Slowdown in v4.22.0-rc3


pandaman (Jul 12 2025 at 11:27):

I observed that the lean-regex library slowed down by 10-30% for regexes with alternations after upgrading to v4.22.0-rc3. I wonder if the new compiler introduced a regression. I'd appreciate any comments!

According to perf, this SparseSet.insert function seems to spend more time in v4.22.0-rc3. Upon inspecing the generated C code, v4.22.0-rc3 emits more lean_dec in the exclusive branch (one per SparseSet field), which might have caused the regression.

Traces

I collected perf samples for the two versions in Pop!_OS 22.04 (Ubuntu-like distribution). You can see the stack traces below:

How to reproduce

Checkout v4.21.0-benchmark and v4.22.0-rc3-benchmark branches from the lean-regex repository. In the regex directory, execute lake exe Bench -e 'def|have|push|wf|nfa' -n 1000 Regex/Backtracker/Basic.lean to see how long each regex execution takes. In my environment, v4.22.0-rc3 takes roughly 20% more time.

Markus Himmel (Jul 12 2025 at 11:33):

Yes, it's very likely that this is due to the new compiler. We're also experiencing some regressions of a similar order of magnitude in the standard library. Thanks for preparing these benchmarks and for investigating! cc @Cameron Zwarich

Cameron Zwarich (Jul 12 2025 at 18:47):

Thanks for reporting this! From your description that this specifically affects alt patterns and specifically involves ref-counting operations, this seems very likely to be due to a known regression with the new compiler involving the increased use of join points (essentially non-escaping local functions called in tail position from two distinct places).

The new compiler generates more of these, which is generally speaking more optimal, but the ref-counting passes at the back end of the compiler (which did not change) are a bit pessimistic in the face of join points.

Apologies for the temporary regression. I will probably use lean-regex to generate test cases when I fix this issue.

pandaman (Jul 13 2025 at 02:08):

Thanks. I hope the codegen improves in the newer versions!

I tried to isolate the affected component, but it shows only a few percentage of slowdown and has a higher variance. It also depends on a bit long proofs, so it might be only useful as a starting point.

(note: this reproduction is quadratic in -n. I used -n 30000, but you might need to find a good -n that works in your environment)

Reproducer


Last updated: Dec 20 2025 at 21:32 UTC