Zulip Chat Archive

Stream: lean4

Topic: Suboptimal codegen when comparing against a constant

pandaman (Aug 05 2025 at 13:23):

I noticed that replacing an equality comparison against a constant variant in a hot path with a specialized, inline function results in a noticeable speedup, leading to a few percent higher throughput across regex benchmarks.

It appears that the derived decEqNode function wasn't inlined, which may have prevented constant folding and other optimizations. Having specialization for constant arguments might also help in such cases.

Here's a small example demonstrating the difference. I hope feedback like this is welcome. Thanks!

inductive Node where
  | done
  | epsilon (next : Nat)
  | split (next₁ : Nat) (next₂ : Nat)
deriving DecidableEq

@[inline, always_inline]
def Node.isDone (n : Node) : Bool :=
  match n with
  | .done => true
  | _ => false

set_option trace.Compiler.result true

def f (n : Node) : Nat :=
  -- This comparison was not inlined
  if n = .done then
    42
  else
    1

def f' (n : Node) : Nat :=
  -- This version seems faster in my actual example
  if n.isDone then
    42
  else
    1

Robin Arnez (Aug 05 2025 at 13:39):

Yeah this slightly unfortunate but there is another option which works just as well: n matches .done which is shorthand for what you do in Node.isDone.

Robin Arnez (Aug 05 2025 at 13:41):

Ideally this would be compiled efficiently regardless of what you do though

Markus Himmel (Aug 08 2025 at 07:21):

Thank you for reporting this. I have created lean4#9795. @pandaman, as described in the issue, I wasn't able to reproduce a slowdown from this in compiled code using your code example. If you happen to know how to make the difference visible in compiled code, feel free to add a comment to the issue.

pandaman (Aug 08 2025 at 11:07):

Hi thank you for looking into this! Yeah, this example is for illustration purpose and I didn't check if this one reproduces the slowness. I was able to modify the benchmark to reproduce the slowdown:

https://github.com/leanprover/lean4/issues/9795#issuecomment-3167497020

It seems that a variant with Char and testing with non-matching cases (which is closer to my original benchmarking) are crucial.

Markus Himmel (Aug 08 2025 at 11:09):

Great, thank you! I'm sure that this will help tracking down the issue.

Markus Himmel (Aug 08 2025 at 11:19):

Oh, wow, looking at the generated IR, it's reusing the storage from the Node for the IO result type, which falsifies the numbers. However, after fixing this, f' is still measurably slower. I have updated the issue with a version of the benchmarking code which I hope produces meaningful numbers.

Last updated: Feb 28 2026 at 14:05 UTC