Zulip Chat Archive

Stream: lean4

Topic: What is the elaborator doing in this example

Chris Hughes (Oct 04 2023 at 13:49):

In the following code snippet the elaborator spends three seconds, even though it looks like I gave it a fully elaborated term. What is it spending its time doing? Interestingly, if I use List.eraseDups (and instBEqNat) instead of List.dedup then elaboration is instant. The kernel typechecks the declaration in an instant as well.

import Mathlib.Data.List.Basic

set_option trace.Kernel true
set_option profiler true

example : @Eq.{1} (@List.{0} @Nat) (@List.dedup.{0} @Nat @Nat.decEq
    (@List.replicate.{0} @Nat (Nat.zero.succ.succ.succ.succ.succ.succ.succ.succ.succ) @Nat.zero))
    (@List.cons.{0} @Nat @Nat.zero (@List.nil.{0} @Nat)) :=
  @Eq.refl.{1} (@List.{0} @Nat) (@List.cons.{0} @Nat @Nat.zero (@List.nil.{0} @Nat))

Henrik Böving (Oct 04 2023 at 14:00):

You can use the profiler option and in addition my Flame tool to profile it and hopefully figure out what's up

Mauricio Collares (Oct 04 2023 at 14:20):

Yet another example where by rfl is way faster than rfl.

Mauricio Collares (Oct 04 2023 at 14:24):

It's also an example of the kind initially discovered by Kevin in https://leanprover.zulipchat.com/#narrow/stream/270676-lean4/topic/rfl.20repeating.20itself, where the times from the trace profiler are roughly of the form X, X/2, X/4, X/8, ..., suggesting something is being checked/elaborated twice in a way that leads to exponential blowup.

Mauricio Collares (Oct 04 2023 at 14:26):

Does it get better with lean4#2612?

Mauricio Collares (Oct 04 2023 at 14:37):

Here it got 2.16x faster with the leanprover/lean4-pr-releases:pr-release-2612 toolchain compared to leanprover/lean4:v4.2.0-rc1, but I haven't tested the parent commit to be sure the speedup comes just from the mentioned PR. Also don't know what the standard deviation is for these measurements, it would be good to have a tool to make "proper" local benchmarking simple.

Chris Hughes (Oct 04 2023 at 15:20):

Henrik Böving said:

You can use the profiler option and in addition my Flame tool to profile it and hopefully figure out what's up

How do I use your Flame tool?

Patrick Massot (Oct 04 2023 at 15:36):

Chris, there is some info at https://leanprover.zulipchat.com/#narrow/stream/270676-lean4/topic/Introducing.20FlameTC, but I'm not sure this is sufficient to get started.

Kevin Buzzard (Oct 04 2023 at 15:41):

I followed those instructions and they worked flawlessly. The fact that I didn't understand the output was the issue for me, but I think that for the obscure slowdown I was debugging nobody understood the output :-)

Patrick Massot (Oct 04 2023 at 15:51):

Yes, understanding the output is the tricky part.

Patrick Massot (Oct 04 2023 at 15:53):

There was some discussion where Sebastian and Henrik explained this to me but I can' t find it now.

Chris Hughes (Oct 04 2023 at 16:07):

out.txt
I've attached out.txt, from Flame, but it's hard to understand.

Kevin Buzzard (Oct 04 2023 at 16:13):

wait, this output is only 1 byte?

Chris Hughes (Oct 04 2023 at 16:17):

Sorry. It's now fixed

Kevin Buzzard (Oct 04 2023 at 16:31):

out.svg

Henrik Böving (Oct 04 2023 at 18:34):

Kevin Buzzard said:

out.svg

Hm, there is a lot of info missing from the profile due to insufficient logging from the compiler I guess but it does seem to think very hard about this casesOn =?= 0 unification problem and also twice but in swapped form? That seems a bit unnecessary if it is truly the same term

Chris Hughes (Oct 05 2023 at 09:21):

Elaboration time seems to be exponential in the number of succs. Approximately 3 times slower for each extra succ.

Sebastian Ullrich (Oct 05 2023 at 09:23):

With lean4#2612 as well?

Chris Hughes (Oct 05 2023 at 09:25):

I don't know how to test that. How do I make mathlib depend on that Lean commit?

Eric Wieser (Oct 05 2023 at 09:29):

Mauricio Collares said:

Here it got 2.16x faster with the leanprover/lean4-pr-releases:pr-release-2612 toolchain compared to leanprover/lean4:v4.2.0-rc1, but I haven't tested the parent commit to be sure the speedup comes just from the mentioned PR. Also don't know what the standard deviation is for these measurements, it would be good to have a tool to make "proper" local benchmarking simple.

Perhaps @Mauricio Collares already pushed a branch that does that?

Sebastian Ullrich (Oct 05 2023 at 09:30):

There's an automatic one, lean-pr-testing-2612

Chris Hughes (Oct 05 2023 at 09:30):

The answer to my question is type leanprover/lean4-pr-releases:pr-release-2612 in the lean-toolchain file for anyone interested.

Chris Hughes (Oct 05 2023 at 09:53):

Sebastian Ullrich said:

With lean4#2612 as well?

Yes. It's about 2.5x faster. But still about 3 times slower for each extra succ.

Mauricio Collares (Oct 05 2023 at 10:44):

@Sebastian Ullrich This can be pasted after the test in lean4#2612 for a minimised example:

def pwFilter {α} (R : α → α → Prop) [DecidableRel R] (l : List α) : List α :=
  l.foldr (fun x IH => if ∀ y, y ∈ IH → R x y then x :: IH else IH) []

def dedup {α} [DecidableEq α] : List α → List α :=
  pwFilter (· ≠ ·)

-- times out
example : dedup (List.replicate 11 0) = [0] := rfl

-- times out
example : @Eq.{1} (@List.{0} @Nat) (@dedup @Nat @Nat.decEq
    (@List.replicate.{0} @Nat (Nat.zero.succ.succ.succ.succ.succ.succ.succ.succ.succ) @Nat.zero))
    (@List.cons.{0} @Nat @Nat.zero (@List.nil.{0} @Nat)) := rfl

Sebastian Ullrich (Oct 05 2023 at 10:48):

Thanks!

Sebastian Ullrich (Oct 05 2023 at 10:58):

Ah, you already mentioned that by rfl is fast, so it is not WHNF but the first-order approximation of defeq that's likely at fault here. I think the first question we should ask here is whether we simply want to ensure that term rfl benefits from all the optimizations tactic rfl has.

-- for demonstration purposes only, might not be quite correct yet regarding hygiene
macro_rules
  | `(rfl) => `(by rfl)  -- not as recursive as it looks

Ruben Van de Velde (Oct 05 2023 at 11:04):

Does dsimp apply by rfl theorems?

Sebastian Ullrich (Oct 05 2023 at 11:15):

It should, the resulting Expr is the same

Mauricio Collares (Oct 05 2023 at 11:30):

Huh, actually the minimised example I posted is really fast with lean4#2612 (sorry for not checking earlier!), even with rfl.

Mauricio Collares (Oct 05 2023 at 11:41):

That's super weird. With decidableBall implemented in terms of decidableBex, as in the test, everything is fast using the toolchain from lean4#2612. But if I instead use

instance decidableBall {α} (p : α → Prop) [DecidablePred p] :
    ∀ l : List α, Decidable (∀ x, x ∈ l → p x)
  | [] => isTrue sorry
  | x :: xs =>
    match ‹DecidablePred p› x with
    | isTrue h₁ =>
      match decidableBall p xs with
      | isTrue h₂ => isTrue sorry
      | isFalse h₂ => isFalse sorry
    | isFalse h₁ => isFalse sorry

as the decidableBall implementation, then

def pwFilter {α} (R : α → α → Prop) [DecidableRel R] (l : List α) : List α :=
  l.foldr (fun x IH => if ∀ y, y ∈ IH → R x y then x :: IH else IH) []

def dedup {α} [DecidableEq α] : List α → List α :=
  pwFilter (· ≠ ·)

example : dedup (List.replicate 11 0) = [0] := rfl

times out again with (deterministic) timeout at 'whnf', maximum number of heartbeats (200000) has been reached, even using lean4#2612. This implementation of decidableBall is closer to what is used in Std/mathlib.

Mauricio Collares (Oct 05 2023 at 12:18):

Filed as lean4#2627 just in case.

Mauricio Collares (Oct 05 2023 at 13:20):

Hmm, the following simplification also times out pre lean4#2612, but works with by rfl:

def dedup {α} [DecidableEq α] : List α → List α :=
List.foldr (fun x IH => if IH.all (· ≠ x) then x :: IH else IH) []

example : dedup (List.replicate 11 0) = [0] := rfl

After lean4#2612 it runs fast.

Sebastian Ullrich (Oct 05 2023 at 13:26):

If it's fast with 2612 but wasn't fast before minimization, I think you've simplified too far :)

Mauricio Collares (Oct 05 2023 at 13:31):

Yes, I tend to do that :/ Which is why I filed lean4#2627 as soon as I had something mathlib-free which wasn't fixed by 2612. I then couldn't resist continuing, of course.

Mauricio Collares (Oct 05 2023 at 14:33):

flame.svg for the example in lean4#2627 with lean4#2559 and lean4#2612

Sebastian Ullrich (Oct 05 2023 at 14:48):

I think I have a clue of what's going on

Chris Hughes (Oct 06 2023 at 15:18):

I think the purpose of this question for me was just to understand why the kernel can be super fast to type check a declaration, but IsDefEq is extremely slow? What is so different about the elaborator and the kernel. @Kevin Buzzard showed me a more complicated example of a rfl proof which type checked in the kernel in like 0.02s but took a long time in the elaborator.

Sebastian Ullrich (Oct 06 2023 at 16:45):

There isn't any fundamental reason; more complex code is simply more complex to implement caching for

Last updated: May 02 2025 at 03:31 UTC