Zulip Chat Archive
Stream: lean4
Topic: What is the elaborator doing in this example
Chris Hughes (Oct 04 2023 at 13:49):
In the following code snippet the elaborator spends three seconds, even though it looks like I gave it a fully elaborated term. What is it spending its time doing? Interestingly, if I use List.eraseDups
(and instBEqNat
) instead of List.dedup
then elaboration is instant. The kernel typechecks the declaration in an instant as well.
import Mathlib.Data.List.Basic
set_option trace.Kernel true
set_option profiler true
example : @Eq.{1} (@List.{0} @Nat) (@List.dedup.{0} @Nat @Nat.decEq
(@List.replicate.{0} @Nat (Nat.zero.succ.succ.succ.succ.succ.succ.succ.succ.succ) @Nat.zero))
(@List.cons.{0} @Nat @Nat.zero (@List.nil.{0} @Nat)) :=
@Eq.refl.{1} (@List.{0} @Nat) (@List.cons.{0} @Nat @Nat.zero (@List.nil.{0} @Nat))
Henrik Böving (Oct 04 2023 at 14:00):
You can use the profiler option and in addition my Flame tool to profile it and hopefully figure out what's up
Mauricio Collares (Oct 04 2023 at 14:20):
Yet another example where by rfl
is way faster than rfl
.
Mauricio Collares (Oct 04 2023 at 14:24):
It's also an example of the kind initially discovered by Kevin in https://leanprover.zulipchat.com/#narrow/stream/270676-lean4/topic/rfl.20repeating.20itself, where the times from the trace profiler are roughly of the form X, X/2, X/4, X/8, ..., suggesting something is being checked/elaborated twice in a way that leads to exponential blowup.
Mauricio Collares (Oct 04 2023 at 14:26):
Does it get better with lean4#2612?
Mauricio Collares (Oct 04 2023 at 14:37):
Here it got 2.16x faster with the leanprover/lean4-pr-releases:pr-release-2612
toolchain compared to leanprover/lean4:v4.2.0-rc1
, but I haven't tested the parent commit to be sure the speedup comes just from the mentioned PR. Also don't know what the standard deviation is for these measurements, it would be good to have a tool to make "proper" local benchmarking simple.
Chris Hughes (Oct 04 2023 at 15:20):
Henrik Böving said:
You can use the profiler option and in addition my Flame tool to profile it and hopefully figure out what's up
How do I use your Flame tool?
Patrick Massot (Oct 04 2023 at 15:36):
Chris, there is some info at https://leanprover.zulipchat.com/#narrow/stream/270676-lean4/topic/Introducing.20FlameTC, but I'm not sure this is sufficient to get started.
Kevin Buzzard (Oct 04 2023 at 15:41):
I followed those instructions and they worked flawlessly. The fact that I didn't understand the output was the issue for me, but I think that for the obscure slowdown I was debugging nobody understood the output :-)
Patrick Massot (Oct 04 2023 at 15:51):
Yes, understanding the output is the tricky part.
Patrick Massot (Oct 04 2023 at 15:53):
There was some discussion where Sebastian and Henrik explained this to me but I can' t find it now.
Chris Hughes (Oct 04 2023 at 16:07):
out.txt
I've attached out.txt
, from Flame, but it's hard to understand.
Kevin Buzzard (Oct 04 2023 at 16:13):
wait, this output is only 1 byte?
Chris Hughes (Oct 04 2023 at 16:17):
Sorry. It's now fixed
Kevin Buzzard (Oct 04 2023 at 16:31):
Henrik Böving (Oct 04 2023 at 18:34):
Kevin Buzzard said:
Hm, there is a lot of info missing from the profile due to insufficient logging from the compiler I guess but it does seem to think very hard about this casesOn =?= 0 unification problem and also twice but in swapped form? That seems a bit unnecessary if it is truly the same term
Chris Hughes (Oct 05 2023 at 09:21):
Elaboration time seems to be exponential in the number of succ
s. Approximately 3 times slower for each extra succ
.
Sebastian Ullrich (Oct 05 2023 at 09:23):
With lean4#2612 as well?
Chris Hughes (Oct 05 2023 at 09:25):
I don't know how to test that. How do I make mathlib depend on that Lean commit?
Eric Wieser (Oct 05 2023 at 09:29):
Mauricio Collares said:
Here it got 2.16x faster with the
leanprover/lean4-pr-releases:pr-release-2612
toolchain compared toleanprover/lean4:v4.2.0-rc1
, but I haven't tested the parent commit to be sure the speedup comes just from the mentioned PR. Also don't know what the standard deviation is for these measurements, it would be good to have a tool to make "proper" local benchmarking simple.
Perhaps @Mauricio Collares already pushed a branch that does that?
Sebastian Ullrich (Oct 05 2023 at 09:30):
There's an automatic one, lean-pr-testing-2612
Chris Hughes (Oct 05 2023 at 09:30):
The answer to my question is type leanprover/lean4-pr-releases:pr-release-2612
in the lean-toolchain
file for anyone interested.
Chris Hughes (Oct 05 2023 at 09:53):
Sebastian Ullrich said:
With lean4#2612 as well?
Yes. It's about 2.5x faster. But still about 3 times slower for each extra succ
.
Mauricio Collares (Oct 05 2023 at 10:44):
@Sebastian Ullrich This can be pasted after the test in lean4#2612 for a minimised example:
def pwFilter {α} (R : α → α → Prop) [DecidableRel R] (l : List α) : List α :=
l.foldr (fun x IH => if ∀ y, y ∈ IH → R x y then x :: IH else IH) []
def dedup {α} [DecidableEq α] : List α → List α :=
pwFilter (· ≠ ·)
-- times out
example : dedup (List.replicate 11 0) = [0] := rfl
-- times out
example : @Eq.{1} (@List.{0} @Nat) (@dedup @Nat @Nat.decEq
(@List.replicate.{0} @Nat (Nat.zero.succ.succ.succ.succ.succ.succ.succ.succ.succ) @Nat.zero))
(@List.cons.{0} @Nat @Nat.zero (@List.nil.{0} @Nat)) := rfl
Sebastian Ullrich (Oct 05 2023 at 10:48):
Thanks!
Sebastian Ullrich (Oct 05 2023 at 10:58):
Ah, you already mentioned that by rfl
is fast, so it is not WHNF but the first-order approximation of defeq that's likely at fault here. I think the first question we should ask here is whether we simply want to ensure that term rfl
benefits from all the optimizations tactic rfl
has.
-- for demonstration purposes only, might not be quite correct yet regarding hygiene
macro_rules
| `(rfl) => `(by rfl) -- not as recursive as it looks
Ruben Van de Velde (Oct 05 2023 at 11:04):
Does dsimp
apply by rfl
theorems?
Sebastian Ullrich (Oct 05 2023 at 11:15):
It should, the resulting Expr
is the same
Mauricio Collares (Oct 05 2023 at 11:30):
Huh, actually the minimised example I posted is really fast with lean4#2612 (sorry for not checking earlier!), even with rfl
.
Mauricio Collares (Oct 05 2023 at 11:41):
That's super weird. With decidableBall
implemented in terms of decidableBex
, as in the test, everything is fast using the toolchain from lean4#2612. But if I instead use
instance decidableBall {α} (p : α → Prop) [DecidablePred p] :
∀ l : List α, Decidable (∀ x, x ∈ l → p x)
| [] => isTrue sorry
| x :: xs =>
match ‹DecidablePred p› x with
| isTrue h₁ =>
match decidableBall p xs with
| isTrue h₂ => isTrue sorry
| isFalse h₂ => isFalse sorry
| isFalse h₁ => isFalse sorry
as the decidableBall implementation, then
def pwFilter {α} (R : α → α → Prop) [DecidableRel R] (l : List α) : List α :=
l.foldr (fun x IH => if ∀ y, y ∈ IH → R x y then x :: IH else IH) []
def dedup {α} [DecidableEq α] : List α → List α :=
pwFilter (· ≠ ·)
example : dedup (List.replicate 11 0) = [0] := rfl
times out again with (deterministic) timeout at 'whnf', maximum number of heartbeats (200000) has been reached
, even using lean4#2612. This implementation of decidableBall
is closer to what is used in Std/mathlib.
Mauricio Collares (Oct 05 2023 at 12:18):
Filed as lean4#2627 just in case.
Mauricio Collares (Oct 05 2023 at 13:20):
Hmm, the following simplification also times out pre lean4#2612, but works with by rfl
:
def dedup {α} [DecidableEq α] : List α → List α :=
List.foldr (fun x IH => if IH.all (· ≠ x) then x :: IH else IH) []
example : dedup (List.replicate 11 0) = [0] := rfl
After lean4#2612 it runs fast.
Sebastian Ullrich (Oct 05 2023 at 13:26):
If it's fast with 2612 but wasn't fast before minimization, I think you've simplified too far :)
Mauricio Collares (Oct 05 2023 at 13:31):
Yes, I tend to do that :/ Which is why I filed lean4#2627 as soon as I had something mathlib-free which wasn't fixed by 2612. I then couldn't resist continuing, of course.
Mauricio Collares (Oct 05 2023 at 14:33):
flame.svg for the example in lean4#2627 with lean4#2559 and lean4#2612
Sebastian Ullrich (Oct 05 2023 at 14:48):
I think I have a clue of what's going on
Chris Hughes (Oct 06 2023 at 15:18):
I think the purpose of this question for me was just to understand why the kernel can be super fast to type check a declaration, but IsDefEq
is extremely slow? What is so different about the elaborator and the kernel. @Kevin Buzzard showed me a more complicated example of a rfl
proof which type checked in the kernel in like 0.02s but took a long time in the elaborator.
Sebastian Ullrich (Oct 06 2023 at 16:45):
There isn't any fundamental reason; more complex code is simply more complex to implement caching for
Last updated: Dec 20 2023 at 11:08 UTC