Zulip Chat Archive

Stream: mathlib4

Topic: performance when importing Mathlib


Floris van Doorn (Dec 06 2024 at 10:41):

Consider this example, either with a minimal import, or when importing all of Mathlib:

-- import Mathlib.Algebra.Polynomial.Eval.Coeff
import Mathlib

open Polynomial
set_option profiler true
set_option diagnostics true

example : (X ^ 3 + (2 : )  X - 1 : Polynomial ).eval 2 = 11 := by
  simp

The performance of this depends massively on the imports (by a factor >100):

-- importing Mathlib
/-
simp took 6.81s

[reduction] unfolded declarations (max: 14466, num: 23): ▶
[reduction] unfolded instances (max: 145800, num: 31): ▶
[reduction] unfolded reducible declarations (max: 9000, num: 11): ▶
[type_class] used instances (max: 49, num: 2): ▶
[type_class] max synth pending failures (maxSynthPendingDepth: 1), use `set_option maxSynthPendingDepth <limit>` ▶
[def_eq] heuristic for solving `f a =?= f b` (max: 1807, num: 10): ▶
use `set_option diagnostics.threshold <num>` to control threshold for reporting counters
-/

-- importing Mathlib.Algebra.Polynomial.Eval.Coeff
/-
simp took 53.5ms

[reduction] unfolded declarations (max: 30, num: 3): ▶
[reduction] unfolded instances (max: 264, num: 15): ▶
[reduction] unfolded reducible declarations (max: 106, num: 2): ▶
[type_class] used instances (max: 49, num: 2): ▶
[def_eq] heuristic for solving `f a =?= f b` (max: 35, num: 1): ▶
use `set_option diagnostics.threshold <num>` to control threshold for reporting counters
-/

Floris van Doorn (Dec 06 2024 at 10:42):

I expect there are some simp-lemmas/instances that are problematic that we might want to get rid off. Anyone wants to try to figure out the problematic one(s)?

Floris van Doorn (Dec 06 2024 at 10:43):

(after squeezing the simp when importing all of Mathlib, it is quick: 34.5ms)

Henrik Böving (Dec 06 2024 at 10:57):

It's spending a lot of time solving this defeq instance:

instHSMul =?= instHSMul

where the left side is:

@instHSMul  [X] SMulZeroClass.toSMul : HSMul  [X] [X]

and the right side is:

@instHSMul  [X] instZSMul : HSMul  [X] [X]

Daniel Weber (Dec 06 2024 at 11:15):

The output of trace.Meta.Tactic.simp is almost identical except for

[Meta.Tactic.simp.unify] Algebra.id.smul_eq_mul:1000, failed to unify
      ?x  ?y
    with
      2  X
[Meta.Tactic.simp.unify] smul_X:1000, failed to unify
      ?m  X
    with
      2  X

so either the problem is there or that's not the problem

Daniel Weber (Dec 06 2024 at 11:19):

trace.Meta.synthInstance shows it fails to synthesize MulSemiringAction ℤ ℤ (docs#MulSemiringAction) many times (although that's cached)

Daniel Weber (Dec 06 2024 at 11:22):

Squeezing and adding smul_X it still takes a lot of time, so that must be part of the problem

Henrik Böving (Dec 06 2024 at 11:23):

You can run set_option trace.profiler true to see that almost all of the time is spent in the defeq problem I reportd above.

Eric Wieser (Dec 06 2024 at 11:23):

Floris, are your comments backwards above?

Eric Wieser (Dec 06 2024 at 11:24):

Or does importing more things really make it faster, not slower?

Daniel Weber (Dec 06 2024 at 11:27):

Daniel Weber said:

Squeezing and adding smul_X it still takes a lot of time, so that must be part of the problem

import Mathlib

open Polynomial

example : (X ^ 3 + (2 : )  X - 1 : Polynomial ).eval 2 = 11 := by
  rw [smul_X]

takes approximately the same time to fail, and

import Mathlib

open Polynomial
set_option trace.profiler true

example : (2 : )  X = (X : [X]) := by
  apply smul_X

causes a timeout

Johan Commelin (Dec 06 2024 at 13:14):

Eric Wieser said:

Floris, are your comments backwards above?

They seem fine to me.

Daniel Weber (Dec 07 2024 at 06:20):

If I understand correct there is an unresolved @MulSemiringAction.toDistribMulAction ℤ ℤ (?m.11062 : Monoid ℤ) Int.instSemiring (?m.11065 : MulSemiringAction ℤ ℤ) in the rhs of the defeq, and after failing to synthesize MulSemiringAction ℤ ℤ it attempts to fill it by unification.

Daniel Weber (Dec 07 2024 at 06:32):

Henrik Böving said:

It's spending a lot of time solving this defeq instance:

instHSMul =?= instHSMul

where the left side is:

@instHSMul  [X] SMulZeroClass.toSMul : HSMul  [X] [X]

and the right side is:

@instHSMul  [X] instZSMul : HSMul  [X] [X]

I think there is also exponential blow-up due to the heuristic

3- If t and s can be unfolded and they have the same head symbol, then
a) First try to solve unification by unifying arguments.
b) If it fails, unfold both and continue.
Implemented by unfoldBothDefEq

in defeq checking, with e.g. @SMul.smul ℤ ℤ[X] instZSMul =?= @SMul.smul ℤ ℤ[X] SMulZeroClass.toSMul, as first it tries to compare the arguments (instZSMul =?= SMulZeroClass.toSMul), which after unfolding stuff means comparing { smul := foo } =?= { smul := bar }, so it compares foo =?= bar, and fails.
Then it tries to first unfold SMul.smul and continue, so it gets instZSMul.1 =?= SMulZeroClass.toSMul.1, which eventually unfolds to foo =?= bar, and it tries to solve that again. I'm not sure why the cache doesn't trigger here, though

Daniel Weber (Dec 07 2024 at 06:38):

Daniel Weber said:

If I understand correct there is an unresolved @MulSemiringAction.toDistribMulAction ℤ ℤ (?m.11062 : Monoid ℤ) Int.instSemiring (?m.11065 : MulSemiringAction ℤ ℤ) in the rhs of the defeq, and after failing to synthesize MulSemiringAction ℤ ℤ it attempts to fill it by unification.

Perhaps the metavariable for the Monoid ℤ (which I think gets assigned eventually), causes the cache to not trigger here?

Kevin Buzzard (Dec 07 2024 at 23:41):

The offending simp lemma is docs#Polynomial.smul_X (edit: as I now notice was pointed out above by Daniel, although not emphasized). As we know, this is slow:

set_option trace.Meta.Tactic.simp true in
set_option trace.profiler true in
example : (X ^ 3 + (2 : )  X - 1 : Polynomial ).eval 2 = 11 := by
  simp

and the trace says

  [Meta.isDefEq] [5.211678] ❌️ ?m  X =?= 2  X 
  [Meta.Tactic.simp.unify] smul_X:1000, failed to unify
      ?m  X
    with
      2  X

I have even less understanding than others above about why we're getting that ❌️; the typeclass problem quickly turns into @instHSMul ℤ ℤ[X] SMulZeroClass.toSMul =?= @instHSMul ℤ ℤ[X] instZSMul and, as has been already pointed out, the proof is rfl. Is it a reducibility issue? But now we see in the trace why Lean is even trying to solve this, and indeed

import Mathlib

open Polynomial

attribute [-simp] Polynomial.smul_X

example : (X ^ 3 + (2 : )  X - 1 : Polynomial ).eval 2 = 11 := by
  simp

is fast again.

Eric Wieser (Dec 08 2024 at 03:33):

Kevin Buzzard said:

as has been already pointed out, the proof is rfl.

This wasn't my reading at all; the LHS has a metavariable that is mathematically (or at the very least, canonically) false, so the unification is impossible

Daniel Weber (Dec 08 2024 at 05:21):

Daniel Weber said:

I think there is also exponential blow-up due to the heuristic

3- If t and s can be unfolded and they have the same head symbol, then
a) First try to solve unification by unifying arguments.
b) If it fails, unfold both and continue.
Implemented by unfoldBothDefEq

in defeq checking, with e.g. @SMul.smul ℤ ℤ[X] instZSMul =?= @SMul.smul ℤ ℤ[X] SMulZeroClass.toSMul, as first it tries to compare the arguments (instZSMul =?= SMulZeroClass.toSMul), which after unfolding stuff means comparing { smul := foo } =?= { smul := bar }, so it compares foo =?= bar, and fails.
Then it tries to first unfold SMul.smul and continue, so it gets instZSMul.1 =?= SMulZeroClass.toSMul.1, which eventually unfolds to foo =?= bar, and it tries to solve that again. I'm not sure why the cache doesn't trigger here, though

In lean4#6335 I attempted to tweak the heuristic to avoid this. How can this be benchmarked with Mathlib? I attempted to branch off nightly-with-mathlib, but I don't think it worked correctly

Kim Morrison (Dec 08 2024 at 08:59):

@Daniel Weber, to get a benchmark, you would need to tweak Mathlib so it actually compiles under your change. You can do this by pushing changes to lean-pr-testing-6335 (if it is even possible, I didn't look at all at the change in lean#6335).

Daniel Weber (Dec 08 2024 at 12:06):

How do I update my local toolchain after pushing changes to the Lean PR?

Kim Morrison (Dec 09 2024 at 13:02):

@Daniel Weber:

elan toolchain uninstall $(<lean-toolchain)
lake build

Last updated: May 02 2025 at 03:31 UTC