Zulip Chat Archive

Stream: mathlib4

Topic: very slow instance synthesis

David Loeffler (Apr 06 2024 at 15:25):

The following takes something like five whole seconds to run:

import Mathlib

noncomputable example : FiniteDimensional ℝ (ℂ × ℂ →L[ℂ] ℂ) := by
  infer_instance

Any thoughts on what is making this so slow?

Eric Wieser (Apr 06 2024 at 15:43):

For me, just over half of the time spent is in the statement

Eric Wieser (Apr 06 2024 at 15:45):

It takes most of the actual TC time working on ContinuousLinearMap.module =?= Module.complexToReal (ℂ × ℂ →L[ℂ] ℂ)

David Loeffler (Apr 06 2024 at 16:18):

It also seems to become slower if you import too much of Mathlib. With a minimal set of imports, namely

import Mathlib.Analysis.Complex.Basic
import Mathlib.Topology.Algebra.Module.FiniteDimension

it takes about 1.9sec for me; but with import Mathlib, it is more than twice that.

Running with all of mathlib imported, the output of #minimize_imports has a highly suspect entry, import Mathlib.RingTheory.HopfAlgebra. It looks like whatever rabbit hole instance-synthesis goes down involves visiting the instance CommSemiring.toHopfAlgebra ("every commutative semiring is a Hopf algebra over itself") somewhere along the way. But disabling this instance doesn't make the example significantly faster.

Kevin Buzzard (Apr 06 2024 at 23:11):

On my fast machine, with all of mathlib imported. I get 0.61 seconds for FiniteDimensional ℝ (ℂ × ℂ →L[ℂ] ℂ) to be interpreted as a Prop (i.e. checking the statement makes sense), and then 0.50 seconds for infer_instance to work (the proof). Let's switch on set_option trace.profiler true and set_option trace.Meta.synthInstance true to see what's going on.

Most of the pain in the FiniteDimensional issue is this. The question is this:

      [] new goal Module ℝ (ℂ × ℂ →L[ℂ] ℂ) ▼

Lean can see actions of ℝ on ℂ and of ℂ on ℂ, and to make the algebra work it needs to check that they commute.

[] new goal SMulCommClass ℂ ℝ ℂ ▼
          [instances] #[@IsScalarTower.to_smulCommClass, @IsScalarTower.to_smulCommClass', @ModuleCat.sMulCommClass_mk, @Algebra.to_smulCommClass, smulCommClass_self, @Complex.instSMulCommClassComplexInstSMulRealComplexInstSMulRealComplex]

Unfortunately, Lean is going to work through that list backwards (the right answer is @IsScalarTower.to_smulCommClass', the second one). The philosophy is that it doesn't really matter what order we work through things because hopefully stuff that fails is going to fail quickly. However this is not the case here:

[] [0.400800s] ❌ apply @Complex.instSMulCommClassComplexInstSMulRealComplexInstSMulRealComplex to SMulCommClass ℂ ℝ ℂ

This instance says "if R and S act on the reals and the actions commute, then they can be viewed as acting on the complexes and the actions also commute (because the complexes is $\R^2$ )". So now we are led to this:

[synthInstance] [0.398565s] ❌ SMul ℂ ℝ

The trace for this failing instance synthesis is huge: nearly 2000 applications of typeclass instances to go on a wild goose chase. If you turn off instance tracing then it looks like this (although I'm a bit confused about why the numbers don't add up):

[synthInstance] [0.339037s] ❌ SMul ℂ ℝ ▼
                                                                    [] [0.030130s] ❌ apply @DirectSum.GradeZero.semiring to Semiring (Submodule ℂ ℂ) ▶
                                                                    [] [0.016200s] ✅ apply @Submodule.idemSemiring to IdemSemiring (Submodule ℂ ℂ) ▶
                                                                    [] [0.015283s] ✅ apply @Submodule.instIdemCommSemiringSubmoduleToSemiringToAddCommMonoidToNonUnitalNonAssocSemiringToNonAssocSemiringToSemiringToModule to IdemCommSemiring
                                                                          (Submodule ℂ ℂ) ▶
                                                                    [] [0.028142s] ❌ apply @DirectSum.GradeZero.commSemiring to CommSemiring (Submodule ℂ ℂ) ▶
                                                                    [] [0.022209s] ❌ apply @DirectSum.GradeZero.semiring to Semiring (Submodule ℝ ℝ) ▶
                                                                    [] [0.020841s] ❌ apply @DirectSum.GradeZero.commSemiring to CommSemiring (Submodule ℝ ℝ) ▶

but I do know that typeclass inference is going nuts behind the scenes.

So the tl;dr summary for the slowness in the statement is:

set_option trace.profiler true in -- [0.426561s] on a fast machine
#synth SMul ℂ ℝ -- fails

and a dirty hack which speeds things up (by convincing Lean not to try to prove this at all) is

attribute [instance 1] Complex.instSMulCommClassComplexInstSMulRealComplexInstSMulRealComplex
  Complex.instIsScalarTowerComplexInstSMulRealComplexInstSMulRealComplex

which ensures that Lean doesn't go down that route.

Eric Wieser (Apr 06 2024 at 23:23):

(docs#Complex.instSMulCommClassComplexInstSMulRealComplexInstSMulRealComplex)

Eric Wieser (Apr 06 2024 at 23:27):

This looks like the case I was afraid of; the GradeZero instances for direct sums are bad, and we should probably scope them

Eric Wieser (Apr 06 2024 at 23:27):

(They're supposed to match elements A 0 of some family, but this doesn't make sense because every type could potentially be the constant family )

Eric Wieser (Apr 06 2024 at 23:29):

What seems to happen is that A remains a metavariable forever, and so Lean fruitlessly unfolds everything it can reach to try and assign it

Kevin Buzzard (Apr 06 2024 at 23:30):

As for the proof, it seems to be a related problem. This time it's

set_option trace.profiler true in -- [0.503354s]
#synth SMulCommClass ℂ ℂ ℂ

which boils down to

  [Meta.synthInstance] [0.455496s] ✅ SMul ℂ ℂ ▼
    [] new goal SMul ℂ ℂ ▼
      [instances] #[Mul.toSMul, @MulAction.toSMul, @SMulZeroClass.toSMul, @Algebra.toSMul, @GradedMonoid.GradeZero.smul, @Complex.instSMulRealComplex]

and again the first choice is a disaster: Complex.instSMulRealComplex says "if a ring A acts on the reals then it acts on the complexes" so Lean is back looking for SMul ℂ ℝ again, and again the trace is 2000 lines long. A not particularly unreasonable hack to speed this up would be to make a shortcut instance:

instance : SMulCommClass ℂ ℂ ℂ := smulCommClass_self ℂ ℂ

So this is now over twice as fast, but still perhaps annoyingly slow:

import Mathlib

attribute [instance 1] Complex.instSMulCommClassComplexInstSMulRealComplexInstSMulRealComplex
  Complex.instIsScalarTowerComplexInstSMulRealComplexInstSMulRealComplex

instance : SMulCommClass ℂ ℂ ℂ := smulCommClass_self ℂ ℂ

set_option trace.profiler true -- now only [0.502854s]
noncomputable example : FiniteDimensional ℝ (ℂ × ℂ →L[ℂ] ℂ) := by
  infer_instance

Eric Wieser (Apr 06 2024 at 23:37):

Eric Wieser said:

This looks like the case I was afraid of; the GradeZero instances for direct sums are bad, and we should probably scope them

Though actually

attribute [-instance] DirectSum.GradeZero.semiring DirectSum.GradeZero.commSemiring DirectSum.GradeZero.ring DirectSum.GradeZero.commRing

doesn't help much; it just removes all the information from the trace (which is now 2 lines long, but only 25 faster)

Sébastien Gouëzel (Apr 07 2024 at 05:32):

When you see things like this, it's a good idea to open a PR lowering a little bit the priority of the bad instances, or increasing a little bit the priority of the good instances, and bench it to see if it has nasty effects elsewhere. I've done it in #11980 for the case at hand.

Sébastien Gouëzel (Apr 07 2024 at 08:20):

This looks useful (three files get a good speedup, no bad consequence), so I'm setting this as awaiting-review.

David Loeffler (Apr 07 2024 at 09:13):

That's wonderful, thanks Sebastien!

Last updated: May 02 2025 at 03:31 UTC