Zulip Chat Archive

Stream: mathlib4

Topic: performance example with type-class inference

Floris van Doorn (Feb 19 2025 at 15:15):

I have two very similar examples that have very different type-class inference timings. The example depends on Mathlib (revision ebbd72bd24186186016706e3716518fbc7aba085 from earlier this month, but I expect that doesn't matter much), but I think there might be general type-class inference lessons, so I'm posting it here.

import Mathlib.Data.Real.Basic

set_option profiler true
set_option profiler.threshold 10
-- set_option trace.Meta.synthInstance true
-- set_option trace.profiler true

-- 40-80ms (high variance)
example {a b c : ℝ} {n : ℤ} (hc : 0 < c) (h : a ≤ b) : a * c ^ n ≤ b * c ^ n := by
  let _ : MulPosReflectLE ℝ := inferInstance -- typeclass inference of MulPosReflectLE took 15.7ms
  rw [mul_le_mul_right (zpow_pos hc _)]
  exact h

-- 376ms
example {a b c : ℝ} {n : ℤ} (hc : 0 < c) (h : a ≤ b) : a * c ^ n ≤ b * c ^ n := by
  rw [mul_le_mul_right (zpow_pos hc _)] -- typeclass inference of MulPosReflectLE took 266ms
  exact h

I did some investigation: The slow instances seem to be docs#LinearOrderedCommGroupWithZero.toMulPosReflectLE and docs#LinearOrderedCommGroupWithZero.toPosMulReflectLE. The difference between the two examples seems to be that in the first example, the arguments of these instances are tried to be synthesized first, which immediately results in a (cached) failure. In the second example, unification is attempted first, which takes significant time to fail.

Is there a lesson here about improving the type-class inference algorithm? (probably not, because it's so complicated) (but cc @Sebastian Ullrich anyway)

I think there are a few Mathlib lessons here:

The fact that classes like docs#PosMulReflectLE and docs#MulPosReflectLE are abbreviations is insane, and they should be separate classes. It takes Lean >100ms to figure out (after unfolding them) that the lemma LinearOrderedCommGroupWithZero.toPosMulReflectLE doesn't apply to the goal, which is about a different class!
We should probably make these two instances (that only apply for types like NNReal) have low priority compared to lemmas that apply to linearly ordered rings.

Floris van Doorn (Feb 19 2025 at 15:17):

I guess #13124 is related

Floris van Doorn (Feb 19 2025 at 15:18):

@Yuyang Zhao what is the status of that PR?

Floris van Doorn (Feb 19 2025 at 15:41):

I think there are some Lean lessons as well, but I understand that any change to elaboration/type-class inference is very hard to do.

In the second example, type-class inference tries the two mentioned instances, and first tries to unify the conclusion of the instance with the class to be synthesized. The problem is that the instance contains an instance argument (of type LinearOrderedCommGroupWithZero ℝ), and this argument is still a metavariable. This makes unification really hard, and causes many nested type-class inference searches (according to the diagnostics LinearOrderedCommGroupWithZero.toLinearOrderedCommMonoidWithZero is unfolded 10286 times, so I guess roughly that many times).
Ideally, during type-class inference of class A, while trying instance an instance I that has other type-class argument occurring in its conclusion B, we try to synthesize the arguments of I before an expensive unification of A with B (although I guess you have to do some unification to figure out the implicit arguments of I, so this sounds pretty hard).

Floris van Doorn (Feb 19 2025 at 15:43):

Another thing I noticed in the trace is that certain unification problems seem to be tried twice:

                              [] [0.168624] ❌️ 0 < x✝ =?= 0 < x✝ ▼
                                [] [0.084345] ❌️ 0 =?= 0 ▶
                                [synthInstance] [0.000002] ❌️ LinearOrderedCommGroupWithZero ℝ ▶
                                [] [0.084215] ❌️ 0 < x✝ =?= 0 < x✝ ▼
                                  [] [0.084064] ❌️ 0 =?= 0 ▶

where, as far as I can tell, lines 1&4 and lines 2&5 have the same implicit arguments and metavariable numbers. But this might have a genuine reason.

Floris van Doorn (Feb 19 2025 at 16:01):

Attempting a small change in #22086

Floris van Doorn (Feb 19 2025 at 18:14):

#22090 gives (small) performance improvements (not sure if the decrease in total build time is statistically significant).

Floris van Doorn (Feb 19 2025 at 18:16):

The -0.851 % in type class inference time is probably significant. It's a very good speedup, considering I'm only touching a tiny fraction of the classes.

Michael Stoll (Feb 19 2025 at 18:18):

Yes; this is definitely a meaningful gain (and nothing got slower).

Jovan Gerbscheid (Feb 19 2025 at 22:29):

The seemingly duplicate unification problem can be seen to not be duplicate if you hover over the typeclass argument. The first one uses Real.instPreorder.toLT, and this is unfolded into the same unification problem, but directly using Real.instLT.

Jovan Gerbscheid (Feb 19 2025 at 22:37):

But the 0 =?= 0 unification problems are genuinely the same, and this isn't caught by the caching in isDefEq :(

Last updated: May 02 2025 at 03:31 UTC