Zulip Chat Archive

Stream: lean4

Topic: Instance synth cache does not use reducible defeqs

Anne Baanen (Jan 22 2026 at 13:44):

Consider a setup like the following:

class A (α : Type) where
  x : α -- To make sure we have nontrivial equalities

class B (α : Type) extends A α

class C (α : Type) [A α] where
  prop : False

class D (α : Type) [A α] extends C α

instance (α : Type) [B α] [C α] : D α where

In other words, B is stronger than A and D is stronger than C, except C is equivalent to D when B holds. In Mathlib we have for example A = Monoid, B = Group, C = IsOrderedMonoid and D = IsOrderedCancelMonoid.

Now, take a type α for which B α holds but on which C does not hold. In the Mathlib example, α = Fin n. Searching for an instance of C α checks if there is a D α, and since we have B α, it loops back to C α. Now tabled instance synthesis correctly realizes that C α is already being searched for, and does not loop forever. This is all expected so far:

section

-- Defining a single instance
instance : B (Fin 1) where
  x := 0

set_option trace.Meta.synthInstance true
#synth C (Fin 1)
/-
[Meta.synthInstance] ❌️ C (Fin 1) ▼
  [] new goal C (Fin 1) ▶
  [] ✅️ apply @D.toC to C (Fin 1) ▶
  [] ✅️ apply instDOfC to D (Fin 1) ▶
  [] result <not-available>
-/

end

But now we have a complication: the instance of B α is built in two steps. First we show A α and then B α. Suddenly the search loops once:

section

-- Defining two separate instances
instance : A (Fin 2) where
  x := 0

instance : B (Fin 2) where
  x := 0

-- Make sure the two instances are equal, modulo instance transparency.
example : B.toA = instAFinOfNatNat := by
  fail_if_success with_reducible rfl
  with_reducible_and_instances rfl

set_option trace.Meta.synthInstance true
#synth C (Fin 2)
/-
[Meta.synthInstance] ❌️ C (Fin 2) ▼
  [] new goal C (Fin 2) ▶
  [] ✅️ apply @D.toC to C (Fin 2) ▶
  [] ✅️ apply instDOfC to D (Fin 2) ▶
  [] ✅️ apply @D.toC to C (Fin 2) ▶
  [] ✅️ apply instDOfC to D (Fin 2) ▶
  [] result <not-available>
-/

end

It turns out the first goal C (Fin 2) is more precisely @C (Fin 2) instAFinOfNatNat : Prop and the second goal C (Fin 2) is @C (Fin 2) instBFinOfNatNat_1.toA : Prop. These are apparently considered different by the tabled algorithm, explaining why we loop once.

The same happens if we are careful to make our instances reducibly defeq:

section

-- Defining the instance and a shortcut (as an abbrev)
instance : B (Fin 3) where
  x := 0

@[instance] abbrev instAFinThree : A (Fin 3) := inferInstance

-- Now they are really reducibly defeq!
example : B.toA = instAFinThree := by
  with_reducible rfl

set_option trace.Meta.synthInstance true
#synth C (Fin 3)
/-
[Meta.synthInstance] ❌️ C (Fin 3) ▼
  [] new goal C (Fin 3) ▶
  [] ✅️ apply @D.toC to C (Fin 3) ▶
  [] ✅️ apply instDOfC to D (Fin 3) ▶
  [] ✅️ apply @D.toC to C (Fin 3) ▶
  [] ✅️ apply instDOfC to D (Fin 3) ▶
  [] result <not-available>

-/

end

Oops, we still loop!

This turns out to cause actual performance regressions in Mathlib, for example #32828 where grind and lia on a goal including Fin take 50% longer, since the PR adds a new instance that makes Lean traverse the AddCancelMonoid hierarchy three times instead of twice(!).

Anne Baanen (Jan 22 2026 at 13:46):

Two questions:

Is it intentional that tabled typeclass resolution does not look at definitional equalities to prevent loops?
Does this behaviour mean "shortcut" instances (like @[instance] abbrev instAFinThree : A (Fin 3) := inferInstance in the example) are actually an anti-pattern?

Jovan Gerbscheid (Jan 22 2026 at 14:18):

The table in tabled type class resolution is a Std.HashMap, which means that looking up in it cannot do any unification. It would be a lot less efficient to use pairwise isDefEq as a lookup mechanism.

I could imagine a mechanism where the parts of the key that are instances are removed, so that that we can match C (Fin 3) with C (Fin 3) even if the argument instances are the same, and then there would need to be an isDefEq check to verify that the instances are indeed defEq. But I don't know how badly this would affect performance with the extra isDefEq.

Sébastien Gouëzel (Jan 22 2026 at 14:30):

Wouldn't it make sense to not even put the isDefEq check because the standing assumption is that instances are unique -- and if in the end it builds an instance which is not correct then error out?

Jovan Gerbscheid (Jan 22 2026 at 14:58):

It just so happens to be the case for other reasons that Meta.check is always run on the output of type class search, so with that approach it will be easy to catch type-incorrect terms.

Junyan Xu (Jan 22 2026 at 22:43):

I am also seeing some isDefEq checks on #31603 that make things slower compared to master:
image.png
and adding an shortcut instance doesn't help.
On master ✅️ Algebra K[X] (RatFunc K) takes 49392 rather than 266803.

Jovan Gerbscheid (Jan 22 2026 at 22:52):

@Junyan Xu, I think that's not directly related to the problem at hand (and it is to be expected that type class search and/or unification can get slower if you make things reducible).

Jovan Gerbscheid (Jan 22 2026 at 22:53):

I remember a long time ago noticing that Lean's type class search can loop forever even though it has this loop protection. Here is an example that I came up with now:

class A (α : Type) : Prop where
class A' (α : Type) : Prop where

class B (α : Type) [A α] : Prop where
class B' (α : Type) [A' α] : Prop where

variable {α : Type}

instance [A α] : A' α where
instance [A' α] : A α where

instance [A' α] [B α] : B' α where
instance [A α] [B' α] : B α where

variable [A α]

#synth B α -- (deterministic) timeout

Kevin Buzzard (Jan 22 2026 at 23:04):

Junyan Xu said:

On master ✅️ Algebra K[X] (RatFunc K) takes 49392 rather than 266803.

(I get [231930.000000] #synth Algebra K[X] (RatFunc K) on master but we should take this elsewhere)

Junyan Xu (Jan 22 2026 at 23:16):

266803-231930 is the slight speedup from adding priority := high to all these shortcut instances.

Junyan Xu (Jan 22 2026 at 23:18):

I'm just echoing Sébastien's question whether we can delay these defeq checks until the kernel checks the fully elaborated term; they seem to be the main reason for the slowdown.

Anne Baanen (Jan 28 2026 at 11:32):

I tried out Jovan's suggestion of erasing instances from the keys, with mixed results: https://radar.lean-lang.org/repos/mathlib4-nightly-testing/commits/5927bc4a620e2fb2a216c0cc39b0798ce7b1b587?reference=580d15da0f736f4db6eb7d7b6266f02e524d99f7&s=^build/module.*//instructions

We get about 0.1% slowdown over all of Mathlib, with some files noticeably slower and other files noticeably faster. Typeclass inference overall slows down by 1.1%, but on the other hand, grind typeclass inference speeds up by 3.6%. I'll try and think about different ways to modify the keys that may be more performant. Otherwise the fix would be redesigning some parts of the hierarchy in Mathlib.

Last updated: Feb 28 2026 at 14:05 UTC