Zulip Chat Archive

Stream: lean4

Topic: checking for reducible rfl

Kevin Buzzard (Mar 17 2024 at 15:41):

I changed the definition of an instance in mathlib to something which seems to be defeq in #11413 , but a few random files speed up (speedcenter). My understanding is that typeclass inference is super-fast for unifying "reducibly rfl" instances but can explode nowadays even with defeq instances (it starts unfolding and unfolding, and we have some pretty big structures; this is arguably a regression in the typeclass inference system from Lean 3, if my understanding is correct). In trying to understand this phenomenon better, I remembered some Lean code which someone had posted which could check whether two terms of the same typeclass were "obviously equal" for the typeclass inference algorithm but I can't find it now. Can someone show me how to check this?

The un-xy: I find my graduate students complaining to me that things are slow (integer rings are slow, tensor products are slow, Hopf algebras are slow etc etc) and I look at their traces and I see stuff like this:

            [tryResolve] [0.284857s] ✅ Coalgebra R ((A ⊗[R] B) ⊗[R] C) ≟ Coalgebra R ((A ⊗[R] B) ⊗[R] C) ▼
              [isDefEq] [0.146107s] ✅ Coalgebra R ((A ⊗[R] B) ⊗[R] C) =?= Coalgebra ?m.46814 (?m.46815 ⊗[?m.46814] ?m.46816) ▼
                [] [0.011000s] ✅ NonUnitalNonAssocSemiring.toAddCommMonoid =?= TensorProduct.addCommMonoid ▶
                [] [0.125267s] ✅ Algebra.toModule =?= TensorProduct.instModuleTensorProductToSemiringAddCommMonoid ▼
                  [] [0.125243s] ✅ Module.mk _ _ =?= TensorProduct.instModuleTensorProductToSemiringAddCommMonoid ▼
                    [] [0.125235s] ✅ Module.mk _ _ =?= TensorProduct.leftModule ▼
                      [] [0.125223s] ✅ Module.mk _ _ =?= Module.mk _ _ ▼
                        [] [0.123805s] ✅ DistribMulAction.mk _ _ =?= TensorProduct.leftDistribMulAction ▶
              [isDefEq] [0.138644s] ✅ ?m.46809 =?= Coalgebra.instTensorProduct ▶

What sin have I committed here, that typeclass inference spends an age breaking up everything into small pieces interminably before saying "oh yeah it was rfl after all"?

Eric Wieser (Mar 17 2024 at 16:32):

by with_reducible rfl is the way to check if two things are reducibly equal

Kevin Buzzard (Mar 18 2024 at 16:58):

So I am still confused. The change Eric made in #11413 is changing an instance from something to something else which is reducibly defeq to the original instance...

import Mathlib.Algebra.Group.InjSurj

variable {M₁ M₂ : Type*} [Mul M₂] [One M₂] [Pow M₂ ℕ]

open Function

-- variants of `Function.Surjective.commMonoid`, pushing the monoid axioms along a surjection

abbrev commMonoid1 [CommMonoid M₁] (f : M₁ → M₂) (hf : Surjective f) (one : f 1 = 1)
    (mul : ∀ x y, f (x * y) = f x * f y) (npow : ∀ (x) (n : ℕ), f (x ^ n) = f x ^ n) :
    CommMonoid M₂ :=
  { hf.commSemigroup f mul, hf.monoid f one mul npow with }

abbrev commMonoid2 [CommMonoid M₁] (f : M₁ → M₂) (hf : Surjective f) (one : f 1 = 1)
    (mul : ∀ x y, f (x * y) = f x * f y) (npow : ∀ (x) (n : ℕ), f (x ^ n) = f x ^ n) :
    CommMonoid M₂ :=
  { hf.monoid f one mul npow, hf.commSemigroup f mul with }

example : @commMonoid1 = @commMonoid2 := by
  with_reducible rfl -- works

...except that in the PR we use def not abbrev, and then somehow this choice (which makes things less reducibly equal) causes a 10% speedup in some random files?

My question is: why does the change from commMonoid1 to commMonoid2 ever make anything any better? What is the fundamental change we're making, which changes how typeclass inference behaves and explains the speedcenter link in the original post above?

Ruben Van de Velde (Mar 18 2024 at 17:00):

I think maybe it helps if you put them in the same order as the extends

class CommMonoid (M : Type u) extends Monoid M, CommSemigroup M

Yaël Dillies (Mar 18 2024 at 17:02):

Yes, Ruben is right

Ruben Van de Velde (Mar 18 2024 at 17:04):

Don't ask me why, though

Jireh Loreaux (Mar 18 2024 at 17:07):

The point is that it's doing less unpacking and repacking all the structures involved (since these structures have overlapping fields and there is only one "true" parent).

Jireh Loreaux (Mar 18 2024 at 17:09):

I'm fairly certain that if the structures in the extends clause were disjoint then it wouldn't make any difference.

Eric Wieser (Mar 18 2024 at 17:38):

The PR uses @[reducible] def which is basically the same as abbrev

Matthew Ballard (Mar 18 2024 at 17:50):

I left this on the closed #11368 but probably no one will ever look there (lightly edited for sane reading)

Currently the structure instance elaborator tries to populate all the fields one at a time, eta expanding any failed structure fields, and trying again.

The problem with with was that it could never match on toA directly if you had an A in the sources itself or via some projection of the sources. The reason for this was very silly: A doesn't have a self field and we search through all fields in the structure for matching.

This left the toA unfilled and resulted in the eta expansion in the next pass.

The problem with the eta expansion is that unification defaults to eta expansion when two terms differ by a constructor application. So now where there might have been one quick check you are forced to check all the fields directly. If any of this contains data, then you are much worse off.

So generally we want our terms to be as eta-reduced as possible still. Unification still takes them apart and looks at the pieces if they aren't.

So we come to the issue here: Lean 4's preferred parent implementation means that if you can supply the instance of the preferred parent you want to do so to avoid eta expansion in their flavor of inheritance.

Another change that landed with with fix was that the structure instance elaborator should now always fill the field with the first occurrence in the sources reading left to right. If the preferred parent comes second, then you are in trouble because it's very likely Lean will not use it whole and the end product will be eta expanded much more than necessary.

Here our preferred parent is Monoid. It should go first in the list of supplied sources. When it doesn't, Lean sees it has a mul from the CommSemigroup instance. This blocks elaboration from using the Monoid source since mul occurs in Monoid also. So it uses that mul to build the CommMonoid instead of using Monoid. The result is that data, the multiplication, is exposed to unification each time this function is used to construct a CommMonoid

Kevin Buzzard (Mar 18 2024 at 21:41):

So let me get this straight. Typeclass inference can explode when trying to unify two terms which are reducibly defeq? And preferred parents in mathlib's algebra hierarchy are now actually a thing that people have agreed on as opposed to a theory?

Yaël Dillies (Mar 18 2024 at 21:42):

There's two things at play here: typeclass inference exploding and spread notation exploding. The latter causes the former.

Yaël Dillies (Mar 18 2024 at 21:43):

Preferred parents are a thing for non-flat structures. We haven't really given much thought about which parents should be preferred in mathlib

Mario Carneiro (Mar 18 2024 at 22:03):

we've given thought, not sure we've put that thought into action

Matthew Ballard (Mar 19 2024 at 10:35):

Kevin Buzzard said:

So let me get this straight. Typeclass inference can explode when trying to unify two terms which are reducibly defeq? And preferred parents in mathlib's algebra hierarchy are now actually a thing that people have agreed on as opposed to a theory?

This happens independent of typeclass search. (TC synthesis just calls unifications a lot.) To my understanding, the whole point of transparency settings is to address exactly this problem. We don't want to completely unfold everything and normalize to check defeq because the terms are terribly large.

I think there has been a fair bit of thought about the choice of preferred parents in mathlib but I think we have more experience since those design decisions were made.

I know people have proposed more radical redesigns of the hierarchies motivated by this experience.

Amelia Livingston (Mar 19 2024 at 14:07):

Would this reasoning also apply here?

instance addCommMonoid : AddCommMonoid (M ⊗[R] N) :=
  { TensorProduct.addCommSemigroup _ _,
    TensorProduct.addZeroClass _ _ with
    toAddSemigroup := TensorProduct.addSemigroup _ _
    toZero := (TensorProduct.addZeroClass _ _).toZero
    nsmul := fun n v => n • v
    nsmul_zero := by simp [TensorProduct.zero_smul]
    nsmul_succ := by simp only [TensorProduct.one_smul, TensorProduct.add_smul, add_comm,
      forall_const] }

Or does the fact that toAddSemigroup and toZero are provided explicitly mean the order of the arguments before with doesn't matter?

Yaël Dillies (Mar 19 2024 at 14:08):

Providing to arguments explicitly indeed stops them from being filled in piecewise by a with clause

Matthew Ballard (Mar 19 2024 at 15:07):

Mental model for filling order:

anything :=
find it in the sources before with going left to right
try to synthesize it

I think that default values goes between 2 and 3 but would need to check the code to be sure

Jireh Loreaux (Mar 19 2024 at 15:13):

where exactly does the spread stuff fit in this model?

Yaël Dillies (Mar 19 2024 at 15:14):

It is the same as a with clause

Matthew Ballard (Mar 19 2024 at 15:14):

Having never actually looked at the spread notation, I would wildly speculate that __ := are sources and the named ones are :=

Jireh Loreaux (Mar 19 2024 at 15:16):

Yaël Dillies said:

It is the same as a with clause

I would swear it's not exactly the same, because I've had things break by switching between the two. But it's always possible I made some other mistake.

Matthew Ballard (Mar 19 2024 at 15:18):

While the ordering of the sources here is basically irrelevant since only proof fields will be used, note that the toZero := ... will make the terms more expanded than, say, building a AddMonoid and then layering on add_comm proof. Does this matter? Probably not that much here, only if Lean is fine with all the AddMonoid instances it is seeing being "clearly" the same. Just don't inline the data if it can be helped

Yaël Dillies (Mar 19 2024 at 15:20):

Jireh Loreaux said:

I would swear it's not exactly the same

The one difference I can think of is that with clauses always come last while spread clauses might be able to come before other clauses (I'm not quite sure)

Matthew Ballard (Mar 19 2024 at 15:22):

Eg, here you don't want Lean to try to compute n • v every time it is unifying two AddCommMonoid instances on the tensor product. Since this is wrapped in a call to a notation typeclass, my experience makes me think it won't if can just unify the instance names without unfold the SMul. I could be quite wrong here though

Amelia Livingston (Mar 19 2024 at 17:23):

Matthew Ballard said:

While the ordering of the sources here is basically irrelevant since only proof fields will be used, note that the toZero := ... will make the terms more expanded than, say, building a AddMonoid and then layering on add_comm proof. Does this matter? Probably not that much here, only if Lean is fine with all the AddMonoid instances it is seeing being "clearly" the same. Just don't inline the data if it can be helped

This does significantly speed up Mathlib.Algebra.Category.AlgebraCat.Monoidal, is #11505 worth existing?

Matthew Ballard (Mar 19 2024 at 17:27):

For a .25% overall improvement, I think so!

Kevin Buzzard (Mar 19 2024 at 19:13):

Oh nice find!

Amelia Livingston (Mar 19 2024 at 20:23):

I thought TensorProduct.addMonoidmight speed things up because I'd seen a trace where stuff like TensorProduct.addCommMonoid =?= AddCommGroup.toAddCommMonoid was slowish and the preferred parent of AddCommGroup is AddGroup

Amelia Livingston (Mar 19 2024 at 20:25):

On the other hand, here in

@[reducible]
def Module.compHom [Semiring S] (f : S →+* R) : Module S M :=
  { MulActionWithZero.compHom M f.toMonoidWithZeroHom, DistribMulAction.compHom M (f : S →* R) with
    add_smul := fun r s x => show f (r + s) • x = f r • x + f s • x by simp [add_smul] }

the arguments before the with are also in the non-preferred order; similarly in DistribMulAction.compHom. I tried switching them in both declarations but bench said there were no significant changes. Is that expected? For the record I want to speed up Mathlib.Algebra.Category.ModuleCat.ChangeOfRings where Module.compHom is used heavily. but I'll try looking at more traces in that file

Yaël Dillies (Mar 19 2024 at 20:28):

It highly depends on what kind of things you are checking defeq with. If they are also of the form Module.compHom, there shouldn't be much speedup

Amelia Livingston (Mar 19 2024 at 20:32):

I see, that makes sense

Kyle Miller (Mar 19 2024 at 21:09):

Jireh Loreaux said:

Yaël Dillies said:

It is the same as a with clause

I would swear it's not exactly the same, because I've had things break by switching between the two. But it's always possible I made some other mistake.

The way spread notation works is that every __ := is collected up and placed in a with clause in order. If there was already a with clause, they're inserted after the existing sources. (See Mathlib.Tactic.Spread for the macro.)

There's currently a hack I put into place to make it so every value from a __ := can be used as a local instance. This has a very uncertain future, and it was just to help smooth over some changes during a recent Lean version bump.

Minus this local instance feature of __ :=, it should be exactly the same as using a with clause.

Kevin Buzzard (Mar 20 2024 at 19:51):

I want to speed up Mathlib.Algebra.Category.ModuleCat.ChangeOfRings where Module.compHom is used heavily. but I'll try looking at more traces in that file

I'm looking at some slow declarations now and I'm not sure that it's typeclass inference any more; there are at least two non-terminal dsimps which are taking forever. For example changing the dsimp in ModuleCat.ExtendRestrictScalarsAdj.HomEquiv.evalAt (line 600 or so) to dsimp only changes the number of heartbeats from 87801 (2.6 seconds) to 13266 (0.3 seconds) on my machine.

Kevin Buzzard (Mar 20 2024 at 19:54):

Similarly ModuleCat.restrictScalarsIsEquivalenceOfRingEquiv is not slow becasue of typeclass inference, it's slow because an autoparam: aesop_cat is taking over 2 seconds to solve a goal.

Amelia Livingston (Mar 20 2024 at 19:58):

Yeah, sorry, I should've edited this post because as soon as I started trying to make an MWE of my slow use-case (extension of scalars as a lax monoidal functor) by moving it into its own file on current mathlib, it went from taking 40 seconds to compile to about 4 :upside_down: so I am no longer bothered about the ChangeOfRings file :upside_down:

Matthew Ballard (Mar 20 2024 at 20:02):

Yes, inserting quick clean up dsimp calls can make a big difference and the autoparams for CategoryTheory structures can awfully expensive.

Kevin Buzzard (Mar 20 2024 at 20:03):

The current definition of restrictScalarsIsEquivalenceOfRingEquiv is the same as this:

instance restrictScalarsIsEquivalenceOfRingEquiv {R S} [Ring R] [Ring S] (e : R ≃+* S) :
    IsEquivalence (ModuleCat.restrictScalars e.toRingHom) where
  inverse := ModuleCat.restrictScalars e.symm
  unitIso := NatIso.ofComponents (fun M ↦ LinearEquiv.toModuleIso'
    { __ := AddEquiv.refl M
      map_smul' := fun s m ↦ congr_arg (· • m) (e.right_inv s).symm }) (by aesop_cat)
  counitIso := NatIso.ofComponents (fun M ↦ LinearEquiv.toModuleIso'
    { __ := AddEquiv.refl M
      map_smul' := fun r m ↦ congr_arg (· • (_ : M)) (e.left_inv r)}) (by aesop_cat)
  functor_unitIso_comp := (by aesop_cat)

where I've added in the three by aesop_cats which are happening by autoparams. It's 107K heartbeats; if you sorry them all instead it's 20K heartbeats.

Matthew Ballard (Mar 20 2024 at 20:07):

I once replaced something like this with rfl and it was glorious. Probably doesn’t apply here.

Kevin Buzzard (Mar 20 2024 at 20:20):

they're all intros; rfl in fact:

count_heartbeats in
instance restrictScalarsIsEquivalenceOfRingEquiv {R S} [Ring R] [Ring S] (e : R ≃+* S) :
    IsEquivalence (ModuleCat.restrictScalars e.toRingHom) where
  inverse := ModuleCat.restrictScalars e.symm
  unitIso := NatIso.ofComponents (fun M ↦ LinearEquiv.toModuleIso'
    { __ := AddEquiv.refl M
      map_smul' := fun s m ↦ congr_arg (· • m) (e.right_inv s).symm }) (by intros; rfl)
  counitIso := NatIso.ofComponents (fun M ↦ LinearEquiv.toModuleIso'
    { __ := AddEquiv.refl M
      map_smul' := fun r m ↦ congr_arg (· • (_ : M)) (e.left_inv r)}) (by intros; rfl)
  functor_unitIso_comp := (by intros; rfl)

is 21K down from 107K.

Matthew Ballard (Mar 20 2024 at 20:23):

Glorious

Matthew Ballard (Mar 20 2024 at 20:25):

I think we need an issue for aesop performance vs fun _ => rfl if this is the second incident

Matthew Ballard (Mar 20 2024 at 20:30):

We could also adjust the auto-param to by try (intros; rfl); aesop_cat

Kevin Buzzard (Mar 20 2024 at 20:58):

I'm making a speed-up PR. Here's another reason this file is slow: we keep having to do this:

                                                    [Meta.synthInstance] [0.061262s] ✅ HSMul S ↑((ModuleCat.restrictScalars f).obj Y) ↑((ModuleCat.restrictScalars f).obj Y) ▶
                                                    [Meta.synthInstance] [0.038096s] ✅ HSMul S ↑((ModuleCat.restrictScalars f).obj Y) ↑((ModuleCat.restrictScalars f).obj Y) ▶
                                                    [Meta.synthInstance] [0.037499s] ✅ HSMul S ↑((ModuleCat.restrictScalars f).obj Y) ↑((ModuleCat.restrictScalars f).obj Y) ▶

We just did one computation three times and it took 0.12 seconds; we do this computation a lot of times in the file.

Kevin Buzzard (Mar 26 2024 at 12:11):

I'm sorry I'm so slow to understand all of this.

I'm currently looking at a trace which contains this (on a fast machine):

[Meta.synthInstance] [0.220873s] ✅ OfNat (↥(𝓞 K) ⧸ Ideal.span {IsPrimitiveRoot.toInteger hζ - 1}) 0

So that says "for something which is obviously a ring (being a quotient of a ring by an ideal), Lean takes over 0.2 seconds to check that it has a zero". So there's presumably something wrong (and a bunch of professional mathematicians trying to do something completely mathematically innocuous at a conference in Luminy are struggling as a result). Digging a little further I get

 [] [0.141052s] ✅ Module ↥(𝓞 K) ↥(𝓞 K) =?= Module ↥(𝓞 K) ↥(𝓞 K)

so most of the time is spent checking that two module structures are defeq. The module structures aren't syntactically equal; one is

@Module (↥(𝓞 K)) (↥(𝓞 K)) Ring.toSemiring AddCommGroup.toAddCommMonoid : Type u_1

and the other is

@Module (↥(𝓞 K)) (↥(𝓞 K)) (Subalgebra.toSemiring (𝓞 K)) NonUnitalNonAssocSemiring.toAddCommMonoid : Type u_1

The next line of the trace is

 [] [0.134164s] ✅ AddCommGroup.toAddCommMonoid =?= NonUnitalNonAssocSemiring.toAddCommMonoid

and then typeclass inference goes haywire,

[] ✅ apply @DenselyNormedField.toNontriviallyNormedField to NontriviallyNormedField ↥(𝓞 K)

etc etc (over 500 lines of nonsense scanning many irrelevant instances).

So I'm guessing the problem is that typeclass inference is refusing to believe that the two module structures are defeq. However

example : @Module (↥(𝓞 K)) (↥(𝓞 K)) Ring.toSemiring AddCommGroup.toAddCommMonoid =
          @Module (↥(𝓞 K)) (↥(𝓞 K)) (Subalgebra.toSemiring (𝓞 K)) NonUnitalNonAssocSemiring.toAddCommMonoid := by with_reducible rfl

works fine.

So is with_reducible rfl just a red herring?

Kevin Buzzard (Mar 26 2024 at 12:25):

Here's another one:

--  [] [0.132849s] ✅ 1 =?= 1
--    [] [0.132785s] ✅ One.toOfNat1 =?= One.toOfNat1
example :
    (@One.toOfNat1 (↥(𝓞 K) ⧸ Ideal.span {λ}) (Ideal.Quotient.one (Ideal.span {λ})) : OfNat (↥(𝓞 K) ⧸ Ideal.span {λ}) 1) =
    (@One.toOfNat1 (↥(𝓞 K) ⧸ Ideal.span {λ}) AddMonoidWithOne.toOne                : OfNat (↥(𝓞 K) ⧸ Ideal.span {λ}) 1) := by
  with_reducible rfl -- works fine

This is all coming from the far-too-slow-for-my-liking https://github.com/riccardobrasca/flt3

Kevin Buzzard (Mar 26 2024 at 12:36):

After more experiments, I think this all boils down to the issue (which I can't currently find) that typeclass inference checks the newest instances first. For example

                [] new goal Neg (↥(𝓞 K) ⧸ Ideal.span {IsPrimitiveRoot.toInteger hζ - 1}) ▼
                  [instances] #[@InvolutiveNeg.toNeg, @SubNegMonoid.toNeg, @NegZeroClass.toNeg, @AddGroupWithOne.toNeg, @Ring.toNeg, @LinearOrderedAddCommGroupWithTop.toNeg]

This Neg typeclass is just a simple thing coming from ring theory, but the Lean happened to last import some exotic thing from order theory which applies, so Lean now just goes down some rabbithole looking at linearly ordered stuff first, before finding its way to the obvious instance :-(

Matthew Ballard (Mar 26 2024 at 12:41):

Reducible rfl is no guarantee of performance. Though making things more opaque is a valid way to handle performance issues

Matthew Ballard (Mar 26 2024 at 12:42):

“Lean don’t look there”

Matthew Ballard (Mar 26 2024 at 12:45):

Performance is about how much data Lean has to check is the same. Ideally that would be none. But of course we can’t guarantee that.

A hygienic rule is to make sure all data is separated from other data in a term so Lean can at least ask itself if it needs to unfold that or is fine without it.

Matthew Ballard (Mar 26 2024 at 12:50):

I think here you have death by 1000 cuts where there are many instances to check against. In each one Lean has to defeq of some data (like Neg) which takes a small but nontrivial amount of time. Then it adds up.

I would ask: is Neg exposed wherever it is declared?

Kevin Buzzard (Mar 26 2024 at 13:04):

I'm just increasing the priority of all the instances relevant to the project in order to work around this right now.

Matthew Ballard (Mar 26 2024 at 13:06):

Under the things that were right under my face but only just realized: projections from class parents are NOT reducible. This is why we have with_reducible_and_instances. So breaking out parent classes is more opaque than parent structures which have reducible projections.

Matthew Ballard (Mar 26 2024 at 13:12):

Why not demote the bad ones to make everyone’s life better? I don’t think most users are making a LinearOrderedAddCommGroupWithTop because they need that Neg

Kevin Buzzard (Mar 26 2024 at 13:14):

yes demoting the bad instances is a much better choice!

Matthew Ballard (Mar 26 2024 at 13:57):

I made an issue for mathlib #11692 so you can use this tag for each change. It should be linked to the existing lean issue. If things change, we can quickly find all the changes by grepping

Last updated: May 02 2025 at 03:31 UTC