Zulip Chat Archive
Stream: lean4
Topic: checking for reducible rfl
Kevin Buzzard (Mar 17 2024 at 15:41):
I changed the definition of an instance in mathlib to something which seems to be defeq in #11413 , but a few random files speed up (speedcenter). My understanding is that typeclass inference is super-fast for unifying "reducibly rfl" instances but can explode nowadays even with defeq instances (it starts unfolding and unfolding, and we have some pretty big structures; this is arguably a regression in the typeclass inference system from Lean 3, if my understanding is correct). In trying to understand this phenomenon better, I remembered some Lean code which someone had posted which could check whether two terms of the same typeclass were "obviously equal" for the typeclass inference algorithm but I can't find it now. Can someone show me how to check this?
The un-xy: I find my graduate students complaining to me that things are slow (integer rings are slow, tensor products are slow, Hopf algebras are slow etc etc) and I look at their traces and I see stuff like this:
[tryResolve] [0.284857s] ✅ Coalgebra R ((A ⊗[R] B) ⊗[R] C) ≟ Coalgebra R ((A ⊗[R] B) ⊗[R] C) ▼
[isDefEq] [0.146107s] ✅ Coalgebra R ((A ⊗[R] B) ⊗[R] C) =?= Coalgebra ?m.46814 (?m.46815 ⊗[?m.46814] ?m.46816) ▼
[] [0.011000s] ✅ NonUnitalNonAssocSemiring.toAddCommMonoid =?= TensorProduct.addCommMonoid ▶
[] [0.125267s] ✅ Algebra.toModule =?= TensorProduct.instModuleTensorProductToSemiringAddCommMonoid ▼
[] [0.125243s] ✅ Module.mk _ _ =?= TensorProduct.instModuleTensorProductToSemiringAddCommMonoid ▼
[] [0.125235s] ✅ Module.mk _ _ =?= TensorProduct.leftModule ▼
[] [0.125223s] ✅ Module.mk _ _ =?= Module.mk _ _ ▼
[] [0.123805s] ✅ DistribMulAction.mk _ _ =?= TensorProduct.leftDistribMulAction ▶
[isDefEq] [0.138644s] ✅ ?m.46809 =?= Coalgebra.instTensorProduct ▶
What sin have I committed here, that typeclass inference spends an age breaking up everything into small pieces interminably before saying "oh yeah it was rfl
after all"?
Eric Wieser (Mar 17 2024 at 16:32):
by with_reducible rfl
is the way to check if two things are reducibly equal
Kevin Buzzard (Mar 18 2024 at 16:58):
So I am still confused. The change Eric made in #11413 is changing an instance from something to something else which is reducibly defeq to the original instance...
import Mathlib.Algebra.Group.InjSurj
variable {M₁ M₂ : Type*} [Mul M₂] [One M₂] [Pow M₂ ℕ]
open Function
-- variants of `Function.Surjective.commMonoid`, pushing the monoid axioms along a surjection
abbrev commMonoid1 [CommMonoid M₁] (f : M₁ → M₂) (hf : Surjective f) (one : f 1 = 1)
(mul : ∀ x y, f (x * y) = f x * f y) (npow : ∀ (x) (n : ℕ), f (x ^ n) = f x ^ n) :
CommMonoid M₂ :=
{ hf.commSemigroup f mul, hf.monoid f one mul npow with }
abbrev commMonoid2 [CommMonoid M₁] (f : M₁ → M₂) (hf : Surjective f) (one : f 1 = 1)
(mul : ∀ x y, f (x * y) = f x * f y) (npow : ∀ (x) (n : ℕ), f (x ^ n) = f x ^ n) :
CommMonoid M₂ :=
{ hf.monoid f one mul npow, hf.commSemigroup f mul with }
example : @commMonoid1 = @commMonoid2 := by
with_reducible rfl -- works
...except that in the PR we use def
not abbrev
, and then somehow this choice (which makes things less reducibly equal) causes a 10% speedup in some random files?
My question is: why does the change from commMonoid1 to commMonoid2 ever make anything any better? What is the fundamental change we're making, which changes how typeclass inference behaves and explains the speedcenter link in the original post above?
Ruben Van de Velde (Mar 18 2024 at 17:00):
I think maybe it helps if you put them in the same order as the extends
class CommMonoid (M : Type u) extends Monoid M, CommSemigroup M
Yaël Dillies (Mar 18 2024 at 17:02):
Yes, Ruben is right
Ruben Van de Velde (Mar 18 2024 at 17:04):
Don't ask me why, though
Jireh Loreaux (Mar 18 2024 at 17:07):
The point is that it's doing less unpacking and repacking all the structures involved (since these structures have overlapping fields and there is only one "true" parent).
Jireh Loreaux (Mar 18 2024 at 17:09):
I'm fairly certain that if the structures in the extends
clause were disjoint then it wouldn't make any difference.
Eric Wieser (Mar 18 2024 at 17:38):
The PR uses @[reducible] def
which is basically the same as abbrev
Matthew Ballard (Mar 18 2024 at 17:50):
I left this on the closed #11368 but probably no one will ever look there (lightly edited for sane reading)
Currently the structure instance elaborator tries to populate all the fields one at a time, eta expanding any failed structure fields, and trying again.
The problem with
with
was that it could never match ontoA
directly if you had anA
in the sources itself or via some projection of the sources. The reason for this was very silly:A
doesn't have aself
field and we search through all fields in the structure for matching.This left the
toA
unfilled and resulted in the eta expansion in the next pass.The problem with the eta expansion is that unification defaults to eta expansion when two terms differ by a constructor application. So now where there might have been one quick check you are forced to check all the fields directly. If any of this contains data, then you are much worse off.
So generally we want our terms to be as eta-reduced as possible still. Unification still takes them apart and looks at the pieces if they aren't.
So we come to the issue here: Lean 4's preferred parent implementation means that if you can supply the instance of the preferred parent you want to do so to avoid eta expansion in their flavor of inheritance.
Another change that landed with
with
fix was that the structure instance elaborator should now always fill the field with the first occurrence in the sources reading left to right. If the preferred parent comes second, then you are in trouble because it's very likely Lean will not use it whole and the end product will be eta expanded much more than necessary.Here our preferred parent is
Monoid
. It should go first in the list of supplied sources. When it doesn't, Lean sees it has amul
from theCommSemigroup
instance. This blocks elaboration from using theMonoid
source sincemul
occurs inMonoid
also. So it uses thatmul
to build theCommMonoid
instead of usingMonoid
. The result is that data, the multiplication, is exposed to unification each time this function is used to construct aCommMonoid
Kevin Buzzard (Mar 18 2024 at 21:41):
So let me get this straight. Typeclass inference can explode when trying to unify two terms which are reducibly defeq? And preferred parents in mathlib's algebra hierarchy are now actually a thing that people have agreed on as opposed to a theory?
Yaël Dillies (Mar 18 2024 at 21:42):
There's two things at play here: typeclass inference exploding and spread notation exploding. The latter causes the former.
Yaël Dillies (Mar 18 2024 at 21:43):
Preferred parents are a thing for non-flat structures. We haven't really given much thought about which parents should be preferred in mathlib
Mario Carneiro (Mar 18 2024 at 22:03):
we've given thought, not sure we've put that thought into action
Matthew Ballard (Mar 19 2024 at 10:35):
Kevin Buzzard said:
So let me get this straight. Typeclass inference can explode when trying to unify two terms which are reducibly defeq? And preferred parents in mathlib's algebra hierarchy are now actually a thing that people have agreed on as opposed to a theory?
This happens independent of typeclass search. (TC synthesis just calls unifications a lot.) To my understanding, the whole point of transparency settings is to address exactly this problem. We don't want to completely unfold everything and normalize to check defeq because the terms are terribly large.
I think there has been a fair bit of thought about the choice of preferred parents in mathlib but I think we have more experience since those design decisions were made.
I know people have proposed more radical redesigns of the hierarchies motivated by this experience.
Amelia Livingston (Mar 19 2024 at 14:07):
Would this reasoning also apply here?
instance addCommMonoid : AddCommMonoid (M ⊗[R] N) :=
{ TensorProduct.addCommSemigroup _ _,
TensorProduct.addZeroClass _ _ with
toAddSemigroup := TensorProduct.addSemigroup _ _
toZero := (TensorProduct.addZeroClass _ _).toZero
nsmul := fun n v => n • v
nsmul_zero := by simp [TensorProduct.zero_smul]
nsmul_succ := by simp only [TensorProduct.one_smul, TensorProduct.add_smul, add_comm,
forall_const] }
Or does the fact that toAddSemigroup
and toZero
are provided explicitly mean the order of the arguments before with
doesn't matter?
Yaël Dillies (Mar 19 2024 at 14:08):
Providing to
arguments explicitly indeed stops them from being filled in piecewise by a with
clause
Matthew Ballard (Mar 19 2024 at 15:07):
Mental model for filling order:
- anything
:=
- find it in the sources before
with
going left to right - try to synthesize it
I think that default values goes between 2 and 3 but would need to check the code to be sure
Jireh Loreaux (Mar 19 2024 at 15:13):
where exactly does the spread stuff fit in this model?
Yaël Dillies (Mar 19 2024 at 15:14):
It is the same as a with
clause
Matthew Ballard (Mar 19 2024 at 15:14):
Having never actually looked at the spread notation, I would wildly speculate that __ :=
are sources and the named ones are :=
Jireh Loreaux (Mar 19 2024 at 15:16):
Yaël Dillies said:
It is the same as a
with
clause
I would swear it's not exactly the same, because I've had things break by switching between the two. But it's always possible I made some other mistake.
Matthew Ballard (Mar 19 2024 at 15:18):
While the ordering of the sources here is basically irrelevant since only proof fields will be used, note that the toZero := ...
will make the terms more expanded than, say, building a AddMonoid
and then layering on add_comm
proof. Does this matter? Probably not that much here, only if Lean is fine with all the AddMonoid
instances it is seeing being "clearly" the same. Just don't inline the data if it can be helped
Yaël Dillies (Mar 19 2024 at 15:20):
Jireh Loreaux said:
I would swear it's not exactly the same
The one difference I can think of is that with
clauses always come last while spread clauses might be able to come before other clauses (I'm not quite sure)
Matthew Ballard (Mar 19 2024 at 15:22):
Eg, here you don't want Lean to try to compute n • v
every time it is unifying two AddCommMonoid
instances on the tensor product. Since this is wrapped in a call to a notation typeclass, my experience makes me think it won't if can just unify the instance names without unfold the SMul
. I could be quite wrong here though
Amelia Livingston (Mar 19 2024 at 17:23):
Matthew Ballard said:
While the ordering of the sources here is basically irrelevant since only proof fields will be used, note that the
toZero := ...
will make the terms more expanded than, say, building aAddMonoid
and then layering onadd_comm
proof. Does this matter? Probably not that much here, only if Lean is fine with all theAddMonoid
instances it is seeing being "clearly" the same. Just don't inline the data if it can be helped
This does significantly speed up Mathlib.Algebra.Category.AlgebraCat.Monoidal
, is #11505 worth existing?
Matthew Ballard (Mar 19 2024 at 17:27):
For a .25% overall improvement, I think so!
Kevin Buzzard (Mar 19 2024 at 19:13):
Oh nice find!
Amelia Livingston (Mar 19 2024 at 20:23):
I thought TensorProduct.addMonoid
might speed things up because I'd seen a trace where stuff like TensorProduct.addCommMonoid =?= AddCommGroup.toAddCommMonoid
was slowish and the preferred parent of AddCommGroup
is AddGroup
Amelia Livingston (Mar 19 2024 at 20:25):
On the other hand, here in
@[reducible]
def Module.compHom [Semiring S] (f : S →+* R) : Module S M :=
{ MulActionWithZero.compHom M f.toMonoidWithZeroHom, DistribMulAction.compHom M (f : S →* R) with
add_smul := fun r s x => show f (r + s) • x = f r • x + f s • x by simp [add_smul] }
the arguments before the with
are also in the non-preferred order; similarly in DistribMulAction.compHom
. I tried switching them in both declarations but bench said there were no significant changes. Is that expected? For the record I want to speed up Mathlib.Algebra.Category.ModuleCat.ChangeOfRings
where Module.compHom
is used heavily. but I'll try looking at more traces in that file
Yaël Dillies (Mar 19 2024 at 20:28):
It highly depends on what kind of things you are checking defeq with. If they are also of the form Module.compHom
, there shouldn't be much speedup
Amelia Livingston (Mar 19 2024 at 20:32):
I see, that makes sense
Kyle Miller (Mar 19 2024 at 21:09):
Jireh Loreaux said:
Yaël Dillies said:
It is the same as a
with
clauseI would swear it's not exactly the same, because I've had things break by switching between the two. But it's always possible I made some other mistake.
The way spread notation works is that every __ :=
is collected up and placed in a with
clause in order. If there was already a with
clause, they're inserted after the existing sources. (See Mathlib.Tactic.Spread
for the macro.)
There's currently a hack I put into place to make it so every value from a __ :=
can be used as a local instance. This has a very uncertain future, and it was just to help smooth over some changes during a recent Lean version bump.
Minus this local instance feature of __ :=
, it should be exactly the same as using a with
clause.
Kevin Buzzard (Mar 20 2024 at 19:51):
I want to speed up
Mathlib.Algebra.Category.ModuleCat.ChangeOfRings
whereModule.compHom
is used heavily. but I'll try looking at more traces in that file
I'm looking at some slow declarations now and I'm not sure that it's typeclass inference any more; there are at least two non-terminal dsimp
s which are taking forever. For example changing the dsimp
in ModuleCat.ExtendRestrictScalarsAdj.HomEquiv.evalAt
(line 600 or so) to dsimp only
changes the number of heartbeats from 87801 (2.6 seconds) to 13266 (0.3 seconds) on my machine.
Kevin Buzzard (Mar 20 2024 at 19:54):
Similarly ModuleCat.restrictScalarsIsEquivalenceOfRingEquiv
is not slow becasue of typeclass inference, it's slow because an autoparam: aesop_cat
is taking over 2 seconds to solve a goal.
Amelia Livingston (Mar 20 2024 at 19:58):
Yeah, sorry, I should've edited this post because as soon as I started trying to make an MWE of my slow use-case (extension of scalars as a lax monoidal functor) by moving it into its own file on current mathlib, it went from taking 40 seconds to compile to about 4 :upside_down: so I am no longer bothered about the ChangeOfRings
file :upside_down:
Matthew Ballard (Mar 20 2024 at 20:02):
Yes, inserting quick clean up dsimp
calls can make a big difference and the autoparams for CategoryTheory
structures can awfully expensive.
Kevin Buzzard (Mar 20 2024 at 20:03):
The current definition of restrictScalarsIsEquivalenceOfRingEquiv
is the same as this:
instance restrictScalarsIsEquivalenceOfRingEquiv {R S} [Ring R] [Ring S] (e : R ≃+* S) :
IsEquivalence (ModuleCat.restrictScalars e.toRingHom) where
inverse := ModuleCat.restrictScalars e.symm
unitIso := NatIso.ofComponents (fun M ↦ LinearEquiv.toModuleIso'
{ __ := AddEquiv.refl M
map_smul' := fun s m ↦ congr_arg (· • m) (e.right_inv s).symm }) (by aesop_cat)
counitIso := NatIso.ofComponents (fun M ↦ LinearEquiv.toModuleIso'
{ __ := AddEquiv.refl M
map_smul' := fun r m ↦ congr_arg (· • (_ : M)) (e.left_inv r)}) (by aesop_cat)
functor_unitIso_comp := (by aesop_cat)
where I've added in the three by aesop_cat
s which are happening by autoparams. It's 107K heartbeats; if you sorry them all instead it's 20K heartbeats.
Matthew Ballard (Mar 20 2024 at 20:07):
I once replaced something like this with rfl
and it was glorious. Probably doesn’t apply here.
Kevin Buzzard (Mar 20 2024 at 20:20):
they're all intros; rfl
in fact:
count_heartbeats in
instance restrictScalarsIsEquivalenceOfRingEquiv {R S} [Ring R] [Ring S] (e : R ≃+* S) :
IsEquivalence (ModuleCat.restrictScalars e.toRingHom) where
inverse := ModuleCat.restrictScalars e.symm
unitIso := NatIso.ofComponents (fun M ↦ LinearEquiv.toModuleIso'
{ __ := AddEquiv.refl M
map_smul' := fun s m ↦ congr_arg (· • m) (e.right_inv s).symm }) (by intros; rfl)
counitIso := NatIso.ofComponents (fun M ↦ LinearEquiv.toModuleIso'
{ __ := AddEquiv.refl M
map_smul' := fun r m ↦ congr_arg (· • (_ : M)) (e.left_inv r)}) (by intros; rfl)
functor_unitIso_comp := (by intros; rfl)
is 21K down from 107K.
Matthew Ballard (Mar 20 2024 at 20:23):
Glorious
Matthew Ballard (Mar 20 2024 at 20:25):
I think we need an issue for aesop performance vs fun _ => rfl
if this is the second incident
Matthew Ballard (Mar 20 2024 at 20:30):
We could also adjust the auto-param to by try (intros; rfl); aesop_cat
Kevin Buzzard (Mar 20 2024 at 20:58):
I'm making a speed-up PR. Here's another reason this file is slow: we keep having to do this:
[Meta.synthInstance] [0.061262s] ✅ HSMul S ↑((ModuleCat.restrictScalars f).obj Y) ↑((ModuleCat.restrictScalars f).obj Y) ▶
[Meta.synthInstance] [0.038096s] ✅ HSMul S ↑((ModuleCat.restrictScalars f).obj Y) ↑((ModuleCat.restrictScalars f).obj Y) ▶
[Meta.synthInstance] [0.037499s] ✅ HSMul S ↑((ModuleCat.restrictScalars f).obj Y) ↑((ModuleCat.restrictScalars f).obj Y) ▶
We just did one computation three times and it took 0.12 seconds; we do this computation a lot of times in the file.
Kevin Buzzard (Mar 26 2024 at 12:11):
I'm sorry I'm so slow to understand all of this.
I'm currently looking at a trace which contains this (on a fast machine):
[Meta.synthInstance] [0.220873s] ✅ OfNat (↥(𝓞 K) ⧸ Ideal.span {IsPrimitiveRoot.toInteger hζ - 1}) 0
So that says "for something which is obviously a ring (being a quotient of a ring by an ideal), Lean takes over 0.2 seconds to check that it has a zero". So there's presumably something wrong (and a bunch of professional mathematicians trying to do something completely mathematically innocuous at a conference in Luminy are struggling as a result). Digging a little further I get
[] [0.141052s] ✅ Module ↥(𝓞 K) ↥(𝓞 K) =?= Module ↥(𝓞 K) ↥(𝓞 K)
so most of the time is spent checking that two module structures are defeq. The module structures aren't syntactically equal; one is
@Module (↥(𝓞 K)) (↥(𝓞 K)) Ring.toSemiring AddCommGroup.toAddCommMonoid : Type u_1
and the other is
@Module (↥(𝓞 K)) (↥(𝓞 K)) (Subalgebra.toSemiring (𝓞 K)) NonUnitalNonAssocSemiring.toAddCommMonoid : Type u_1
The next line of the trace is
[] [0.134164s] ✅ AddCommGroup.toAddCommMonoid =?= NonUnitalNonAssocSemiring.toAddCommMonoid
and then typeclass inference goes haywire,
[] ✅ apply @DenselyNormedField.toNontriviallyNormedField to NontriviallyNormedField ↥(𝓞 K)
etc etc (over 500 lines of nonsense scanning many irrelevant instances).
So I'm guessing the problem is that typeclass inference is refusing to believe that the two module structures are defeq. However
example : @Module (↥(𝓞 K)) (↥(𝓞 K)) Ring.toSemiring AddCommGroup.toAddCommMonoid =
@Module (↥(𝓞 K)) (↥(𝓞 K)) (Subalgebra.toSemiring (𝓞 K)) NonUnitalNonAssocSemiring.toAddCommMonoid := by with_reducible rfl
works fine.
So is with_reducible rfl
just a red herring?
Kevin Buzzard (Mar 26 2024 at 12:25):
Here's another one:
-- [] [0.132849s] ✅ 1 =?= 1
-- [] [0.132785s] ✅ One.toOfNat1 =?= One.toOfNat1
example :
(@One.toOfNat1 (↥(𝓞 K) ⧸ Ideal.span {λ}) (Ideal.Quotient.one (Ideal.span {λ})) : OfNat (↥(𝓞 K) ⧸ Ideal.span {λ}) 1) =
(@One.toOfNat1 (↥(𝓞 K) ⧸ Ideal.span {λ}) AddMonoidWithOne.toOne : OfNat (↥(𝓞 K) ⧸ Ideal.span {λ}) 1) := by
with_reducible rfl -- works fine
This is all coming from the far-too-slow-for-my-liking https://github.com/riccardobrasca/flt3
Kevin Buzzard (Mar 26 2024 at 12:36):
After more experiments, I think this all boils down to the issue (which I can't currently find) that typeclass inference checks the newest instances first. For example
[] new goal Neg (↥(𝓞 K) ⧸ Ideal.span {IsPrimitiveRoot.toInteger hζ - 1}) ▼
[instances] #[@InvolutiveNeg.toNeg, @SubNegMonoid.toNeg, @NegZeroClass.toNeg, @AddGroupWithOne.toNeg, @Ring.toNeg, @LinearOrderedAddCommGroupWithTop.toNeg]
This Neg
typeclass is just a simple thing coming from ring theory, but the Lean happened to last import some exotic thing from order theory which applies, so Lean now just goes down some rabbithole looking at linearly ordered stuff first, before finding its way to the obvious instance :-(
Matthew Ballard (Mar 26 2024 at 12:41):
Reducible rfl is no guarantee of performance. Though making things more opaque is a valid way to handle performance issues
Matthew Ballard (Mar 26 2024 at 12:42):
“Lean don’t look there”
Matthew Ballard (Mar 26 2024 at 12:45):
Performance is about how much data Lean has to check is the same. Ideally that would be none. But of course we can’t guarantee that.
A hygienic rule is to make sure all data is separated from other data in a term so Lean can at least ask itself if it needs to unfold that or is fine without it.
Matthew Ballard (Mar 26 2024 at 12:50):
I think here you have death by 1000 cuts where there are many instances to check against. In each one Lean has to defeq of some data (like Neg
) which takes a small but nontrivial amount of time. Then it adds up.
I would ask: is Neg
exposed wherever it is declared?
Kevin Buzzard (Mar 26 2024 at 13:04):
I'm just increasing the priority of all the instances relevant to the project in order to work around this right now.
Matthew Ballard (Mar 26 2024 at 13:06):
Under the things that were right under my face but only just realized: projections from class parents are NOT reducible. This is why we have with_reducible_and_instances
. So breaking out parent classes is more opaque than parent structures which have reducible projections.
Matthew Ballard (Mar 26 2024 at 13:12):
Why not demote the bad ones to make everyone’s life better? I don’t think most users are making a LinearOrderedAddCommGroupWithTop
because they need that Neg
Kevin Buzzard (Mar 26 2024 at 13:14):
yes demoting the bad instances is a much better choice!
Matthew Ballard (Mar 26 2024 at 13:57):
I made an issue for mathlib #11692 so you can use this tag for each change. It should be linked to the existing lean issue. If things change, we can quickly find all the changes by grepping
Last updated: May 02 2025 at 03:31 UTC