Zulip Chat Archive

Stream: lean4

Topic: slowness in ring theory file


Kevin Buzzard (Jul 25 2023 at 12:13):

I've been thinking about slowness in mathlib4 algebra for the last couple of days; I didn't solve anything but at least I understand a lot more about it than I did, so I thought I would write some kind of overview because I need to go on to other things right now, at least for the next couple of days.

Overview

The changes in the way typeclass inference works have meant that some mathlib4 files (as directly ported from mathlib3) are much slower than the corresponding mathlib3 versions. On the other hand the community is learning more about these changes and modifying mathlib4 code so that it compiles more quickly; e.g. some algebra files involving adjoining roots and splitting fields are faster than they were.

Currently a poster child for mathlib4 slowness is Mathlib.RingTheory.Kaehler. Sixteen times in this 700 line file the default maxHeartbeats or synthInstance.maxHeartbeats needs to be raised. One of the first times this happens is in the definition docs#KaehlerDifferential.D , where this rewrite takes over six seconds. The culprit is LinearMap.map_smul_of_tower. Indeed simply elaborating LinearMap.map_smul_of_tower takes over three seconds on my machine. Here's how you can see this at home:

import Mathlib.RingTheory.Kaehler

open scoped TensorProduct

universe u v

variable (R : Type u) (S : Type v) [CommRing R] [CommRing S] [Algebra R S] (a : S)

open KaehlerDifferential

set_option synthInstance.maxHeartbeats 40000 in
set_option trace.profiler true in
example :=
LinearMap.map_smul_of_tower (Ideal.toCotangent (ideal R S)) a
-- [Elab.command] [3.303770s] example := ...

Adding the expected type just makes it slower:

set_option synthInstance.maxHeartbeats 40000 in
set_option trace.profiler true in
example :  (x : ideal R S), (Ideal.toCotangent (ideal R S)) (a  x) = a  (Ideal.toCotangent (ideal R S)) x :=
LinearMap.map_smul_of_tower (Ideal.toCotangent (ideal R S)) a
-- [Elab.command] [4.228826s] example ...

I should say that I'm using Ubuntu on a relatively new machine (12th Gen Intel(R) Core(TM) i7-12700).

Debugging

If you pipe the output of the trace to a file you can get a feeling for what is going on. This shows up 72 times:

 [] [0.037281s]  CommSemiring.toSemiring =?= Algebra.TensorProduct.instSemiring

(Well, that's what it looks like for me because I'm working on this branch: right now on master the RHS is called Algebra.TensorProduct.instSemiringTensorProductToAddCommMonoidToNonUnitalNonAssocSemiringToNonAssocSemiringToAddCommMonoidToNonUnitalNonAssocSemiringToNonAssocSemiringToModuleToModule because of lean4#2343 ).

These terms are both terms of type Semiring (S ⊗[R] S) and are syntactically unequal, and Lean is spending 0.03 to 0.04 seconds, 72 times, showing that they're defeq (which they are). These computations explain the vast majority of the 3.3 seconds which Lean is spending elaborating this term. The proofs are everywhere. For example, the initial elaboration of the 3-second example above very quickly breaks up into ten pieces

    [] [0.079087s] expected type: { x // x  KaehlerDifferential.ideal R S } →ₗ[S [R] S]
          Ideal.Cotangent (KaehlerDifferential.ideal R S), term
        (Ideal.toCotangent (ideal R S)) 
    [Meta.synthInstance] [0.230184s]  AddCommMonoid { x // x  KaehlerDifferential.ideal R S } 
    [Meta.isDefEq] [0.074836s]  Submodule.addCommMonoid
          (KaehlerDifferential.ideal R S) =?= Submodule.addCommMonoid (KaehlerDifferential.ideal R S) 
    [Meta.synthInstance] [0.108451s]  AddCommMonoid (Ideal.Cotangent (KaehlerDifferential.ideal R S)) 
    [Meta.synthInstance] [0.284854s]  SMul S { x // x  KaehlerDifferential.ideal R S } 
    [Meta.synthInstance] [0.539433s]  Module (S [R] S) { x // x  KaehlerDifferential.ideal R S } 
    [Meta.isDefEq] [0.075252s]  Submodule.module (KaehlerDifferential.ideal R S) =?= Submodule.module (KaehlerDifferential.ideal R S) 
    [Meta.synthInstance] [0.313167s]  SMul S (Ideal.Cotangent (KaehlerDifferential.ideal R S)) 
    [Meta.synthInstance] [0.131442s]  Module (S [R] S) (Ideal.Cotangent (KaehlerDifferential.ideal R S)) 
    [Meta.synthInstance] [1.382991s]  LinearMap.CompatibleSMul { x // x  KaehlerDifferential.ideal R S }
          (Ideal.Cotangent (KaehlerDifferential.ideal R S)) S (S [R] S) 

and in each of these pieces the calculation is performed several times; indeed for the quicker calculations you can guess the number of times it is performed by looking at the largest multiple of 0.033 which goes into the total time taken. For example the calculation is performed three times in [Meta.synthInstance] [0.108451s] ✅ AddCommMonoid (Ideal.Cotangent (KaehlerDifferential.ideal R S)) ▶, and four times in [Meta.synthInstance] [0.131442s] ✅ Module (S ⊗[R] S) (Ideal.Cotangent (KaehlerDifferential.ideal R S)) ▶. If I counted correctly it's performed 31 times in [Meta.synthInstance] [1.382991s] ✅ LinearMap.CompatibleSMul .... Here is an example of the trace for one of the SMul instances, unfolded until we run into the six instances of CommSemiring.toSemiring =?= Algebra.TensorProduct.instSemiring which I didn't unfold any further.

    [Meta.synthInstance] [0.329439s]  SMul S { x // x  KaehlerDifferential.ideal R S } 
      [] [0.264434s]  apply @Submodule.smul to SMul S { x // x  KaehlerDifferential.ideal R S } 
        [tryResolve] [0.264337s]  SMul S { x // x  KaehlerDifferential.ideal R S }  SMul S { x // x  KaehlerDifferential.ideal R S } 
          [isDefEq] [0.264010s]  SMul S { x // x  KaehlerDifferential.ideal R S } =?= SMul ?m.6952 { x // x  ?m.6958 } 
            [] [0.263974s]  { x // x  KaehlerDifferential.ideal R S } =?= { x // x  ?m.6958 } 
              [] [0.263941s]  fun x  x  KaehlerDifferential.ideal R S =?= fun x  x  ?m.6958 
                [] [0.263934s]  x  KaehlerDifferential.ideal R S =?= x  ?m.6958 
                  [] [0.087328s]  Ideal (S [R] S) =?= Submodule (S [R] S) (S [R] S) 
                    [] [0.087319s]  Submodule (S [R] S) (S [R] S) =?= Submodule (S [R] S) (S [R] S) 
                      [] [0.041345s]  CommSemiring.toSemiring =?= Algebra.TensorProduct.instSemiring 
                      [] [0.041224s]  Semiring.toModule =?= Semiring.toModule 
                        [delta] [0.041213s]  Semiring.toModule =?= Semiring.toModule 
                          [] [0.041178s]  CommSemiring.toSemiring =?= Algebra.TensorProduct.instSemiring 
                  [] [0.176046s]  SetLike.instMembership =?= SetLike.instMembership 
                    [delta] [0.176027s]  SetLike.instMembership =?= SetLike.instMembership 
                      [] [0.087993s]  Ideal (S [R] S) =?= Submodule (S [R] S) (S [R] S) 
                        [] [0.087984s]  Submodule (S [R] S) (S [R] S) =?= Submodule (S [R] S) (S [R] S) 
                          [] [0.041296s]  CommSemiring.toSemiring =?= Algebra.TensorProduct.instSemiring 
                          [] [0.041809s]  Semiring.toModule =?= Semiring.toModule 
                            [delta] [0.041796s]  Semiring.toModule =?= Semiring.toModule 
                              [] [0.041760s]  CommSemiring.toSemiring =?= Algebra.TensorProduct.instSemiring 
                      [] [0.087950s]  Submodule.setLike =?= Submodule.setLike 
                        [delta] [0.087940s]  Submodule.setLike =?= Submodule.setLike 
                          [] [0.041624s]  CommSemiring.toSemiring =?= Algebra.TensorProduct.instSemiring 
                          [] [0.041409s]  Semiring.toModule =?= Semiring.toModule 
                            [delta] [0.041397s]  Semiring.toModule =?= Semiring.toModule 
                              [] [0.041359s]  CommSemiring.toSemiring =?= Algebra.TensorProduct.instSemiring 
      [] [0.024370s]  apply @IsScalarTower.right to IsScalarTower S (S [R] S) (S [R] S) 
      [check] [0.012651s]  Submodule.smul (KaehlerDifferential.ideal R S)

Details of the instance check.

Here's some code which just isolates the instance check (always modulo #6123)

import Mathlib.RingTheory.Kaehler

open scoped TensorProduct

universe u v

variable (R : Type u) (S : Type v) [CommRing R] [CommRing S] [Algebra R S]
open KaehlerDifferential

set_option trace.profiler true in
set_option pp.proofs.withType false in
example : @CommSemiring.toSemiring (S [R] S) (@CommRing.toCommSemiring (S [R] S) Algebra.TensorProduct.instCommRing) =
          Algebra.TensorProduct.instSemiring := by
  rfl

(on master it's

example : @CommSemiring.toSemiring (S [R] S) (@CommRing.toCommSemiring (S [R] S) Algebra.TensorProduct.instCommRingTensorProductToCommSemiringToAddCommMonoidToNonUnitalNonAssocSemiringToNonUnitalNonAssocRingToNonAssocRingToRingToAddCommMonoidToNonUnitalNonAssocSemiringToNonUnitalNonAssocRingToNonAssocRingToRingToModuleToSemiringToCommSemiringToModuleToSemiringToCommSemiring) =
          Algebra.TensorProduct.instSemiringTensorProductToAddCommMonoidToNonUnitalNonAssocSemiringToNonAssocSemiringToAddCommMonoidToNonUnitalNonAssocSemiringToNonAssocSemiringToModuleToModule := by
  rfl

).

The relevant instances are this:

instance instCommRing : CommRing (A [R] B) :=
  { toRing := inferInstance
    mul_comm := fun x y => by
      (proof omitted)
instance instRing : Ring (A [R] B) :=
  { toSemiring := inferInstance
    add_left_neg := add_left_neg }

and

instance instSemiring : Semiring (A [R] B) :=
  { (by infer_instance : AddMonoidWithOne (A [R] B)),
    (by infer_instance : AddCommMonoid (A [R] B)) with
    zero := 0
    add := (· + ·)
    one := 1
    mul := fun a b => mul a b
    one_mul := one_mul
    (12 more proofs omitted)

Kevin Buzzard (Jul 25 2023 at 12:13):

It's not about paths in the instance graph!

There are two routes from CommRing to Semiring, the "canonical" one via Ring and the "to be discouraged" one via CommSemiring. Note that CommRing extends Ring first, and Ring extends Semiring first. For a while I had assumed that the problem was that Lean was getting from instCommRing to instSemiring via the non-canonical route (this is what things like #5959 are about -- trying to change things so that Lean takes the canonical route as often as possible). But I was wrong! Let's change the route to the canonical one: it is still not instant!

set_option trace.profiler true in
set_option pp.proofs.withType false in
example : @Ring.toSemiring (S [R] S) (@CommRing.toRing (S [R] S) Algebra.TensorProduct.instCommRing) =
          Algebra.TensorProduct.instSemiring := rfl
-- [Elab.command] [0.088639s]

This for me is very surprising, and might mean that I've missed something. This proof is 0.02 seconds of boilerplate and then two instance of the dreaded 0.03 seconds of Algebra.TensorProduct.instSemiring =?= Ring.toSemiring.

    [step] [0.035774s] expected type: Ring.toSemiring = Algebra.TensorProduct.instSemiring, term
        rfl 
      [Meta.isDefEq] [0.035685s]  Ring.toSemiring = Algebra.TensorProduct.instSemiring =?= ?m.2025 = ?m.2025 
        [] [0.035634s]  Algebra.TensorProduct.instSemiring =?= Ring.toSemiring 
          [] [0.035610s]  Algebra.TensorProduct.instSemiring =?= Ring.toSemiring 
    [Meta.isDefEq] [0.031654s]  Ring.toSemiring = Ring.toSemiring =?= Ring.toSemiring = Algebra.TensorProduct.instSemiring 
      [] [0.031642s]  Ring.toSemiring =?= Algebra.TensorProduct.instSemiring 

So why is this taking so long? Because checking this many times seems inevitable.

The innards of the instance check.

It's taking ages because Lean unfolds Algebra.TensorProduct.instSemiring (which is big) instead of unfolding Ring.toSemiring first. Let's see what happens if we guide Lean in the right direction:

set_option trace.profiler true in
example : @Ring.toSemiring (S [R] S) (@CommRing.toRing (S [R] S) Algebra.TensorProduct.instCommRing) =
          Algebra.TensorProduct.instSemiring := by
  delta Ring.toSemiring
  delta CommRing.toRing
  -- ⊢ Algebra.TensorProduct.instCommRing.1.1 = Algebra.TensorProduct.instSemiring
  rfl
-- [Elab.command] [0.023775s]

The 0.02 seconds is just setting stuff up, as far as I can see; the difference between this and the 0.08 seconds is the two 0.03 seconds it's taking to prove Ring.toSemiring =?= Algebra.TensorProduct.instSemiring twice via unfolding Algebra.TensorProduct.instSemiring. I know these are all very small numbers but the point is that this is happening 150 times in that rw I linked to, and 150*0.03 is over 4 seconds.

So let's stop guiding Lean and see what it actually does when proving @Ring.toSemiring (S ⊗[R] S) (@CommRing.toRing (S ⊗[R] S) Algebra.TensorProduct.instCommRing) = Algebra.TensorProduct.instSemiring. Here's the trace up to the first bifurcation. As you can see, Lean first unfolds the Algebra.TensorProduct.instSemiring which it finds on the RHS, and a few lines later unfolds the one which it finds on the LHS!

    [Meta.isDefEq] [0.033624s]  Ring.toSemiring = Ring.toSemiring =?= Ring.toSemiring = Algebra.TensorProduct.instSemiring 
      [] [0.033609s]  Ring.toSemiring =?= Algebra.TensorProduct.instSemiring 
        [] [0.033523s]  Ring.toSemiring =?= let src := inferInstance;
            let src_1 := inferInstance;
            Semiring.mk _ _ npowRec 
          [] [0.033516s]  Ring.toSemiring =?= Semiring.mk _ _ npowRec 
            [] [0.033511s]  CommRing.toRing.1 =?= Semiring.mk _ _ npowRec 
              [] [0.033415s]  inferInstance =?= Semiring.mk _ _ npowRec 
                [] [0.033408s]  Algebra.TensorProduct.instSemiring =?= Semiring.mk _ _ npowRec 
                  [] [0.033375s]  let src := inferInstance;
                      let src_1 := inferInstance;
                      Semiring.mk _ _ npowRec =?= Semiring.mk _ _ npowRec 
                    [] [0.033368s]  Semiring.mk _ _ npowRec =?= Semiring.mk _ _ npowRec 
                      [] [0.014229s]  npowRec =?= npowRec 
                      [] [0.016571s]  NonUnitalSemiring.mk _ =?= NonUnitalSemiring.mk _ 

Once both instSemiring terms are unfolded (which takes under 1ms, it's super-quick), Lean ends up ultimately having to prove [] [0.033368s] ✅ Semiring.mk _ _ npowRec =?= Semiring.mk _ _ npowRec. Now as far as I can see these terms are now syntactically equal. So I am quite confused about why this takes 33ms, because I thought that it was easy to check that syntactically equal terms were equal by rfl.

But I think it must be because the terms are, erm, large. With pp.all on, the last reasonable term in the trace looks like this (the =?= is two lines from the bottom) (this is the line [] [0.033609s] ✅ Ring.toSemiring =?= Algebra.TensorProduct.instSemiring ▼ in the above trace) (it's a different run so the timings are slightly different). This is just before Lean unfolds the instSemiring on the RHS.

      [] [0.032140s]  @Ring.toSemiring.{v}
            (@TensorProduct.{u, v, v} R (@CommRing.toCommSemiring.{u} R inst✝²) S S
              (@NonUnitalNonAssocSemiring.toAddCommMonoid.{v} S
                (@NonUnitalNonAssocRing.toNonUnitalNonAssocSemiring.{v} S
                  (@NonAssocRing.toNonUnitalNonAssocRing.{v} S
                    (@Ring.toNonAssocRing.{v} S (@CommRing.toRing.{v} S inst✝¹)))))
              (@NonUnitalNonAssocSemiring.toAddCommMonoid.{v} S
                (@NonUnitalNonAssocRing.toNonUnitalNonAssocSemiring.{v} S
                  (@NonAssocRing.toNonUnitalNonAssocRing.{v} S
                    (@Ring.toNonAssocRing.{v} S (@CommRing.toRing.{v} S inst✝¹)))))
              (@Algebra.toModule.{u, v} R S (@CommRing.toCommSemiring.{u} R inst✝²)
                (@CommSemiring.toSemiring.{v} S (@CommRing.toCommSemiring.{v} S inst✝¹)) inst)
              (@Algebra.toModule.{u, v} R S (@CommRing.toCommSemiring.{u} R inst✝²)
                (@CommSemiring.toSemiring.{v} S (@CommRing.toCommSemiring.{v} S inst✝¹)) inst))
            (@CommRing.toRing.{v}
              (@TensorProduct.{u, v, v} R (@CommRing.toCommSemiring.{u} R inst✝²) S S
                (@NonUnitalNonAssocSemiring.toAddCommMonoid.{v} S
                  (@NonUnitalNonAssocRing.toNonUnitalNonAssocSemiring.{v} S
                    (@NonAssocRing.toNonUnitalNonAssocRing.{v} S
                      (@Ring.toNonAssocRing.{v} S (@CommRing.toRing.{v} S inst✝¹)))))
                (@NonUnitalNonAssocSemiring.toAddCommMonoid.{v} S
                  (@NonUnitalNonAssocRing.toNonUnitalNonAssocSemiring.{v} S
                    (@NonAssocRing.toNonUnitalNonAssocRing.{v} S
                      (@Ring.toNonAssocRing.{v} S (@CommRing.toRing.{v} S inst✝¹)))))
                (@Algebra.toModule.{u, v} R S (@CommRing.toCommSemiring.{u} R inst✝²)
                  (@CommSemiring.toSemiring.{v} S (@CommRing.toCommSemiring.{v} S inst✝¹)) inst)
                (@Algebra.toModule.{u, v} R S (@CommRing.toCommSemiring.{u} R inst✝²)
                  (@CommSemiring.toSemiring.{v} S (@CommRing.toCommSemiring.{v} S inst✝¹)) inst))
              (@Algebra.TensorProduct.instCommRing.{u, v, v} R inst✝² S inst✝¹ inst S inst✝¹
                inst)) =?= @Algebra.TensorProduct.instSemiring.{u, v, v} R (@CommRing.toCommSemiring.{u} R inst✝²) S
            (@CommSemiring.toSemiring.{v} S (@CommRing.toCommSemiring.{v} S inst✝¹)) inst S
            (@CommSemiring.toSemiring.{v} S (@CommRing.toCommSemiring.{v} S inst✝¹)) inst

The next line in the un-pp.all trace is [] [0.033523s] ✅ Ring.toSemiring =?= let src := inferInstance; let src_1 := inferInstance; Semiring.mk _ _ npowRec (the disastrous choice of unfolding the RHS rather than the LHS). You can see the pp.all output of this next line here; the =?= is still 30 or so lines down but we now have 2500 lines of pp.all unfolded let src := inferInstance; let src_1 := inferInstance; Semiring.mk _ _ npowRec in all their glory.

I would be interested in getting a bit further down the trace with pp.all on so I can see whether by the time Lean has unfolded both instSemirings the resulting [] [0.033368s] ✅ Semiring.mk _ _ npowRec =?= Semiring.mk _ _ npowRec ▼ really is a syntactic equality. Unfortunately VS Code hangs before I can get to it, and piping the pp.all output to a file also fails; Lean uses all 32 gigs of ram on my machine and then gives up.

So that's a summary of my understanding of why we need to bump up the maxHeartbeats here and I suspect that it will be a similar story for the rest of the file, because mathematically this is just the statement that Ideal.toCotangent (ideal R S) is S-linear, which is a basic fact and which will show up all over the place in the file.

Matthew Ballard (Jul 25 2023 at 12:39):

The current record holder for max heart beats is AlgebraicGeometry.Morphisms.RingHomProperties with this line responsible. I piped the term generated by this line with pp.explicit to a text file. It was 1/2 Gb.

Matthew Ballard (Jul 25 2023 at 12:43):

To even get here, I had to turn off instance synthesis and use unification or Lean would run away and seemingly never return.

Matthew Ballard (Jul 25 2023 at 12:44):

Right now I’m in the “terms too big “ camp but I don’t know how the corresponding code in Lean 3 compares.

Matthew Ballard (Jul 25 2023 at 12:46):

Are the terms the same size and handled more nimbly? Or has there been a blow up?

Kevin Buzzard (Jul 25 2023 at 12:48):

https://gist.github.com/kbuzzard/ffb9977dfd9e81b25b71f29a8617479e (starting after the =?= on line 30) is unfolded Algebra.TensorProduct.instSemiring with pp.all on (2500 lines). This is the Lean code which generates that term.

Kevin Buzzard (Jul 25 2023 at 12:50):

But what is frustrating is that Lean doesn't need to unfold that term at all -- it occurs on both the LHS and RHS of a defeq check. Unfortunately if you make it irreducible a bunch of proofs break.

Matthew Ballard (Jul 25 2023 at 12:51):

The line I cite was an erw in Lean 3 to begin with. It wasn’t snappy but I could work with it on my RPi

Matthew Ballard (Jul 25 2023 at 12:53):

Can you trim fields in that instance?

Kevin Buzzard (Jul 25 2023 at 12:58):

I am sure that one can make that instance smaller. Is this the right approach? Here is a minimised example of rfl making a bad life decision (on #6123 , which is now being dealt with by bors) :

import Mathlib.RingTheory.Kaehler

open TensorProduct

variable (R : Type) (S : Type) [CommRing R] [CommRing S] [Algebra R S]

set_option trace.profiler true in
example : @Ring.toSemiring (S [R] S) Algebra.TensorProduct.instRing =
          Algebra.TensorProduct.instSemiring := rfl -- 0.08 seconds

set_option trace.profiler true in
example : @Ring.toSemiring (S [R] S) Algebra.TensorProduct.instRing =
          Algebra.TensorProduct.instSemiring := by
  delta Ring.toSemiring
  rfl -- 0.02 seconds

The 0.06 seconds of difference comes from Algebra.TensorProduct.instSemiring = Algebra.TensorProduct.instSemiring being instant as it stands, but if you unfold both sides it takes 0.03 seconds to check that they're syntactically equal and Lean does it twice (once when solving for a metavariable and once again when it's solved for it).

Kevin Buzzard (Jul 25 2023 at 13:00):

Is 2500 lines reasonable for a pp.all instance? If it isn't then this is an argument for minimising that instSemiring term regardless.

Matthew Ballard (Jul 25 2023 at 13:23):

Why am I telling Lean about add, zero, and all their properties if that should be coming from the two instances? This confused me in Lean 3. When porting lines like this, they often caused problems.

It looks like zero, add, add_assoc, zero_add, add_zero are all extraneous.

Matthew Ballard (Jul 25 2023 at 13:25):

I get a cryptic error if I comment out add_comm

Matthew Ballard (Jul 25 2023 at 13:26):

Can remove nsmul_succ also but the same form of error for natCast_succ

Matthew Ballard (Jul 25 2023 at 13:29):

Also one can be removed

Matthew Ballard (Jul 25 2023 at 13:33):

Hmm, I broke things further down in the file

Eric Wieser (Jul 25 2023 at 13:54):

Does replacing (by infer_instance : AddCommMonoid (A ⊗[R] B)) with with toSomething := infer_instance help at all here?

Kevin Buzzard (Jul 25 2023 at 17:22):

Well

class Semiring (α : Type u) extends NonUnitalSemiring α, NonAssocSemiring α, MonoidWithZero α
#align semiring Semiring

so

instance instSemiring : Semiring (A [R] B) :=
  { (by infer_instance : AddMonoidWithOne (A [R] B)),
    (by infer_instance : AddCommMonoid (A [R] B)) with
    zero := 0
    add := (· + ·)
    one := 1
    mul := fun a b => mul a b
    one_mul := one_mul
    mul_one := mul_one
    mul_assoc := mul_assoc
    add_assoc := add_assoc
    zero_add := zero_add
    add_zero := add_zero
    add_comm := add_comm
    nsmul_succ := AddMonoid.nsmul_succ
    natCast_succ := AddMonoidWithOne.natCast_succ
    zero_mul := fun a => show mul 0 a = 0 by rw [map_zero, LinearMap.zero_apply]
    mul_zero := fun a => show mul a 0 = 0 by rw [map_zero]
    -- port note : `left_distrib` and `right_distrib` are proved by `simp` in mathlib3
    left_distrib := fun a b c => show mul a (b + c) = mul a b + mul a c by rw [map_add]
    right_distrib := fun a b c => show mul (a + b) c = mul a c + mul b c
      by rw [map_add, LinearMap.add_apply] }

is in some sense quite chaotic; it's not extending either AddMonoidWithOne or AddCommMonoid.

Eric Wieser (Jul 25 2023 at 17:24):

Yes, you'd need to write toNonUnitalSemiring := { toAddCommMonoid := inferInstance } or worse

Kevin Buzzard (Jul 25 2023 at 17:29):

I am about to embark on this but I've realised that I don't understand Lean 4. How does that even compile? A bunch of fields are missing.

Eric Wieser (Jul 25 2023 at 17:34):

Only one field is missing, smul, right?

Eric Wieser (Jul 25 2023 at 17:34):

Or rather toSMul, which is automatically found via typeclass inference

Matthew Ballard (Jul 25 2023 at 17:35):

I am trying to remove zero and add now. Fixing the resulting errors is simple (delete stuff for the most part) but I should be able to close the following quickly

example {M : Type _} [AddCommMonoid M] (a b c d e f g h : M) (h1 : a + c = e + g) (h2 : b + d  = f + h) : a + b + c + d = e + f + g  + h := sorry

Kevin Buzzard (Jul 25 2023 at 17:35):

(module question: oh I see -- toDistribMulAction is found by typeclass inference)

Eric Wieser (Jul 25 2023 at 17:36):

@Matthew Ballard, that's add_add_add_comm twice on congr_arg2 HAdd.hAdd h1 h2

Eric Wieser (Jul 25 2023 at 17:37):

How's that relevant here though?

Matthew Ballard (Jul 25 2023 at 17:37):

hx : f ((x * ab₁) ⊗ₜ[R] (c₁ * c₂)) + f ((x * ab₂) ⊗ₜ[R] (c₁ * c₂)) =
  f (x ⊗ₜ[R] c₁) * f (ab₁ ⊗ₜ[R] c₂) + f (x ⊗ₜ[R] c₁) * f (ab₂ ⊗ₜ[R] c₂)
hy : f ((y * ab₁) ⊗ₜ[R] (c₁ * c₂)) + f ((y * ab₂) ⊗ₜ[R] (c₁ * c₂)) =
  f (y ⊗ₜ[R] c₁) * f (ab₁ ⊗ₜ[R] c₂) + f (y ⊗ₜ[R] c₁) * f (ab₂ ⊗ₜ[R] c₂)
 f ((x * ab₁) ⊗ₜ[R] (c₁ * c₂)) + f ((y * ab₁) ⊗ₜ[R] (c₁ * c₂)) +
    (f ((x * ab₂) ⊗ₜ[R] (c₁ * c₂)) + f ((y * ab₂) ⊗ₜ[R] (c₁ * c₂))) =
  f (x ⊗ₜ[R] c₁) * f (ab₁ ⊗ₜ[R] c₂) + f (y ⊗ₜ[R] c₁) * f (ab₁ ⊗ₜ[R] c₂) +
    (f (x ⊗ₜ[R] c₁) * f (ab₂ ⊗ₜ[R] c₂) + f (y ⊗ₜ[R] c₁) * f (ab₂ ⊗ₜ[R] c₂))

Matthew Ballard (Jul 25 2023 at 17:38):

Comes from the ported proof of algEquivOfLinearEquivTripleTensorProduct when I remove add and zero fields from the Semiring instance

Matthew Ballard (Jul 25 2023 at 17:41):

It builds but I suspect it won't help too much.

Kevin Buzzard (Jul 25 2023 at 17:49):

What's the proof? I've got as far as rw [add_add_add_comm, hx, hy]

Matthew Ballard (Jul 25 2023 at 17:51):

ac_rfl closed it for me

Kevin Buzzard (Jul 25 2023 at 17:51):

I tried some more drastic changes and have broken several other things e.g.

@[simp]
theorem lmul'_apply_tmul (a b : S) : lmul' R (a ⊗ₜ[R] b) = a * b :=
  rfl

now gives

function expected at
  lmul' R
term has type
  ?m.1695651 [R] ?m.1695651 →ₐ[R] ?m.1695651

Matthew Ballard (Jul 25 2023 at 17:51):

Though things seem to have gotten slightly slower (only deleting all the zero and add fields) in Kaehler

Matthew Ballard (Jul 25 2023 at 17:51):

Yeah, I got a bit of that

Matthew Ballard (Jul 25 2023 at 17:51):

Had to specify some implicit arguments to elaborate

Matthew Ballard (Jul 25 2023 at 17:54):

Kevin Buzzard said:

I tried some more drastic changes

Following the happy path as Eric suggested? That is most sensible thing at this point

Eric Wieser (Jul 25 2023 at 17:57):

@Anatole Dedecker's suggestion of writing everything explicitly might surprisingly also help; that is add, nsmul := nsmul, npow := npow etc too

Matthew Ballard (Jul 25 2023 at 17:58):

Worth a shot

Kevin Buzzard (Jul 25 2023 at 17:59):

These changes are good in the sense that I'm deleting porting notes which I remember writing 2 months ago :-)

Kevin Buzzard (Jul 25 2023 at 18:10):

these changes are now also good in that I'm deleting maxHeartbeats in Kaehler :D

Matthew Ballard (Jul 25 2023 at 18:14):

Running tally of the largest one deleted?

Kevin Buzzard (Jul 25 2023 at 18:15):

Small ones get deleted, big ones get smaller. Some definitions are still slow.

Matthew Ballard (Jul 25 2023 at 18:16):

4400000 -> ?

Kevin Buzzard (Jul 25 2023 at 18:17):

3200000 -> 400000 but I can't get rid of it completely.

Matthew Ballard (Jul 25 2023 at 18:18):

Almost an order of magnitude

Kevin Buzzard (Jul 25 2023 at 18:21):

#6134 , the diff looks nice but there still might be more work to be done. Thanks all for the help!

Kevin Buzzard (Jul 25 2023 at 18:30):

It's really difficult to understand the effects of my PR because the trace for the stuff I gave in my verbose initial posts seems to be completely different now. The semiring stuff isn't being unfolded at all it seems. I am unclear about which of my changes have caused this.

Matthew Ballard (Jul 25 2023 at 18:32):

Does it just not have to go as deep into the projections when they are better structured?

Matthew Ballard (Jul 25 2023 at 18:32):

Eliminating maxHeartbeats 2400000 is very satisfying. Nice work!

Kevin Buzzard (Jul 25 2023 at 18:34):

All the [Meta.isdefEq] checks have now disappeared and instead of the ten pieces which the original code broke into, now this happens:

    [] [0.017991s] expected type: { x // x  KaehlerDifferential.ideal R S } →ₗ[S [R] S]
          Ideal.Cotangent (KaehlerDifferential.ideal R S), term
        (Ideal.toCotangent (ideal R S)) 
    [Meta.synthInstance] [0.017766s]  AddCommMonoid { x // x  KaehlerDifferential.ideal R S } 
    [Meta.synthInstance] [0.012941s]  AddCommMonoid (Ideal.Cotangent (KaehlerDifferential.ideal R S))
    [Meta.synthInstance] [0.071670s]  SMul S { x // x  KaehlerDifferential.ideal R S } 
    [Meta.synthInstance] [0.038738s]  Module (S [R] S) { x // x  KaehlerDifferential.ideal R S } 
    [Meta.synthInstance] [0.039233s]  SMul S (Ideal.Cotangent (KaehlerDifferential.ideal R S)) 
    [Meta.synthInstance] [0.012067s]  Module (S [R] S) (Ideal.Cotangent (KaehlerDifferential.ideal R S))
    [Meta.synthInstance] [0.507008s]  LinearMap.CompatibleSMul { x // x  KaehlerDifferential.ideal R S }
          (Ideal.Cotangent (KaehlerDifferential.ideal R S)) S (S [R] S) 

Everything is now about 4 times faster and the instance traces halt way way before they were halting before. I don't know exactly which change caused this.

Kevin Buzzard (Jul 25 2023 at 18:35):

I should perhaps say that the key change is the definition of instSemiring; the definition which I was complaining shouldn't be unfolded because when it was unfolded chaos ensued. I have no understanding of why changing the definition stops it from being unfolded.

Kevin Buzzard (Jul 25 2023 at 18:38):

The check that the two instances which are defeq, are defeq, is now 0.02 seconds rather than 0.08 seconds (i.e. the two 0.03 "heavy" rfls are gone) and the trace is this:

[Elab.command] [0.026918s] set_option pp.proofs.withType false in
    example :
        @CommSemiring.toSemiring (S [R] S) (@CommRing.toCommSemiring (S [R] S) Algebra.TensorProduct.instCommRing) =
          Algebra.TensorProduct.instSemiring :=
      by rfl 
  [] [0.026866s] example :
          @CommSemiring.toSemiring (S [R] S) (@CommRing.toCommSemiring (S [R] S) Algebra.TensorProduct.instCommRing) =
            Algebra.TensorProduct.instSemiring :=
        by rfl 
    [step] [0.016592s] expected type: Prop, term
        @CommSemiring.toSemiring (S [R] S) (@CommRing.toCommSemiring (S [R] S) Algebra.TensorProduct.instCommRing) =
          Algebra.TensorProduct.instSemiring 
      [] [0.016578s] expected type: Prop, term
          binrel% Eq
            (@CommSemiring.toSemiring (S [R] S)
              (@CommRing.toCommSemiring (S [R] S) Algebra.TensorProduct.instCommRing))
            Algebra.TensorProduct.instSemiring 
        [] [0.012519s] expected type: <not-available>, term
            @CommSemiring.toSemiring (S [R] S) (@CommRing.toCommSemiring (S [R] S) Algebra.TensorProduct.instCommRing)

What is binrel%? It definitely wasn't there before.

Mario Carneiro (Jul 25 2023 at 18:41):

it's something to help with getting the elaboration order and coercions on binops right

Kevin Buzzard (Jul 25 2023 at 18:41):

What's a binop?

Mario Carneiro (Jul 25 2023 at 18:41):

it should not be too relevant to this

Mario Carneiro (Jul 25 2023 at 18:41):

x = y

Mario Carneiro (Jul 25 2023 at 18:41):

is a binary operation

Matthew Ballard (Jul 25 2023 at 18:41):

docs#Lean.Elab.Term.Op.elabBinRelCore for reference but yeah probably not pertinent

Kevin Buzzard (Jul 25 2023 at 18:44):

The new definition of instSemiring is this:

instance instMul : Mul (A [R] B) := fun a b  mul a b

instance : NonUnitalNonAssocSemiring (A [R] B) := {
  -- porting note : `left_distrib` and `right_distrib` are proved by `simp` in mathlib3
  -- See https://github.com/leanprover-community/mathlib4/issues/5026
  -- Probably because `mul` is defined to be bi-R-linear and then coerced to function?
  left_distrib := fun a b c => show mul a (b + c) = mul a b + mul a c by simp
  right_distrib := fun a b c => show mul (a + b) c = mul a c + mul b c by simp
  zero_mul := fun a => show mul 0 a = 0 by simp
  mul_zero := fun a => show mul a 0 = 0 by simp
}

instance instNonUnitalSemiring : NonUnitalSemiring (A [R] B) := {
  mul_assoc := mul_assoc
}

instance instSemiring : Semiring (A [R] B) :=
  { one_mul := one_mul
    mul_one := mul_one
  }

You can see the old one further up in the thread; it was just the sort of thing we did in mathlib3 all the time -- "yeah it's an AddCommMonoid, yeah it's some other things, here's some other data, here are some proofs, I don't even care what order I am doing things in". I have replaced it with a definition which literally follows the .mk constructor exactly: I define all the earlier instances so typeclass inference can find them and then I prove all the theorems in the right order. For some reason this has made a huge difference.

Kevin Buzzard (Jul 25 2023 at 18:44):

Note that now if someone refactors the definition of Semiring everything might start timing out again.

Kevin Buzzard (Jul 25 2023 at 18:49):

Are we sure that those are the things we want to extend, and in that exact order?

Matthew Ballard (Jul 25 2023 at 18:50):

Is there a difference in the size of the term for instSemiring if you print it to file?

Kevin Buzzard (Jul 25 2023 at 18:52):

With pp.all on? Interesting idea.

This breakage is a bit annoying. In RingTheory.PolynomialAlgebra I've just had to change

    includeLeft a * (1 : A) ⊗ₜ[R] (X : R[X]) ^ n :=

to

    includeLeft (R := R) (A := A) (B := R[X]) a * (1 : A) ⊗ₜ[R] (X : R[X]) ^ n :=

(all three are needed). Why is this happening? docs#Algebra.TensorProduct.includeLeft

Matthew Ballard (Jul 25 2023 at 18:53):

That also happened if you just removed add and zero as explicit fields

Matthew Ballard (Jul 25 2023 at 18:53):

(plus all their related proofs)

Kevin Buzzard (Jul 25 2023 at 19:01):

So this was the original definition:

instance : AddCommMonoid (A [R] B) := by infer_instance

instance instSemiring : Semiring (A [R] B) :=
  { (by infer_instance : AddMonoidWithOne (A [R] B)),
    (by infer_instance : AddCommMonoid (A [R] B)) with
    zero := 0
    add := (· + ·)
    one := 1
    mul := fun a b => mul a b
    one_mul := one_mul
    mul_one := mul_one
    mul_assoc := mul_assoc
    add_assoc := add_assoc
    zero_add := zero_add
    add_zero := add_zero
    add_comm := add_comm
    nsmul_succ := AddMonoid.nsmul_succ
    natCast_succ := AddMonoidWithOne.natCast_succ
    zero_mul := fun a => show mul 0 a = 0 by rw [map_zero, LinearMap.zero_apply]
    mul_zero := fun a => show mul a 0 = 0 by rw [map_zero]
    -- port note : `left_distrib` and `right_distrib` are proved by `simp` in mathlib3
    left_distrib := fun a b c => show mul a (b + c) = mul a b + mul a c by rw [map_add]
    right_distrib := fun a b c => show mul (a + b) c = mul a c + mul b c
      by rw [map_add, LinearMap.add_apply] }

and recall that our current, possibly arbitrary definition of Semiring is

class Semiring (α : Type u) extends NonUnitalSemiring α, NonAssocSemiring α, MonoidWithZero α

although I fear that this could possibly be changed at some point because of lean4#2325 (unless someone like @Sebastien Gouezel thought really hard about these ringy definitions)

Kevin Buzzard (Jul 25 2023 at 19:04):

So the question is: why does deleting zero and add from that definition (because they can be read off from e.g. AddCommMonoid) cause elaboration to become worse?

Jireh Loreaux (Jul 25 2023 at 19:11):

Kevin, that includeLeft problem seems eerily similar to the one we encountered involving the breakage from docs#Module.Dual.instFunLikeDual. However, as I mentioned then, that instance isn't the only one causing breakage.

Yury G. Kudryashov (Jul 25 2023 at 19:15):

Should we try to change definitions so that projections like Ring.toSemiring, AddGroup.toAddMonoid are just .1?

Jireh Loreaux (Jul 25 2023 at 19:17):

Why would that help?

Matthew Ballard (Jul 25 2023 at 19:22):

Kevin Buzzard said:

With pp.all on? Interesting idea.

Or pp.explicit. How else can one measure the complexity of a term?

Kevin Buzzard (Jul 25 2023 at 19:26):

Matthew Ballard said:

Is there a difference in the size of the term for instSemiring if you print it to file?

There's a huge difference because I also made Algebra.TensorProduct.instNonUnitalSemiring which is now bearing some of the load. There is also Algebra.TensorProduct.instSemiring.proof_x for x=1,2,3,4, whereas before it went up to x=16.

Matthew Ballard (Jul 25 2023 at 19:28):

If you inline everything does difference persist?

Kevin Buzzard (Jul 25 2023 at 19:29):

I don't know what that means :-(

Matthew Ballard (Jul 25 2023 at 19:29):

Oh sorry. Just stick everything back into instSemiring but using the more structured presentation

Kevin Buzzard (Jul 25 2023 at 19:30):

#6134 is compiling locally for me. The 4400000 can't come down to 2200000 so there's still some mysteries to be solved.

Matthew Ballard (Jul 25 2023 at 19:31):

Eg

{ toNonUnitalSemiring := { ..
 }
..
}

Kevin Buzzard (Jul 25 2023 at 19:34):

Is this good enough or do you want me to get the Mul in there too?

instance : AddCommMonoid (A [R] B) := by infer_instance

instance instMul : Mul (A [R] B) := fun a b  mul a b

instance instSemiring : Semiring (A [R] B) :=
  { toNonUnitalSemiring :=
    { toNonUnitalNonAssocSemiring :=
      { left_distrib := fun a b c => show mul a (b + c) = mul a b + mul a c by simp
        right_distrib := fun a b c => show mul (a + b) c = mul a c + mul b c by simp
        zero_mul := fun a => show mul 0 a = 0 by simp
        mul_zero := fun a => show mul a 0 = 0 by simp }
      mul_assoc := mul_assoc }
    one_mul := one_mul
    mul_one := mul_one }

I am scared of the Mul because #print NonUnitalNonAssocSemiring gives me

inductive NonUnitalNonAssocSemiring.{u} : Type u  Type u
number of parameters: 1
constructors:
NonUnitalNonAssocSemiring.mk : {α : Type u} 
  [toAddCommMonoid : AddCommMonoid α] 
    [toMul : Mul α] 
      ( (a b c : α), a * (b + c) = a * b + a * c) 
        ( (a b c : α), (a + b) * c = a * c + b * c) 
          ( (a : α), 0 * a = 0)  ( (a : α), a * 0 = 0)  NonUnitalNonAssocSemiring α

so that's two inputs which are supposed to be inferred by typeclass inference and I don't know how to inline that.

Matthew Ballard (Jul 25 2023 at 19:35):

Is that just two fields toAddCommMonoid and toMul?

Kevin Buzzard (Jul 25 2023 at 19:35):

got it:

instance : AddCommMonoid (A [R] B) := by infer_instance

instance instSemiring : Semiring (A [R] B) :=
  { toNonUnitalSemiring :=
    { toNonUnitalNonAssocSemiring :=
      { toMul := fun a b  mul a b
        left_distrib := fun a b c => show mul a (b + c) = mul a b + mul a c by simp
        right_distrib := fun a b c => show mul (a + b) c = mul a c + mul b c by simp
        zero_mul := fun a => show mul 0 a = 0 by simp
        mul_zero := fun a => show mul a 0 = 0 by simp }
      mul_assoc := mul_assoc }
    one_mul := one_mul
    mul_one := mul_one }

Are you expecting inlining to make things better or worse?

Yury G. Kudryashov (Jul 25 2023 at 19:36):

Jireh Loreaux said:

Why would that help?

I think (not sure) that this would help if Lean needs to unify a theorem about [Semiring R] [AddCommMonoid M] [Module R M] with [Ring R] [AddCommGroup M] [Module R M].

Matthew Ballard (Jul 25 2023 at 19:36):

I am expecting little performance change but better comparison to the original term

Eric Wieser (Jul 25 2023 at 19:39):

That looks wrong to me, where did the natCast field go?

Eric Wieser (Jul 25 2023 at 19:40):

I don't think it's really manageable to play these silly nested structure games while also playing the "don't forget to explicitly provide the fields that prevent diamonds so they don't get filled with garbage" game

Matthew Ballard (Jul 25 2023 at 19:47):

Can you expand on what those two games are?

Eric Wieser (Jul 25 2023 at 20:00):

The silly nested game is the one this message is playing

Eric Wieser (Jul 25 2023 at 20:01):

The other game is knowing whether you need to write natCast := Nat.cast to avoid lean using natCast := natCastRec or whatever the default is called

Eric Wieser (Jul 25 2023 at 20:02):

I have no idea what natCast is being set to in the message above, and it's rather worrying that reading the source code can't tell you whether it does what it needs to

Matthew Ballard (Jul 25 2023 at 20:04):

Oh ok. The nesting was only (possibly) useful for comparing term size with the previous one. I don't think there is any reason to pursue beyond this particular experiment.

Kevin Buzzard (Jul 25 2023 at 20:10):

Inlining everything on the branch causes breakage:

./././Mathlib/RingTheory/Kaehler.lean:369:63: error: (deterministic) timeout at 'whnf', maximum number of heartbeats (200000) has been reached (use 'set_option maxHeartbeats <num>' to set the limit)
./././Mathlib/RingTheory/Kaehler.lean:369:55: error: (deterministic) timeout at 'whnf', maximum number of heartbeats (200000) has been reached (use 'set_option maxHeartbeats <num>' to set the limit)
./././Mathlib/RingTheory/Kaehler.lean:369:47: error: (deterministic) timeout at 'whnf', maximum number of heartbeats (200000) has been reached (use 'set_option maxHeartbeats <num>' to set the limit)
./././Mathlib/RingTheory/Kaehler.lean:369:47: error: (deterministic) timeout at 'whnf', maximum number of heartbeats (200000) has been reached (use 'set_option maxHeartbeats <num>' to set the limit)
./././Mathlib/RingTheory/Kaehler.lean:372:89: error: (deterministic) timeout at 'elaborator', maximum number of heartbeats (200000) has been reached (use 'set_option maxHeartbeats <num>' to set the limit)
./././Mathlib/RingTheory/Kaehler.lean:412:2: error: (deterministic) timeout at 'whnf', maximum number of heartbeats (200000) has been reached (use 'set_option maxHeartbeats <num>' to set the limit)
./././Mathlib/RingTheory/Kaehler.lean:407:4: error: (deterministic) timeout at 'whnf', maximum number of heartbeats (200000) has been reached (use 'set_option maxHeartbeats <num>' to set the limit)

so my guess is that it's still faster, but it's less fast than before because the terms suddenly got much bigger.

Matthew Ballard (Jul 25 2023 at 20:30):

If you want to stare at two large terms, https://gist.github.com/mattrobball/4547c55d6a23e079e8c5ca5598ded641

Eric Wieser (Jul 26 2023 at 01:06):

#6141 tests whether uninlining * alone is enough

Matthew Ballard (Jul 26 2023 at 01:11):

Adding the following instances before KaehlerDifferential.endEquivDerivation' makes it instantaneous

instance instS : Module S { x // x  Ideal.cotangentIdeal (KaehlerDifferential.ideal R S) } :=
  @Submodule.module' _ _ _ _ _ _ (Ideal.cotangentIdeal (KaehlerDifferential.ideal R S)) _ _ _ IsScalarTower.right

instance instR : Module R { x // x  Ideal.cotangentIdeal (KaehlerDifferential.ideal R S) } :=
  @Submodule.module' _ _ _ _ _ _ (Ideal.cotangentIdeal (KaehlerDifferential.ideal R S)) _ _ _ IsScalarTower.right

instance instSS : Module (S [R] S) { x // x  Ideal.cotangentIdeal (KaehlerDifferential.ideal R S) } :=
  @Submodule.module' _ _ _ _ _ _ (Ideal.cotangentIdeal (KaehlerDifferential.ideal R S)) _ _ _ IsScalarTower.right

instance : @IsScalarTower R S { x // x  Ideal.cotangentIdeal (KaehlerDifferential.ideal R S) } _ (instS R S).toSMul (instR R S).toSMul :=
  @Submodule.isScalarTower' R _ _ _ _ _ (Ideal.cotangentIdeal (KaehlerDifferential.ideal R S)) _ _ _ _ _ _ IsScalarTower.right _ IsScalarTower.right

instance : @IsScalarTower S (S [R] S) { x // x  Ideal.cotangentIdeal (KaehlerDifferential.ideal R S) } _ (instSS R S).toSMul (instS R S).toSMul :=
  @Submodule.isScalarTower' S _ _ _ _ _ (Ideal.cotangentIdeal (KaehlerDifferential.ideal R S)) _ _ _ _ _ _ IsScalarTower.right _ IsScalarTower.right

instance : @LinearMap.CompatibleSMul (Ω[SR]) { x // x  Ideal.cotangentIdeal (KaehlerDifferential.ideal R S) } _ _ S (S [R] S) _ _ _ (instS R S).toSMul _ :=
  @LinearMap.IsScalarTower.compatibleSMul (Ω[SR]) _ _ _ S (S [R] S) _ _ _ _ _ (instS R S).toSMul _ _

Matthew Ballard (Jul 26 2023 at 01:43):

KaehlerDifferential.endEquiv is also fine without heart beat bumps given these instances

Matthew Ballard (Jul 26 2023 at 02:17):

The other two remaining bumps can be eliminated more pleasantly. Heart beat bump free version is here. (I would push to a PR but not sure who will win.)

Eric Wieser (Jul 26 2023 at 07:57):

Eric Wieser said:

#6141 tests whether uninlining * alone is enough

This is slightly faster!

Kevin Buzzard (Jul 26 2023 at 07:57):

I'm happy to go with that one

Eric Wieser (Jul 26 2023 at 07:58):

I wonder if we can get away with changing even less

Eric Wieser (Jul 26 2023 at 07:59):

Was it previously trying to unify a whole semiring instance when checking that the *s were compatible?

Kevin Buzzard (Jul 26 2023 at 08:02):

It was previously trying to unify a whole semiring instance when checking things of the form ideal (S \otimes[R] S) = ideal (S \otimes[R] S)

Kevin Buzzard (Jul 26 2023 at 08:18):

Matthew Ballard said:

Adding the following instances before KaehlerDifferential.endEquivDerivation' makes it instantaneous

For me it still takes 45 seconds to compile endEquivDerivation' (down from 180 seconds so still a huge improvement)

Kevin Buzzard (Jul 26 2023 at 08:20):

(deleted)

Eric Wieser (Jul 26 2023 at 08:23):

Kevin Buzzard said:

I'm happy to go with that one

I've closed your PR and pushed some tiny cleanups to my one

Kevin Buzzard (Jul 26 2023 at 08:27):

Something fishy is going on:

set_option maxHeartbeats 400000 in
set_option synthInstance.maxHeartbeats 320000 in
instance i : Module S { x // x  Ideal.cotangentIdeal (KaehlerDifferential.ideal R S) } := inferInstance

instance instS : Module S { x // x  Ideal.cotangentIdeal (KaehlerDifferential.ideal R S) } :=
  @Submodule.module' _ _ _ _ _ _ (Ideal.cotangentIdeal (KaehlerDifferential.ideal R S)) _ _ _ IsScalarTower.right

set_option maxHeartbeats 400000 in
example : i = instS := rfl

Riccardo Brasca (Jul 26 2023 at 08:27):

Can you summarize the changes you made? This can be very useful also in other files I guess.

Kevin Buzzard (Jul 26 2023 at 08:28):

If you look at the definition of docs#Algebra.TensorProduct.instSemiring on master it creates the instance in a way which is normal for Lean 3: just throw all the fields together in any way you like. We can't do this sort of thing any more.

Eric Wieser (Jul 26 2023 at 08:30):

I think the answer might actually just be "always define the data fields in tiny instances first"

Kevin Buzzard (Jul 26 2023 at 08:30):

Now we have to look at the actual definition of Semiring (it's no longer just a huge list of fields, it's something which extends NonUnitalSemiring first and then extends some other things) and if possible you have to make instances which conform to the definition somehow (in a way which Eric understands better than me)

Eric Wieser (Jul 26 2023 at 08:31):

It's not clear to me that the changes inside the Semiring instance mattered; extracting the Mul instance seems like it might have been the important bit

Eric Wieser (Jul 26 2023 at 08:31):

I do wonder if we should try the FlatHack branch again...

Kevin Buzzard (Jul 26 2023 at 08:34):

For all this module stuff which Matt was messing with last night, do you think there's a similar issue with SMul?

Riccardo Brasca (Jul 26 2023 at 08:36):

So in practice it again the same issue as in #5941?

Kevin Buzzard (Jul 26 2023 at 08:36):

No I don't think so.

Kevin Buzzard (Jul 26 2023 at 08:36):

See "It's not about paths in the instance graph!" at the top of my second post.

Eric Wieser (Jul 26 2023 at 08:38):

My speculation: #5941 is about trying to encourage Lean to follow first parents in typeclases. This only does any good if the instances actually pay special attention to how they construct first parents

Kevin Buzzard (Jul 26 2023 at 08:39):

#5941 is all about making Lean find the "canonical" path from CommRing to Semiring. But this is a losing battle. If we arbitrarily decree that CommRing -> Ring -> Semiring is canonical and CommRing -> CommSemiring -> Semiring is not, then no matter how we change priorities this won't help a situation where Lean needs to apply a lemma which is true for CommSemirings so necessarily needs the CommSemiring instance.

Eric Wieser (Jul 26 2023 at 08:39):

But paying attention to first parents in instances is a maintainability nightmare if you ever want to add a new base class or re-order.

Kevin Buzzard (Jul 26 2023 at 08:40):

This is why I opened this thread in lean4 and not mathlib4 -- it seems like this is another consequence of the typeclass inference redesign which we're only just discovering. In Lean 3 we didn't have to worry about any of this, however there were just beginning to be hints that flat structures were causing other problems (although nothing was ever debugged and analysed correctly AFAIK because we knew we were switching to Lean 4 at that time)

Eric Wieser (Jul 26 2023 at 08:41):

then no matter how we change priorities this won't help

It's still beneficial if we can ensure in this example that the embedded toAddCommMonoidfields are trivially defeq (which also isn't always possible, but the point is that there is a sliding scale of badness which we can still attempt to minimize even if we can't set it to zero).

Kevin Buzzard (Jul 26 2023 at 08:42):

Eric Wieser said:

But paying attention to first parents in instances is a maintainability nightmare if you ever want to add a new base class or re-order.

In particular Gouezel claimed that our library was already non-coherent here in the sense that some definitions in algebra of the form class X extends Y, Z should be class X extends Z, Y. None of this seemed to matter in the Lean 3 days but it seems to be super-important now.

Kevin Buzzard (Jul 26 2023 at 08:43):

If already our definition of CommMonoid needs changing then how many other definitions in algebra need changing? What if we decide to change Semiring? Then we have to start again with Kaehler differentials because everything might start timing out again.

Eric Wieser (Jul 26 2023 at 08:44):

If we make everything flat then the order stops mattering again

Kevin Buzzard (Jul 26 2023 at 08:44):

and then don't we run into the other problems which Leo warned against when he made the change to non-flat structures?

Eric Wieser (Jul 26 2023 at 08:45):

We don't know if those are as bad as the current problems

Eric Wieser (Jul 26 2023 at 08:46):

I also think those warning predate structure eta

Eric Wieser (Jul 26 2023 at 08:46):

I imagine that flat structures make the worst case better and the best case worse

Mario Carneiro (Jul 26 2023 at 09:20):

I don't see why the worst case of nested structures would be much different from the worst case of flat structures

Kevin Buzzard (Jul 26 2023 at 09:25):

Going back to the slow instances which Matt made, here's a conjecture: Module extends DistribMulActionfirst and the slow instances use Function.Surjective.module. Eric what do you think of that definition, and in particular of #6144 ?

Eric Wieser (Jul 26 2023 at 09:28):

#6144 looks like a good change to me

Eric Wieser (Jul 26 2023 at 09:29):

Mario Carneiro said:

I don't see why the worst case of nested structures would be much different from the worst case of flat structures

My speculation is that the nested structure version has more places for eta expansion to create a diversion

Mario Carneiro (Jul 26 2023 at 09:29):

once everything is eta expanded it's just a flat structure though

Eric Wieser (Jul 26 2023 at 09:30):

I guess there's a rule in place that prevents eta expanding Prod.mk _ _to Prod.mk (Prod.mk _ _).a (Prod.mk _ _).b?

Mario Carneiro (Jul 26 2023 at 09:31):

more like, it's not literal eta expansion, it is reducing a =?= b to a.x =?= b.x and a.y =?= b.y

Patrick Massot (Jul 26 2023 at 09:34):

Eric Wieser said:

But paying attention to first parents in instances is a maintainability nightmare if you ever want to add a new base class or re-order.

This answer to this is well-known, right? Writing a hierarchy by hand is prehistoric formalized mathematics. Even mathcomp which has much smaller hierarchies switched to using a tool in order to do it in a principled and extendable way.

Mario Carneiro (Jul 26 2023 at 09:37):

we have a tool, it's called instance and structure instance constructors. We just need to figure out what the rules are on good hierarchy design and implement them in the existing tools

Patrick Massot (Jul 26 2023 at 09:38):

This is a different level. What you are talking about are low-level tools that seem to be too delicate to use by hand.

Mario Carneiro (Jul 26 2023 at 09:38):

?

Eric Wieser (Jul 26 2023 at 09:38):

Do the Coq tools only help with building the hierarchy, or also with instantiating the hierarchy for concrete types? The problem we're seeing here is with the latter.

Patrick Massot (Jul 26 2023 at 09:39):

Eric, did you at least look at the name of the tool?

Eric Wieser (Jul 26 2023 at 09:40):

Yes, I've skimmed the hierarchy builder paper. Edit: I seeHB.instanceis what I was asking for.

Mario Carneiro (Jul 26 2023 at 09:40):

I don't see the HB interface (i.e. what the user types) as significantly different from what we already do. The main difference is that HB has custom constructors ("factories"), which do not have fancy syntax in our version

Patrick Massot (Jul 26 2023 at 09:40):

Mario, hierarchy builder does not replace the Coq commands that write the canonical structures, it write them from a higher level description. We could have something similar generating class and instances declarations.

Mario Carneiro (Jul 26 2023 at 09:41):

what do you mean by a "higher level description" though? It looks pretty much the same as a regular class definition to me

Mario Carneiro (Jul 26 2023 at 09:41):

and TBH I don't think that the level of description of classes or instances is all that far from what we want in the first place

Patrick Massot (Jul 26 2023 at 09:43):

No, it clearly expands to a lot more code than what is written by users.

Mario Carneiro (Jul 26 2023 at 09:43):

The things that need fixing are all in the backend, not the frontend. instance and class and extends etc generate some definitions and search for things in a certain way that is suboptimal

Mario Carneiro (Jul 26 2023 at 09:43):

but that can be fixed without changing the syntax of any of the above

Patrick Massot (Jul 26 2023 at 09:44):

That's a different topic. Fixing the backend would of course be nicer, but it isn't clear to me that this "fix" is possible.

Mario Carneiro (Jul 26 2023 at 09:44):

we don't know what the "fix" even is yet, we're still playing around with tweaks

Eric Wieser (Jul 26 2023 at 09:45):

It's not clear to me that the problems in the backend can be plastered over with a clever frontend either

Mario Carneiro (Jul 26 2023 at 09:45):

once we have a reliable way to turn a bad definition into a good one, we can implement that in the core tool and then we won't have to worry about it anymore

Patrick Massot (Jul 26 2023 at 09:45):

Mario Carneiro said:

we don't know what the "fix" even is yet, we're still playing around with tweaks

This is what I meant by "it isn't clear to me that this fix is possible".

Eric Wieser (Jul 26 2023 at 09:46):

As a simple example of where a clever frontend can't help, if we go all in on tweaking instance priorities everywhere, we're eventually going to get stuck because a new file needs to add an instance with (integer?) priority between 37 and 38, but the clever frontend that generated those priorities already ran in a previous file.

Mario Carneiro (Jul 26 2023 at 09:46):

certainly I have doubts that we can transplant the HB backend to lean and get a performance improvement

Mario Carneiro (Jul 26 2023 at 09:47):

I don't think we have ever had issues running out of integers for priorities

Patrick Massot (Jul 26 2023 at 09:47):

I'm not saying that the tool needs to be HB. I'm saying we'll need a tool. Even having to know what should be parameters and what should be extended is already mathematically irrelevant and should be handled by a tool, together with all the forgetful inheritance tricks.

Mario Carneiro (Jul 26 2023 at 09:48):

What I'm saying is that we already have a tool, but it is not yet calibrated properly

Patrick Massot (Jul 26 2023 at 09:48):

Eric, the tool could be run at any time and change those 37 and 38 to 370 and 380 without any impact on user code.

Eric Wieser (Jul 26 2023 at 09:49):

But now the tool has to run globally and not locally, which was not how I understood HB worked

Kevin Buzzard (Jul 26 2023 at 09:49):

And the calibration currently consists of me spending all of Sunday staring at 200000 line instance traces?

Mario Carneiro (Jul 26 2023 at 09:49):

yes :)

Kevin Buzzard (Jul 26 2023 at 09:49):

I'll put the kettle on

Mario Carneiro (Jul 26 2023 at 09:50):

I don't think we are yet at the point where we have "rules of thumb" that produce good results wherever they are applied

Mario Carneiro (Jul 26 2023 at 09:51):

there are some things like having a consistent order of parents, which we might implement in a linter soon, but it's not enough to solve all the defeq issues

Mario Carneiro (Jul 26 2023 at 09:52):

obviously we want to work toward not having to care about this and having tools either tell you what to do or fix up whatever you wrote to make it do the right thing

Eric Wieser (Jul 26 2023 at 09:58):

Eric Wieser said:

If we make everything flat then the order stops mattering again

I've resolved the conflicts in #3171; maybe the unification changes in core that solved lean4#2074 make it behave better than it did last time...

Eric Rodriguez (Jul 26 2023 at 10:13):

I thought some of the issue was that we needed specific instance priorities for specific paths to prefer paths; EricW's mentioned some ways to implement this (priorities should be additive over the whole path instead of just taking the last branch's priority, for example) but a HierarchyBuilder-esque front end I could definitely see helping to figure out these priorities fast; because right now when you declare something like AddMonoidWithOne, the declaration of Field is a thousand files away and so Lean cannot take that sort of stuff into account [if it matters, if it causes a preferred path]

Eric Rodriguez (Jul 26 2023 at 10:14):

And sometimes structures will be declared in separate places that won't be able to be kept in sync; e.g. I'd expect the files declaring LinearOrderedAddCommGroup and Field to not import each other, so how could they know?

Kevin Buzzard (Jul 26 2023 at 10:26):

I thought some of the issue was that we needed specific instance priorities for specific paths to prefer paths

That is part of the problem, but it's not what caused the 44% speedup in Kaehler: that issue is back to flat v non-flat structures and how they're built (and in particular not wanting to take them apart when pp.all says they're 2500 lines long when taken apart)

Eric Wieser (Jul 26 2023 at 10:28):

The Kaehler one is even more mysterious than that; the difference appears to be between

instance foo : Mul X := {mul := _}
instance : Semiring X := { toMul := foo }

and

instance : Semiring X := { toMul := {mul := _} }

Eric Wieser (Jul 26 2023 at 10:29):

(what you describe is almost certainly relevant, but in the message above the first version works and the second is too slow)

Matthew Ballard (Jul 26 2023 at 11:03):

Kevin Buzzard said:

Matthew Ballard said:

Adding the following instances before KaehlerDifferential.endEquivDerivation' makes it instantaneous

For me it still takes 45 seconds to compile endEquivDerivation' (down from 180 seconds so still a huge improvement)

I originally made this change. I had that when looking at the info view (instantaneous) but reverted it in the last commit to the branch after things built fine.

Kevin Buzzard (Jul 26 2023 at 11:07):

lol I have a version of that change open in the Lean file in front of me right now :-)

Matthew Ballard (Jul 26 2023 at 11:10):

Fun experience is trying to #synth SMul after declaring one of the Module instances

Kevin Buzzard (Jul 26 2023 at 11:16):

We're coming to the conclusion that making the SMul instance explicitly before making all the other instances is a good first move, right?

Matthew Ballard (Jul 26 2023 at 11:18):

Don’t unfold the data fields unless you have to?

Eric Wieser (Jul 26 2023 at 11:29):

Kevin Buzzard said:

We're coming to the conclusion that making the SMul instance explicitly before making all the other instances is a good first move, right?

That's my one-data-point hunch

Kevin Buzzard (Jul 26 2023 at 11:37):

Eric Wieser said:

The Kaehler one is even more mysterious than that; the difference appears to be between

instance foo : Mul X := {mul := _}
instance : Semiring X := { toMul := foo }

and

instance : Semiring X := { toMul := {mul := _} }

Isn't this just to do with the fact that the more you package stuff up, the smaller the terms get, so the less long rfl takes to prove syntactic equality?

Matthew Ballard (Jul 26 2023 at 11:41):

Eric Wieser said:

The Kaehler one is even more mysterious than that; the difference appears to be between

instance foo : Mul X := {mul := _}
instance : Semiring X := { toMul := foo }

and

instance : Semiring X := { toMul := {mul := _} }

Wait I thought the problematic one was where mul was declared at “top level” ?

Matthew Ballard (Jul 26 2023 at 11:45):

Another question: why was Lean 3 so good with the “spaghetti” construction of instances?

Eric Wieser (Jul 26 2023 at 11:48):

To be clear, the first example in that message is the one we're now using

Eric Wieser (Jul 26 2023 at 11:49):

I'd like to test the hypothesis that Lean 3 was so good here because of flat structures; but #3171 is now stuck on some Qq stuff I don't understand

Matthew Ballard (Jul 26 2023 at 11:53):

Was everything in Lean 3 a flat structure?

Eric Wieser (Jul 26 2023 at 11:55):

All the basic algebra was. Module and normed algebra classes weren't

Kevin Buzzard (Jul 26 2023 at 11:55):

Matthew Ballard said:

Kevin Buzzard said:

Matthew Ballard said:

Adding the following instances before KaehlerDifferential.endEquivDerivation' makes it instantaneous

For me it still takes 45 seconds to compile endEquivDerivation' (down from 180 seconds so still a huge improvement)

I originally made this change. I had that when looking at the info view (instantaneous) but reverted it in the last commit to the branch after things built fine.

So we have to fix StarModule.decomposeProdAdjoint and then we can make Kaehler maxHeartbeat-free?

Matthew Ballard (Jul 26 2023 at 11:57):

Do any egregious spaghetti constructions of module instances come to mind?

Eric Wieser (Jul 26 2023 at 11:58):

In lean3 you mean? I don't think it matter so much; the key thing was that the [semiring R] argument inside the module type itself was always easy to resolve

Matthew Ballard (Jul 26 2023 at 11:58):

It should be bump-free on that branch as is. (With the silly instances)

Eric Wieser (Jul 26 2023 at 11:59):

Is that branch based off Kevin's one that just landed in master?

Matthew Ballard (Jul 26 2023 at 11:59):

Eric Wieser said:

In lean3 you mean? I don't think it matter so much; the key thing was that the [semiring R] argument inside the module type itself was always easy to resolve

I guess the same was true with normed algebra?

Matthew Ballard (Jul 26 2023 at 11:59):

Kevin’s. I only noticed yours after the fact

Eric Wieser (Jul 26 2023 at 11:59):

NormedAlgebra actually ran into a lot of trouble in those areas, presumably because it consumed normed_field which was not flat

Eric Wieser (Jul 26 2023 at 11:59):

Matthew Ballard said:

Kevin’s. I only noticed yours after the fact

They're pretty similar so it shouldn't matter

Matthew Ballard (Jul 26 2023 at 12:00):

Ojalá

Matthew Ballard (Jul 26 2023 at 12:04):

Looking at the difference in terms in the two cases, the constructors for the data fields are exposed in the spaghetti construction but not for the outlined one. Does this mean unification bumps into them more often?

Matthew Ballard (Jul 26 2023 at 12:29):

Eric Wieser said:

Matthew Ballard said:

Kevin’s. I only noticed yours after the fact

They're pretty similar so it shouldn't matter

Breaks a fix at the end of the file in KaehlerDifferential.quotKerTotalEquiv. More precisely, it is pushed over the timeout limit again with master but wasn't before

Matthew Ballard (Jul 26 2023 at 12:32):

I created #6149. Feel free to push to. Stepping away from the computer for a bit.

Eric Wieser (Jul 26 2023 at 12:32):

Eric Wieser said:

Is that branch based off Kevin's one that just landed in master?

Unfortunately the one that landed in master was not enough to unblock #6094

Matthew Ballard (Jul 26 2023 at 12:40):

#6149 will help with that

Matthew Ballard (Jul 26 2023 at 12:42):

Though I would prefer a fix where Lean figures these instances out itself

Matthew Ballard (Jul 26 2023 at 13:52):

Matthew Ballard said:

#6149 will help with that

The "helper" instances remove the 7000000 bump

Eric Wieser (Jul 26 2023 at 14:08):

Remove it entirely or leave it at 4400000?

Matthew Ballard (Jul 26 2023 at 14:10):

Entirely

Matthew Ballard (Jul 26 2023 at 14:12):

I didn't push because I am holding out for a better fix but perhaps you just want to go

Eric Wieser (Jul 26 2023 at 14:17):

Do you mind if I push a cleanup to #6149?

Matthew Ballard (Jul 26 2023 at 14:17):

Be my guest

Matthew Ballard (Jul 26 2023 at 14:18):

Oh I thought I did that!

Matthew Ballard (Jul 26 2023 at 14:18):

Let me push

Eric Wieser (Jul 26 2023 at 14:28):

I pushed one more cleanup

Matthew Ballard (Jul 26 2023 at 14:38):

Doubled checked that instance construction pattern from #6134 indeed fixes the timeouts in #6149 including allowing one to restore a @[simps!]

Eric Wieser (Jul 26 2023 at 14:42):

Eric Wieser said:

Unfortunately the one that landed in master was not enough to unblock #6094

bumping the heartbeat counts by 2.5% was enough to make this work, so it looks like that PR no longer makes things meaningfully worse

Matthew Ballard (Jul 26 2023 at 14:45):

What changed?

Eric Wieser (Jul 26 2023 at 14:46):

The 700000 was a relic from before the change that landed in master

Matthew Ballard (Jul 26 2023 at 14:46):

The change?

Eric Wieser (Jul 26 2023 at 14:47):

rss-bot said:

fix: redefine semiring instance on tensor product (mathlib4#6141)

By providing the Mul instance up front, this seems to make future typeclass search much easier.

This comes at the expense of causing minor elaboration problem in various tensor definitions, which now require the implicit type arguments to be provided explicitly.

This also simplifies some proofs, removing a porting note.

Co-authored-by: Eric Wieser <wieser.eric@gmail.com>

Authored-by: kbuzzard (commit)

Matthew Ballard (Jul 26 2023 at 15:56):

It doesn't look like #6144 makes much of a dent in the bumps for KaehlerDifferential.endEquivDerivation' (as in master).

Matthew Ballard (Jul 26 2023 at 16:54):

Doesn't seem to help directly but cleaned up cotangentIdeal #6156

Matthew Ballard (Jul 26 2023 at 17:58):

set_option profiler true in
set_option trace.Meta.isDefEq true in
set_option maxHeartbeats 0 in
set_option synthInstance.maxHeartbeats 0 in
set_option trace.Meta.synthInstance true in
#synth SMul S (KaehlerDifferential.ideal R S).cotangentIdeal

dumps to a 4m line file

Kevin Buzzard (Jul 26 2023 at 18:34):

I just ran the profiler on a theorem in GroupCohomology.Basic and also got a 4m line file :-)

Kevin Buzzard (Jul 26 2023 at 18:37):

But I think your problem is worse than mine.

Matthew Ballard (Jul 26 2023 at 19:04):

pp.explicit is cheating :)

Matthew Ballard (Jul 26 2023 at 20:09):

Here is one point. There seems to lots of unifying of two different AddMonoid structures on a tensor product. One coming from Ideal.Quotient.ring and the other from Submodule.Quotient.addCommGroup

Matthew Ballard (Jul 26 2023 at 20:44):

Is this instance suspicious?

Kevin Buzzard (Jul 26 2023 at 21:23):

I would say so!

Eric Wieser (Jul 26 2023 at 21:23):

Any use of the with keyword is likely a performance issue

Eric Wieser (Jul 26 2023 at 21:32):

Though the results at #6144 are concerning

Matthew Ballard (Jul 26 2023 at 21:55):

Wall clock improved

Kevin Buzzard (Jul 26 2023 at 21:59):

I have some ModuleCat observations but maybe I'll move to #mathlib4

Eric Wieser (Jul 26 2023 at 22:00):

It probably doesn't help that this fails in mathlib4 master:

example : (instRing : Ring (A [R] B)).toAddCommGroup = addCommGroup := rfl

(fixed in #6162)

Eric Wieser (Jul 26 2023 at 23:27):

This is an example of what I meant by

Eric Wieser said:

I don't think it's really manageable to play these silly nested structure games while also playing the "don't forget to explicitly provide the fields that prevent diamonds so they don't get filled with garbage" game

In lean 3 our defense against this was "just .. every possible parent instance you can think of and it will probably be fine". It sounds like in Lean4 we're learning "don't with any parents at all if you can avoid it", so now when defining a structure you have to very carefully copy across only the fields that aren't found from parent classes.

Matthew Ballard (Jul 27 2023 at 00:28):

#6162 seems like a good speed up. Previously

#synth Module S (S [R] S  KaehlerDifferential.ideal R S ^ 2)

took ~1.3s on my branch and now it is ~0.8s which moves it from failure to success without a bump

Matthew Ballard (Jul 27 2023 at 01:01):

The version of KaehlerDifferential.endEquivDerivation' on master goes from

set_option maxHeartbeats 4400000 in
set_option synthInstance.maxHeartbeats 1500000

to

set_option maxHeartbeats 700000 in
set_option synthInstance.maxHeartbeats 200000

Matthew Ballard (Jul 27 2023 at 01:01):

So we are getting there

Matthew Ballard (Jul 27 2023 at 01:25):

For

SMul S <|  S [R] S  KaehlerDifferential.ideal R S ^ 2

we have three paths

  • MulAction.toSMul
  • Submodule.Quotient.hasSmul' (KaehlerDifferential.ideal R S ^ 2))
  • SMulZeroClass.toSMul

For

AddCommMonoid <|  S [R] S  KaehlerDifferential.ideal R S ^ 2

we have two paths

  • NonUnitalNonAssocSemiring.toAddCommMonoid
  • AddCommGroup.toAddCommMonoid

Lean needs to check (many) multiple times these are def eq which eats up time.

Matthew Ballard (Jul 27 2023 at 02:38):

This is probably not in the right order

structure AlgEquiv (R : Type u) (A : Type v) (B : Type w) [CommSemiring R] [Semiring A] [Semiring B]
  [Algebra R A] [Algebra R B] extends A  B, A ≃* B, A ≃+ B, A ≃+* B where

Kevin Buzzard (Jul 27 2023 at 06:39):

Yes I was looking at that declaration yesterday. Right now with non-flat structures and the algorithm for defeq we are being forced to think very hard about exactly how to present these definitions and that one probably isn't right (we generally want the more complex things first I think)

Eric Wieser (Jul 27 2023 at 08:42):

Wow, #6162 wasn't even intended to be a performance fix!

Eric Wieser (Jul 27 2023 at 08:43):

Changing the order of base structures affects the API of those types (the coe_mk lemmas have to change); so I don't think we want to start doing that for optimization reasons

Eric Wieser (Jul 27 2023 at 08:44):

(classes aren't so bad because the API is "it magically does the right thing")

Kevin Buzzard (Jul 27 2023 at 08:46):

Eric Wieser said:

Wow, #6162 wasn't even intended to be a performance fix!

How are you seeing performance changes? I can only figure out how to compare the benchmark against e.g. an old mathlib master from 5 days ago, which was before there was a lot of action coming from this thread.

Eric Wieser (Jul 27 2023 at 08:46):

Yeah, I think the benchmarking can't keep up

Kevin Buzzard (Jul 27 2023 at 08:47):

Kevin Buzzard said:

I have some ModuleCat observations but maybe I'll move to #mathlib4

I feel like we've got Kaehler into a much better state so I started on group cohomology but I got stuck here.

Eric Wieser (Jul 27 2023 at 08:58):

This comparison (against our last fix) suggests that the time for Kaehler has reduced by 60% thanks to #6162.

Eric Wieser (Jul 27 2023 at 08:58):

(the changes in other files are presumably due to unrelated changes in master)

Matthew Ballard (Jul 27 2023 at 11:24):

Eric Wieser said:

Wow, #6162 wasn't even intended to be a performance fix!

At this point there’s are no single show stoppers but slower def eq checks are layered on top of each other and repeated over and over. I am optimistic that changes like this and others will ultimately bring things into the normal range. I expect they are deeper in the library so will unlock speed ups more broadly.

Though bundled categories are a bit of an odd duck

Matthew Ballard (Jul 27 2023 at 11:29):

I’ve been collecting incidents here

Matthew Ballard (Jul 27 2023 at 11:35):

There are still plenty of porting notes that describe steering tc synthesis in there. It is reasonable to turn the wheel over to Lean for many of these. Hopefully, things will be more coherent with Lean driving

Notification Bot (Jul 27 2023 at 12:51):

15 messages were moved from this topic to #lean4 > speedcenter runs on master by Eric Wieser.

Matthew Ballard (Jul 27 2023 at 13:46):

Eric Wieser said:

This comparison (against our last fix) suggests that the time for Kaehler has reduced by 60% thanks to #6162.

Did #6141 introduce the diamond? If so, I would guess that #6162 is unleashing the full speed up of the change in #6141.

Eric Wieser (Jul 27 2023 at 13:47):

No, the diamond existed in mathlib3 was introduced during the port

Matthew Ballard (Jul 27 2023 at 13:49):

We had an explicit natCast_succ := AddMonoidWithOne.natCast_succ before but no explicit natCast_zero

Eric Wieser (Jul 27 2023 at 13:49):

The natCast and intCast diamonds have been there since mathlib3 (but I expect are irrelevant because we never used those algebras)

Eric Wieser (Jul 27 2023 at 13:51):

Here's the porting regression (specifically this commit)

Eric Wieser (Jul 27 2023 at 13:53):

So I guess @Kevin Buzzard's sunday of looking at traces is penance for having his name on the PR that (mis-)ported the file!

Matthew Ballard (Jul 27 2023 at 14:10):

I made #6178 to change other heart beat limits to more representative numbers.

These bumps won't last as they will go away with #6149 one way or another another. In the meantime, I think there is value to having the values reflect reality.

Eric Wieser (Jul 27 2023 at 14:24):

Should we have a bump_heartbeats N in foo command that means something like "count_heartbeats, but error if the result differs from N by more than 20%"?

Eric Wieser (Jul 27 2023 at 14:25):

(it could also limit how long it tries to 3*N or something)

Matthew Ballard (Jul 27 2023 at 14:27):

If we are making wish lists, I would like a config for count_heartbeats that let's me specify if bumping the limit for synthesizing instances is desired.

Eric Wieser (Jul 27 2023 at 14:30):

I suspect it might be hard to count those

Matthew Ballard (Jul 27 2023 at 14:31):

Can we catch the error failed to synthesize...?

Kevin Buzzard (Jul 27 2023 at 14:38):

My impression is that failure to have enough heartbeats in a declaration can result in more than one kind of error.

Matthew Ballard (Jul 27 2023 at 14:40):

In tuning the limits, I bump synthInstance.maxHeartbeats if is see a synthesis failure. Else, I bump maxHeartbeats. Seems to work so far

Matthew Ballard (Jul 27 2023 at 15:55):

It looks like

chore(linear_algebra/tensor_product): forward-port leanprover-community/mathlib#19143 (mathlib4#6094)

did slow some things down a bit. I had to bump limits after merging master.

Eric Wieser (Jul 27 2023 at 16:14):

Yeah, I kind of expected it to: it will be interesting to see the benchmark results

Eric Wieser (Jul 27 2023 at 16:15):

(though I think we will have to ask Sebastian Ullrich to queue some of the commits to actually get clear results)

Eric Wieser (Jul 27 2023 at 16:30):

This now shows both steps, although it's possible the lean bump changed things too (we'll know in another 2.5 hours)

Anne Baanen said:

Kaehler comparison graph

Kevin Buzzard (Jul 27 2023 at 17:24):

@Eric Wieser do you understand why the change in your #6141 PR

6141change.png

beats my change in the #6134 PR

6134change.png

? We both define SMul and then our approaches differ. I'm currently looking at this

instance distribMulAction [Monoid R] [AddMonoid M] [DistribMulAction R M] :
    DistribMulAction R (α →₀ M) :=
  { Finsupp.distribSMul _ _ with
    one_smul := fun x => ext fun y => one_smul R (x y)
    mul_smul := fun r s x => ext fun y => mul_smul r s (x y) }

and wondering if this with should be considered harmful because DistribMulAction extends MulAction first. But I don't know what the analogue of the "6141 change" would be here.

Matthew Ballard (Jul 27 2023 at 17:37):

Currently I have to break out parent instances ala #6134 avoid tripping over limits later in the file.

I think both are a significant speed up. #6134 is slightly faster but more onerous on the author. How you balance that is your decision

Kevin Buzzard (Jul 27 2023 at 17:55):

Preliminary indications are that #6192 makes at least one declaration in GroupCohomology.Resolution go from 45 seconds to 15

Eric Wieser (Jul 27 2023 at 19:34):

Kevin Buzzard said:

Eric Wieser do you understand why the change in your ...

No, I just thought your change looked messy and wanted to check whether we actually needed all of it. I didn't expect it to be more performant, I was hoping for a tie

Matthew Ballard (Jul 27 2023 at 19:37):

The actual sizes of terms don't change between the original spaghetti pattern and either of these. In fact, they seem to be slightly bigger (by line size) in new patterns. I am guessing that in the new patterns less is unfolded because Lean doesn't have to wade through junk during unification to get to the pertinent info.

Kevin Buzzard (Jul 27 2023 at 20:05):

Certainly less is unfolded with the semiring change: the trace is transformed. For example you don't see any let src := ..., let src_1 := ..., Semiring.mk _ _ _ in the trace any more: it's not that this is replaced with the change, it's that the algorithm for rfl is not doing the unfolding at all any more (or maybe it's doing it but so quickly that the profler doesn't report it).

Matthew Ballard (Jul 27 2023 at 20:10):

There are certainly some silly things still going on:

#synth SMul (S [R] S  KaehlerDifferential.ideal R S ^ 2) (S [R] S  KaehlerDifferential.ideal R S ^ 2)

times out

Matthew Ballard (Jul 27 2023 at 20:12):

As does,

#synth SMul (S [R] S  KaehlerDifferential.ideal R S) (S [R] S  KaehlerDifferential.ideal R S)

Matthew Ballard (Jul 27 2023 at 20:25):

I was pushing a cleaned up version of #6149 and I encountered a new CI error

thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: Error { kind: UnexpectedEof, message: "failed to fill whole buffer" }', /project/src/ltar.rs:54:41

:)

Matthew Ballard (Jul 27 2023 at 20:26):

More general (semi-) #mwe

import Mathlib

open TensorProduct

variable (R : Type u) (S : Type w) [CommRing R] [CommRing S] [Algebra R S] (I : Ideal <| S [R] S)

#synth SMul (S [R] S  I) (S [R] S  I)  -- failed to synthesize

Kevin Buzzard (Jul 27 2023 at 20:37):

I also have the same CI error on #6192 , which decreases some heartbeats in group cohomology files by a factor of two.

Matthew Ballard (Jul 27 2023 at 20:38):

unwrap() panics if there is an error so I guess the server and/or connection is having a timeout

Kevin Buzzard (Jul 27 2023 at 20:38):

in other news,

import Mathlib.RepresentationTheory.GroupCohomology.Resolution

open TensorProduct

variable (R : Type u) (S : Type w) [CommRing R] [CommRing S] [Algebra R S] (I : Ideal <| S [R] S)

#synth SMul (S [R] S  I) (S [R] S  I)  -- works fine on my branch

Matthew Ballard (Jul 27 2023 at 20:40):

So where is the bad instance

Matthew Ballard (Jul 27 2023 at 20:40):

Unless your branch fixes with it import Mathlib?

Kevin Buzzard (Jul 27 2023 at 20:41):

My branch doesn't have a fully compiled mathlib and CI won't give me oleans

Matthew Ballard (Jul 27 2023 at 20:41):

Should docs#Algebra.id be default?

Kevin Buzzard (Jul 27 2023 at 20:42):

oh my code works fine on master. So it's import Mathlib which is the problem. Edit: and it fails on my branch too with import Mathlib so my branch is a red herring.

Kevin Buzzard (Jul 27 2023 at 20:45):

CI looks like it's up and running again BTW

Kevin Buzzard (Jul 27 2023 at 21:09):

some bizarre progress:

import Mathlib.RingTheory.TensorProduct

-- you can have one of the below imports, but not both!

--import Mathlib.Algebra.Lie.OfAssociative
--import Mathlib.LinearAlgebra.Matrix.ToLinearEquiv

open TensorProduct

variable (R : Type u) (S : Type w) [CommRing R] [CommRing S] [Algebra R S] (I : Ideal <| S [R] S)

-- set_option pp.all true in
-- #check (S ⊗[R] S ⧸ I)

#synth SMul (S [R] S  I) (S [R] S  I)

Kevin Buzzard (Jul 27 2023 at 21:16):

import Mathlib.RingTheory.TensorProduct

-- you can have any two of the below imports, but not all three

import Mathlib.Algebra.Lie.Basic
import Mathlib.RingTheory.Polynomial.Content
import Mathlib.RingTheory.Ideal.QuotientOperations

open TensorProduct

variable (R : Type u) (S : Type w) [CommRing R] [CommRing S] [Algebra R S] (I : Ideal <| S [R] S)

-- set_option pp.all true in
-- #check (S ⊗[R] S ⧸ I)

#synth SMul (S [R] S  I) (S [R] S  I)

Kevin Buzzard (Jul 27 2023 at 21:22):

import Mathlib.RingTheory.TensorProduct

/-
Below each import is the list of files imported by it.
You can replace any of the three imports below with the
files it itself imports, and the problem goes away
-/

import Mathlib.Algebra.Lie.Basic
--import Mathlib.Algebra.Module.Equiv
--import Mathlib.Data.Bracket
--import Mathlib.LinearAlgebra.Basic
import Mathlib.RingTheory.Ideal.QuotientOperations
--import Mathlib.RingTheory.Ideal.Operations
--import Mathlib.RingTheory.Ideal.Quotient
import Mathlib.Algebra.EuclideanDomain.Basic
--import Mathlib.Algebra.EuclideanDomain.Defs
--import Mathlib.Algebra.Ring.Divisibility
--import Mathlib.Algebra.Ring.Regular
--import Mathlib.Algebra.GroupWithZero.Divisibility
--import Mathlib.Algebra.Ring.Basic


open TensorProduct

variable (R : Type u) (S : Type w) [CommRing R] [CommRing S] [Algebra R S] (I : Ideal <| S [R] S)

-- set_option pp.all true in
-- #check (S ⊗[R] S ⧸ I)

#synth SMul (S [R] S  I) (S [R] S  I)

Kevin Buzzard (Jul 27 2023 at 21:24):

Oh it's just a timeout:

import Mathlib

open TensorProduct

variable (R : Type u) (S : Type w) [CommRing R] [CommRing S] [Algebra R S] (I : Ideal <| S [R] S)

set_option synthInstance.maxHeartbeats 80000 in
#synth SMul (S [R] S  I) (S [R] S  I)

Kevin Buzzard (Jul 27 2023 at 21:26):

Doesn't bode well for future growth :-/ Actually maybe it's just a case of persuading Lean to look in the right direction more quickly...

Jireh Loreaux (Jul 27 2023 at 21:29):

Would it make sense to mark instance [Mul R] : SMul R R a default_instance? Would that make it fast? Nevermind, I think that's stupid. We already have it marked lower priority so that Lean finds other things first.

Eric Wieser (Jul 27 2023 at 21:32):

Eric Wieser said:

This now shows both steps, although it's possible the lean bump changed things too (we'll know in another 2.5 hours)

All the data points are now on the graph, including the forward-port that very slightly worsened performance

Matthew Ballard (Jul 28 2023 at 11:08):

#6149 Is cleaned up and awaiting review. The helper instances have been refined and prettied up.

Matthew Ballard (Jul 28 2023 at 16:08):

38.6% improvement in Kaehler

Matthew Ballard (Jul 28 2023 at 16:15):

Though just renaming things can improve by 30%...?

Kevin Buzzard (Jul 28 2023 at 16:56):

I'm not clear which two runs are being compared in that last link. There have been two PRs recently which have hugely affected Kaehler, and so obviously we have to be careful about comparing runs before and after the relevant commits.

Matthew Ballard (Jul 28 2023 at 16:57):

master just before the bump in nightly to 7/19 vs #6149

Matthew Ballard (Jul 28 2023 at 16:59):

master includes those two commits

Eric Wieser (Jul 28 2023 at 17:18):

@Sebastian Ullrich, is it possible to show these PR runs on the same graph as the main repo history? The arrow rendering suggests that it might be able to render git branches...

Matthew Ballard (Jul 28 2023 at 18:59):

So is the main culprit the let binding created with with or is it the redundant smul field?

I've been changing them both at the same time and I think Kevin has also.

Do we have a simple test case?

Eric Wieser (Jul 28 2023 at 19:20):

my guess is that it's the latter

Eric Wieser (Jul 28 2023 at 19:20):

And that the problem with with is that it creates a bunch of other redundant fields too; not that it creates a let binding

Sebastian Ullrich (Jul 28 2023 at 19:44):

Eric Wieser said:

Sebastian Ullrich, is it possible to show these PR runs on the same graph as the main repo history? The arrow rendering suggests that it might be able to render git branches...

Yes, it is built to be able to render branches. But I believe that would mean it would benchmark all those branches, on every push.

Matthew Ballard (Jul 29 2023 at 01:15):

class A where
  foo : String
  bar : String

class B extends A where
  extra : String

instance f : A := "hi", "there"

instance one : B :=
  { f with
    foo := f.foo
    extra := "bro" }

instance two : B :=
  { toA := f
    extra := "bro" }

instance three : B :=
  { f with
    extra := "bro" }

instance four : B where
  extra := "bro"

set_option pp.all true in
#print one
-- def one : B :=
-- let src : A := f;
-- @B.mk (A.mk (@A.foo f) (@A.bar src)) "bro"

set_option pp.all true in
#print two
-- def two : B :=
-- @B.mk f "bro"

set_option pp.all true in
#print three
-- def three : B :=
-- let src : A := f;
-- @B.mk (A.mk (@A.foo src) (@A.bar src)) "bro"

set_option pp.all true in
#print four
-- def four : B :=
-- @B.mk f "bro"

So if I reduce things I get

@B.mk (A.mk (@A.foo (A.mk "hi" "there")) (@A.bar (A.mk "hi" "there"))) "bro"

vs

@B.mk (@A.mk "hi" "there") "bro"

?

Kevin Buzzard (Jul 29 2023 at 08:05):

Does this matter? For example example : one = two := rfl works fine. But #6189 says that it matters greatly.

#6189 changes just one declaration in Data.Finsupp.Basic from style one to style two, and results in this benchmark which goes to this speedcenter link. The 30 percent speedup of RepresentationTheory.GroupCohomology.Basic is not a case of something going from 4 seconds to 3 seconds; this file is 233 lines long and is full of large maxHeartbeat bumps. Before the change the largest one was set_option maxHeartbeats 6400000, one of the largest in mathlib; two of these can now come down to 3200000, as is evidenced in the PR diff.

Another way of looking at it: I've been compiling mathlib on the command line quite a bit during these experiments, and it's really noticeable that most of it gets done really quickly on my machine, and then there's some files at the end which take a super-long time and really slow down total compilation speeds. Again taking RepresentationTheory.GroupCohomology.Basic as an example; before #6189 hit if I add a newline to RepresentationTheory.GroupCohomology.Basic and recompile I get

$ time lake build Mathlib.RepresentationTheory.GroupCohomology.Basic
[1388/1389] Building Mathlib.RepresentationTheory.GroupCohomology.Basic

real    6m27.680s
user    6m28.425s
sys 0m1.252s

and after, I get

$ time lake build Mathlib.RepresentationTheory.GroupCohomology.Basic
[1385/1386] Building Mathlib.RepresentationTheory.GroupCohomology.Basic

real    4m34.496s
user    4m35.172s
sys 0m1.281s

This is evidence that, in mathlib at least, with is sometimes a very dangerous choice.

Sebastian Ullrich (Jul 29 2023 at 08:20):

Does this matter?

By eta extensionality, all those declarations are defeq. But that doesn't mean their performance is identical.

#6189 changes just one declaration

Just to check, style four would work here as well and probably be the most natural one? Is it clear in which scenarios people use/resort to the different styles?

Kevin Buzzard (Jul 29 2023 at 08:20):

I think it's clear that it's pretty random, because my perception was that in mathlib3 none of this mattered (so the community is just writing anything which compiles; we've only very recently understood that this is an issue which needs thought)

Eric Wieser (Jul 29 2023 at 09:01):

We often used style one instead of four because it ensures that we don't accidentally pick up default values when we actually want to pick up the values we used in f (this recently caused a really dumb instance diamond where the AddCommGroup implied by a ring was not defeq to the separate global instance)

Eric Wieser (Jul 29 2023 at 09:02):

Maybe that's an indication that default values are just a bad idea though

Kevin Buzzard (Jul 29 2023 at 10:04):

Sebastian Ullrich said:

Just to check, style four would work here as well and probably be the most natural one? Is it clear in which scenarios people use/resort to the different styles?

I just changed Finsupp.module on master to

instance module [Semiring R] [AddCommMonoid M] [Module R M] : Module R (α →₀ M) :=
  { zero_smul := fun _ => ext fun _ => zero_smul _ _
    add_smul := fun _ _ _ => ext fun _ => add_smul _ _ _ }

i.e. a change from two to four, and everything still compiles and our test file RepresentationTheory,GroupCohomology.Basic still takes the same amount of time, so yes you're right and probably this would have been more idiomatic (e.g. it doesn't refer to an explicit instance by name).

Matthew Ballard (Jul 29 2023 at 11:27):

I guess there is also

instance five : B where
  foo := "hi"
  bar := "there"
  extra := "bro"

set_option pp.all true in
#print five
-- def five : B :=
-- @B.mk (A.mk "hi" "there") "bro"

which we seem to paying a performance cost for also.

Matthew Ballard (Jul 29 2023 at 11:33):

Reran the benchmark on #6149 with some additional changes. Things were pretty stable except the previous decrease in performance of Ideal.Quotient was reversed. Probably due to two of the smul field removals in Module.Basic.

Matthew Ballard (Jul 29 2023 at 11:33):

One of the removals probably did nothing with the remaining with.

Matthew Ballard (Jul 29 2023 at 11:37):

There are a quite a few with's remaining in Module.Basic. I think #6149 should be merged as is and further experimentation can be done in a more targeted manner.

Matthew Ballard (Jul 29 2023 at 12:02):

rg 'with$' | rg '\{' in the Mathlib folder gives 2363 instances.

Matthew Ballard (Jul 29 2023 at 18:47):

Per Eric’s suggestion, I broke up #6149 into

  • #6149 only adds the instances to RingTheory.Kaehler
  • #6240 which tests only Kevin’s original pattern for breaking up semiRing in RingTheory.TensorProduct
  • #6241 has all the smul removals plus cleans up some porting notes

Matthew Ballard (Jul 29 2023 at 18:48):

@Sebastian Ullrich the first bench mark run on #6240 can be cancelled because Kaehler did build.

Sebastian Ullrich (Jul 29 2023 at 19:11):

Instructions unclear, killed two of them

Sebastian Ullrich (Jul 29 2023 at 19:50):

Please reissue !bench in #6149 if still needed. But just to make sure the current impact is clear, issuing four !bench commands will block other speedcenter progress for approximately 10 hours

Kevin Buzzard (Jul 29 2023 at 21:09):

So when does the FRO buy us a bunch more speedcenter machines? :-)

Matthew Ballard (Jul 29 2023 at 21:12):

Thanks. Current situation is most efficient. #6149 will get its turn after merging.

I’ve got a machine I don’t do much with currently…

Matthew Ballard (Jul 30 2023 at 01:04):

I made two different versions of repeatedly extending instances and checking equality using rfl.

One using the with pattern:

With

and one using the toX pattern

Without

using lake build gives

cumulative profiling times:
    attribute application 0.00508ms
    compilation new 0.0201ms
    interpretation 0.639ms
    type checking 0.445ms

on the with patterns and

cumulative profiling times:
    attribute application 0.00458ms
    compilation new 0.0193ms
    interpretation 0.63ms
    type checking 0.174ms

on the toX pattern

That difference in type checking seems significant.

Eric Wieser (Jul 31 2023 at 13:39):

Here's a new theory for why moving instance : Mul (TensorProduct _ _ _) to its own declaration helped: It means that the default value for npow field is not populated with a copy of that instance that has to duplicate the mul field

Kevin Buzzard (Jul 31 2023 at 13:47):

Certainly Lean got very interested in proving npowRec =?= npowRec in the traces when everything was still slow (this is mentioned in the second post at the top); it was one of the things which would show up 100 times in the traces and take 0.02 seconds each time.

Matthew Ballard (Jul 31 2023 at 13:48):

Can default arguments be more declarative?

Matthew Ballard (Jul 31 2023 at 13:50):

Also which of the following the describes the behavior? I don't supply a value for the field so

  1. Lean attempts to synthesis one. Failing it fails back to the default
  2. Lean uses the default value

Eric Wieser (Jul 31 2023 at 13:54):

I'm working on a PR that just removes these defaults, they're a constant source of bugs

Matthew Ballard (Jul 31 2023 at 13:55):

Opt in should be the default at least

Eric Wieser (Jul 31 2023 at 13:56):

Matthew Ballard said:

Lean attempts to synthesis one. Failing it fails back to the default

Synthesize in what sense? These fields aren't typeclasses, so TC synthesis isn't relevant here

Matthew Ballard (Jul 31 2023 at 13:56):

Via Try this or something similar

Matthew Ballard (Jul 31 2023 at 13:57):

They are fields of typeclasses right?

Matthew Ballard (Jul 31 2023 at 13:57):

To be clear, if they are removed, I won't shed a tear

Eric Wieser (Jul 31 2023 at 13:57):

Eric Wieser said:

I'm working on a PR that just removes these defaults, they're a constant source of bugs

#6262 is that PR

Matthew Ballard (Jul 31 2023 at 14:05):

Matthew Ballard said:

They are fields of typeclasses right?

And we usually have those typeclasses floating around in the context. But I understand the answer

Eric Wieser (Jul 31 2023 at 14:06):

It sounds like you're saying Lean should try filling the nsmul field with AddMonoid.nsmul (synthesizing the implicit TC argument in that declaration)?

Matthew Ballard (Jul 31 2023 at 14:07):

Yes.

Eric Wieser (Jul 31 2023 at 14:08):

We could presumably set the default to by exact AddMonoid.nsmul <|> exact nsmulRec

Matthew Ballard (Jul 31 2023 at 14:11):

It would tell me what failed right? In terms of missing fields?

Eric Wieser (Jul 31 2023 at 14:12):

If you used by exact AddMonoid.nsmul, yes

Kevin Buzzard (Jul 31 2023 at 15:45):

Eric Wieser said:

Here's a new theory for why moving instance : Mul (TensorProduct _ _ _) to its own declaration helped: It means that the default value for npow field is not populated with a copy of that instance that has to duplicate the mul field

So which of the changes that Matt made on this branch do you think will have a useful effect? I don't want to keep spamming !bench

Matthew Ballard (Jul 31 2023 at 15:46):

I think that you can infer the changes from the original run on Kaehler.

Matthew Ballard (Jul 31 2023 at 15:47):

I've begun a larger removal effort and am trying to get the current changes to compile at the moment

Eric Wieser (Jul 31 2023 at 15:48):

Every change in that branch looks good to me, though I'm curious to know whether it makes a difference

Eric Wieser (Jul 31 2023 at 15:49):

(I'd be happy to merge it anyway as long as it doesn't make performance much worse)

Matthew Ballard (Jul 31 2023 at 15:49):

If you want a fun with that is deep,

instance booleanAlgebra {α : Type _} : BooleanAlgebra (Set α) :=
  -- Pi.booleanAlgebra
  { (inferInstance : BooleanAlgebra (α  Prop)) with
    sup := (·  ·),
    le := (·  ·),
    lt := fun s t => s  t  ¬t  s,
    inf := (·  ·),
    bot := ,
    compl := fun s => { x | x  s },
    top := univ,
    sdiff := fun s t => { x | x  s  x  t } }

Eric Wieser (Jul 31 2023 at 15:49):

Ah, that brings up something I was just going to say

Eric Wieser (Jul 31 2023 at 15:50):

I think that instance should remain a with, because otherwise it unfolds weirdly

Eric Wieser (Jul 31 2023 at 15:50):

The smul := SMul.comp.smul f stuff from your branch is a much less bad version of this, where removing that line causes it to unfold with an extra coercion

Matthew Ballard (Jul 31 2023 at 15:51):

Yes, this seems to be the main negative from experience now

Eric Wieser (Jul 31 2023 at 15:54):

But the compHom stuff could be fixed by changing those bits of API to use MonoidHomClass

Eric Wieser (Jul 31 2023 at 15:54):

Which didn't exist at the time

Matthew Ballard (Jul 31 2023 at 15:55):

Kevin Buzzard said:

Eric Wieser said:

Here's a new theory for why moving instance : Mul (TensorProduct _ _ _) to its own declaration helped: It means that the default value for npow field is not populated with a copy of that instance that has to duplicate the mul field

So which of the changes that Matt made on this branch do you think will have a useful effect? I don't want to keep spamming !bench

#6241 splits off the field elimination changes. Everything has been benched together in #6149.

You can see the most recent run here on the latter PR. All the changes outside of Kaehler are due to #6241 I think

Eric Wieser (Jul 31 2023 at 16:02):

I've bumped the commit that you merged in from master to the top of the benchmark queue

Eric Wieser (Jul 31 2023 at 16:04):

Though I guess http://speedcenter.informatik.kit.edu/mathlib4/compare/24b2d3ab-ebc9-4b6a-b8c4-8813a378efff/to/e7b27246-a3e6-496a-b552-ff4b45c7236e?hash2=42ced2f01a2da3ff4d20caf991af6a88f406260d is probably close enough

Kevin Buzzard (Jul 31 2023 at 16:04):

I was just looking through old scratch files and I see that there's still one raised heartbeat in RingTheory.AdjoinRoot. Looking at the traces in the proof, they're again just full of things where Lean is taking 1 second to prove things which look syntactically equal (and I can't work out if they are syntactically equal because any attempt to turn on pp.all crashes VS Code). Here's an example of a trace:

                        [] [1.124474s]  MulHomClass.toFunLike =?= MulHomClass.toFunLike 
                          [] [1.124430s]  NonUnitalRingHomClass.toMulHomClass.1 =?= NonUnitalRingHomClass.toMulHomClass.1 
                            [] [1.124316s]  { coe := fun f_1  f_1.toFun,
                                  coe_injective' := _ } =?= { coe := fun f_1  f_1.toFun, coe_injective' := _ } 
                              [] [1.080956s]  fun f_1  f_1.toFun =?= fun f_1  f_1.toFun 
                                [] [1.080834s]  R[X]  Ideal.map Polynomial.C I →+*
                                      (R[X]  Ideal.map Polynomial.C I) 
                                        Ideal.span
                                          {(Ideal.Quotient.mk (Ideal.map Polynomial.C I))
                                              f} =?= R[X]  Ideal.map Polynomial.C I →+*
                                      (R[X]  Ideal.map Polynomial.C I) 
                                        Ideal.span {(Ideal.Quotient.mk (Ideal.map Polynomial.C I)) f} 
                                  [] [0.041883s]  (R[X]  Ideal.map Polynomial.C I) 
                                        Ideal.span
                                          {(Ideal.Quotient.mk (Ideal.map Polynomial.C I))
                                              f} =?= (R[X]  Ideal.map Polynomial.C I) 
                                        Ideal.span {(Ideal.Quotient.mk (Ideal.map Polynomial.C I)) f} 
                                  [] [0.033986s]  Semiring.toNonAssocSemiring =?= Semiring.toNonAssocSemiring 
                                  [] [1.004940s]  Semiring.toNonAssocSemiring =?= Semiring.toNonAssocSemiring 
                                    [] [1.004925s]  NonAssocSemiring.mk _ _ =?= NonAssocSemiring.mk _ _ 
                                      [] [0.041582s]  (R[X]  Ideal.map Polynomial.C I) 
                                            Ideal.span
                                              {(Ideal.Quotient.mk (Ideal.map Polynomial.C I))
                                                  f} =?= (R[X]  Ideal.map Polynomial.C I) 
                                            Ideal.span {(Ideal.Quotient.mk (Ideal.map Polynomial.C I)) f} 
                                      [] [0.724339s]  NonUnitalSemiring.toNonUnitalNonAssocSemiring =?= NonUnitalSemiring.toNonUnitalNonAssocSemiring 
                                        [] [0.724329s]  Semiring.toNonUnitalSemiring.1 =?= Semiring.toNonUnitalSemiring.1 
                                          [] [0.724139s]  NonUnitalNonAssocSemiring.mk _ _ _ _ =?= NonUnitalNonAssocSemiring.mk _ _ _ _ 
                                            [] [0.011918s]  RingCon.Quotient
                                                  (Ideal.Quotient.ringCon
                                                    (Ideal.span
                                                      {(Ideal.Quotient.mk (Ideal.map Polynomial.C I))
                                                          f})) =?= RingCon.Quotient
                                                  (Ideal.Quotient.ringCon
                                                    (Ideal.span {(Ideal.Quotient.mk (Ideal.map Polynomial.C I)) f})) 
                                            [] [0.534922s]  AddCommMonoid.mk _ =?= AddCommMonoid.mk _ 
                                            [] [0.177104s]  MulZeroClass.toMul =?= MulZeroClass.toMul 

Everything is of the form X =?= X at least according to the prettyprinter. Something triggered this nightmare (this is exactly what the Kaehler file looked like before we started tinkering) but I don't really have a good understanding of what the crucial step was which made all this stuff magically vanish. Could it be this for example?

def quotEquivOfEq {R : Type _} [CommRing R] {I J : Ideal R} (h : I = J) : R  I ≃+* R  J :=
  { Submodule.quotEquivOfEq I J h with
    map_mul' := by
      rintro x y
      rfl }

Or is it obviously not that? Which withs do I try and remove? Does anyone have a gut feeling as to which ones are causing the trouble? For example I removed one in #6144 and it seemed to make things worse, not better.

Eric Wieser (Jul 31 2023 at 16:05):

That looks a bit unbelievable to me, I think I'd prefer to see the fresh run

Kevin Buzzard (Jul 31 2023 at 16:07):

Oh lol the two files which were slower in #6144 (Mathlib.Algebra.Category.ModuleCat.Abelian and Mathlib.Algebra.Category.ModuleCat.Limits) are precisely the two files spoiling the sea of green speedups in the speedcenter link you just posted! Does that make it more believable?

Matthew Ballard (Jul 31 2023 at 16:54):

Wait... -20% wall-clock?

Matthew Ballard (Jul 31 2023 at 16:56):

I assumed this run had failed

Matthew Ballard (Jul 31 2023 at 20:26):

I have #6276 now where I removed a bunch of with's from instances in Data. It builds locally now.

It is mainly to see the results of !bench.

Eric Wieser (Jul 31 2023 at 21:06):

Eric Wieser said:

That looks a bit unbelievable to me, I think I'd prefer to see the fresh run

Here's the fresh run. Do we think this is real?

Matthew Ballard (Jul 31 2023 at 21:16):

That’s pretty crazy

Jireh Loreaux (Jul 31 2023 at 21:16):

Can someone explain exactly what I'm looking at? What's the diff between the commits we're benchmarking?

Riccardo Brasca (Jul 31 2023 at 21:19):

What do you see? I only get

Not Found
The requested URL was not found on this server.

Patrick Massot (Jul 31 2023 at 21:21):

Riccardo, you're meant to see
image.png

Eric Wieser (Jul 31 2023 at 21:21):

Works on my machine... The diff is exactly #6241 vs the version of master it last merged

Matthew Ballard (Jul 31 2023 at 21:21):

Lean is building mathlib 20% faster because of the changes in #6241 vs the master according to the speed center

Matthew Ballard (Jul 31 2023 at 21:22):

The individual file changes track because it is reproducing what happened in Kaehler which this was split from

Matthew Ballard (Jul 31 2023 at 21:23):

But 20% overall is very surprising

Eric Wieser (Jul 31 2023 at 21:23):

Any opposition to merging this as is? Sure, it makes a bad file worse, but 20% overall is huge

Jireh Loreaux (Jul 31 2023 at 21:23):

that is insane (20% that is, not the idea to merge it)

Riccardo Brasca (Jul 31 2023 at 21:24):

Wow, that's really impressive

Eric Wieser (Jul 31 2023 at 21:25):

It would be interesting to know if there was one change in that PR in particular that is responsible, but not to the point of refraining from making the other changes

Jireh Loreaux (Jul 31 2023 at 21:26):

something seems fishy though, why don't more files have significant (>20%) changes?

Matthew Ballard (Jul 31 2023 at 21:27):

#6276 is in the same spirit and will be done in a couple hours

Eric Wieser (Jul 31 2023 at 21:27):

I think it's only the ones using quotients that are affected

Jireh Loreaux (Jul 31 2023 at 21:27):

And those 6 files account for >20% of the entire compile time of mathlib?

Jireh Loreaux (Jul 31 2023 at 21:28):

Their changes were only ~10% better (as far as instructions go anyway, which I think we're using as a proxy for compile time)

Eric Wieser (Jul 31 2023 at 21:30):

Sorting by most-removed-instructions first:

image.png

Eric Wieser (Jul 31 2023 at 21:31):

It's strange that the number of instructions has decreased overall by only 0.08%

Eric Wieser (Jul 31 2023 at 21:31):

Maybe we're using much less memory somehow?

Eric Wieser (Jul 31 2023 at 21:32):

@Sebastian Ullrich, is "instructions" measuring only CPU instructions in userspace, or in kernel virtual memory handlers too?

Kevin Buzzard (Jul 31 2023 at 21:43):

I think it's worth stressing that Mathlib.Algebra.Category.ModuleCat.Limits, the file which gets slower by 15%, is about 200 lines and compiles on my machine in 8 seconds. Whereas files like Kaehler were taking 6+ minutes to compile this time last week, so 15% speedup is a visible difference.

Jireh Loreaux (Jul 31 2023 at 21:48):

Aha, maybe the point is that there were several files which take a very long time to compile, and several of these got faster, but not by more than 5%?

Eric Wieser (Jul 31 2023 at 21:51):

The per-file metric we're using is instructions, which only improved by 0.08% globally; But the global metrics also include wall time, which got substantially faster

Kevin Buzzard (Jul 31 2023 at 21:51):

Getting faster: Yes! Recently I've been watching mathlib compile (because I don't want to spam the speedcenter but have been doing a lot of testing), and it compiles the first 1800 files really quickly and you think "wow, we're half way through and it's only taken about 10 minutes" and then at the end there are some files which take 10+ minutes to compile.

Eric Wieser (Jul 31 2023 at 21:51):

(deleted)

Jireh Loreaux (Jul 31 2023 at 21:52):

Kevin, I think maybe you haven't caught on to the point I'm making (sorry if you have, it just seemed like we're talking on different wavelengths): this speedcenter run gives a 20% decrease across all of mathlib, but ... only 6 individual files had a decrease by more than 5% (of instructions). That seemed off to me.

Eric Wieser (Jul 31 2023 at 21:52):

Jireh, see my comment above: that 20% is a very different metric to the 5%, as we don't measure wall time per file.

Jireh Loreaux (Jul 31 2023 at 21:53):

Is it though? I thought that instructions were supposed to be a good proxy for build time?

Eric Wieser (Jul 31 2023 at 21:53):

My hunch is that instructions are only measured in user space

Eric Wieser (Jul 31 2023 at 21:53):

And therefore doesn't include costs due to file IO or memory page faults

Jireh Loreaux (Jul 31 2023 at 21:54):

Maybe we want to change the speedcenter to list wall time per file instead of instructions per file then?

Eric Wieser (Jul 31 2023 at 21:55):

I don't think the "machine changed between runs" explanation is plausible, because we benchmarked the commit history A <-- C <-- B (where B is the magic PR) in the order A B C, and the diff between A and C is negligeable

Jireh Loreaux (Jul 31 2023 at 22:06):

I'm not necessarily arguing the run is invalid, only the speedcenter data seemed fishy under my presuppositions about what it was measuring, but your explanation I think points out that my assumptions may be invalid.

Matthew Ballard (Jul 31 2023 at 22:18):

I’m skeptical. This run has these changes +

  • extra instances in Kaehler
  • split of semiRing instance in TensorProduct into parents

Wall clock improves but only 1%. Perhaps I am confused.

Matthew Ballard (Jul 31 2023 at 22:23):

But you can see that with causes problems in simple cases like below.

If Lean can figure out that all fields have been filled, can it store the info as though it was specified by the latter pattern?

Matthew Ballard said:

I made two different versions of repeatedly extending instances and checking equality using rfl.

One using the with pattern:

With

and one using the toX pattern

Without

using lake build gives

cumulative profiling times:
    attribute application 0.00508ms
    compilation new 0.0201ms
    interpretation 0.639ms
    type checking 0.445ms

on the with patterns and

cumulative profiling times:
    attribute application 0.00458ms
    compilation new 0.0193ms
    interpretation 0.63ms
    type checking 0.174ms

on the toX pattern

That difference in type checking seems significant.

Eric Wieser (Jul 31 2023 at 23:00):

There's certainly something weird going on with the benchmark tool here; these two pages compare the same commits, but give different results:

Kevin Buzzard (Jul 31 2023 at 23:01):

Looking on the bright side, they're both a sea of green.

Matthew Ballard (Jul 31 2023 at 23:03):

Any one build things locally and compare the times yet? :)

Yury G. Kudryashov (Aug 01 2023 at 00:27):

Eric Wieser said:

I'm working on a PR that just removes these defaults, they're a constant source of bugs

Right before porting, I started (and didn't finish) a PR that merges npow + axioms into a structure with a Unique instance.

Yury G. Kudryashov (Aug 01 2023 at 00:28):

This could help with removing defaults.

Matthew Ballard (Aug 01 2023 at 00:56):

Matthew Ballard said:

Any one build things locally and compare the times yet? :)

Didn't see much difference overall

Sebastian Ullrich (Aug 01 2023 at 07:42):

There is something weird going on where very rarely, perhaps every 1000th run, all time-based benchmarks are 20% faster. When the identical run is repeated, the speedup vanishes. I wonder if that is what happened here.

Sebastian Ullrich (Aug 01 2023 at 07:43):

In general, any time-based change that does not come with at least a similar instructions change is suspect

Sebastian Ullrich (Aug 01 2023 at 07:44):

Eric Wieser said:

Sebastian Ullrich, is "instructions" measuring only CPU instructions in userspace, or in kernel virtual memory handlers too?

It's whatever perf stat is measuring, i.e. hardware performance counters I assume

Eric Wieser (Aug 01 2023 at 08:00):

I've triggered a repeat benchmark to see if that helps

Eric Wieser (Aug 01 2023 at 08:00):

Do you have any understanding of why the diff view vs commit view above show different benchmarks?

Sebastian Ullrich (Aug 01 2023 at 08:11):

It's not the same base commit: #6008 vs #6089

Eric Wieser (Aug 01 2023 at 08:12):

Oh, that would do it

Matthew Ballard (Aug 01 2023 at 10:48):

#6276 completed. Results.

(Edited) Two easy (expected) gains: #6282 and #6283

Damiano Testa (Aug 01 2023 at 10:59):

Are those #6282 and #6283? Note the 2 instead of 8!

Matthew Ballard (Aug 01 2023 at 11:00):

Thanks

Matthew Ballard (Aug 01 2023 at 13:20):

I think 95% of the bad performance in #6276 can be traced to this change

Matthew Ballard (Aug 01 2023 at 13:40):

After the change in #6282 to instance field : Field ℚ, the following is field with pp.explicit

New

Before

Old

Matthew Ballard (Aug 01 2023 at 13:44):

I did not unfold commRing but did commGroupWithZero (src_1). If you reduce the let bindings there is an even bigger difference.

Matthew Ballard (Aug 01 2023 at 16:58):

#6282 speed results in. Unambiguously good.

Matthew Ballard (Aug 01 2023 at 19:37):

After mucking around for a bit now, it is clear a lot more remains on the table for speeding up building.

It seems that the penalty compounds with nested with constructions so classes in the algebraic hierarchy are particularly affected. These with's are buried deep. For example, MonoidAlgebra.Basic has ~50.

One main limitation is that we often want to override a field from a parent with a defeq expression that elaborates more quickly in the given context. I don't see how to do this in an efficient manner without with or something like it.

Another is that it is very inconvenient (and invites foot-guns) to try to build these instances "by hand" even when you are not overriding existing fields.

Kevin Buzzard (Aug 01 2023 at 19:53):

Is it really the withs or is it Eric's observation here or are these two separate issues?

Matthew Ballard (Aug 01 2023 at 19:54):

I think these are two separate problems that can interact

Matthew Ballard (Aug 01 2023 at 22:01):

#6283 speed results. Again all green above 5%

Kevin Buzzard (Aug 01 2023 at 22:09):

Probably someone should open a lean 4 issue citing some of these benchmarked PRs, where one small change can have such a drastic effect. Or is this only happening because we have gigantic terms and it's all our own fault?

Scott Morrison (Aug 01 2023 at 22:34):

I really hope we can explore adding a term elaborator in these instance definitions that helps out. e.g. instance : Bar X := fix_instance% { .... }. Does someone who's being keeping up with these fixes have a suggestion about what such a fix_instance% could usefully do?

Would just dsimp% help? Have we tried it?

(I appreciate that if this works in the long run we might want to override instance, but that should not happen until after a fix_instance% can be demonstrated to work.)

Eric Wieser (Aug 01 2023 at 22:38):

dsimp would be better than nothing, but I don't think it's the full algorithm here

Scott Morrison (Aug 01 2023 at 22:40):

And is there an approximate criterion for what sort of instances would benefit from a dsimp%? I am tempted to just shove it in all over the library and benchmark.

Matthew Ballard (Aug 01 2023 at 22:44):

Reconstruct instance in preferred parent projected form?

Eric Wieser (Aug 01 2023 at 23:05):

My guess would be that it's universally harmless globally, and sometimes a waste of time locally

Matthew Ballard (Aug 01 2023 at 23:07):

It would be nice if it could generate intermediate class instances prioritizing the declarations given to it and their projections.

Eric Wieser (Aug 01 2023 at 23:25):

I don't know how much that would really help: it's pretty common that those immediate instances actually hold more generally, and so need to be written by hand anyway

Matthew Ballard (Aug 01 2023 at 23:36):

If they are in the context, I guess just pick them up.

Matthew Ballard (Aug 01 2023 at 23:36):

Scott Morrison said:

And is there an approximate criterion for what sort of instances would benefit from a dsimp%? I am tempted to just shove it in all over the library and benchmark.

If there is a with, go for it

Matthew Ballard (Aug 01 2023 at 23:39):

Sadly the -20% wall clock disappeared

Matthew Ballard (Aug 02 2023 at 00:58):

Getting timeouts with my simple-minded dsimp%:/

Scott Morrison (Aug 02 2023 at 01:01):

It's not crazy that having to allow higher maxHeartbeats on an instance ... := dsimp% ... could then save maxHeartbeats during later syntheses, and thus be valuable overall.

Matthew Ballard (Aug 02 2023 at 01:32):

These are pretty egregious

Sebastien Gouezel (Aug 02 2023 at 07:07):

Is the structure generation algorithm documented somewhere? It might be possible to improve it by removing useless with clauses automatically, for example, but for this we would first need to understand how things are done now.

Matthew Ballard (Aug 02 2023 at 09:59):

Added more to #6276 and got another set of pretty nice speed results

Kevin Buzzard (Aug 02 2023 at 12:51):

I love the way that the speedcenter always keeps us waiting with these links. It should print out "And the winner is..." on that blank screen. Or am I the only one who has to wait 5+ seconds for the output?

Kevin Buzzard (Aug 02 2023 at 12:53):

I have this hunch that the perennial red Algebra.Category.ModuleCat.Limits should be individually fixed because there's something pretty weird going on there that makes it slower when everything else is faster. I am wondering if you are de-withing one side of an equation but not the other, but I had a look yesterday and couldn't isolate the problem; I just saw that the long declaration goes up from 150K heartbeats to 180K as a result of your PR and the trace is also much longer.

Kevin Buzzard (Aug 02 2023 at 12:54):

What seems to be added to the trace is as usual a bunch of super-unhelpful "I take 0.5 seconds proving that two things which look syntactically equal with pp.all off are defeq"

Eric Wieser (Aug 02 2023 at 12:55):

I have this hunch that the perennial red Algebra.Category.ModuleCat.Limits should be individually fixed

A reminder that I have #6265 that at least makes this file much shorter

Scott Morrison (Aug 02 2023 at 13:11):

:merge:. Note however that this file is about colimits, and Kevin is pointing to the file about limits.

Matthew Ballard (Aug 02 2023 at 13:36):

Since I was fixing things from master mindlessly, there is overlap with the other PR. #6276 is something I dump all changes into so I can bench them all at once and preserve resources.

I have more defined thoughts. I’ll share them when I get to a computer. TLDR: term too big due extra eta expansion and unification handles those terms less efficiently (maybe because of size but maybe something more intrinsic). Resulting terms need more unfolding . This compounds until we get deep and things break.

I’ve been fairly lazy about changing the with construction pattern. Not touching def’s, leaving knotty ones alone. Totally unsupported speculation: we can get that 20% wall clock speed up back with a complete overhaul. I wouldn’t be surprised if it’s 50% or more.

Matthew Ballard (Aug 02 2023 at 13:43):

The speed improvements from #6276 also are getting into concrete categories which make me more hopeful

John Nicol (Aug 02 2023 at 13:54):

Riccardo Brasca said:

What do you see? I only get

Not Found
The requested URL was not found on this server.

I got the same error.
I'd be interested in looking at performance as well -- is there some list you have to be on to see the speedcenter? (Can it be opened to all if so?)

Riccardo Brasca (Aug 02 2023 at 13:54):

It's an https only error, or something similar

Riccardo Brasca (Aug 02 2023 at 13:55):

It works in chrome for me, but not in Firefox (even if I disable htpps only)

John Nicol (Aug 02 2023 at 13:56):

Thanks, that's it! Was able to view in Chrome

Matthew Ballard (Aug 02 2023 at 16:51):

Most of the speed regressions in #6276 are in smaller files. The exception is Analysis.NormedSpace.Star.Multiplier which I cannot seem to track to the changes. Someone who is more familiar with that file might be able to better pinpoint it.

Jireh Loreaux (Aug 02 2023 at 16:57):

I'll have a look, I wrote it.

Jireh Loreaux (Aug 02 2023 at 17:31):

Well, I'm not sure exactly what's going on yet, but docs#DoubleCentralizer.coeHom is extremely slow, and everything else seems fine (or at least passable).

Matthew Ballard (Aug 02 2023 at 17:33):

Thanks for narrowing it down!

Jireh Loreaux (Aug 02 2023 at 17:38):

I'm still working on it btw, that was just a status update.

Jireh Loreaux (Aug 02 2023 at 19:38):

@Matthew Ballard Sorry it took me a while to get around to this, but if you want to you can cherry pick #6318. This declaration was very slow on master too. I'm not sure exactly what is contributing to the slowdown in this file from #6276, and it may not be this declaration. But I didn't notice too much else that was incredibly slow. With #6318, this file now compiles in about 55 seconds on my machine (12 cores), and it seems pretty close both on top of master and on top of #6276, so I wouldn't worry about it too much (IMO).

Kevin Buzzard (Aug 02 2023 at 19:48):

I thought that number of cores was not relevant when compiling a Lean 4 file because, in contrast to Lean 3, Lean 4 can only use one core per file. Am I out of date?

Jireh Loreaux (Aug 02 2023 at 19:49):

oh, you're right, I forgot!

Jireh Loreaux (Aug 02 2023 at 19:49):

In any case, it's a pretty snappy machine.

Matthew Ballard (Aug 02 2023 at 19:59):

Great! Thanks.

Matthew Ballard (Aug 03 2023 at 01:14):

Experimenting with a more systematic banishing of with, I de-withed every structure term in a single file, MonoidAlgebra.Basic, in #6319. This file is the basis for polynomials so I thought it would give me the sense of the effects of such changes downstream.

Matthew Ballard (Aug 03 2023 at 01:14):

Results

Eric Wieser (Aug 03 2023 at 01:15):

I think a good pattern might be to mix with with toBaseClass

Eric Wieser (Aug 03 2023 at 01:16):

That way you don't forget to set any defaults, but you also get the preferred structure

Matthew Ballard (Aug 03 2023 at 01:16):

Can you give a more concrete example?

Eric Wieser (Aug 03 2023 at 01:18):

Sure, replace your

 { toNonUnitalSemiring := MonoidAlgebra.nonUnitalSemiring,
    natCast := MonoidAlgebra.nonAssocSemiring.natCast
    natCast_zero := MonoidAlgebra.nonAssocSemiring.natCast_zero
    natCast_succ := MonoidAlgebra.nonAssocSemiring.natCast_succ
    one_mul := MonoidAlgebra.nonAssocSemiring.one_mul
    mul_one := MonoidAlgebra.nonAssocSemiring.mul_one }

with

 { MonoidAlgebra.nonAssocSemiring with
    toNonUnitalSemiring := MonoidAlgebra.nonUnitalSemiring }

(untested)

Matthew Ballard (Aug 03 2023 at 01:23):

It looks like that doesn't work currently

Matthew Ballard (Aug 03 2023 at 01:24):

class A where
  val : Nat

class B where
  num : Nat

class C extends A, B where

def a : A := 0

def b : B := 1

def c : C :=
  { b with
    toA := a }

set_option pp.explicit true in
#print c
--- def c : C :=
--- let src := b;
--- @C.mk a { num := @B.num src }

Matthew Ballard (Aug 03 2023 at 01:27):

Though I wish it did. Re-writing those with patterns in that file was a bit tedious

Matthew Ballard (Aug 03 2023 at 01:33):

MonoidAlgebra.Basic only improved by 5% itself compared to down stream improvements of up to 62% (FieldTheory.Cardinality).

Matthew Ballard (Aug 03 2023 at 01:37):

Given that changes to only one file could produce this, I feel pretty confident that systematic changes would yield significant benefit.

Sebastien Gouezel (Aug 03 2023 at 07:03):

That's a very good example. Can you open a Lean 4 issue describing it? If we could get improvements in the structure generator, this could mean that it would not be necessary to review all of mathlib to eliminate the with.

Eric Wieser (Aug 03 2023 at 08:21):

Matthew Ballard said:

It looks like that doesn't work currently

What makes you say that? The important bit is the implicit toA argument, which is correctly populated with a not { val := a.val }

Matthew Ballard (Aug 03 2023 at 10:10):

Sorry. I took that too literally. I agree it would be better than the existing pattern. How much of the problem it eliminates, I’m not sure.

Eric Wieser (Aug 03 2023 at 10:27):

If anything it might be better, because the lets it introduce give lean a small chance to decide not to substitute them n times

Matthew Ballard (Aug 03 2023 at 10:38):

That would be nice of it worked as so. Certainly worth a look.

Matthew Ballard (Aug 03 2023 at 10:39):

I’ll rework #6319 to use this pattern and see. (In a couple hours)

Matthew Ballard (Aug 03 2023 at 10:41):

My understanding of the structure generation algorithm and how unification handles the resulting terms is still hazy.

Matthew Ballard (Aug 03 2023 at 10:46):

In terms of an end game, I think the performance hit isn’t “mathlib’s own fault” and it is worthwhile to try to interest upstream for discussion.

Matthew Ballard (Aug 03 2023 at 23:36):

Matthew Ballard said:

I’ll rework #6319 to use this pattern and see. (In a couple hours)

Indeed an improvement

Matthew Ballard (Aug 04 2023 at 21:17):

Given the last run, I think a simple improvement would be simply be to extend docs#Lean.Elab.Term.expandParentFields so that if the parent occurs in Source.explicit then you just slot that instead of expanding.

Scott Morrison (Aug 05 2023 at 06:05):

Very decent. Slightly over 1% reduction for wall-clock.

Yuyang Zhao (Aug 05 2023 at 19:34):

Hello everyone. I'm also trying to speed up some ring theory files. Can I use !bench command in my PR?

Scott Morrison (Aug 05 2023 at 23:10):

@Yuyang Zhao, yes, you may!

As suggested here, it may be worthwhile to check that the commit on master which you branch diverges from has already been benchmarked, by looking here. If that's not the case, you could rebase your PR onto one that has been.

Kevin Buzzard (Aug 06 2023 at 19:12):

Sebastien Gouezel said:

That's a very good example. Can you open a Lean 4 issue describing it? If we could get improvements in the structure generator, this could mean that it would not be necessary to review all of mathlib to eliminate the with.

Did anyone ever do this? I couldn't find an issue so I opened https://github.com/leanprover/lean4/issues/2387 . @Matthew Ballard or @Eric Wieser do you have anything to add?

Matthew Ballard (Aug 06 2023 at 20:43):

I think there are simple changes that would produce most of the benefit https://leanprover.zulipchat.com/#narrow/stream/270676-lean4/topic/slowness.20in.20ring.20theory.20file/near/381977940

Currently the algorithm treats all instances before the with equally. They each get a let src := inst. Next the field projections get expanded. Then they figure out how to slot the src’s in to see what fields can have terms generated.

A simple rule would be that if inst is an instance of the parent field you would reduce the let and feed it into the parent projection. Otherwise do as you have done.

Matthew Ballard (Aug 06 2023 at 20:45):

#6398 is a good example of the efficacy of this vs the current setup

Eric Wieser (Aug 06 2023 at 22:11):

A simple rule would be that if inst is an instance of the parent field you would reduce the let and feed it into the parent projection. Otherwise do as you have done.

A slightly more complex rule is to first try populating the base field with toBase := src.toBase (for each src in turn) before falling back on toBase := { x := src.x, y = src.y }

Eric Wieser (Aug 06 2023 at 22:12):

Which would then also catch things like "there is an existing Monoid instance but no existing Group"

Matthew Ballard (Aug 08 2023 at 12:10):

Good news is that the structure pattern changes and the removal of universe meta-variables seem complementary in terms of performance benefit. In #6427, I merged #6319 and #6370 and benchmarked. Results.

Kevin Buzzard (Aug 08 2023 at 16:48):

And you're sure that this is not this weird glitch we sometimes see in the speedcenter? It doesn't look like it is though, does it: you seem to be consistently producing huge speedups.

Matthew Ballard (Aug 08 2023 at 16:49):

It is basically adding together the speedups from the two PRs so I would expect it look good. The question was mainly about how much one change might cannibalize improvements from other one.

Matthew Ballard (Aug 08 2023 at 16:51):

There should be another nice one incoming for #6370. I replaced all the Sort _'s with Sort*'s and want to make sure there isn't anything strange occurring to be safe.

Matthew Ballard (Aug 08 2023 at 16:53):

I am also trying this and fumbling toward a release for it.

Matthew Ballard (Aug 08 2023 at 17:15):

I am currently blocked by GitHub Pages for reasons I don't understand

Eric Wieser (Aug 08 2023 at 17:27):

You need to enable pages in the repository settings I think

Matthew Ballard (Aug 08 2023 at 17:28):

Here is run from Scott's fork.

Matthew Ballard (Aug 08 2023 at 17:28):

It looks like I one can skip that Publish manual step?

Matthew Ballard (Aug 08 2023 at 17:32):

Oh https://github.com/semorrison/lean4/blob/8de1c0786c5b2f77f94b19224f4b08a054f935d4/.github/workflows/nix-ci.yml#L106

I guess I should push to a branch?

Jireh Loreaux (Aug 08 2023 at 17:32):

For clarity, I checked the worst offenders (>100% slowdown) on Results and they were both super small and quick files, so that's basically noise.

Matthew Ballard (Aug 08 2023 at 17:44):

I just deleted that task

Matthew Ballard (Aug 08 2023 at 21:59):

Now I am stuck releasing the artifacts. https://github.com/mattrobball/lean4/actions/runs/5801517467/job/15728447672

Eric Wieser (Aug 08 2023 at 22:22):

Does deleting these lines work? https://github.com/mattrobball/lean4/actions/runs/5801517467/workflow#L283-L284

Matthew Ballard (Aug 08 2023 at 22:36):

Apparently I didn't push the fix I found online :face_palm:

Matthew Ballard (Aug 09 2023 at 12:58):

Well, I didn't accomplish what I want. But it did something, including -2.5% wall clock

Matthew Ballard (Aug 09 2023 at 12:59):

I understand the landscape better though.

Eric Wieser (Aug 09 2023 at 12:59):

We never tested the let_mvar% change from the porting meeting, right?

Matthew Ballard (Aug 09 2023 at 13:00):

Scott tried to get a release built but it timed out on the structInst* tests

Eric Wieser (Aug 09 2023 at 13:00):

Oh, so let_mvar% was much worse?

Matthew Ballard (Aug 09 2023 at 13:01):

I don't know I would use worse. You couldn't build a release because of timeouts on failed tests. I think that is probably a bad overall indicator

Matthew Ballard (Aug 09 2023 at 13:02):

This change here as best I understand it now is that: instead of trying to synthesize an instance, try the with's beforehand

Matthew Ballard (Aug 09 2023 at 13:03):

This will not help in most of our situations because Lean will not fallback to tc synthesis if we provide enough data in the with's

Matthew Ballard (Aug 09 2023 at 13:04):

I had a previous version where you just stuck in the structure instance for the field if it occurred in a with but that failed one test.

Matthew Ballard (Aug 09 2023 at 13:05):

The reason is failed is the following.

Matthew Ballard (Aug 09 2023 at 13:07):

structure A where
 x : Nat
 y : Nat

structure B extends A where
 z : Nat

def a  : A := sorry

example : B :=
 { a with y := 0, z := 0}

Matthew Ballard (Aug 09 2023 at 13:07):

The expectation is that if we provide y := 0then we are asking Lean to explicitly override a and replace a.y with 0

Matthew Ballard (Aug 09 2023 at 13:09):

I think this is reasonable and doubt it will change

Matthew Ballard (Aug 09 2023 at 13:09):

So our current pattern of

example : Module R M :=
 { someModule with smul := . \smul . }

will need to change

Matthew Ballard (Aug 09 2023 at 13:10):

Regardless of whatever happens

Matthew Ballard (Aug 09 2023 at 13:11):

Rule: only provide the fields of a with'ed field structure instance if you actually want to change the term

Matthew Ballard (Aug 09 2023 at 13:11):

I think that is pretty reasonable

Matthew Ballard (Aug 09 2023 at 13:12):

Violating makes Lean unfold the with structure instance and rebuild a new one regardless if the provided field syntactically identical

Matthew Ballard (Aug 09 2023 at 13:14):

Assuming we accept this and change our behavior in mathlib, we still have examples like #6398 where I provide the structure instance as a with instead of as field projection directly

Matthew Ballard (Aug 09 2023 at 13:15):

I think I it is reasonable ask that there be no serious difference in behavior between these two patterns

Matthew Ballard (Aug 09 2023 at 13:17):

I think I can hack something together that does that today (but don't hold me to it!)

Matthew Ballard (Aug 09 2023 at 13:18):

Getting the release workflow kinks worked out yesterday will help

Matthew Ballard (Aug 09 2023 at 13:19):

I think unfolding the terms generated and reduce let's as with dsimp% or let_mvar% will still suffer from the same timeout issues

Matthew Ballard (Aug 09 2023 at 13:22):

One needs to create a update/reduce function for docs#Lean.Elab.Term.StructInst.Struct type which checks if a with structure instance has the correct type as a field and if so drops it from the type and updates the value of the field

Matthew Ballard (Aug 09 2023 at 13:24):

Even if you do this, setting field values upstream in parent structures will override this. Great care is taken to make sure this occurs correctly

Matthew Ballard (Aug 09 2023 at 13:51):

Matthew Ballard said:

Well, I didn't accomplish what I want. But it did something, including -2.5% wall clock

I should say that, with my current level of understanding, the changes I made could conceivably be a no-op. It certainly doesn't look like it though from the benchmark run.

Scott Morrison (Aug 10 2023 at 00:06):

Matthew Ballard said:

Scott tried to get a release built but it timed out on the structInst* tests

To clarify here, it wasn't timing out because it was slow, but actually getting into an infinite loop and crashing.

Matthew Ballard (Aug 11 2023 at 14:25):

structure A where
 x : Nat

structure B extends A where
 y : Nat

def a : A := sorry

def b : B := { a with y := 0}

#print b
-- def b : B :=  let src := a;  { toA := src, y := 0 }

Matthew Ballard (Aug 11 2023 at 14:55):

#6527

Matthew Ballard (Aug 11 2023 at 15:03):

So the behavior in current nightly is

structure A where
 x : Nat

structure B extends A where
 y : Nat

def a : A := sorry

def b : B := { a with y := 0}

#print b
-- def b : B :=  let src := a;  { toA := { x := src.x }, y := 0 }

which is one extra structure eta expansion than seems necessary.

From staring at traces, this seems to be the dominant factor in the diminished performance compared to

def b' : b := {toA := a, y := 0}

But we will see shortly

Matthew Ballard (Aug 11 2023 at 15:05):

Zeta reducing the expression should be straightforward but zeta reducing the syntax would be better.

Kevin Buzzard (Aug 11 2023 at 15:12):

6527 is failing CI BTW @Matthew Ballard (which might stop the bench from working)

Matthew Ballard (Aug 11 2023 at 15:12):

I need to fix some simps

Matthew Ballard (Aug 11 2023 at 15:12):

Yeah, it should be taken off until I fix them. Too eager!

Eric Wieser (Aug 11 2023 at 15:15):

How does this compile under your change?

structure A where
 x : Nat

structure B extends A where
 y : Nat

structure C extends A where
 y : Nat

def c : C := sorry

def b : B := { c with }

Matthew Ballard (Aug 11 2023 at 15:15):

Same as before I imagine.

Matthew Ballard (Aug 11 2023 at 15:15):

But that can be fixed

Eric Wieser (Aug 11 2023 at 15:16):

What we want is

#print b
-- def b : B :=  let src := c;  { toA := c.toA, y := c.y }

Matthew Ballard (Aug 11 2023 at 15:16):

Right

Matthew Ballard (Aug 11 2023 at 15:17):

There are two things

  • reduce the let
  • check parents fields of the explicit sources

Matthew Ballard (Aug 11 2023 at 15:17):

I didn't reduce the let and currently only check that an explicit source is a direct projection from a parent

Matthew Ballard (Aug 11 2023 at 15:17):

The latter is straightforward

Matthew Ballard (Aug 11 2023 at 15:18):

The former should be done on Syntax but I am scared of Syntax

Eric Wieser (Aug 11 2023 at 15:22):

Reducing the let would be equivalent to the let_mvar% approach, which already failed...

Matthew Ballard (Aug 11 2023 at 15:22):

You can't do it on the whole term or it will blow up

Eric Wieser (Aug 11 2023 at 15:22):

Can you link to where the let_mvar% approach was tested?

Matthew Ballard (Aug 11 2023 at 15:23):

You need to do it as it builds the fields

Matthew Ballard (Aug 11 2023 at 15:24):

The fields built using Name's and Syntax

Matthew Ballard (Aug 11 2023 at 15:24):

Before being properly elaborated

Mario Carneiro (Aug 11 2023 at 15:25):

I think using letI would not fail in the same way as let_mvar%, although that would require upstreaming letI

Matthew Ballard (Aug 11 2023 at 15:25):

That's a good idea.

Eric Wieser (Aug 11 2023 at 15:25):

Mario Carneiro said:

in the same way as let_mvar%

What way was that?

Mario Carneiro (Aug 11 2023 at 15:26):

some kind of stack overflow, I did not investigate in detail

Eric Wieser (Aug 11 2023 at 15:26):

Do we have a build log / PR / commit?

Matthew Ballard (Aug 11 2023 at 15:26):

I had to drop a very basic function from Mathlib.Lean.Expr.Basic into the file already

Matthew Ballard (Aug 11 2023 at 15:27):

Eric Wieser said:

Do we have a build log / PR / commit?

Let me get the link

Mario Carneiro (Aug 11 2023 at 15:28):

https://github.com/digama0/lean4/tree/structinst_letmvar is the PR, I think Scott built it locally and reported the errors to me via PM

Matthew Ballard (Aug 11 2023 at 15:28):

https://github.com/semorrison/lean4/actions/runs/5793010682

Matthew Ballard (Aug 11 2023 at 15:29):

But reducing the let only goes so far

Matthew Ballard (Aug 11 2023 at 15:29):

There is just an extra eta expansion right now. It serves no use

Eric Wieser (Aug 11 2023 at 15:29):

Yeah, but the unifier knows to priority eta expansions (that's how we ultimately un-reverted the first fix to #2074)

Matthew Ballard (Aug 11 2023 at 15:30):

It still seems noticably slower than without it

Matthew Ballard (Aug 11 2023 at 15:31):

I did use docs#zetaReduce on all the generated exprs and didn't see much benefit

Matthew Ballard (Aug 11 2023 at 15:32):

I broke something in mathlib so I know it had an effect

Matthew Ballard (Aug 11 2023 at 15:37):

Here are the changes

Matthew Ballard (Aug 11 2023 at 15:39):

But a lesson I have learned from this is that messing with expressions latter on are the wrong thing to do

Matthew Ballard (Aug 11 2023 at 16:02):

Interesting changes required

Matthew Ballard (Aug 11 2023 at 21:09):

Two sorry's leave me from a working mathlib to benchmark.

Matthew Ballard (Aug 12 2023 at 13:28):

Bench

Matthew Ballard (Aug 12 2023 at 13:28):

Next I’ll try reducing the let’s

Matthew Ballard (Aug 12 2023 at 13:50):

My guess is this is sniffing out defeq abuse.

Matthew Ballard (Aug 14 2023 at 17:41):

Ok. I think I have the let's reduced.

Matthew Ballard (Aug 14 2023 at 17:58):

And of course the unused arguments linter is terribly unhappy now

Matthew Ballard (Aug 15 2023 at 17:02):

Kevin Buzzard said:

I have this hunch that the perennial red Algebra.Category.ModuleCat.Limits should be individually fixed because there's something pretty weird going on there that makes it slower when everything else is faster. I am wondering if you are de-withing one side of an equation but not the other, but I had a look yesterday and couldn't isolate the problem; I just saw that the long declaration goes up from 150K heartbeats to 180K as a result of your PR and the trace is also much longer.

This one is -51%. (Sorry just remembered)

Matthew Ballard (Aug 15 2023 at 17:04):

I updated the toolchain in #6539 so that the following is the expansion

structure A where

structure B extends A where

def a : A := sorry

def b : B := { a with }

#print b
-- let src := a; { toA := a }

Matthew Ballard (Aug 15 2023 at 17:05):

Currently this is only for direct projections to the parent.

Matthew Ballard (Aug 15 2023 at 17:09):

Reducing the let's is a little hacky because it is hard to access them from inside the elaboration process for a structure instance.

When elaborating

{ a1, a2, ..., with }

get transformed into

let src1 := a1; let src2 := a2; ...; {src1, src2,....}

So by the time we are actually ready to elaborate the structure instance all we have is

{src1, src2,...}

for syntax.

Matthew Ballard (Aug 15 2023 at 17:11):

I used docs#Lean.Expr.toSyntax from std but only after going around in a big circle

Matthew Ballard (Aug 15 2023 at 17:12):

One can try to reduce all the let's, not just those where the terms come from parent structures, but that broke mathlib more than I had patience to fix at the moment.

Matthew Ballard (Aug 15 2023 at 17:15):

Once the benchmark runner is back up, I will put up the changes to testing

Matthew Ballard (Aug 15 2023 at 18:52):

Results of gtime -v lake build.

Status quo

Reduce structure etas on parent projections

Reduce lets and structure etas on parent projections

Matthew Ballard (Aug 15 2023 at 18:57):

AlgebraicGeometry and Analysis.Calculus stand out as being very unhappy with the loss of an extra eta expansion. They are almost uniformly red.

I would guess they are even more unhappy about the reduced let bindings. Their unhappiness is outweighing the gains elsewhere probably.

Matthew Ballard (Aug 15 2023 at 19:00):

As to why they are unhappy (and I guess Analysis.InnerProductSpace also), any insight would be appreciated.

Matthew Ballard (Aug 15 2023 at 19:01):

I imagine that they continually need to unfold things to unify and this throws additional work to do that.

Matthew Ballard (Aug 15 2023 at 19:04):

It could also be that there is some overhead to how lets are reduced, given the roundabout manner of implementation.

Matthew Ballard (Aug 15 2023 at 20:56):

Looking at the example from #mathlib4 > Instance priorities for quotients of MvPolynomial that Eric gave

import Mathlib.Algebra.RingQuot
import Mathlib.RingTheory.Ideal.QuotientOperations
import Mathlib.Data.MvPolynomial.CommRing

universe v u

noncomputable section

open Finset MvPolynomial Ideal.Quotient Ideal RingQuot

variable (R : Type u) [CommSemiring R] (M : Type v)

-- A trivial relation is enough to show problems
inductive r : (MvPolynomial M R)  (MvPolynomial M R)  Prop

abbrev Quot_r := RingQuot (r R M)

set_option profiler true in
set_option trace.Meta.synthInstance true in
set_option trace.Meta.isDefEq true in
set_option maxHeartbeats 0 in
set_option synthInstance.maxHeartbeats 0 in
example (R M : Type _) [CommRing R] (I : Ideal (Quot_r R M)) :
    Algebra R ((Quot_r R M)  I) :=
  inferInstance

it looks like the main cause of the timeout is that the with instance pattern creates terms of the form B.mk (A.mk ...) ... due to the eta expansion (as opposed to B.mk instA ...) which need to be unfolded to check that A.mk =?= A.mk.

But the with pattern is layered over and over in the hierarchy so Lean has to go a few layers deep to make sure things are def eq before continuing. One pass in itself is manageable but Lean needs to do this multiple times to get Algebra R ((Quot_r R M) ⧸ I) which leads to the timeouts.

Matthew Ballard (Aug 16 2023 at 01:37):

Here is a mathlib-free (in the sense I copy-pasta'ed a large chunk of the hierarchy) example based on the example above. It can probably be whittled down further

slower

To make it 10x faster change instRing and instCommSemiring to

instance instRing {R : Type u} [Ring R] (r : R  R  Prop) : Ring (RingQuot r) :=
  { toSemiring := instSemiring r
    neg := sorry
    add_left_neg := sorry
    sub := sorry
    sub_eq_add_neg := sorry
    zsmul := sorry
    zsmul_zero' := sorry
    zsmul_succ' := sorry
    zsmul_neg' := sorry
    intCast := sorry
    intCast_ofNat := sorry
    intCast_negSucc := sorry}

instance instCommSemiring {R : Type u} [CommSemiring R] (r : R  R  Prop) :
  CommSemiring (RingQuot r) :=
  { toSemiring := instSemiring r
    mul_comm := sorry }

Things are still fast overall of course because checking sorry = ?= sorry is much less expensive than with actual data. But when you get into a "real-world" situation like the above where we need to check this equality between two Semiring structures over and over the costs add up.

Matthew Ballard (Aug 16 2023 at 02:11):

The extra eta expansion doesn't occur if we supply a structure instance sharing the common parent

structure A where
  x : Bool

structure B extends A where

structure C extends A where

def a : A := true

def c : C := {toA := true⟩}

def b : B := {a with}

def b' : B := {c with}

#print b
-- extra eta
-- let src := a;
-- { toA := { x := src.x } }

#print b'
-- no extra eta
-- let src := c;
-- { toA := src.toA }

except in the degenerate case where C = A.

Matthew Ballard (Aug 16 2023 at 02:15):

So I think the changes in the first toolchain of #6539 are probably optimal here. (Ignoring all the cases where mathlib forces to Lean to unfold and chop up the provided structure instance by providing a redundant value for the parent structure. Those will need to be changed on their own.)

Matthew Ballard (Aug 16 2023 at 14:08):

More stripped down example but still hits the main point

set_option autoImplicit true

structure ACM (α : Type u) where

structure MWZ (α : Type u) where

structure NUNASR (α : Type u) extends ACM α

structure NUSR (α : Type u) extends NUNASR α

structure NASR (α : Type u) extends NUNASR α

structure NUNAR (α : Type u) extends NUNASR α

structure SR (α : Type u) extends NUSR α, NASR α, MWZ α

structure NUCSR (α : Type u) extends NUSR α

structure CSR (R : Type u) extends SR R

def instACM (α : Type u) : ACM α := sorry

def instMWZ (α : Type u) : MWZ α := sorry

def instSR (α : Type u) : SR α := {instACM α, instMWZ α with }

def instCSR (α : Type u) : CSR α := {instSR α with}

set_option profiler true in
set_option trace.Meta.isDefEq true in
example (α : Type u) : CSR.toSR (instCSR α) = instSR α := rfl

def instCSR' (α : Type u) : CSR α := let src := instSR α; {toSR := src}

set_option profiler true in
set_option trace.Meta.isDefEq true in
example (α : Type u) : CSR.toSR (instCSR' α) = instSR α := rfl

The important part of the trace on the first rfl is

[Meta.isDefEq] [0.000116s]  (instCSR α).toSR = (instCSR α).toSR =?= (instCSR α).toSR = instSR α 
  [] [0.000000s]  (instCSR α).toSR =?= (instCSR α).toSR
  [] [0.000102s]  (instCSR α).toSR =?= instSR α 
    [] [0.000091s]  (instCSR α).toSR =?= let src := instACM α;
        let src := instMWZ α;
        { toNUSR := { toNUNASR := { toACM := { } } }, toMWZ := { } } 
      [] [0.000086s]  (instCSR α).toSR =?= { toNUSR := { toNUNASR := { toACM := { } } }, toMWZ := { } } 
        [] [0.000079s]  (instCSR α).1 =?= { toNUSR := { toNUNASR := { toACM := { } } }, toMWZ := { } } 
          [] [0.000059s]  { toNUSR := (instSR α).toNUSR,
                toMWZ := (instSR α).toMWZ } =?= { toNUSR := { toNUNASR := { toACM := { } } }, toMWZ := { } } 
            [] [0.000024s]  (instSR α).toNUSR =?= { toNUNASR := { toACM := { } } } 
              [] [0.000015s]  (instSR α).1 =?= { toNUNASR := { toACM := { } } } 
                [] [0.000000s]  { toNUNASR := { toACM := { } } } =?= { toNUNASR := { toACM := { } } }

where Lean has to go all the way down to check that the structure instance provided for the parent class is what you get when you project back to that parent class.

Compare this to the second rfl

[Meta.isDefEq] [0.000058s]  (instCSR' α).toSR = (instCSR' α).toSR =?= (instCSR' α).toSR = instSR α 
  [] [0.000000s]  (instCSR' α).toSR =?= (instCSR' α).toSR
  [] [0.000044s]  (instCSR' α).toSR =?= instSR α 
    [] [0.000035s]  (instCSR' α).toSR =?= let src := instACM α;
        let src := instMWZ α;
        { toNUSR := { toNUNASR := { toACM := { } } }, toMWZ := { } } 
      [] [0.000030s]  (instCSR' α).toSR =?= { toNUSR := { toNUNASR := { toACM := { } } }, toMWZ := { } } 
        [] [0.000025s]  (instCSR' α).1 =?= { toNUSR := { toNUNASR := { toACM := { } } }, toMWZ := { } } 
          [] [0.000007s]  instSR α =?= { toNUSR := { toNUNASR := { toACM := { } } }, toMWZ := { } } 
            [] [0.000000s]  let src := instACM α;
                let src := instMWZ α;
                { toNUSR := { toNUNASR := { toACM := { } } },
                  toMWZ := { } } =?= { toNUSR := { toNUNASR := { toACM := { } } }, toMWZ := { } }

where Lean has to do some reductions but never opens up the SR instance.

Matthew Ballard (Aug 16 2023 at 14:09):

The extra eta expansion coming from the with pattern in instCSR forces Lean to open it up during unification.

Matthew Ballard (Aug 16 2023 at 14:12):

With no data, this is lower cost. But it is still twice as expensive as without the extra eta expansion.

With data, it gets worse.

Then when you have structures built on top of these and have to perform this unification multiple times you start to time out.

Matthew Ballard (Aug 16 2023 at 20:50):

Hmm. It looks like I didn't catch all the extra eta expansions

structure A where
  x : Nat
  y : Nat

structure B extends A where

def foo (b : B) : A :=
  { B with }

def bar (b : B) : A :=
  { b with }

#print foo
-- let src := b;
--  { x := src.toA.x, y := src.toA.y }

#print bar
-- { x := b.toA.x, y := b.toA.y }

Matthew Ballard (Aug 16 2023 at 20:53):

For example, we currently have

instance (priority := 200) : Semiring α :=
  { Ring α with }

which is not just Ring.toSemiring

Matthew Ballard (Aug 17 2023 at 16:51):

#6621 changing to Ring.toSemiring doesn't seem to help much. Either on its own or in addition to the reducing the extra eta expansions.

But, the new run does tell me that ring is incredibly unhappy with loss of eta expansions. That may be feeding the majority of the regressions.

Matthew Ballard (Aug 20 2023 at 02:55):

Calculus has turned green now along with all but some sporadic files and most of AG

Jireh Loreaux (Aug 20 2023 at 12:14):

What did you do to manage that?

Matthew Ballard (Aug 20 2023 at 12:28):

#mathlib4 > Field.toSemifield clicking on drop downs in traces in LinearAlgebra.FreeModule.Norm before and after the change until things diverged

Matthew Ballard (Aug 20 2023 at 12:30):

ring improved a good deal but is still suffering so optimistically I just need to do this a few more times :)

Matthew Ballard (Aug 20 2023 at 18:09):

The pattern

structure A where
  field : T

structure B extends A where

structure C extends B where

def b : B := sorry

def c : C := { b with field := sorry}

defeats the changes here to an extent since specifying a field of a provided structure forces Lean to unfold it to insert that field into the constructor (with all the other fields coming from b).

This is fine if you need to change the subobject field but we often have add := add which is just re-inferring the add we have from AddMonoid, for example. These expanded terms slow things down in the same way the extra eta expansions do.

I went back to docs#MonoidAlgebra.Basic and eliminated this pattern. Here is the diff from the speedcenter from the last run above.

Even less red. Most of the remaining red got materially better, except for non elliptic curve AG.

Matthew Ballard (Aug 20 2023 at 18:10):

Change from current master

Matthew Ballard (Aug 23 2023 at 19:23):

After looking through the regressions, I feel fairly confident there will be across the board improvement in Mathlib with these changes.

The simplest and most impactful change is a two-line addition to core (not counting doc strings).

I have opened an RFC lean4#2451 to discuss it and would appreciate any and all feedback!

Riccardo Brasca (Sep 06 2023 at 15:23):

I surely missed the last developments about all these problems, what is the current situation? It would be nice to write down somewhere what has been done (for example, are we supposed to avoid with now? Is this documented?).

Matthew Ballard (Sep 07 2023 at 01:03):

TLDR: currently structure instance elaboration produces terms that are needlessly unfolded causing unification to work harder than it should.

Two of the main ways this happens are :

  • in
structure A where
  x : Nat

structure B extends A where

def a : A := { x := 0 }

def b : B := {a with }

b is not elaborated as { toA := a } but as { toA := A.mk 0 }. This also seems like a bug since

structure C extends A

def c : C := sorry

def b : B := {c with}

will elaborate to { toA := c.toA }.

  • continuing with the above,
 def b : B := {a with x := 0}

results in the same expanded term as before.

Unification of b and b’ is quicker if they were both { toA := a} since a = a. With the two issues above, the inputs to A.mk all have to be checked. As terms get bigger and the behavior gets nested, slowdowns accumulate.

Matthew Ballard (Sep 07 2023 at 01:04):

To address the first issue, there is lean4#2478.

Matthew Ballard (Sep 07 2023 at 01:04):

To (partially) address the second, there is #6965

Matthew Ballard (Sep 07 2023 at 01:06):

Combined they give the following improvements including a 4.6% drop in total CPU instructions.

Anecdotally, they seem to dramatically help speed up instance synthesis for algebraic type classes. Some examples: one, two, three, four

Matthew Ballard (Sep 07 2023 at 01:07):

The dominant term in these improvements is the first the change

Matthew Ballard (Sep 07 2023 at 01:09):

There is one more way to get an expanded term

structure A where

structure B extends A where

def b : B := sorry

def a : A := {b with}

a does not elaborate as b.toA but as A.mk b.toA._

Matthew Ballard (Sep 07 2023 at 01:10):

This last one is fairly common pattern in mathlib. I am not sure if it is one that should be supported.

Matthew Ballard (Sep 07 2023 at 01:12):

To make it work, it would require a change to core. But perhaps mathlib should just use b.toA directly instead.

Riccardo Brasca (Sep 07 2023 at 06:56):

Thanks!

Kevin Buzzard (Sep 07 2023 at 06:57):

What do you think of the with in docs#RingHom.comp ?

Eric Wieser (Sep 07 2023 at 07:16):

@Kevin Buzzard, I have a PR in the works (#6791) that makes all morphisms flat structures, which would make the with irrelevant

Eric Wieser (Sep 07 2023 at 07:17):

We already fully disintegrate and reassemble morphisms when casting from ring_hom to monoid_hom (because we do so via MonoidHomClass), so there's no benefit at all to the nested structures there

Eric Wieser (Sep 07 2023 at 07:20):

Unfortunately this is now ugly in a different way because we have to fight lack of core support, but I think @Mario Carneiro had some ideas about changing core to fix this

Matthew Ballard (Sep 07 2023 at 15:18):

Kevin Buzzard said:

What do you think of the with in docs#RingHom.comp ?

I’m guessing you could remove that toFun. But not sure if you want to

I didn’t touch fields outside algebraic ones in #6965 for the most part.

Mario Carneiro (Sep 08 2023 at 18:31):

Eric Wieser said:

Unfortunately this is now ugly in a different way because we have to fight lack of core support, but I think Mario Carneiro had some ideas about changing core to fix this

Are you referring to something other than Matthew's core PR?

Matthew Ballard (Sep 08 2023 at 18:32):

Yes. Eric has a separate mathlib PR for flat structures on morphism classes

Eric Wieser (Sep 08 2023 at 20:11):

Mario, I'm referring to your class Foo extends @[flat] Bar proposal from some time ago

Matthew Ballard (Sep 21 2023 at 16:10):

Update: After succumbing to the pressures of life for a bit, this is back on track. #7257 has working copy of mathlib with the change in the toolchain (in case you want to throw your problem at it).

Comments:

  • AlgebraicGeometry is very unhappy that is has to work harder to unfold its giant terms to unify. Very unhappy. But the current design is not sustainable and only the result of a sprint to the finish in the port.
  • Pi.smul_apply enters into the rw vs simp realm with the change
  • OperatorNorm compiles things even more #7299
  • the original motivation of this thread RingTheory.Kaehler does not like the change either so I've resuscitated #6149 which almost knocks it out of the top 15 in CPU instructions

Matthew Ballard (Sep 21 2023 at 16:13):

Here is the most recent speed run

Scott Morrison (Sep 22 2023 at 01:43):

How about fix this up (title, remove draft status) as a proper PR? I'm very keen to get to the point we can merge this. There's still a bunch of work to move every possible change that can be done without lean4#2478 into separate PRs, so that it becomes possible to get lean4#2478 merged.

Kevin Buzzard (Sep 22 2023 at 07:10):

@Matthew Ballard yes great work on #6149 ! Is there an easy way to see a list of files with the most CPU instructions?

Matthew Ballard (Sep 22 2023 at 10:24):

Quickest way: go to the latest bench run and sort the values for each metric by decreasing size.


Last updated: Dec 20 2023 at 11:08 UTC