Zulip Chat Archive

Stream: mathlib4

Topic: the number of instances is too dang high


Matthew Ballard (Jun 27 2024 at 14:51):

As a experiment, I tried to made assert_not_exists OrderedSemiring succeed in RingTheory.Kaehler.Basic in branch#mrb/working. (Note: it is far from building all of mathlib and there are many sorry's)

The intent was to spot check the impact on performance from not having the projections from ordered algebra classes on typeclass synthesis.

Here its profile output

cumulative profiling times:
    attribute application 12.2ms
    compilation 59.6ms
    dsimp 122ms
    elaboration 859ms
    import 380ms
    initialization 16ms
    interpretation 2.8s
    linting 100ms
    parsing 23.2ms
    simp 2.41s
    tactic execution 2.32s
    type checking 2.15s
    typeclass inference 15s
    Command being timed: "lake env lean --profile Mathlib/RingTheory/Kaehler/Basic.lean"
    User time (seconds): 25.95
    System time (seconds): 0.43
    Percent of CPU this job got: 99%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:26.45
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 1104592
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 110
    Minor (reclaiming a frame) page faults: 221275
    Voluntary context switches: 25
    Involuntary context switches: 2144
    Swaps: 0
    File system inputs: 0
    File system outputs: 0
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 16384
    Exit status: 0

vs current master

cumulative profiling times:
    attribute application 12.9ms
    compilation 60.3ms
    dsimp 128ms
    elaboration 888ms
    import 498ms
    initialization 16ms
    interpretation 3.14s
    linting 105ms
    norm_num 0.653ms
    parsing 23.4ms
    simp 2.59s
    tactic execution 2.39s
    type checking 2.23s
    typeclass inference 20s
    Command being timed: "lake env lean --profile Mathlib/RingTheory/Kaehler/Basic.lean"
    User time (seconds): 31.47
    System time (seconds): 0.60
    Percent of CPU this job got: 99%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:32.26
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 1348592
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 602
    Minor (reclaiming a frame) page faults: 247703
    Voluntary context switches: 131
    Involuntary context switches: 3941
    Swaps: 0
    File system inputs: 0
    File system outputs: 0
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 16384
    Exit status: 0

The main difference is 15s in typeclass synthesis vs 20s depending on whether you have ordered algebra floating around.

Compared other files that had to be rid of ordered algebra to reach here, this on the low end. LinearAlgebra.TensorProduct.RightExactness dropped by about 40%.

Matthew Ballard (Jun 27 2024 at 14:54):

If you've stared a winding trace of instances synthesis, then you can viscerally appreciate the problem when there are too many projections to, say, Ring, in the context and the file itself doesn't deal with orders or analysis.

Matthew Ballard (Jun 27 2024 at 14:59):

This problem will only get worse as more complicated classes are built which intertwine algebra with other hierarchies so I want to get a sense of what people think should be done:

  1. Nothing -- not a great option but often the choice
  2. Core fix -- I don't know what avoids this without seriously touching instance synthesis which is a no-go currently.
  3. Carefully refactor to use the import structure to cut down on the instances in scope.
  4. Manually scope the instances in some way, eg write an attribute that erases the instance and scopes it on declaring.
  5. Unbundle algebra from everything else.

Are there other options?

Michael Stoll (Jun 27 2024 at 15:01):

I think @Yaël Dillies is planning on doing a version of 5 at some point.

Matthew Ballard (Jun 27 2024 at 15:01):

For ordered algebra, yes

Matthew Ballard (Jun 27 2024 at 15:03):

The disadvantage of 5 is that you will need a [Ring R] every time you want an OrderedRing.

Matthew Ballard (Jun 27 2024 at 15:04):

But nothing is hidden from the user as with 4.

Michael Stoll (Jun 27 2024 at 15:05):

I tried a version of 4 (for algebra vs. order) in #12778.

Matthew Ballard (Jun 27 2024 at 15:09):

Hmm. I would expect more benefits than that

Matthew Ballard (Jun 27 2024 at 15:12):

Yeah, it looks like some things leak through. For example,

variable {α : Type*} [OrderedRing α] in
#synth Ring α  -- OrderedRing.toRing

succeeds in the Kaehler.Basic

Matthew Ballard (Jun 27 2024 at 16:08):

Ok, I started on 5. It is something I can nibble at in between other commitments

Josha Dekker (Jun 27 2024 at 16:16):

Matthew Ballard said:

Ok, I started on 5. It is something I can nibble at in between other commitments

I'm sure you're already aware of @Yaël Dillies thoughts and work on this, but let me an issue that I believe is relevant just in case: https://github.com/leanprover-community/mathlib4/issues/11757

Floris van Doorn (Jun 27 2024 at 18:07):

I have gotten to similar conclusion around this message
It would be great if we could decouple parts of the hierarchy more. It would be great if we can investigate option 4 more. I expect that if we scope all instances used <10 times in Mathlib, that will already result in a noticeable gain.

Matthew Ballard (Jun 27 2024 at 20:05):

If anyone wants something to do branch#mrb/unbundle_algebra_from_orders

Matthew Ballard (Jun 27 2024 at 20:05):

Not sure when I will touch it again today

Floris van Doorn (Jun 27 2024 at 21:24):

I tested my example from 2 months ago again, and it fails 30x faster now. So the combination of lean4#3996, removing some instances from Mathlib and whatever else we did in the meantime did have a big effect on this.

Yaël Dillies (Jun 27 2024 at 22:19):

Yes, I have a pretty clear picture of what I want to try. Nothing concrete quite yet, and I won't have time to work on this before another two weeks

Matthew Ballard (Jun 28 2024 at 10:10):

Matthew Ballard said:

If anyone wants something to do branch#mrb/unbundle_algebra_from_orders

And I now realize I ran off too quickly and didn't push this at post time... :man_facepalming:

Matthew Ballard (Jun 28 2024 at 16:37):

This has reached submodules now.

Matthew Ballard (Jul 01 2024 at 17:03):

Results

Headline: only ~2% decrease in total instructions . Observations:

  • it is fairly straightforward except for the few cases where we built algebra data using ordered info, eg Nonneg types.
  • writing the algebraic instances always is not terrible is pretty annoying (looking at Valuation.Basic)
  • the simp lemmas proving algebraic statements using ordered info are still in scope and drag down performance
  • some files really win, eg Algebra.Star.NonUnitalSubalgebra
  • some namespaces are kinda chaotic, eg EllipticCurve

Floris van Doorn (Jul 01 2024 at 17:06):

-7.632 % on typeclass inference is very good, IMO.

Matthew Ballard (Jul 01 2024 at 17:08):

I can make it better with lean#2905 but overall we regress

Matthew Ballard (Jul 01 2024 at 17:28):

I think the best solution is to comb the import ball of yarn. It is probably the most involved though.

Matthew Ballard (Jul 01 2024 at 17:48):

In this direction, #14121 is awaiting-review now

Matthew Ballard (Jul 01 2024 at 17:58):

Also #14355

Matthew Ballard (Jul 01 2024 at 18:00):

Here is the graph from Algebra.Order.Group.Defs to Algebra.Star.NonUnitalSubalgebra.

Matthew Ballard (Jul 01 2024 at 18:14):

#14336 is another easy one avoiding importing Algebra.Order.BigOperators.Ring.Finset in BigOperators.Finsupp to talk about positive natural numbers.

Matthew Ballard (Jul 01 2024 at 18:26):

#14338 just deletes a redundant import. Waiting to see if it is needed downstream.

Matthew Ballard (Jul 01 2024 at 18:59):

Matthew Ballard said:

#14338 just deletes a redundant import. Waiting to see if it is needed downstream.

Nope. Builds just fine. Clearly easy now

Matthew Ballard (Jul 01 2024 at 20:16):

#14341 avoids ordered algebra in Data.Rat.Lemmas.

Matthew Ballard (Jul 01 2024 at 22:02):

#14129 avoids ordered algebra in Set.Pointwise.Basic by moving one declaration to a new file Set.Pointwise.BoundedMul

Kevin Buzzard (Jul 01 2024 at 22:24):

Matthew Ballard said:

Matthew Ballard said:

#14338 just deletes a redundant import. Waiting to see if it is needed downstream.

Nope. Builds just fine. Clearly easy now

Why isn't shake going nuts about this one?

Matthew Ballard (Jul 01 2024 at 22:26):

Not sure but rerunning locally it did. Then I had to do a shake hokey-pokey

Matthew Ballard (Jul 01 2024 at 22:26):

Oh wait. That was another one.

Matthew Ballard (Jul 01 2024 at 22:26):

I assume pollution of noshake

Matthew Ballard (Jul 01 2024 at 22:27):

The motivation for #mathlib4 > shake hangs with no config file

Mario Carneiro (Jul 01 2024 at 22:28):

what do you mean by pollution?

Matthew Ballard (Jul 01 2024 at 22:28):

People just run --update reflexively

Matthew Ballard (Jul 01 2024 at 22:30):

Here is a question: how many import Mathlib.Logic.Function.Iterate can be straight deleted?

Mario Carneiro (Jul 01 2024 at 22:58):

Is there something about Mathlib.Logic.Function.Iterate that foils shake? Some spot checks suggest that it can indeed be deleted in some cases but it's not noshaken

Matthew Ballard (Jul 01 2024 at 22:58):

Oh, I assumed it was noshaken

Mario Carneiro (Jul 01 2024 at 23:00):

most likely it's transitively imported by sibling imports in many files

Matthew Ballard (Jul 01 2024 at 23:04):

#14122 moves ordered algebra instances on products to a separate file

Kevin Buzzard (Jul 01 2024 at 23:04):

I know it's been said before but that declarations diff thing is an absolute godsend.

Matthew Ballard (Jul 01 2024 at 23:04):

Yes, that is very nice

Kim Morrison (Jul 01 2024 at 23:41):

Matthew Ballard said:

Also #14355

@Matthew Ballard I think this was a typo, I can't find it.

Matthew Ballard (Jul 01 2024 at 23:47):

#14335 ?

Matthew Ballard (Jul 02 2024 at 10:46):

#14351 avoids bundled ordered algebra in unbundled ordered algebra

Matthew Ballard (Jul 02 2024 at 22:49):

#14371 splits Data.Nat.Cast.Order into files depending on unbundled ordered algebra and bundled ordered algebra.

Following up from that #14370 splits Algebra.Order.Nonneg.Ring similarly.

The next step is NNRat.Defs.

Ralf Stephan (Jul 03 2024 at 08:04):

Could you also have a look at #14357?

Matthew Ballard (Jul 03 2024 at 12:35):

#14380 splits Algebra.Order.Group.Defs moving the unbundled results to Algebra.Order.Group.Unbundled.Basic

Matthew Ballard (Jul 10 2024 at 14:04):

Some results coming from minimizing imports.

The goal was to cleave Algebra.Star.NonUnitalSubalgebra from OrderedCommMonoid. The benefits are the same as unbundling the algebra from bundled ordered algebra hierarchy.

Matthew Ballard (Jul 10 2024 at 14:15):

Behold the graph from bundled ordered rings to Module.PID

Kim Morrison (Jul 10 2024 at 16:02):

We need sober spaces to talk about modules over a PID? :-)

Matthew Ballard (Jul 10 2024 at 17:21):

Exactly how I was taught it in Algebra I

Yaël Dillies (Jul 10 2024 at 19:05):

Algebraic geometry in my PIDs? :face_with_raised_eyebrow:

Matthew Ballard (Jul 10 2024 at 19:56):

One thing you don't see compared to the unbundling is the increase in linting though. We are fairly flat for instructions.

Matthew Ballard (Jul 10 2024 at 20:00):

Two main things are cause for the tangles:

  1. At the bottom of the ordered algebra stuff, it is quite common to have a single file with unbundled classes/results and bundled classes/results together. This means when you want something basic to glue together mul and <, for example, then the full bundled machinery comes for a ride.
  2. There is tendency to put very small sections in Basic files where you prove your thing is LinearOrderedField under assumptions that make you import things you don't otherwise need for Basic.

Matthew Ballard (Jul 10 2024 at 21:45):

If you don't want modules over a PID to require Mathlib.Topology.Basic #14631

Matthew Ballard (Aug 16 2024 at 12:33):

I am still on this kick with PRs that should be pretty uncontroversial: #15071, #15072, and #15152


Last updated: May 02 2025 at 03:31 UTC