Zulip Chat Archive
Stream: lean4
Topic: kernel universe normalization issue
Kevin Buzzard (Jan 28 2025 at 14:56):
Currently the slowest file in mathlib is Mathlib.RingTheory.Kaehler.JacobiZariski. In attempting to try and figure out why, we've stumbled upon the following phenomenon.
A summary: the proof Foo.{max u w w'} = Foo.{max (max u w) w'} := by rfl is elaborated very quickly but typechecking in the kernel takes several seconds. The slow mathlib file is full of stuff like that (implicitly). A possible reason: this is a gigantic term and the kernel is unfolding it before noticing that the universes can be unified. Here's some code demonstrating the problem (not mathlib-free of course, but if someone has a slick way to make a gigantic term involving several universes which would be costly to unfold then it might be possible to give a simple mathlib-free example):
import Mathlib.RingTheory.Kaehler.JacobiZariski
namespace Algebra.Generators
open KaehlerDifferential TensorProduct MvPolynomial
universe w' w u v uT
variable {R : Type u} {S : Type v} [CommRing R] [CommRing S] [Algebra R S]
{T : Type uT} [CommRing T] [Algebra R T] [Algebra S T] [IsScalarTower R S T]
(Q : Generators.{w} S T) (P : Generators.{w'} R S)
set_option trace.profiler true
-- quick
-- [Elab.command] [0.041051] example
example :
Extension.CotangentSpace.{max u w w'} (Q.comp P).toExtension =
Extension.CotangentSpace.{max u w w'} (Q.comp P).toExtension := rfl
-- slow
-- [Elab.command] [2.038793] example :
-- ...
-- [Kernel] [1.874395] typechecking declaration
example :
Extension.CotangentSpace.{max u w w'} (Q.comp P).toExtension =
Extension.CotangentSpace.{max (max u w) w'} (Q.comp P).toExtension := rfl
Currently the slow (500 line) mathlib file starts universe u₁ u₂ u₃ u₄ w' w u v uT. For an experiment I tried replacing this line with universe u and using one universe throughout the file: the result was a saving of nearly half a trillion instructions (see here ) and a 20% reduction in olean file size.
I would be happy to do further experiments if necessary, but the naive path to a mathlib-free minimisation will just involve building a full algebra hierarchy from scratch (which I've done before...). Are there people out there who understand how the kernel handles universes who can shed some light on what is actually happening here?
Eric Wieser (Jan 28 2025 at 16:55):
The other note here is that lean4#5695 makes the second "slow" example likely to occur, right?
Christian Merten (Jan 28 2025 at 19:54):
I minimized a bit, still imports RingTheory.TensorProduct.Basic, but I need to stop now. If it helps, here it is:
foo
Kevin Buzzard (Jan 28 2025 at 22:22):
One could no doubt persevere down this line but if there is some slick way to perhaps autogenerate some term which takes forever to unfold and eats three universes then this might do just as well.
Just to stress that the phenomenon here is not artificial -- here is an explanation of an example of it actually happening in an algebraic geometry file in mathlib. Algebraic geometry is famously slow in mathlib and I wonder to what extent this is being caused by this funny universe issue.
Sebastian Ullrich (Jan 30 2025 at 22:38):
So one issue here seems to be that Extension.CotangentSpace is a relatively complex definition, but it is marked as abbrev. This essentially tells the kernel "don't bother comparing my (level/expr) arguments modulo defeq, just unfold me". Which may lead to the same thing happening with some other abbrev inside its definition, I haven't checked that far. Making it def makes it fast.
Christian Merten (Jan 30 2025 at 23:18):
Did you try this on the minimized version I posted above or in the actual file? Because I can confirm that it makes it fast in the minized version, but when I tried it on master, the performance did not change / got worse (I had sorried out some proofs that broke, but that should make it faster).
Sebastian Ullrich (Jan 31 2025 at 07:06):
I tried it in Mathlib directly below the actual definition, no other changes
Johan Commelin (Feb 03 2025 at 05:22):
@Christian Merten Would you mind trying again, to double check?
Christian Merten (Feb 03 2025 at 10:59):
Johan Commelin said:
Christian Merten Would you mind trying again, to double check?
I tried it again at #21364 sorrying out all proofs that break, which unfortunately the speed center does not like. But local lake env perf stat lean Mathlib/RingTheory/Kaehler/JacobiZariski.lean is still at 736 * 10^9 instructions which is very close to the 727 * 10^9 instructions on master.
Matthew Ballard (Feb 03 2025 at 13:52):
Things I investigated
- Implementing Eric's suggestion to normalize
maxassociativity the other way - Reversing the accumulator in the kernel to match the convention of the elaborator
- Trying to sprinkle docs#Lean.Level.normalize in places to test during elaboration
I was successful with the first two but nothing really improved. The last one was not as successful as I would have liked. In particular, I could not get the universes in normalized. I couldn't get them to change at all.
I will break off my confusion on that into a separate post shortly ( :fingers_crossed:)
Matthew Ballard (Feb 03 2025 at 14:03):
But, I will say that it seems that the level metavariable introduced in such patterns (free universe parameters in fields) doesn't get instantiated until really, really late in elaboration.
Sebastian Ullrich (Feb 03 2025 at 14:05):
Christian Merten said:
Johan Commelin said:
Christian Merten Would you mind trying again, to double check?
I tried it again at #21364 sorrying out all proofs that break, which unfortunately the speed center does not like. But local
lake env perf stat lean Mathlib/RingTheory/Kaehler/JacobiZariski.leanis still at 736 * 10^9 instructions which is very close to the 727 * 10^9 instructions on master.
Let's focus on the slow example given above
import Mathlib.RingTheory.Kaehler.Polynomial
import Mathlib.RingTheory.Generators
open KaehlerDifferential TensorProduct MvPolynomial
namespace Algebra
universe w u v
variable {R : Type u} {S : Type v} [CommRing R] [CommRing S] [Algebra R S]
namespace Extension
variable (P : Extension.{w} R S)
/--
The cotangent space on `P = R[X]`.
This is isomorphic to `Sⁿ` with `n` being the number of variables of `P`.
-/
abbrev CotangentSpace : Type _ := S ⊗[P.Ring] Ω[P.Ring⁄R]
def CotangentSpace' : Type _ := S ⊗[P.Ring] Ω[P.Ring⁄R]
universe uT w'
variable {R : Type u} {S : Type v} [CommRing R] [CommRing S] [Algebra R S]
{T : Type uT} [CommRing T] [Algebra R T] [Algebra S T] [IsScalarTower R S T]
(Q : Generators.{w} S T) (P : Generators.{w'} R S)
set_option trace.profiler true
example :
Extension.CotangentSpace.{max u w w'} (Q.comp P).toExtension =
Extension.CotangentSpace.{max (max u w) w'} (Q.comp P).toExtension := rfl
example :
Extension.CotangentSpace'.{max u w w'} (Q.comp P).toExtension =
Extension.CotangentSpace'.{max (max u w) w'} (Q.comp P).toExtension := rfl
[Elab.command] [1.655671]
[Elab.command] [0.035150]
Matthew Ballard (Feb 03 2025 at 14:22):
NB: No provision of my statements shall be interpreted or enforced in a manner that alters, diminishes, or otherwise impacts my support for making that a def or structure.
Matthew Ballard (Feb 03 2025 at 15:11):
#lean4 > instantiation of level metavariables in structure instances is my precise question(s)
Kevin Buzzard (Jan 18 2026 at 13:26):
So this issue has come up again in #34088. See also . Let me summarise what's going on.
In the following code snippet (the first delaration of Mathlib/RingTheory/Kaehler/Polynomial.lean but twice) (which ironically just took down my PC making me lose the original version of this message)...
two almost identical declarations
we have exactly the same code twice, modulo a universe having a different name (alphabetically before or after u). These two declarations have very different profiling. Elaboration of the fast declaration is 4 times quicker:
[Elab.command] [23198470.000000] noncomputable def mvPolynomialEquiv_fast (σ : Type t) :
vs
[Elab.command] [107480532.000000] noncomputable def mvPolynomialEquiv_slow (σ : Type w) :
but, more importantly, typechecking of the fast declaration is 200 times quicker (not a typo), due to
[Elab.async] [222081.000000] Lean.addDecl ▼
[Kernel] [222033.000000] ✅️ typechecking declarations [KaehlerDifferential.mvPolynomialEquiv_fast._proof_2]
[Elab.async] [317580.000000] Lean.addDecl ▼
[Kernel] [317532.000000] ✅️ typechecking declarations [KaehlerDifferential.mvPolynomialEquiv_fast._proof_1]
vs
[Elab.async] [54494621.000000] Lean.addDecl ▼
[Kernel] [54494566.000000] ✅️ typechecking declarations [KaehlerDifferential.mvPolynomialEquiv_slow._proof_2]
[Elab.async] [57765580.000000] Lean.addDecl ▼
[Kernel] [57765519.000000] ✅️ typechecking declarations [KaehlerDifferential.mvPolynomialEquiv_slow._proof_1]
You can see the 200x speedup if you just watch the file compile: the orange bars for the fast declaration disappear and that's it, whereas the orange bars for the second declaration disappear four times more slowly but, crucially, after that, one orange bar remains on the def line for several seconds (this is the typechecking).
Taking everything apart, the issue seems to be the following. In the slow trace, with pp.all on, we have this:
a slow defeq
and these terms are essentially identical: there is one small change somewhere in the middle, but the two choices are defeq under with_reducible_and_instances, and then there is one other difference which is that one side has max v u five times and the other side has max u v instead. This is for some reason completely derailing both elaboration and typechecking: the large number 815425 of heartbeats here in elaboration is because Lean unfolds everything and the term is complex.
So that's the problem (and I'm running out of characters).
Kevin Buzzard (Jan 18 2026 at 13:43):
So what is the solution? The reason I am bringing this up again is that in the discussion last year there was a lot of talk about associativity of max, but here the issue seems to be commutativity of max. First let me say that mathematically it is important for us to have two different universes here. Of course the "hack" solution in mathlib is just to change the universe name for to something before u (it is currently u_1 which causes slowness; #34088 changes it to t and radar says this saves 117.3G instructions) and then all the problems in that declaration are solved, but people have objected that this is not ideal (and I agree) because who knows where else this is happening. So let me talk about other approaches.
In the discussion last year, Sebastian pointed out that making a declaration irreducible would solve this problem. Perhaps there is a solution along these lines here but I don't know it. For computer scientists who are baulking at the size of the pp.all terms above, this is nothing compared to some of of the stuff we have in this corner of mathlib.
Another solution is that core just changes their algorithms so that they're fast on the term above. Of course this is probably ridiculous (I have no understanding of core).
Another solution is that we somehow figure out why we have ended up with both max u v and max v u, and somehow try to ensure that this never actually happens. I don't really know how to start with this but if it just entails setting pp.universes true and going through everything carefully, I might be tempted to do this.
I'm aware that the code I've posted is not a #mwe as far as this channel is concerned because it's mathlib-dependent; the reason it's very convenient to be mathlib-dependent is that the slowdown is, I believe, caused by Lean unfolding everything, and a crucial part of the slowness is that there is an absolute ton of stuff which needs to be unfolded. Although now I've written this, I'm realising that actually my claim that "lean is unfolding everything" actually means "I have looked at elaboration traces and note that Lean is unfolding everything" -- I have no idea what is happening during typechecking.
Kevin Buzzard (Jan 18 2026 at 13:47):
Here is an explicit comparison of the defeq terms the kernel is wrestling with, firstly with max u v and max v u, and secondly with the max v u's all changed to max u v. With max u v and max v u the profiling is
[Elab.command] [1243934.000000]
[Kernel] [3600984.000000] ✅️ typechecking declarations [_example]
and when it's just max u v on both sides the profiling is
[Elab.command] [185100.000000]
[Kernel] [5141.000000] ✅️ typechecking declarations [_example]
so here we have a 700x speedup for typechecking when the kernel does not have to wrestle with the two different ways to express max u v.
two defeq checks
Kevin Buzzard (Jan 18 2026 at 15:39):
Wooah. Filling in an easy-to-fill-in-by-unification hole changes a max v u to a max u v and suddenly the slow declaration typechecks super-quickly. In the code above, in the slow declaration, change
__ := (MvPolynomial.mkDerivation _ (Finsupp.single · 1)).liftKaehlerDifferential
to
__ := (MvPolynomial.mkDerivation R (Finsupp.single · 1)).liftKaehlerDifferential
and boom typechecking is much much faster.
I've been trying to figure out the "cause" of how we end up with max u v and max v u in the same term and it seems that this plays a role. Note the difference in terms:
right_inv with R
⊢ Function.RightInverse.{(max u v) + 1, (max u v) + 1}
(⇑(Finsupp.linearCombination.{v, max v u, max u v} (MvPolynomial.{v, u} σ R) fun x =>
(D.{u, max v u} R (MvPolynomial.{v, u} σ R)) (MvPolynomial.X.{u, v} x)))
(AddHom.toFun.{max u v, max u v}
(LinearMap.toAddHom.{max u v, max u v, max u v, max u v}
(Derivation.liftKaehlerDifferential.{u, max u v, max u v}
(MvPolynomial.mkDerivation.{v, u, max u v} R fun x => Finsupp.single.{v, max u v} x 1))))
right_inv with _
⊢ Function.RightInverse.{(max u v) + 1, (max u v) + 1}
(⇑(Finsupp.linearCombination.{v, max v u, max u v} (MvPolynomial.{v, u} σ R) fun x =>
(D.{u, max v u} R (MvPolynomial.{v, u} σ R)) (MvPolynomial.X.{u, v} x)))
(AddHom.toFun.{max v u, max u v}
(LinearMap.toAddHom.{max v u, max v u, max v u, max u v}
(Derivation.liftKaehlerDifferential.{u, max v u, max u v}
(MvPolynomial.mkDerivation.{v, u, max u v} R fun x => Finsupp.single.{v, max u v} x 1))))
The AddHom.toFun universes are {max u v, max u v} when R is supplied explicitly, and {max v u, max u v} when it's inferred. In particular, it looks like it's going to be hard to work around this issue by making certain universes explicit (as I had initially hoped), because who knows if users are going to supply terms explicitly or not when they can easily be inferred by unification. All in all this makes for a confusing user experience (at least, I'm pretty confused by all of this). Furthermore, mathlib maintainers (reasonably) seem resistant to "hacks" which fix individual instances of the problem by abusing the fact that one can use alphabetical order of universe names as a workaround.
Kevin Buzzard (Jan 18 2026 at 16:04):
Supplying explicit arguments / universe alphabetical order phenomena. Maybe this is possible to minimise to be mathlib-free. However it's not entirely clear that this would help. In some sense it just shows that it's going to be hard for users to control universe orders and perhaps we need another approach (or we need to learn to live with these typechecking difficulties caused by the fact that max u v and max v u are both currently allowed).
The order of the universes in the first max is determined by whether a hole is filled in by the user or by unification. The order of the universes in the second max is determined by alphabetical order.
import Mathlib.RingTheory.Kaehler.Basic
import Mathlib.Algebra.MvPolynomial.PDeriv
import Mathlib.Algebra.Polynomial.Derivation
universe t u v
variable (R : Type u) [CommRing R]
set_option pp.universes true
variable (σ : Type v) in
#check ((MvPolynomial.mkDerivation _ (Finsupp.single · 1)).liftKaehlerDifferential.toFun :
Ω[MvPolynomial σ R⁄R] → σ →₀ MvPolynomial σ R)
-- AddHom.toFun.{max v u, max u v}
variable (σ : Type v) in
#check ((MvPolynomial.mkDerivation R (Finsupp.single · 1)).liftKaehlerDifferential.toFun :
Ω[MvPolynomial σ R⁄R] → σ →₀ MvPolynomial σ R)
-- AddHom.toFun.{max u v, max u v}
variable (σ : Type t) in
#check ((MvPolynomial.mkDerivation _ (Finsupp.single · 1)).liftKaehlerDifferential.toFun :
Ω[MvPolynomial σ R⁄R] → σ →₀ MvPolynomial σ R)
-- AddHom.toFun.{max t u, max t u}
variable (σ : Type t) in
#check ((MvPolynomial.mkDerivation R (Finsupp.single · 1)).liftKaehlerDifferential.toFun :
Ω[MvPolynomial σ R⁄R] → σ →₀ MvPolynomial σ R)
-- AddHom.toFun.{max u t, max t u}
Michael Stoll (Jan 18 2026 at 16:15):
(Disclaimer: I've been following this from the sidelines, so what I say should not count for much.)
It looks like there are two independent problems here:
- the ordering of the universe variables in a
max; - the slowness when the orderings differ in two terms that are being compared.
I would think that it is more important to get to the bottom of the second point, because that seems what causes the unexpected slow-downs in parts of Mathlib.
Kevin Buzzard (Jan 18 2026 at 16:20):
Well, I guess another approach would be to literally not allow one of max u v and max v u (and similarly to make a decision about associativity normalisation and only allow one normalisation).
Mario Carneiro (Jan 20 2026 at 01:04):
Here's a proper minimization @Kevin Buzzard :
import Lean
universe u v
structure X : Type u where x : Nat
opaque a (_ : Nat) : X.{u} := ⟨0⟩
opaque b (_ _ : X.{u}) : X.{u} := ⟨0⟩
elab "big%" : term =>
let rec foo : Nat → Nat → Lean.Expr
| 0, i => Lean.mkApp (Lean.mkConst ``a [.param `u]) (Lean.mkNatLit i)
| n+1, i => Lean.mkApp2 (Lean.mkConst ``b [.param `u]) (foo n (2*i)) (foo n (2*i+1))
pure (foo 12 0)
structure A where
opaque foo (_ : X.{u}) : A := ⟨⟩
structure B extends A
set_option maxRecDepth 100000 in
def bar : B := ⟨foo big%⟩
set_option trace.profiler.useHeartbeats true
set_option trace.profiler true
example : B.toA bar.{max u v} = B.toA bar.{max u v} := rfl
example : B.toA bar.{max v u} = B.toA bar.{max u v} := rfl
Mario Carneiro (Jan 20 2026 at 01:05):
I think the diagnosis is that universe equality should be tested (or more aggressively normalized) before unfolding projections and instances
Mario Carneiro (Jan 20 2026 at 01:08):
and here's a minimization of the original mathlib example:
import Mathlib.RingTheory.Kaehler.Basic
universe u v
variable (R : Type u) [inst : CommRing R] (M : Type (max u v)) [C : CommRing M] [A : Algebra R M]
set_option trace.profiler.useHeartbeats true
set_option trace.profiler true
example :
(@AddCommGroup.toAddCommMonoid.{max v u} (@KaehlerDifferential.{u, max v u} R M inst C A)
(@instAddCommGroupKaehlerDifferential.{max v u, u} R M inst C A)) =
(@AddCommGroup.toAddCommMonoid.{max v u} (@KaehlerDifferential.{u, max v u} R M inst C A)
(@instAddCommGroupKaehlerDifferential.{max u v, u} R M inst C A))
:= rfl
Kevin Buzzard (Jan 20 2026 at 09:06):
Mario Carneiro said:
Here's a proper minimization @Kevin Buzzard :
Thanks! Again both elaboration and typechecking struggle here. The bad elaboration trace also highlights something I've seen before: Lean struggles to do a calculation and then does essentially exactly the same calculation again (it's already unfolded everything, but does it a second time for good measure):
[definition.value] [3673736.000000] _example ▼
[step] [1845532.000000] expected type: bar.toA = bar.toA, term
rfl ▶
[Meta.isDefEq] [1827993.000000] ✅️ bar.toA = bar.toA =?= bar.toA = bar.toA ▶
Kevin Buzzard (Jan 22 2026 at 12:03):
Kim Morrison (Jan 27 2026 at 12:32):
Unfortunately Mario's minimization above doesn't fully reflect what's going on here, in the sense that there exist narrow fixes for that minimization that don't help with the real problem.
Kim Morrison (Jan 27 2026 at 12:32):
We can drastically speed up the minimization:
| Metric | Master | #12175 | #12186 |
|---|---|---|---|
max u v elab |
5,570 | 5,775 | 6,015 |
max v u elab |
3,677,625 | 3,710,609 | 7,373 |
max u v kernel |
97 | 97 | 98 |
max v u kernel |
114,901 | 129 | 114,902 |
Kim Morrison (Jan 27 2026 at 12:32):
But these PRs don't help Kahler/Polynomial.lean or JacobiZariski.lean (with or without #26008):
Measured CPU instructions on chonk (Linux) using perf stat.
Polynomial.lean (no workaround in place)
| Toolchain | Instructions |
|---|---|
| nightly-2026-01-27 | 159.78B |
| PR #12175 | 160.13B |
| PR #12186 | 159.79B |
| PR #12188 (kernel proj) | 158.95B |
| PR #12189 (elab proj) | 159.75B |
JacobiZariski.lean
| Workaround | Toolchain | Instructions |
|---|---|---|
| with #26008 | nightly | 212.93B |
| with #26008 | PR #12175 | 213.96B |
| with #26008 | PR #12186 | 212.99B |
| with #26008 | PR #12188 (kernel proj) | 212.23B |
| with #26008 | PR #12189 (elab proj) | 212.93B |
| reverted | nightly | 742.23B |
| reverted | PR #12175 | 743.40B |
| reverted | PR #12186 | 742.02B |
| reverted | PR #12188 (kernel proj) | 732.57B |
| reverted | PR #12189 (elab proj) | 741.97B |
Mathlib Conclusions
- Neither PR #12175 nor PR #12186 improves real mathlib file performance
- PR #12188 (kernel projection fix) shows a small improvement (~1.3% on reverted JacobiZariski: 732.6B vs 742B)
- PR #12189 (elaborator projection fix) shows no improvement
- The workaround in #26008 provides a 3.5x speedup (742B → 213B instructions)
- The slowdown pattern in JacobiZariski appears different from the synthetic #12102 test case
- Synthetic test: #12175 gives 890x kernel speedup, #12186 gives 499x elab speedup
Kevin Buzzard (Jan 27 2026 at 21:23):
Thanks so much for thinking about this!
How good is the minimisation tool now? Concrete question: are we yet at the point where we can have a mathlib-free docs#KaehlerDifferential.mvPolynomialEquiv ? If so, how long is the file? If not, how far can one get? For example can one get CommRing R -> CommRing (MvPolynomial σ R) in one run, Finsupp in another run, tensor products in a third, and then try and muddle along manually after that?
Kim Morrison (Jan 27 2026 at 21:50):
I don't think it makes sense to try to patch together different minimizations. (Or at least, I've never had success with this.)
Kim Morrison (Jan 27 2026 at 21:53):
If there is time and attention available, I think the most useful workflow is:
- Copy Kahler/Polynomial.lean into a clone of mathlib-minimizer as e.g. MathlibMinimizer/KahlerPolynomial.lean
- Add #guard_msg statements that ensure it is taking at least as many heartbeats as expected (how many is that?)
- Run
lake exe minimize MathlibMinimizer/KahlerPolynomial.lean - This will take a long time (possibly days) to run by itself.
- It is fine to interrupt it, and resume with
lake exe minimize MathlibMinimizer/KahlerPolynomial.lean --resume - In fact, if you interrupt it intentionally and have some time, you can help it along if you see further opportunities to minimize by hand. The same
--resumecommand will work! This will probably give better/faster results, but loses reproducibility.
Kim Morrison (Jan 27 2026 at 21:57):
Or rather, take the still-with-mathlib minimization Mario posted above at
Kim Morrison (Jan 27 2026 at 22:00):
In particular this is the human input required for the minimizer tool:
import Mathlib.RingTheory.Kaehler.Basic
universe u v
variable (R : Type u) [inst : CommRing R] (M : Type (max u v)) [C : CommRing M] [A : Algebra R M]
#guard_msgs in
set_option maxHeartbeats 50 in
example :
(@AddCommGroup.toAddCommMonoid.{max v u} (@KaehlerDifferential.{u, max v u} R M inst C A)
(@instAddCommGroupKaehlerDifferential.{max u v, u} R M inst C A)) =
(@AddCommGroup.toAddCommMonoid.{max v u} (@KaehlerDifferential.{u, max v u} R M inst C A)
(@instAddCommGroupKaehlerDifferential.{max u v, u} R M inst C A))
:= rfl
/--
error: (deterministic) timeout at `whnf`, maximum number of heartbeats (500) has been reached
Note: Use `set_option maxHeartbeats <num>` to set the limit.
Hint: Additional diagnostic information may be available using the `set_option diagnostics true` command.
-/
#guard_msgs in
set_option maxHeartbeats 500 in
example :
(@AddCommGroup.toAddCommMonoid.{max v u} (@KaehlerDifferential.{u, max v u} R M inst C A)
(@instAddCommGroupKaehlerDifferential.{max v u, u} R M inst C A)) =
(@AddCommGroup.toAddCommMonoid.{max v u} (@KaehlerDifferential.{u, max v u} R M inst C A)
(@instAddCommGroupKaehlerDifferential.{max u v, u} R M inst C A))
:= rfl
Kim Morrison (Jan 27 2026 at 22:03):
Here we:
- Put
#guard_msgson the "this should succeed" example: the minimizer leaves invariant everything appearing after the first#guard_msgsin the file. - There's then a "this should fail" example, with #guard_msgs enforcing the timeout.
- I've picked the maxHeartbeat numbers with a bit of slack: in fact the failing example fails with 800 heartbeats, but it's okay if it gets marginally faster as we reduce.
- But still the
#guard_msgsenforce that there is a 10x speedup by changing the universe variables.
Kim Morrison (Jan 27 2026 at 22:07):
I've started this now as lake exe minimize MathlibMinimizer/Kahler.lean.
Kim Morrison (Jan 27 2026 at 22:37):
(And just recording that none of lean#12175, lean#12186, lean#12188, lean#12189 help on the in-mathlib example.)
Jovan Gerbscheid (Jan 29 2026 at 16:50):
Here's what I think is the underlying issue. There are two different ways in which elaboration can get to the universe max u v/max v u:
- At some point, the types are partially known, so the level gets set to something like
max v ?u.4687. At a later point?u.4687gets set tou, which is then simply inlined directly. This order does not depend on the names of the levels - The level is fully determined at once, so the result is
max u v, whereuandvare sorted. This order does depend on the names.
These two ways fundamentally sort the levels differently. This also explains why specifying a seemingly trivial extra argument instead of a _ can change the ordering, because it might switch from (1) to (2).
I propose that the solution should be to normalize universe levels at the moment of metavariable instantiation, i.e. at the moment that max v ?u.4687 gets turned into max v u.
I have implemented a stupid/inefficient version of this in lean#12225. I changed instantiateMVars to normalize levels after instantiation (i.e. go through the whole twice, once for instantiating metavariables, and once for normalizing universes).
The result of this test is an overal increase by 3% in build instructions (because of the inefficient implementation), but two files became more than twice as fast:
Mathlib.AlgebraicGeometry.AffineSpace -154.5G -61.5%
Mathlib.RingTheory.Kaehler.Polynomial -120.2G -77.7%
Kevin Buzzard (Jan 29 2026 at 22:37):
Many thanks! So this is evidence that this is at least one of the issues slowing down algebraic geometry/commutative algebra.
In fact it's only these two files which had substantial gains.
Kim Morrison (Jan 29 2026 at 23:15):
@Jovan Gerbscheid, from this understanding do you think you can come up with a Mathlib-independent benchmark, which would be fixed by lean#12225 (but not fixed by my lean#12175 and #12186 above)?
This benchmark would be the sine qua non of landing a solution, whether lean#12225 or something else.
Kevin Buzzard (Jan 29 2026 at 23:35):
I've not had a chance to think about this today but it looks like the only major gains in mathlib with this experiment are in two files quite deep into mathlib. Hopefully tomorrow I'll have time to experiment with the minimiser.
Jovan Gerbscheid (Jan 30 2026 at 00:24):
I can try to make a mathlib independent benchmark, but I don't directly see how to do it.
The adaptation branch ran into a couple of linter errors that fired thanks to the improved normalization of universes. I have now back ported these in #34591
Yury G. Kudryashov (Jan 30 2026 at 00:33):
Some of the lemmas you're removing were meant to have different LHS and RHS.
Kevin Buzzard (Feb 01 2026 at 00:32):
Here's something I discovered when making what I thought were innocuous changes to the PR #34088.
variations on a mathlib definition
All three declarations look syntactically identical. The only differences are the universe of (explicit before v in the alphabet, explicit after v, and unspecified). However the time taken to elaborate each declaration in mHeartbeats goes from a 7 digit number (version 2) to an 8 digit number (version 3) to a 9 digit number (version 4).
The difference between version 2 and version 3 is https://github.com/leanprover/lean4/issues/12102 , the topic of this thread.
The new thing I'm pointing out is version 4, where a variable (σ) is given without giving any type at all. This is apparently a disaster for this particular declaration; even though universe alphabetical order is the "good" order like in version 2 (u_1 is before v), typechecking is as bad as version 3 and furthermore elaboration goes completely through the roof; it's the worst of the three. This was totally bewildering to me until I spotted the difference:
#print KaehlerDifferential.mvPolynomialBasis2 -- KaehlerDifferential.mvPolynomialBasis2.{u, v}
#print KaehlerDifferential.mvPolynomialBasis3 -- KaehlerDifferential.mvPolynomialBasis3.{v, w}
#print KaehlerDifferential.mvPolynomialBasis4 -- KaehlerDifferential.mvPolynomialBasis4.{v, u_1}
My guess is that the reason v4 is such a disaster is that when the universe ordering algorithm runs, the universe of isn't known so the universe of v4 is {v, ?} and then later on ? ends up being a universe created by a universe-creating algorithm which makes something of the form u_number so now the universes are out of alphabetical order which for some reason makes things even worse.
Mathlib master currently has v4 (but I'm working to change this). To make things even more confusing, "fixing" the declaration by supplying a universe name for and taking us to v2 (which I did in one version of #34088) makes the declaraction compile much much faster but makes another file in mathlib Mathlib.RingTheory.Extension.Cotangent.Basic compile much much slower because somehow the speediness of that file is relying on the universes being the costly way around in this declaration. See the "large changes" in https://github.com/leanprover-community/mathlib4/pull/34088#issuecomment-3829219909 , all caused by changing (σ) to (σ : Type u).
The tl;dr is that genuinely drastic speed changes (50x speedup for elaboration, 175x speedup for typechecking) in this corner of mathlib are caused by universe orderings which the end user has very little control over. The universe ordering algorithm sometimes runs on universes with metavariables such as max u_2 ?; whenever this happens this is a recipe for universes ending up not in alphabetical order which then triggers unnecessary unfolding of terms when applying defeq for structure eta in elaboration, and also triggers something which I cannot see but which is also taking a very long time in typechecking. Furthermore fixing things is nigh-on impossible because everything is really brittle.
Kevin Buzzard (Feb 02 2026 at 22:53):
OK I give up trying to hack my way around this. See for example https://github.com/leanprover-community/mathlib4/pull/34088#issuecomment-3830005136 where I make the declaration above compile as quickly as possible by doing an alphabetical hack in one file; this causes a huge slowdown in a second file which I attempt to fix with a second hack, changing a universe from v to t in this second file, and then I get random slowdowns across commutative algebra which total 1.7T-instructions (that's tera-instructions). Just for a 1-character change. Note in particular that right now it is the case that changing a Type v to a Type t in Mathlib/RingTheory/Extension/Cotangent/Basic.lean causes this in everyone's favourite file:
🟥 build/module/Mathlib.RingTheory.Kaehler.JacobiZariski//instructions: +634.5G (+270.79%)
There could easily be a 1-character change in mathlib which gives a 1.7T speed-up! But the best next step would I guess to be to find a mathlib-free minimisation.
tl;dr: I believe that one key reason Mathlib.RingTheory.Kaehler.JacobiZariski and related files are so slow is lean4#12102 .
Jovan Gerbscheid (Feb 03 2026 at 01:32):
I think I have found a minimization. In the following minimal example, the names of the universes determine whether rfl is immediate, or has to traverse the whole term.
module
abbrev N := 1000
class A (α : Type u) where
a : α → α
class B (α : Type u) extends A α
class C (α : Type u) extends A α
class X (α : Type u) (n : Nat) where
a : α → α
variable {α : Type u}
instance [X α N] : B α where a := X.a N
instance [X α N] : C α where a := X.a N
instance {n : Nat} [X α n] : X α (n + 1) where a := X.a n
instance : X α 0 where a x := x
set_option synthInstance.maxHeartbeats 1000000
set_option synthInstance.maxSize 10000
set_option maxRecDepth 10000
set_option trace.profiler.threshold 0
-- replace `Type t` with `Type v` to make the `rfl` proof 50x faster
example (α : Type u) (β : Type v) :
let inst' : B (α × (_ : Type _)) := inferInstance;
let inst : C (α × β) := inferInstance;
inst.a = inst'.a :=
set_option trace.profiler true in
rfl
Kim Morrison (Feb 04 2026 at 00:24):
@Jovan Gerbscheid, could you write something using #guard_msgs rather than the profiler?
Jovan Gerbscheid (Feb 05 2026 at 13:33):
Here is a version that throws an error (maximum recursion depth) or not depending on the universe level names. So it can serve as a test to check that both versions (Type t and Type v) succeed.
module
abbrev N := 1000
class A (α : Type u) where
a : α → α
class B (α : Type u) extends A α
class C (α : Type u) extends A α
class X (α : Type u) (n : Nat) where
a : α → α
variable {α : Type u}
instance [X α N] : B α where a := X.a N
instance [X α N] : C α where a := X.a N
instance {n : Nat} [X α n] : X α (n + 1) where a := X.a n
instance : X α 0 where a x := x
set_option synthInstance.maxHeartbeats 1000000
set_option synthInstance.maxSize 10000
example (α : Type u) (β : Type t) :
let inst' : B (α × (_ : Type _)) := inferInstance;
let inst : C (α × β) := inferInstance;
inst.a = inst'.a :=
rfl
Sébastien Gouëzel (Feb 05 2026 at 13:34):
Can you write it with #guard_msgs?
Jovan Gerbscheid (Feb 05 2026 at 13:35):
Here you go:
module
abbrev N := 1000
class A (α : Type u) where
a : α → α
class B (α : Type u) extends A α
class C (α : Type u) extends A α
class X (α : Type u) (n : Nat) where
a : α → α
variable {α : Type u}
instance [X α N] : B α where a := X.a N
instance [X α N] : C α where a := X.a N
instance {n : Nat} [X α n] : X α (n + 1) where a := X.a n
instance : X α 0 where a x := x
set_option synthInstance.maxHeartbeats 1000000
set_option synthInstance.maxSize 10000
/--
error: maximum recursion depth has been reached
use `set_option maxRecDepth <num>` to increase limit
use `set_option diagnostics true` to get diagnostic information
-/
#guard_msgs in
example (α : Type u) (β : Type t) :
let inst' : B (α × (_ : Type _)) := inferInstance;
let inst : C (α × β) := inferInstance;
inst.a = inst'.a :=
rfl
Kim Morrison (Feb 06 2026 at 00:40):
Best to use #guard_msgs for both the postive and negative cases to clearly show current behaviour:
module
abbrev N := 1000
class A (α : Type u) where
a : α → α
class B (α : Type u) extends A α
class C (α : Type u) extends A α
class X (α : Type u) (n : Nat) where
a : α → α
variable {α : Type u}
instance [X α N] : B α where a := X.a N
instance [X α N] : C α where a := X.a N
instance {n : Nat} [X α n] : X α (n + 1) where a := X.a n
instance : X α 0 where a x := x
set_option synthInstance.maxHeartbeats 1000000
set_option synthInstance.maxSize 10000
#guard_msgs in
example (α : Type u) (β : Type v) :
let inst' : B (α × (_ : Type _)) := inferInstance;
let inst : C (α × β) := inferInstance;
inst.a = inst'.a :=
rfl
/--
error: maximum recursion depth has been reached
use `set_option maxRecDepth <num>` to increase limit
use `set_option diagnostics true` to get diagnostic information
-/
#guard_msgs in
example (α : Type u) (β : Type t) :
let inst' : B (α × (_ : Type _)) := inferInstance;
let inst : C (α × β) := inferInstance;
inst.a = inst'.a :=
rfl
Last updated: Feb 28 2026 at 14:05 UTC