Zulip Chat Archive
Stream: mathlib4
Topic: Benchmark results
Kevin Buzzard (Nov 17 2025 at 10:36):
Tucked away in #rss is a topic #rss > Benchmark results for mathlib4 . As far as I can see it is mostly obsessives such as myself who are reading that topic. It used to be very noisy (there was a lot of false postivesi) but more recently it seems to have stabilised. It is still noisy; many of the posts there are things of the form "this PR that added a bunch of results in file X making it 20% larger seems to have made the file X take 20% longer to compile".
However there are posts which are not of this form. For example this post was noticed by me obsessively reading the topic, resulting in this question which led to #30734 which led to a nontrivial speedup of mathlib.
I just caught up on reading this topic today, and here are a couple of posts which caught my eye:
1) this one reports that #30130 (adding results about free groups) sped up Mathlib.RingTheory.Kaehler.Polynomial (a problematic file) by 7% and I have no coherent explanation for this. Is it real?
2) And this one reports that #27270 (turning WithLp into a structure, a change which was never benchmarked) had a catastophic effect on various number theory files and in fact had a gigantic effect on mathlib overall, making some files 15% faster and others 100% slower. Here is the speedcenter output.
My impression is that nobody is really looking at these benchmark results but that they're telling a very interesting story. Should the community somehow be paying more attention to them?
Damiano Testa (Nov 17 2025 at 10:53):
I had stopped looking at the benchmark results, since I found them hard to interpret and would not trust them. I realise that they may be more reliable than I think, but I still find it hard to sift out the fluff from the interesting parts.
With respect to #30130, the only change that may reasonably have non local effects seems to be the weakening of some typeclass assumptions from MulAction to SMul in the definition of orbit. I do not know if/how this ripples through to Mathlib.RingTheory.Kaehler.Polynomial. The PR initially had to add some hints to avoid a slowdown (see https://github.com/leanprover-community/mathlib4/pull/30130#discussion_r2450501580). However, in the course of reviewing/modifying the file, this initial slowdown disappeared. I wonder if this has something to do with what you are observing.
Jovan Gerbscheid (Nov 17 2025 at 11:16):
The way you can investigate such a diff, is by having the original and modified mathlib running side-by-side. Then use set_option trace.profiler true and set_option trace.profiler.useHeartbeats true. Then you should find a declaration where the heartbeats differ. If you click through this trace on both versions side-by-side, you can eventually pinpoint which exact operation is slower on one version compared to the other version.
Kevin Buzzard (Nov 17 2025 at 11:22):
Damiano Testa said:
I had stopped looking at the benchmark results, since I found them hard to interpret and would not trust them.
The crazy oscillation phenomenon in the thread (random PR made Mathlib.Logic.Defs faster by 6%, and the next PR then made it slower by 6%) is now fixed, and the results are far more meaningful.
Joscha Mennicken (Nov 17 2025 at 18:37):
Hopefully the radar significance detection is a bit less noisy than the speedcenter one. If you notice any commits that should/shouldn't be marked as significant but aren't/are (or anything else wrong for that matter), please let me know. See also for more information on radar.
Kevin Buzzard (Nov 17 2025 at 19:00):
Oh so is that what's powering the thread?
Joscha Mennicken (Nov 17 2025 at 19:08):
#rss > Benchmark results for mathlib4 is run by a bot that uses the current speedcenter, not radar. Radar will get its own thread for now.
Jireh Loreaux (Nov 17 2025 at 20:19):
In hindsight I should not have merged #27270 without benchmarking, but I didn't because it honestly didn't occur to me that improving the abstraction boundary would make things worse. The changes to the Lp files themselves isn't too surprising, as there was some material added there, but I don't have a good explanation for the slowdowns in NumberField or CStarMatrix
Matthew Ballard (Nov 17 2025 at 20:24):
I think any slowdown when more clearly delineating abstraction boundaries is more likely to be a symptom of some other problem.
There needs to be better tooling for extracting information on unification and instance synthesis changes between two commits. It is way too manual currently.
Kevin Buzzard (Nov 17 2025 at 21:27):
I think you're probably right that "it's not WithLp's fault". I have an hour to kill, I'll have a look.
Kevin Buzzard (Nov 17 2025 at 22:08):
The traces are almost entirely the same! Here's a striking difference though:
Before:
[Meta.synthInstance] [401489.000000] ✅️ NormedSpace ℝ (mixedSpace K) ▶
After:
[Meta.synthInstance] [9177522.000000] ✅️ NormedSpace ℝ (mixedSpace K) ▶
Kevin Buzzard (Nov 17 2025 at 22:10):
with the culprit being
[] [4379715.000000] ❌️ apply PiLp.normedSpaceSeminormedAddCommGroupToPi to NormedSpace ℝ ({ w // w.IsReal } → ℝ) ▶
Jireh Loreaux (Nov 17 2025 at 22:11):
I think these should be abbrevs.
Kevin Buzzard (Nov 17 2025 at 22:12):
mixedSpace is already an abbrev:
abbrev mixedSpace :=
({w : InfinitePlace K // IsReal w} → ℝ) × ({w : InfinitePlace K // IsComplex w} → ℂ)
Jireh Loreaux (Nov 17 2025 at 22:12):
No, I mean the Lp stuff is wrong
Jireh Loreaux (Nov 17 2025 at 22:13):
docs#PiLp.normedSpaceSeminormedAddCommGroupToPi is an instance and it should be an abbrev.
Jireh Loreaux (Nov 17 2025 at 22:14):
I think I didn't catch it during review because the underlying norm structure docs#seminormedAddCommGroupToPi is an abbrev already.
Jireh Loreaux (Nov 17 2025 at 22:14):
So this should be a relatively easy fix. I'll make a PR when I get home.
Jireh Loreaux (Nov 17 2025 at 22:14):
Thanks for sleuthing!
Kevin Buzzard (Nov 17 2025 at 22:24):
No problem -- I like to help out! Here's a self-contained repro:
import Mathlib.NumberTheory.NumberField.CanonicalEmbedding.ConvexBody
variable (K : Type*) [Field K] [NumberField K]
set_option trace.profiler true in
set_option trace.profiler.useHeartbeats true in
open scoped Classical in
#synth NormedSpace ℝ (NumberField.mixedEmbedding.mixedSpace K)
--- 222842 before #27270, 9007374 after.
Kevin Buzzard (Nov 17 2025 at 22:27):
The point of this thread wasn't really supposed to be "oops did we just break something", it was rather supposed to be "can we think of a way to ensure breakage is spotted earlier". I was very surprised about this, not least because WithLp was not even mentioned in the number theory files which were suddenly slower. The docs#NumberField.mixedEmbedding.mixedSpace construction is key to understanding the arithmetic of (or more precisely the integers and units of) a number field, but it's just a product of finitely many copies of and .
Jireh Loreaux (Nov 17 2025 at 22:29):
aha! And actually some of the lemmas in these are actually secretly about the wrong norm too because there was no attribute [-instance] around for the existing norm. So the WithLp PR really did break stuff. We just didn't find it yet because those new declarations aren't used widely enough at this point.
Jireh Loreaux (Nov 17 2025 at 22:31):
I think the solution will be to use the new radar benchmarking tool (to Kevin's question). With better data analysis, we should be able to spot random spikes more easily.
Yaël Dillies (Nov 17 2025 at 22:38):
Was the error actually introduced by the WithLp refactor? (on phone, can't check myself)
Riccardo Brasca (Nov 17 2025 at 22:45):
Jireh Loreaux said:
aha! And actually some of the lemmas in these are actually secretly about the wrong norm too because there was no
attribute [-instance]around for the existing norm. So theWithLpPR really did break stuff. We just didn't find it yet because those new declarations aren't used widely enough at this point.
Really? I see that in the Lp stuff there is Fact (1 ≤ p): how this has been found in the number theory file?
Kevin Buzzard (Nov 17 2025 at 22:51):
As for the other example (where a random file randomly got quicker to the extent that it was reported), @Jovan Gerbscheid 's suggestion (trace profiler with heartbeats) gives this enigmatic result:
[Elab.async] [50222322.000000] Lean.addDecl ▼
[Kernel] [50222250.000000] ✅️ typechecking declarations [KaehlerDifferential.mvPolynomialEquiv._proof_7]
[Elab.async] [40944804.000000] Lean.addDecl ▼
[Kernel] [40944732.000000] ✅️ typechecking declarations [KaehlerDifferential.mvPolynomialEquiv._proof_8]
[Elab.async] [3087686.000000] Lean.addDecl ▼
[Kernel] [3087588.000000] ✅️ typechecking declarations [KaehlerDifferential.mvPolynomialEquiv]
became
[Elab.async] [41912857.000000] Lean.addDecl ▼
[Kernel] [41912785.000000] ✅️ typechecking declarations [KaehlerDifferential.mvPolynomialEquiv._proof_7]
[Elab.async] [34854634.000000] Lean.addDecl ▼
[Kernel] [34854562.000000] ✅️ typechecking declarations [KaehlerDifferential.mvPolynomialEquiv._proof_8]
[Elab.async] [2802898.000000] Lean.addDecl ▼
[Kernel] [2802800.000000] ✅️ typechecking declarations [KaehlerDifferential.mvPolynomialEquiv]
so typechecking some random things that I'm not even sure what they are became 10-20% faster. This is the first declaration docs#KaehlerDifferential.mvPolynomialEquiv in the file Mathlib/RingTheory/Kaehler/Polynomial.lean. This declaration takes a long time to compile (before and after the change). Note that you do not even see this with set_option trace.profiler true in in front of the definition, because it seems that the Lean.addDecl times for these _proof_7 things are not reported if you try that. It was only because I used set_option trace.profiler true that I could see them. The time taken to elaborate KaehlerDifferential.mvPolynomialEquiv itself is roughly unchanged (in fact after the PR it is consistently slightly slower, at 133448340 vs 133304053 beforehand).
As far as I can see, those are the only differences in the files.
Jireh Loreaux (Nov 18 2025 at 00:14):
Yaël Dillies said:
Was the error actually introduced by the
WithLprefactor? (on phone, can't check myself)
Yes, it's in the new declarations which allow for transferring the pi/prod normed structure to a type synonym.
Jireh Loreaux (Nov 18 2025 at 00:16):
Riccardo Brasca said:
Really? I see that in the Lp stuff there is
Fact (1 ≤ p): how this has been found in the number theory file?
Some declarations were marked as instance when they shouldn't have been causing Lean to perform much more unification in some number theory files, which caused the slowdown noticed by Kevin.
Joscha Mennicken (Nov 18 2025 at 17:17):
Joscha Mennicken said:
#rss > Benchmark results for mathlib4 is run by a bot that uses the current speedcenter, not radar. Radar will get its own thread for now.
The first results are in: (the formatting is a bit verbose though)
Jireh Loreaux (Nov 19 2025 at 18:56):
Jireh Loreaux said:
So this should be a relatively easy fix. I'll make a PR when I get home.
when I get home = now. #31819. There may be some breakage I have to fix from the situations that used these synonyms. We'll see.
Moritz Doll (Nov 20 2025 at 05:17):
#31819 looks pretty good :racecar:
Jireh Loreaux (Nov 20 2025 at 17:44):
yes, unfortunately I think it's only undoing previous damage, not improving the situation from before the WithLp refactor.
Last updated: Dec 20 2025 at 21:32 UTC