Zulip Chat Archive

Stream: mathlib4

Topic: mathlib4 speedcenter


Sebastian Ullrich (Mar 15 2023 at 13:12):

Mario Carneiro said:

ooh, this is really useful data. Can the port benchmark bot collect it?

Some of you may already know http://speedcenter.informatik.kit.edu/velcom, which we use to track performance regressions in the lean4 repo. I've now set up a mathlib4 version at http://speedcenter.informatik.kit.edu/mathlib4. It compiles every new commit on master once; we'll see how long that will scale. For now I've retroactively benchmarked a few old commits to get a sense of the evolution of build times. Unsurprisingly, lines go up, though you can also see the positive impact of lean4#2003 for which I specifically selected the adjacent commits. If there are any other interesting commits for which you want to see the specific deltas, I can add those to the queue as well. image.png

http://speedcenter.informatik.kit.edu/mathlib4/repo-detail/e7b27246-a3e6-496a-b552-ff4b45c7236e?zoomXStart=1667257200000&zoomXEnd=1678921200000&dimensions=build%3Aelaboration%3Awall-clock%3Ainterpretation%3A.olean%20serialization%3Acompilation%3Atypeclass%20inference%3Adsimp%3Ainitialization%3Aimport%3AC%20code%20generation%3Acompilation%20new%3Anorm_num%3Aring%3Aparsing%3Alinting%3Atask-clock%3Asimp&dayEquidistant=true

Sebastian Ullrich (Mar 15 2023 at 13:13):

Finally, as each commit is benchmarked exactly once, a bit of noise is to be expected

Eric Wieser (Mar 15 2023 at 13:17):

It might be interesting to have another version which scales the y axis to account for number of files / lines

Eric Wieser (Mar 15 2023 at 13:19):

Adding every commit that does a lean bump and the parent before it (as determined by the lake file) might be a natural set of old commits to add, of that's not already what you did.

Sebastian Ullrich (Mar 15 2023 at 13:20):

Eric Wieser said:

It might be interesting to have another version which scales the y axis to account for number of files / lines

Yes, but there is no such functionality right now; the whole software is held together more or less by duct tape ever since the student project it originated from finished. We could also add either of these as a separate metric, but since there is only one y axis, only metrics of the same order of magnitude are sensibly displayable in the same graph

Sebastian Ullrich (Mar 15 2023 at 13:23):

Eric Wieser said:

Adding every commit that does a lean bump and the parent before it (as determined by the lake file) might be a natural set of old commits to add, of that's not already what you did.

Ah right, I searched and added a few old "bump [...]" commits to the queue, but I should add the preceding commits as well

Gabriel Ebner (Mar 15 2023 at 17:42):

Is there any chance to get per-file measurements? Could we just add every file as a metric?

Sebastian Ullrich (Mar 15 2023 at 18:04):

I could see the software struggling with that many metrics even when not displaying all of them in the graph, it definitely isn't built for that. What kind of reports are you thinking of for this? commit-pair diffs sorted by relative delta is the one that comes to mind that should already be supported.

Sebastian Ullrich (Mar 15 2023 at 18:06):

I should probably mention that the server software is written in Java. I'm really trying not to touch it.

Gabriel Ebner (Mar 15 2023 at 18:18):

What kind of reports are you thinking of for this?

What I'm always afraid of with big changes is failing to notice local regressions. Like you make a change to mathlib that improves performance by 10%, but the same change makes one file take twice as long. (And it doesn't show up in the aggregate because it's just one file.)

So ideally the report I'd like would be: this commit made (unrelated) file XYZ 30% slower, you probably did something stupid.

Sebastian Ullrich (Mar 15 2023 at 20:28):

I suppose the low tech solution would be to name the metrics z $module, then they would be listed below the current metrics in the standard view, but you could sort by "Change %" to get the desired result

Sebastian Ullrich (Mar 15 2023 at 20:37):

Then the remaining questions I see are:

  • How exactly do we measure file times while reusing the current lake build -v invocation, as building twice would be too expensive? Perhaps we can inject a lean wrapper script into Lake's PATH that creates the appropriate output by wrapping the real lean in time. That would also obviate the need for injecting --profile into the lakefile like I do right now. The current setup for this is all in https://github.com/kha/mathlib4-bench btw; it might make sense to move some of this into the mathlib4 repo.
  • Will the UI really cope? I guess we'll just have to see what happens, though maybe after a database backup.

Sebastian Ullrich (Mar 15 2023 at 20:37):

By the way, we can also add other kinds of benchmarks such as the library_search test file (/cc @Scott Morrison). Preferably ones measured in seconds, not tens of minutes.

Sebastian Ullrich (Mar 16 2023 at 17:44):

Against all better judgment, I spent the day with frontend engineering and database wrangling. At least all these changes will be equally valuable to the lean4 instance.

  • New metrics "size .lean [lines]" and "size .olean [bytes]". The lines metric has been backfilled for all previously benchmarked commits.
  • The graph display will now use a second y-axis on the right-hand side for any metric not measured in seconds, which makes it possible to compare build times with lines, instructions, and maxrss. The latter is particularly interesting, apparently memory usage jumped from 2GB to 4GB at the beginning of Feburary, which must be due to a single dominating file image.png . Lines on the other hand scale pretty much 1:1 with CPU time so far. image.png
  • New button for normalized charts. Adding a selector for the metric to use for normalization is too complicated for me, so instead the metric with the highest value is used for each individual commit, which works well with e.g. normalizing by lines as long as we, hopefully, never exceed 1 second per line image.png.

Gabriel Ebner (Mar 16 2023 at 17:57):

metric with the highest value is used for each individual commit

I think maxrss is the biggest metric.

Sebastian Ullrich (Mar 16 2023 at 18:00):

Oh yes, it's the highest selected metric!

Gabriel Ebner (Mar 16 2023 at 18:01):

Yes I just found that out when plotting the normalized memory usage. :smile:

Sebastian Ullrich (Mar 16 2023 at 21:16):

I might give the per-file timings a try tomorrow. @Gabriel Ebner What metrics do you think should we include per file? I'm considering measuring only instructions in order to keep the list short and to make sure we're not drowned in regression reports purely from noise

Gabriel Ebner (Mar 16 2023 at 21:19):

That sounds good!

Gabriel Ebner (Mar 16 2023 at 21:21):

Another thing that would be cool to have is !bench on mathlib PRs.

Sebastian Ullrich (Mar 16 2023 at 21:23):

Like this? :) https://github.com/leanprover-community/mathlib4/pull/2906#issuecomment-1470105871

Sebastian Ullrich (Mar 16 2023 at 22:02):

One caveat is that the delta is computed relative to the PR base commit, but the PR benchmark run is placed at the head of the queue. So one should either base the PR on an already-benchmarked commit or wait with !bench until the base commit is at least being benchmarked. Otherwise no delta is posted in the comment.

Sebastian Ullrich (Mar 17 2023 at 15:34):

Sebastian Ullrich said:

apparently memory usage jumped from 2GB to 4GB at the beginning of Feburary, which must be due to a single dominating file

I bisected this down to !4#1744 in the speedcenter, local Lean agrees

$ lake env \time -v lean Mathlib/Data/Sign.lean
    Command being timed: "lean Mathlib/Data/Sign.lean"
    User time (seconds): 32.12
    System time (seconds): 0.77
    Percent of CPU this job got: 99%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:32.90
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 4131452  <- !!
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 0
    Minor (reclaiming a frame) page faults: 864188
    Voluntary context switches: 5
    Involuntary context switches: 169
    Swaps: 0
    File system inputs: 0
    File system outputs: 0
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 4096
    Exit status: 0

Mario Carneiro (Mar 17 2023 at 15:48):

more precisely, it seems to be the instance declaration:

set_option maxHeartbeats 0
-- Porting note: This takes too long, likely fixed by lean4#2003
/- We can define a `Field` instance on `SignType`, but it's not mathematically sensible,
so we only define the `CommGroupWithZero`. -/
instance : CommGroupWithZero SignType where
  zero := 0
  one := 1
  mul := (· * ·)
  inv := id
  mul_zero a := by cases a <;> rfl
  zero_mul a := by cases a <;> rfl
  mul_one a := by cases a <;> rfl
  one_mul a := by cases a <;> rfl
  mul_inv_cancel a ha := by cases a <;> trivial
  mul_comm a b := by cases a <;> cases b <;> rfl
  mul_assoc a b c := by cases a <;> cases b <;> cases c <;> rfl
  exists_pair_ne := 0, 1, by rintro _⟩⟩
  inv_zero := rfl

(there are two other instances which are also slow but not as bad as this one)

Mario Carneiro (Mar 17 2023 at 15:49):

the set_option maxHeartbeats 0 is very smelly

Mario Carneiro (Mar 17 2023 at 15:49):

and the porting note is over-optimistic

Sebastian Ullrich (Mar 17 2023 at 15:52):

Yes, please use --profile before blaming things on random Lean issues :) (though it's of course possible that the bottleneck shifted after that issue was resolved)

compilation of SignType.instCommGroupWithZeroSignType took 27.3s

This should be fixed with the new compiler, do people mark such declarations as noncomputablein the meantime?

    compilation 27.4s
    compilation new 42.9ms

Mario Carneiro (Mar 17 2023 at 16:15):

There is no need for that, further bisection reveals the issue is caused by the big case splits in mul_comm and mul_assoc, which should not be relevant for the compiler in the first place since they are proofs. !4#2957

Sebastian Ullrich (Mar 17 2023 at 16:17):

Great, basically what the new compiler does automatically!

Kyle Miller (Mar 17 2023 at 16:21):

This is decidably faster:

instance : CommGroupWithZero SignType where
  zero := 0
  one := 1
  mul := (· * ·)
  inv := id
  mul_zero := by decide
  zero_mul := by decide
  mul_one := by decide
  one_mul := by decide
  mul_inv_cancel := by decide
  mul_comm := by decide
  mul_assoc := by decide
  exists_pair_ne := 0, 1, by rintro _⟩⟩
  inv_zero := rfl

Kyle Miller (Mar 17 2023 at 16:26):

This version: tactic execution took 199ms, typeclass inference took 167ms

Previous version: compilation of SignType.instCommGroupWithZeroSignType took 34.7s

(I used set_option profiler true. I don't really comprehend what it's reporting, but I see <1s and >30s)

Mario Carneiro (Mar 17 2023 at 16:33):

I tried decide but it didn't work, I think the instance wasn't in scope?

Mario Carneiro (Mar 17 2023 at 16:34):

I guess it's just for the <= theorems that decide fails

Sebastian Ullrich (Mar 17 2023 at 17:47):

Sebastian Ullrich said:

I might give the per-file timings a try tomorrow

First results are in: http://speedcenter.informatik.kit.edu/mathlib4/run-detail/e7b27246-a3e6-496a-b552-ff4b45c7236e/50ba914cce63cd43429baf8cf1459ae5c2a03efa. Looks good, the only detected significant change is in a file the PR extended. And so far the software seems to cope with the onslaught of data.

Sebastian Ullrich (Mar 17 2023 at 17:48):

Interestingly it does not agree everywhere on whether "~" should come before or after the other metrics image.png

Kevin Buzzard (Mar 17 2023 at 19:16):

Sebastian thanks so much for putting in the effort to make this. This really seems to be a game changer in places.

Sebastian Ullrich (Mar 19 2023 at 10:08):

I re-benchmarked the last two lean4 bumps to get per-file deltas:
http://speedcenter.informatik.kit.edu/mathlib4/run-detail/e7b27246-a3e6-496a-b552-ff4b45c7236e/5bbbc25c541921a19dea0844c9370c4026d03141
http://speedcenter.informatik.kit.edu/mathlib4/run-detail/e7b27246-a3e6-496a-b552-ff4b45c7236e/9cbf797aa91c0a342c8ea232557937ac02adc2b5
Apparently both were almost regression-free!

Sebastian Ullrich (Mar 19 2023 at 10:09):

The "significant difference chips" at the top do kind of lose their meaning when a majority of measures has significant changes...

Gabriel Ebner (Mar 22 2023 at 22:09):

@Sebastian Ullrich Do you have any idea why this commit has such a huge impact? http://speedcenter.informatik.kit.edu/mathlib4/run-detail/2d483167-10ef-423a-9a05-76d18847a65e

Sebastian Ullrich (Mar 22 2023 at 22:12):

@Gabriel Ebner I ran it again (see very end), the changes vanished. I have no idea why this could happen, maybe the CPU was feeling cold?

Mauricio Collares (Mar 23 2023 at 04:40):

What's the base commit for the first comparison? The PR is "old", maybe it got benchmarked against a pre lean4#2151 base build for some reason.

Sebastian Ullrich (Mar 23 2023 at 08:25):

It's not a PR run, the base is the previous commit on master

Sebastian Ullrich (Apr 01 2023 at 08:34):

Another likely compiler edge case in !4#2977 http://speedcenter.informatik.kit.edu/mathlib4/run-detail/e7b27246-a3e6-496a-b552-ff4b45c7236e/fa99b8007c4294b295a0dbcf22859ca5a145e95b

Patrick Massot (Apr 01 2023 at 08:45):

This speedcenter seems really really useful (assuming it is at least roughly reliable).

Ruben Van de Velde (Apr 01 2023 at 08:51):

I think I looked at that pr and hit timeouts I couldn't make progress on, so seems plausible

Eric Wieser (Apr 01 2023 at 08:52):

Getting that PR over the finish line was a nightmare, so that doesn't surprise me at all

Sebastian Ullrich (Apr 01 2023 at 11:59):

Patrick Massot said:

This speedcenter seems really really useful (assuming it is at least roughly reliable).

Here is the delta for a trivial commit, giving an example of the involved noise: http://speedcenter.informatik.kit.edu/mathlib4/run-detail/95b40a5f-b1c0-4458-b97f-4acf81384714. Time measurements may fluctuate by one or two percent, or more if they are only a small part of the total time (which fortunately means that we are not particularly interested in them). Instructions is the most robust time-like measurement, which is why it's the only metric we use for individual files.

Sebastian Ullrich (Apr 03 2023 at 20:19):

In the porting meeting, @Mario Carneiro asked about where the sources for the speedcenter software and setup are

  • https://github.com/IPDSnelting/velcom is the server software as well as the runner that takes care of accepting benchmark requests, checking out the commit, and running the benchmark script
  • https://github.com/Kha/mathlib4-bench is the current benchmark script, which is where the actual metrics are encoded. I mentioned that I intend to move parts of this to mathlib4 like is already the case for the lean4 setup
  • https://github.com/parttimenerd/temci is used by the script to do the actual benchmarking, mostly because I'm not aware of other benchmarking tools that take a sufficiently expressive benchmarking suite config

If I mentioned duct tape above, here's more

Sebastian Ullrich (Apr 03 2023 at 20:49):

Also, both the server and the runner are KIT machines, which will be a bit suboptimal when I leave there in a little over a month

Sebastian Ullrich (May 16 2023 at 14:33):

Wall-clock build time vs lines of code
image.png
http://speedcenter.informatik.kit.edu/mathlib4/repo-detail/e7b27246-a3e6-496a-b552-ff4b45c7236e?zoomXStart=1667257200000&zoomXEnd=1684274400000&dimensions=size%3A.lean%3A%3Abuild%3Awall-clock&dayEquidistant=true

Kevin Buzzard (May 16 2023 at 17:31):

Wait we already have over 800000 lines of mathlib4 code? I think Lean 4 must somehow be more verbose than Lean 3. We only have 2/3 of mathlib which is something like 1.1M LOC right now.

Heather Macbeth (May 16 2023 at 17:39):

I think a lot of it is the #aligns

Yaël Dillies (May 16 2023 at 17:46):

A bunch of

lemma trivial_lemma1 : short_statement1 := easy_proof1
lemma trivial_lemma2 : short_statement2 := easy_proof2

sadly got turned into

theorem trivial_lemma1 :
    short_statement1 :=
  easy_proof1
#align trivial_lemma1 trivial_lemma1

theorem trivial_lemma2 :
    short_statement2 :=
  easy_proof2
#align trivial_lemma2 trivial_lemma2

which is five times more verbose!

Jeremy Tan (May 17 2023 at 00:14):

Is it just me or did mathlib4 become slower than before re-enabling global eta after we removed all the -instance lines and such?

Scott Morrison (May 17 2023 at 12:23):

Jeremy Tan said:

Is it just me or did mathlib4 become slower than before re-enabling global eta after we removed all the -instance lines and such?

You can look in the speedcenter! We can also watch the next post in port benchmark in a few hours.

Scott Morrison (May 17 2023 at 12:23):

It's certainly not impossible.

Matthew Ballard (May 17 2023 at 13:55):

It looks like the clean up PRs have landed in the speed center and it doesn't show a serious bump. My repeated rebuilds have been noticeably snappier :)

Scott Morrison (May 17 2023 at 22:11):

The benchmark bot disagrees, unfortunately: https://leanprover.zulipchat.com/#narrow/stream/287929-mathlib4/topic/port.20benchmark/near/359033805

Sebastian Ullrich (May 25 2023 at 09:44):

The benchmarking queue is now at 99 commits :tear: . Given that the running time has now exceeded one hour, I don't see it being emptied any time soon

Sebastian Ullrich (May 25 2023 at 09:46):

But since new commits are benchmarked first, we do still have trend data even if results and especially deltas for specific commits may be missing
image.png

Sebastian Ullrich (Jun 05 2023 at 14:49):

Typeclass inference is the only major component whose share of the build time has been slowly increasing, it is now at almost 50%
image.png

Mario Carneiro (Jun 05 2023 at 14:52):

Can we say yet whether tabled typeclass inference is significantly more effective than the lean 3 way?

Kevin Buzzard (Jun 05 2023 at 15:00):

It is weirdly ironic, because typeclass inference is really a part of the system which is completely trivial to the mathematician. Of course a normed field is a nonassoc-semiring and of course this is true in precisely one way. Part of me is still dumbfounded that there is any content here on the CS side needed to make it work, even though I've now read several papers which assure me that this is the case...

Gabriel Ebner (Jun 05 2023 at 17:32):

It would be interesting to know how much percent are spent on TC inference in Lean 3. I would expect it to be similarly high.

Kevin Buzzard (Jun 05 2023 at 17:46):

Is this percentage easy to compute though?

Jireh Loreaux (Jun 05 2023 at 18:40):

@Kevin Buzzard I think it's not so obvious in the presence of diamonds (and if we only had linear type class graphs, I think TC synthesis would be blazingly fast :laughing:). When you say things like "Of course a normed field is a nonassoc-semiring and of course this is true in precisely one way", I think what you are (or at least, I am) implicitly doing is something like "well, all that matters is the data fields and they're the same no matter what because there is no reason to change them as you add structure", the first part of which is essentially flat structures and proof irrelevance. The second part is of course borked if you choose to change one of the data fields to something propeq but not defeq, but even if not, Lean still has to check it, whereas a mathematician wouldn't because they generally only care about propeq anyway.

However, when you have diamonds, especially with things that aren't just forgetful functors, I think it's less obvious, even for mathematicians, especially when what you really care about is defeq instead of propeq. As evidence, I'll point two several of the non-defeq diamonds we've accidentally inserted over the years and then later had to remove (e.g., smul), or the time a few months ago when I wanted to add an instance that every -algebra is an -algebra, but I did it in a naive way and then ended up with non-defeq diamonds, whereas I thought everything was fine. In this case, I actually contemplated briefly the possibility of diamonds ahead of time, but thought I would be okay, and I wasn't (this situation never appeared in mathlib, only locally).

At the same time, it would be great if we could find a nice way to make this as easy for Lean as it seems to be for mathematicians. :smiley:

Sebastian Ullrich (Jun 05 2023 at 19:24):

Kevin Buzzard said:

Is this percentage easy to compute though?

Someone would need to record a build of mathlib3 with --profile and pass the output through https://github.com/leanprover/lean4/blob/master/tests/bench/accumulate_profile.py

Kevin Buzzard (Jun 05 2023 at 21:07):

The fact that propeq diamonds break the system is clearly a fault of the system and can't be used as an argument to justify that the problem is harder than I'm claiming it is. Yes I only care about propeq, of course, defeq is an artificial thing introduced by the computer scientists because they have issues with equality. Propeq is the only thing that matters in our work as mathematicians and the fact that we can't insert proofs into the typeclass inference system other than by artificial means such as bundling the topology into the definition of a metric space is something which is actually very weird but we've just forgotten how weird it is because we're now used to it.

Alex J. Best (Jun 16 2023 at 13:13):

Do we have a way of getting benchmark results on a file by file basis? I would like to see which files got slower after the port (and persistently are over a few recent runs, normalized by the fact that on the whole of mathlib things are faster).

Alex J. Best (Jun 16 2023 at 13:15):

I guess the data is there in speedcenter, I just don't know which buttons to click to show me, e.g. the 10 worst slowdown files (alternatively if I could download all the data and process it myself that could be fine)

Sebastian Ullrich (Jun 16 2023 at 20:24):

The speedcenter does not have any Lean 3 performance data. You should be able to adjust Scott's port benchmark script to your needs https://github.com/leanprover-community/mathlib4/blob/master/scripts/benchmark.sh

Alex J. Best (Jun 16 2023 at 20:36):

Is it possible to download the speedcenter data at least?

Alex J. Best (Jun 16 2023 at 20:37):

Maybe this could be combined with @Jannis Limpergs bot data for mathlib 3 (does that still run?)

Sebastian Ullrich (Jun 16 2023 at 20:53):

If you look at the server responses while loading the graph display, you should be able to retrieve some usable data from that

Jannis Limperg (Jun 17 2023 at 06:26):

My bot doesn't run at the moment because we were running out of money for the Azure VMs. Could be restarted if needed.

Sebastian Ullrich (Jun 22 2023 at 15:54):

And we're back http://speedcenter.informatik.kit.edu/mathlib4/repo-detail/e7b27246-a3e6-496a-b552-ff4b45c7236e?zoomXStart=1667260800000&dimensions=size%3A.lean%3A%3Abuild%3Awall-clock&dayEquidistant=true

Sebastian Ullrich (Jun 22 2023 at 15:54):

If you click on "Normalize", you can see we didn't get any slower per line at least in the meantime

Matthew Ballard (Jul 31 2023 at 23:12):

I asked this in the meetings once known as porting meetings but I think I need more info than is on the GitHub repo. If I want to setup a copy of the speed center on a local machine to track mathlib (as exact as the existing as possible), what do I need to know specifically?

Sebastian Ullrich (Aug 01 2023 at 07:50):

I suppose I will soon need to set up the speedcenter on new machines at which point I can make notes

Matthew Ballard (Aug 01 2023 at 10:59):

I would love to have the capacity where this could be part of the standard CI package.

Sebastian Ullrich (Aug 01 2023 at 12:03):

We'll need to work on making it less costly then, ideally we only measure files that actually have to be recompiled and then simulate overall timings from the aggregate

Matthew Ballard (Aug 07 2023 at 17:23):

I am getting the No worker registered :( error from the speed center. Is it just me?

Sebastian Ullrich (Aug 07 2023 at 17:56):

The machine seems to be misbehaving, my bad

Matthew Ballard (Aug 07 2023 at 17:57):

:robot:'s bad

Sebastian Ullrich (Aug 07 2023 at 17:57):

Unfortunately there's no-one around to force-reboot it at this time of the day

Matthew Ballard (Aug 07 2023 at 18:00):

I had an issue with my office machine and it wasn't addressed for months :)

Matthew Ballard (Aug 15 2023 at 13:10):

Is there a way to get the speedcenter to run on mathlib with a local toolchain, eg one from elan override?

Sebastian Ullrich (Aug 15 2023 at 13:13):

How would the toolchain get to the speedcenter server?

Matthew Ballard (Aug 15 2023 at 13:13):

Hope?

Eric Wieser (Aug 16 2023 at 09:20):

!bench seems to be doing nothing on https://github.com/leanprover-community/mathlib4/pull/6562; any idea why?

Matthew Ballard (Aug 16 2023 at 09:43):

Is there still no runner?

Sebastian Ullrich (Aug 16 2023 at 09:47):

Working on it!

Sebastian Ullrich (Aug 16 2023 at 16:05):

As just announced, the speedcenter is now running on a shiny new machine, with quite a remarkable improvement in speed and corresponding throughput: mathlib runs are now down from about 150 minutes to 30 minutes. I suggest to rebase any PRs before invoking !bench to make sure they are compared to a base commit run on the same machine.

Sebastian Ullrich (Aug 16 2023 at 16:10):

The old data is still there, so expect a significant jump in time-based graphs. It would be possible to down-scale the old values but as the relation is not perfectly linear across all time metrics, it is not clear to me whether that would be a good idea.
image.png

Mario Carneiro (Aug 16 2023 at 16:34):

what is the reason for the improvement? Is it more cores? I would assume most other metrics can't vary too much (clock speed can vary by 50% but it's mostly stable across the board these days, and I don't think we are RAM limited except insofar as it prevents additional parallel workers)

Sebastian Ullrich (Aug 16 2023 at 16:49):

Here is a comparison across the two machines (with some actual mathlib changes involved as well as seen in the file-level differences): http://speed.lean-fro.org/mathlib4/compare/e7b27246-a3e6-496a-b552-ff4b45c7236e/to/1946ae42-9a2d-41f2-98d6-225a79727430?hash1=3d6112b5c7d095d3088b359c611a5a2704c5dbdc
It's not just wall-clock (-83%) but also task-clock (-48%). At the very end the two CPUs are listed, they basically play in different leagues. But yes, RAM is basically irrelevant.

Sebastian Ullrich (Aug 16 2023 at 16:52):

The old machine also was optimized for consistency, not throughput: boosting and hyperthreading were disabled and one core was reserved for the benchmark program itself. This did not turn out to help much on the new machine.

Kevin Buzzard (Aug 16 2023 at 17:06):

Thank you so much for this! The speedcenter was really essential for me recently when trying to work out which changes in a PR were actually important

Matthew Ballard (Sep 05 2023 at 13:41):

It looks like the CI has been failing since #6490

Mauricio Collares (Sep 05 2023 at 13:46):

Disk space problems too? It fails with a Nix error trying to fetch a flake input (failed to extract archive (truncated gzip input))

Sebastian Ullrich (Sep 05 2023 at 16:25):

Disk is fine, I assumed it was the GitHub outage. But that is supposed to be over and it's still failing.

Matthew Ballard (Sep 05 2023 at 16:47):

I re-benched an old(er) PR and it failed with the analogous error message

Sebastian Ullrich (Sep 06 2023 at 07:36):

And we're back. I had to clear a cache busted from the outage.

Matthew Ballard (Sep 15 2023 at 15:19):

I assume this is not correct? Because otherwise it is wonderful (and I haven’t been paying attention).

Eric Wieser (Sep 15 2023 at 15:36):

That's consistent with github, see https://github.com/leanprover-community/mathlib4/actions/runs/6198949800/job/16830464511#step:18:24

Eric Wieser (Sep 15 2023 at 15:36):

@Scott Morrison, are we sure that the latest release hasn't broken the linter in some way? 6s seems way to fast to be actually running the lint

Matthew Ballard (Sep 15 2023 at 15:38):

I get the same behavior locally

Matthew Ballard (Sep 15 2023 at 15:39):

There is no lake exe runLinter -v ?

Matthew Ballard (Sep 15 2023 at 15:48):

Hmm. Very limited testing and I can’t get simpNF to fire with #lint. Can others confirm or refute? I have to teach momentarily

Eric Wieser (Sep 15 2023 at 16:06):

Do we have an example that is supposed to fire simpNF?

Eric Wieser (Sep 15 2023 at 16:23):

#lint works for me, but it seems runLinter doesn't

Eric Wieser (Sep 15 2023 at 16:25):

I think this is a lake issue: runLinter now refers to ./lake-packages/std/build/bin/runLinter not the runLinter in mathlib4

Matthew Ballard (Sep 15 2023 at 16:58):

I just copied a theorem with the simp attribute, changed the name, and added the simp attribute to the copy. I assumed it would be caught

Matthew Ballard (Sep 15 2023 at 18:13):

I guess it’s working again :)

Eric Wieser (Sep 19 2023 at 22:26):

http://speed.lean-fro.org/mathlib4/run-detail/e7b27246-a3e6-496a-b552-ff4b45c7236e/d9e7e897cb8ac01e1d7ac3edf2a7308039370fec just resulted in an 18% increase in olean serialization for what seems like a fairly innocent file

Matthew Ballard (Sep 19 2023 at 22:45):

#7103 ?

Eric Wieser (Sep 20 2023 at 11:34):

Would it be possible to have the speedcenter show thecompilation metric on a per-file basis, @Sebastian Ullrich? The above somewhat suggests that we could have one or two more files that represent a majority of the memory usage

Sebastien Gouezel (Sep 20 2023 at 11:38):

Is there something like a noncomputable! file to make sure that Lean does not try to compile any definition in the file? I think it would make sense to have this in many files (maybe even most files, or all files outside the Data folder)

Eric Wieser (Sep 20 2023 at 11:40):

I assume we're running into a bug here (that hopefully will eventually be diagnosed and fixed), so adding language features to work around it feels like overkill

Sebastian Ullrich (Sep 20 2023 at 11:40):

Eric Wieser said:

Would it be possible to have the speedcenter show thecompilation metric on a per-file basis, Sebastian Ullrich? The above somewhat suggests that we could have one or two more files that represent a majority of the memory usage

We are already stressing the server software to its limits with the current number of metrics, measuring anything else for each file might break it for good. It would probably be better for someone to measure that locally as a once-off/monthly/... run

Eric Wieser (Sep 20 2023 at 11:41):

Alternatively, do you think the compilation time for a declaration could be stored in the info tree or similar, and then we could print a leaderboard in the lint step of CI?

Sebastian Ullrich (Sep 20 2023 at 11:42):

Doesn't --profile plus some bash mangling give you that?

Eric Wieser (Sep 20 2023 at 11:42):

Is that an argument to lake or just lean?

Jireh Loreaux (Sep 20 2023 at 11:43):

Eric, wouldn't that invalidate cache traces?

Sebastian Ullrich (Sep 20 2023 at 11:43):

Lake knows about some lean options that don't affect the trace but I forgot the specifics

Jireh Loreaux (Sep 20 2023 at 11:43):

(I'm not sure, I'm asking)

Eric Wieser (Sep 20 2023 at 11:44):

Yes, there's a weakLeanArgs flag to pass arguments that claim to not affect the oleans

Sebastian Ullrich (Sep 20 2023 at 11:44):

By the way, something about the maxrss metric I recently learned that I may have misrepresented before: it is not the sum of each lean process' maximum resident set running in parallel but the maximum of maxima. I.e. there is (at least) one Lean file somewhere in mathlib that uses the shown amount of memory by itself.

Matthew Ballard (Sep 20 2023 at 11:45):

That makes much more sense!

Sebastian Ullrich (Sep 20 2023 at 11:46):

Does it :) ? It's not the metric I hoped for but the one Linux provides

Eric Wieser (Sep 20 2023 at 11:46):

Is it possible to find out which file achieves that maximum?

Sebastian Ullrich (Sep 20 2023 at 11:46):

Not without measuring yourself

Matthew Ballard (Sep 20 2023 at 11:47):

Things weren’t tracking together before. Compilation would jump in a file but max rss wouldn’t budge.

Matthew Ballard (Sep 20 2023 at 11:48):

Or profiling an individual file would show a huge max rss jump but the speed center would report nothing

Matthew Ballard (Oct 05 2023 at 12:56):

Is this from bumping aesop?

Matthew Ballard (Oct 05 2023 at 12:58):

@Jannis Limperg

Kevin Buzzard (Oct 05 2023 at 13:36):

Lol @Patrick Massot I didn't know you'd also jumped on the "make mathlib faster" bandwagon

Patrick Massot (Oct 05 2023 at 14:02):

I didn't know either, but I'm proud of this achievement. I didn't intend to bump aesop in this PR, but I bumped proofwidgets four times and I guess one of them was running lake update instead of lake update proofwidgets.

Jannis Limperg (Oct 05 2023 at 16:28):

Not sure whether this is Aesop-related. I didn't do any intentional optimisation. But I'll be happy to claim it (jointly with Patrick of course).

Sebastian Ullrich (Oct 05 2023 at 16:35):

The -54% aesop seem to suggest it is Aesop related :)

Matthew Ballard (Oct 05 2023 at 16:36):

The libraries are ones that make heavy use also, eg Category Theory

Jannis Limperg (Oct 05 2023 at 19:55):

I'm really confused. There was absolutely nothing in the bump that I can imagine being responsible for a 50% speedup. But the numbers haven't gone up again afterwards, so I guess it wasn't a fluke.

Matthew Ballard (Oct 05 2023 at 20:10):

Current trend

Kevin Buzzard (Oct 05 2023 at 20:11):

It kind of seems a shame that a big speedup has happened and nobody can put their finger on what caused it. Understanding it better might make an even bigger speedup possible.

Matthew Ballard (Oct 05 2023 at 20:12):

Link

Matthew Ballard (Oct 05 2023 at 20:12):

All progress starts with confusion

Matthew Ballard (Oct 05 2023 at 20:14):

Between suppress compliation, cleaning up instances, and lean4#2478, it will be going down a good deal more

Chris Hughes (Oct 06 2023 at 07:37):

Was it perhaps the previous commit that changed something to do with CI, but it only showed up on the following commit or something? It seems a little unlikely given what the PR was, but it's a possibility.

Mauricio Collares (Oct 06 2023 at 09:03):

Jannis Limperg said:

I'm really confused. There was absolutely nothing in the bump that I can imagine being responsible for a 50% speedup. But the numbers haven't gone up again afterwards, so I guess it wasn't a fluke.

Could https://github.com/JLimperg/aesop/commit/f4fa2b5a0b76de43ecf490ff0916f8a196ed5ab2 be the source of this improvement?

Jannis Limperg (Oct 06 2023 at 09:11):

Unlikely because that function is only called when erasing rules, which Aesop calls in mathlib rarely do.

Jannis Limperg (Oct 06 2023 at 09:17):

If I wanted to invest some time into investigating which commit was responsible for the speedup, how would I do that? I guess I could create Aesop branches with some of the relevant commits reverted, and then a mathlib branch that tries out the different Aesop variants. Then make a PR and !bench that.

Joachim Breitner (Oct 06 2023 at 09:53):

Do you know about git bisect? An almost magical tool for finding which commit causes something.

Mauricio Collares (Oct 06 2023 at 09:56):

I have never tried running https://github.com/leanprover-community/mathlib4/blob/master/scripts/bench/run locally, but maybe it is possible to get instruction counts with it.

Jannis Limperg (Oct 06 2023 at 10:03):

Joachim Breitner said:

Do you know about git bisect? An almost magical tool for finding which commit causes something.

Yes, though I never remember how exactly to use it. Perhaps it's time for another try.

Sebastian Ullrich (Oct 11 2023 at 15:04):

I deployed a quick hack that formats mathlib per-file instruction differences as absolute numbers with sensible suffixes, and uses 1 billion instructions as a cut-off for significance (edit: moved up to 10B because of noise)
image.png

Sebastian Ullrich (Oct 11 2023 at 15:05):

Very rough guideline: 10 billion instructions = ~1s

Jireh Loreaux (Oct 11 2023 at 15:55):

So that means any file with a ~0.1 second difference in compile time will appear in the list of significant changes?

Sebastian Ullrich (Oct 11 2023 at 16:00):

Yes. Does it sound reasonable?

Jireh Loreaux (Oct 11 2023 at 16:03):

My guess is that it's far too sensititive. Even simple files can take > 2 seconds, so this is well below the previous 5% cutoff. And for the awful files which take several minutes to compile, they will be marked significant ~ every time.

Jireh Loreaux (Oct 11 2023 at 16:03):

I think what would make more sense is to still use the 5% cutoff, but display absolute instructions.

Sebastian Ullrich (Oct 11 2023 at 16:05):

The percentage cutoff was awkward for very small files. It sounds like we might want the conjunction of both.

Sebastian Ullrich (Oct 11 2023 at 16:06):

Though "marked significant ~ every time" is clearly untrue if you look at the current page

Jireh Loreaux (Oct 11 2023 at 16:06):

Well, yes, it was awkward for very small files, but by displaying the absolute instructions, it becomes clear that the file must have been small.

Jireh Loreaux (Oct 11 2023 at 16:07):

(I stand corrected about my claim.)

Jireh Loreaux (Oct 11 2023 at 16:08):

Perhaps my feeling about sensitivity is just completely wrong then and should be ignored.

Sebastian Ullrich (Oct 11 2023 at 16:08):

I'd say let's give the current version a try first

Sebastian Ullrich (Oct 12 2023 at 07:23):

I've bumped the cutoff to 10B because we have some files with surprisingly high deviation. There's always the full table for details.

Mario Carneiro (Oct 12 2023 at 08:06):

Hey I just had an idea for anyone who likes to do data analysis: If we treat the time-series of runtimes of file i across all commits as an n-dimensional vector (one dim per commit), we may be able to find clusters in the resulting data, corresponding to files that stress different parts of the system. We probably don't need all commits, it is probably sufficient to capture some representative noise spread as well as any major perf changes in lean. If we can find good clusters, that might be one way to significantly reduce the number of benchmark variables, which would help make it more informative compared to the current wall of numbers with unclear significance

Matthew Ballard (Oct 12 2023 at 13:06):

Something is up because http://speed.lean-fro.org/mathlib4/compare/1367ad5c-c959-46b8-8475-15d3d1210f0e/to/27b72717-b7f2-4755-b932-9fdc34a159d4 doesn't report the improvements in individual files

Personally, I don't think knowing that CategoryTheory.Abelian.Projective changed 0.4% is useful information. I care if big files (measured via CPU instructions) move a reasonable amount and smaller files have dramatic changes. And percentages are easier to consume than raw instructions. My two :coin:

Sebastian Ullrich (Oct 12 2023 at 13:07):

You may need to Ctrl+F5 if you don't see individual green files

Matthew Ballard (Oct 12 2023 at 13:29):

The GitHub comment suffered from this issue also

Matthew Ballard (Oct 16 2023 at 13:41):

@Scott Morrison promised me 15%. I want my money back. :lol:

But seriously, this is excellent!

Yuyang Zhao (Nov 17 2023 at 10:26):

It doesn't work after bumping to v4.3.0-rc2.

Mauricio Collares (Nov 17 2023 at 10:30):

That's because build/ is now .lake/build/

Mauricio Collares (Nov 17 2023 at 10:32):

Here, I think: https://github.com/leanprover-community/mathlib4/blob/master/scripts/bench/temci-config.run.yml#L39

Sebastian Ullrich (Nov 17 2023 at 10:41):

Yes, please PR!

Mauricio Collares (Nov 17 2023 at 10:44):

Untested because I don't know how: #8467

Mauricio Collares (Nov 17 2023 at 10:44):

I guess I can just run !bench

Mauricio Collares (Nov 17 2023 at 11:13):

By the way, this is the bump comparison (-2% build instructions, -3.7% build wall-clock): http://speed.lean-fro.org/mathlib4/compare/b02f639c-050f-43ce-88b1-019aa70ab371/to/263f3a9d-06a8-40ab-9b11-123857b983d9

Sebastian Ullrich (Nov 17 2023 at 12:26):

Could we expedite this PR? @Scott Morrison

Mauricio Collares (Nov 17 2023 at 16:18):

The runner is unavailable again :(

Sebastian Ullrich (Nov 17 2023 at 16:58):

That one was deliberate, it's back now

Sebastian Ullrich (Nov 17 2023 at 17:00):

I've backdated the fix so it shows the correct per-commit timings: http://speed.lean-fro.org/mathlib4/run-detail/5ae134c7-b878-4420-9bd9-359862465a32. The other commits between bump and fix are requeued now.

Matthew Ballard (Dec 14 2023 at 15:10):

open Mathlib -- wall-clock seems to be a lot of noise in the Recent Significant Runs

Sebastian Ullrich (Dec 14 2023 at 15:16):

Not after #8891

Matthew Ballard (Dec 14 2023 at 15:22):

Great.


Last updated: Dec 20 2023 at 11:08 UTC