Zulip Chat Archive

Stream: mathlib4

Topic: !bench failed?

David Loeffler (Oct 30 2024 at 19:50):

I tried to run the !bench speed test on #18385 and it failed with a weird error that seems totally unrelated to the PR at hand:

Mathlib.lean:1:0: error: unknown module prefix 'Batteries'

Did I do something wrong, or is the speed test system broken somehow?

The benchmark log is here.

Floris van Doorn (Oct 30 2024 at 19:54):

@Sebastian Ullrich might have an idea what could cause that.

Michael Stoll (Oct 30 2024 at 21:29):

I just did a !bench run on #12778, and it worked fine.

David Loeffler (Oct 30 2024 at 21:34):

Looks like my PR branch was quite old, I'll merge master and see if that fixes it

Floris van Doorn (Oct 30 2024 at 22:45):

I think that is relevant, yes.

Floris van Doorn (Oct 30 2024 at 22:46):

And bench has indeed succeeded on the second try.

Floris van Doorn (Oct 30 2024 at 22:48):

I think you can ignore the first 2 "build" improvements as noise, but the other 2 seem to be very nice (non-noise) improvements!

David Loeffler (Oct 31 2024 at 06:41):

Floris van Doorn said:

I think you can ignore the first 2 "build" improvements as noise, but the other 2 seem to be very nice (non-noise) improvements!

Indeed, I'm startled by the extent of the speedup; but it is tinged with regret, because I suspect most of that 71.6% decrease in NumberTheory.ModularForms.SlashActions comes from the replacement of a single extremely slow field_simp step with a longer and more explicit manual argument. It would be nice if we could get speedups by improving the automation, rather than removing it, but that's not a task I'm competent to take on.

Kim Morrison (Oct 31 2024 at 07:13):

field_simp is known to be pretty borked at the moment.

David Loeffler (Oct 31 2024 at 08:19):

Kim Morrison said:

field_simp is known to be pretty borked at the moment.

I'm amused to see you in particular posting this, Kim; since I asked about the same issue on zulip before, 6 months ago, here and at the time, you wrote

No one should be removing uses of field_simp because they are slow. We need to fix field_simp!

Is there any realistic prospect of field_simp getting un-borked again?

Kim Morrison (Oct 31 2024 at 08:37):

I guess that was my failed attempt to get someone to fix it...

Kim Morrison (Oct 31 2024 at 08:38):

Having a clear spec of what it is meant to do might help. The current implementation is "try stuff, and keep trying if it's not going well".

Michael Rothgang (Oct 31 2024 at 09:41):

My medium-term list of "tactic projects to try" includes looking at field_simp. Having a clear description what it should do, could do and currently does (not) do helps. This thread was my attempt to collect such information. Please add missing bits there!

Yury G. Kudryashov (Nov 05 2024 at 06:31):

In #18508, I left a !bench comment, it got a :rocket: emoji but no results were posted.

Sebastian Ullrich (Nov 05 2024 at 14:59):

Unfortunately the growth of Mathlib has brought the speedcenter software to its limits. I restarted it once more but it's not clear who could implement a more robust fix.

Johan Commelin (Nov 05 2024 at 15:16):

I'm not sure if I should be happy or sad about your message.

Mario Carneiro (Nov 05 2024 at 15:45):

There are a few aspects of our use of the speedcenter that seem a bit abusive and which we could maybe scale back. The ones that come to mind are: running it on every single commit (so it is often fighting a long backlog) and having one benchmark metric per file (resulting in quadratic growth of metric values on the interface over time). The per file benchmark in particular has long been something which makes it difficult for me to read benchmark results, I would much rather this was more selective and only tracked files that are "interesting" in some sense (largest file, longest compile, most typeclasses), maybe with a few random samples sprinkled in

Yaël Dillies (Nov 05 2024 at 16:10):

I think the filewise metrics are useful when benching PRs? Maybe not so useful when benching master though

Sebastian Ullrich (Nov 05 2024 at 16:14):

I don't think that amount of data should necessarily make the server run out of memory. But if we can't fix the software, reducing the data certainly would be an alternative. It might be interesting to look at a histogram of the per-file times to see whether there is some cut-off with many files below it that are fast enough not to be that interesting for the diff unless they regress above the cut-off.

Mario Carneiro (Nov 05 2024 at 19:17):

what exactly went wrong? From your description I guess some kind of implementation limit or resource was exhausted, but not more than that

Mario Carneiro (Nov 05 2024 at 19:18):

Is it possible to test the speedcenter locally? Is the code hosted publicly?

Marc Huisinga (Nov 05 2024 at 21:02):

Mario Carneiro said:

Is it possible to test the speedcenter locally? Is the code hosted publicly?

https://github.com/IPDSnelting/velcom

Mario Carneiro (Nov 05 2024 at 21:03):

I thought that lean was running a fork of this repo

Sebastian Ullrich (Nov 06 2024 at 09:18):

It isn't, see my Mathlib-specific fixes on master

Sebastian Ullrich (Nov 06 2024 at 09:23):

Mario Carneiro said:

what exactly went wrong? From your description I guess some kind of implementation limit or resource was exhausted, but not more than that

Java ran out of memory. It looks like it defaults to 1/4 of physical RAM so doubling that setting may or may not give us some years of breathing room. But it seems problematic that it balloons to 16GB in the first place. The worst part is that only some threads abort and not the whole process, otherwise we could have systemd restart it.

Sebastian Ullrich (Nov 06 2024 at 09:25):

Mario Carneiro said:

Is it possible to test the speedcenter locally?

Yes, you can even point your local build at the live instance's API endpoint for read-only operations. See .env.development-remote.

Last updated: May 02 2025 at 03:31 UTC