Zulip Chat Archive

Stream: general

Topic: speed-up project returns


Kenny Lau (May 13 2020 at 15:05):

the current mathlib compile time is 4 hours, so it's time for the speed-up project to return. current statistics:

645.0 src/analysis/special_functions/trigonometric.lean
472.0 src/data/polynomial.lean
382.0 src/data/multiset.lean
367.0 src/analysis/special_functions/pow.lean
300.0 src/data/mv_polynomial.lean
290.0 src/category_theory/limits/shapes/binary_products.lean
284.0 src/analysis/normed_space/multilinear.lean
270.0 src/topology/metric_space/hausdorff_distance.lean
268.0 src/linear_algebra/basic.lean
254.0 src/data/dfinsupp.lean
240.0 src/analysis/calculus/deriv.lean
234.0 src/geometry/manifold/mfderiv.lean
224.0 src/category_theory/comma.lean
220.0 src/data/complex/exponential.lean
216.0 src/data/finset.lean
212.0 src/linear_algebra/basis.lean
211.0 src/data/padics/padic_numbers.lean
211.0 src/analysis/analytic/basic.lean
208.0 src/ring_theory/power_series.lean
205.0 src/analysis/calculus/times_cont_diff.lean

(raw data, sorted)

Bryan Gin-ge Chen (May 13 2020 at 15:07):

Somewhat related: #2276

Rob Lewis (May 13 2020 at 15:08):

I'm certainly in favor of a speed-up project! But just to be sure: you're not actually compiling mathlib for 4 hours, right? There's no reason to take longer than CI.

If you want to build: commit and push to a random branch of mathlib. In under 2 hours, leanproject get-cache should offer you a bunch of fresh oleans.

Rob Lewis (May 13 2020 at 15:08):

And it uses almost none of your own cpu. :)

Kenny Lau (May 13 2020 at 15:19):

I actually compiled mathlib for 4 hours

Kenny Lau (May 13 2020 at 15:19):

which agrees with the time here

Kenny Lau (May 13 2020 at 15:19):

I guess CI uses multithreads

Rob Lewis (May 13 2020 at 15:24):

I have no idea what the build time bot is, I don't read that stream. But the oleans for that commit were available 1:48 after it landed on the staging branch: https://github.com/leanprover-community/mathlib/runs/669462271

Kenny Lau (May 13 2020 at 15:25):

so CI must use 2 threads!

Scott Morrison (May 13 2020 at 15:26):

I was surprised how long the build-time-bot is taking now. (It was offline for a while in february/march, and got much worse when I restarted it.)

Kenny Lau (May 13 2020 at 15:26):

@Scott Morrison oh and how would you feel if I replaced all your by tidy with manual labour?

Scott Morrison (May 13 2020 at 15:27):

I'm not going to stand in the way of this happening.

Scott Morrison (May 13 2020 at 15:27):

But it would make me very sad. :-)

Scott Morrison (May 13 2020 at 15:27):

Theorem provers future usefulness depends on them being able to carry their own weight.

Scott Morrison (May 13 2020 at 15:28):

i.e. to handle for themselves a great deal of the tedious plumbing in between the mathematical ideas and the underlying logic.

Scott Morrison (May 13 2020 at 15:28):

(I said "future usefulness" to emphasise the complete use-less-ness of all current theorem provers!)

Scott Morrison (May 13 2020 at 15:29):

I'm not claiming at all that tidy is the answer. It is obviously lame and dumb and slow.

Scott Morrison (May 13 2020 at 15:29):

But it makes me sad that we're thinking "how do we avoid using automation", rather than "what do we need to do before this sort of automation is viable"?

Scott Morrison (May 13 2020 at 15:31):

I'd much prefer that we put effort into making #2300 viable than writing "boilerplate".

Scott Morrison (May 13 2020 at 15:33):

Also --- just making tidy faster is probably low hanging fruit.

Scott Morrison (May 13 2020 at 15:34):

I suspect that amongst the subsidiary tactics tidy calls, auto_cases, and simp are the killers. But it would be good to know where it spends its time.

Kenny Lau (May 13 2020 at 15:40):

per byte data:

0.04807339449541284 src/topology/category/Top/adjunctions.lean
0.020389546351084812 src/category_theory/elements.lean
0.015910462500685795 src/category_theory/limits/shapes/binary_products.lean
0.015718349928876246 src/analysis/normed_space/hahn_banach.lean
0.014868532998419965 src/analysis/special_functions/pow.lean
0.014746689923095219 src/category_theory/limits/over.lean
0.013920285544318858 src/analysis/complex/basic.lean
0.013886572143452876 src/category_theory/limits/shapes/constructions/limits_of_products_and_equalizers.lean
0.013393418570530287 src/measure_theory/probability_mass_function.lean
0.013363333985521423 src/algebra/homology/homology.lean
0.012804698972099854 src/category_theory/graded_object.lean
0.012802926383173296 src/topology/sheaves/presheaf.lean
0.012098366541015976 src/algebra/category/Module/monoidal.lean
0.011772905842098594 src/category_theory/limits/shapes/wide_pullbacks.lean
0.011354838709677418 src/category_theory/adjunction/limits.lean
0.011281224818694601 src/category_theory/pempty.lean
0.010924593658406609 src/topology/metric_space/cau_seq_filter.lean
0.010811648079306072 src/category_theory/currying.lean
0.010690172543135784 src/data/padics/padic_integers.lean
0.010124164278892072 src/algebraic_geometry/stalks.lean

(full data)

Kenny Lau (May 13 2020 at 15:41):

the most ideal count is per "lemma/def" right

Kevin Buzzard (May 13 2020 at 15:42):

I don't understand why it matters to @Kenny Lau that it takes hours to compile.

Kenny Lau (May 13 2020 at 15:43):

Kevin Buzzard said:

I don't understand why it matters to Kenny Lau that it takes hours to compile.

I'm not the only one:
Scott Morrison said:

The links are working again. I have no idea why the build times are so much worse than they used to be. Mathlib grew?

Kenny Lau (May 13 2020 at 16:19):

per declarations:

38.0 src/number_theory/sum_four_squares.lean
27.8 src/topology/metric_space/completion.lean
26.2 src/topology/category/Top/adjunctions.lean
26.15 src/algebra/category/Group/biproducts.lean
23.0 src/category_theory/products/basic.lean
22.666666666666668 src/data/padics/hensel.lean
22.2 src/category_theory/limits/shapes/constructions/limits_of_products_and_equalizers.lean
22.1 src/analysis/normed_space/hahn_banach.lean
16.7 src/topology/category/TopCommRing.lean
16.54 src/category_theory/elements.lean
15.7 src/measure_theory/simple_func_dense.lean
14.95 src/category_theory/limits/functor_category.lean
14.9 src/category_theory/monad/algebra.lean
14.75 src/data/zsqrtd/basic.lean
12.971428571428572 src/data/matrix/notation.lean
12.725 src/analysis/calculus/extend_deriv.lean
12.4 src/category_theory/limits/over.lean
12.363636363636363 src/geometry/manifold/basic_smooth_bundle.lean
11.95 src/analysis/complex/polynomial.lean
11.7 src/analysis/complex/basic.lean

(full data)

Ryan Lahfa (May 13 2020 at 16:33):

Kenny Lau said:

I actually compiled mathlib for 4 hours

What kind of CPU did you use?
Did anyone tried to compile mathlib using AMD-based ThreadRipper?

Mario Carneiro (May 13 2020 at 16:35):

Kenny has a machine that is very slow for unknown reasons

Mario Carneiro (May 13 2020 at 16:36):

he consistently measures times that are about double everyone else

Mario Carneiro (May 13 2020 at 16:37):

but I can see why that might incentivize him to speed up things more. I actually think that the move to all-CI and downloaded oleans has decoupled us from the pressures of improving compile times a bit too much. If we aren't careful this will be 6 hours in a few months

Kenny Lau (May 13 2020 at 16:38):

Mario Carneiro said:

he consistently measures times that are about double everyone else

so does build time bot

Mario Carneiro (May 13 2020 at 16:39):

I see nothing in that stream?

Mario Carneiro (May 13 2020 at 16:40):

I thought the build time bot died

Mario Carneiro (May 13 2020 at 16:40):

we are using github actions now for builds, so the timing would have to adapt to that

Bryan Gin-ge Chen (May 13 2020 at 16:42):

You might have to click "more topics" if you muted some of the spammier topics there.

Kenny Lau (May 13 2020 at 16:43):

build time bot said:

Building master at https://github.com/leanprover-community/mathlib/commit/51e2b4ccef20e49bc24ef86a6afe6e48196abbcf takes 246m47.805s

--profile output at https://tqft.net/lean/mathlib/51e2b4ccef20e49bc24ef86a6afe6e48196abbcf.log

Ryan Lahfa (May 13 2020 at 16:44):

Mario Carneiro said:

but I can see why that might incentivize him to speed up things more. I actually think that the move to all-CI and downloaded oleans has decoupled us from the pressures of improving compile times a bit too much. If we aren't careful this will be 6 hours in a few months

Is the CI running default/common hardware or custom powerful hardware?

Bryan Gin-ge Chen (May 13 2020 at 16:45):

https://help.github.com/en/actions/reference/virtual-environments-for-github-hosted-runners#cloud-hosts-for-github-hosted-runners

Ryan Lahfa (May 13 2020 at 16:45):

Okay so that's the default GH Actions hardware

Ryan Lahfa (May 13 2020 at 16:46):

I have to try to compile mathlib on this https://www.packet.com/cloud/servers/c3-medium/ just for reference

Mario Carneiro (May 13 2020 at 16:47):

CI machines tend to be slower than regular machines because they are time sharing with other CI projects

Mario Carneiro (May 13 2020 at 16:48):

but they also happen in parallel so it's not that relevant how long it takes unless you want to decrease latency

Ryan Lahfa (May 13 2020 at 16:48):

Yes, they're most likely virtualized under some baremetal stuff and some CI projects must steal CPU

Ryan Lahfa (May 13 2020 at 16:48):

Mario Carneiro said:

but they also happen in parallel so it's not that relevant how long it takes unless you want to decrease latency

But like, more cores would decrease the total amount of time, right?

Mario Carneiro (May 13 2020 at 16:49):

sure

Ryan Lahfa (May 13 2020 at 16:49):

I don't know how much mathlib compilation would benefit of parallelism

Mario Carneiro (May 13 2020 at 16:49):

it benefits greatly

Mario Carneiro (May 13 2020 at 16:49):

lean is fully parallel and will use all your cores

Ryan Lahfa (May 13 2020 at 16:50):

So an ThreadRipper 3990X with 64 cores / 128 threads would make Lean very much instant, I guess

Ryan Lahfa (May 13 2020 at 16:50):

In exchange of the mere amount of 4K EUR…

Mario Carneiro (May 13 2020 at 16:50):

Does anyone own one of those?

Ryan Lahfa (May 13 2020 at 16:51):

I personally don't, but I know sysadmin people who has one of them and I'm looking to own one for some general work / projects

Mario Carneiro (May 13 2020 at 16:51):

I know that scott has access to some high quality hardware that he has been using for the build time bot, but apparently something has recently got a lot worse and it's not clear if it's mathlib, the build system, or scott's machine

Ryan Lahfa (May 13 2020 at 16:52):

Maybe that's something that an university can afford, I guess or you'd found it in HPC clusters but don't know how hard would it be to compile mathlib on those

Rob Lewis (May 13 2020 at 16:52):

There was some not-so-serious talk at Lean Together about buying one of these things with grant money and setting up a fast compile server.

Rob Lewis (May 13 2020 at 16:52):

But that takes someone with sysadmin experience to keep running correctly.

Ryan Lahfa (May 13 2020 at 16:54):

I know that in the cloud/DevOps landscape, CNCF just give out access to a $25K/month account of ressources on Packet.com: https://www.cncf.io/community/infrastructure-lab/ which everyone can ask for if the demand is sound

Ryan Lahfa (May 13 2020 at 16:55):

Also, don't know how much Microsoft would be interested into giving infra for free to mathlib (?), I know that it's a Lean Prover community project, but as it's a big one… (?)

Reid Barton (May 13 2020 at 17:16):

Ryan Lahfa said:

I have to try to compile mathlib on this https://www.packet.com/cloud/servers/c3-medium/ just for reference

My prediction is somewhere in the vicinity of 30 minutes

Johan Commelin (May 13 2020 at 17:17):

My ancient hardware manages to compile in about 50 minutes. I have 16 threads.

Johan Commelin (May 13 2020 at 17:17):

So I hope modern hardware would be able to get < 20 minutes

Jannis Limperg (May 13 2020 at 17:20):

25min on my Ryzen 3700X (8 cores, 16 threads). I'm doing some timings with 1 and 8 cores right now to measure the speedup. Still, the usual caveat applies: what scales up to 16 threads does not necessarily scale up to 128 threads. There is no substitute for benchmarks.

Bryan Gin-ge Chen (May 13 2020 at 17:23):

It'd be good if someone with more C++ knowledge than me could take a look at lean#58 and other timing output issues so that we could get more accurate per-file / per-declaration timings. Has the profiling situation improved in Lean 4?

Reid Barton (May 13 2020 at 17:23):

Right, I don't think there is any doubt that there is plenty more than 8 available parallelism for most or all of the build.

Mario Carneiro (May 13 2020 at 17:24):

One interesting test would be to hack lean to skip all proofs, and see how much faster it gets. That should be an upper bound on the parallelism

Gabriel Ebner (May 13 2020 at 17:25):

It doesn't help nearly as much as you might assume. The definitions take a lot of time as well, and parsing the proofs is surprisingly expensive.

Reid Barton (May 13 2020 at 17:25):

There's still module-parallelism, though, which is controlled by the width of the import structure

Reid Barton (May 13 2020 at 17:27):

category theory is almost entirely definitions, at least at top level--I'm not sure how much parallelism is gained by floating out proofs

Mario Carneiro (May 13 2020 at 17:28):

Don't Prop parts of definitions get floated out as proofs?

Mario Carneiro (May 13 2020 at 17:28):

maybe that's not until the tactic is finished running though

Reid Barton (May 13 2020 at 17:28):

I guess I'm not sure at what point that happens :this:

Mario Carneiro (May 13 2020 at 17:29):

It should be possible for tidy to make sure this happens

Reid Barton (May 13 2020 at 17:29):

Like if I have def X : Thing := { str := blah blah, prop := by tidy } how much work gets floated out

Reid Barton (May 13 2020 at 17:29):

Is it possible that the elaboration of str even depends on what happens inside tidy?

Reid Barton (May 13 2020 at 17:29):

My impression at one point was that it is possible

Gabriel Ebner (May 13 2020 at 17:29):

You could wrap tidy in prove_goal_async.

Reid Barton (May 13 2020 at 17:29):

interesting!

Reid Barton (May 13 2020 at 17:30):

that is a string that I believe has not appeared on Zulip before

Reid Barton (May 13 2020 at 17:31):

For example, this works:

import tactic

structure thing : Type :=
(x : )
(p : x = 3)

def my_thing : thing :=
{ x := _,
  p := by tidy }

Reid Barton (May 13 2020 at 17:31):

I think it's mostly unfortunate that it works, though

Mario Carneiro (May 13 2020 at 17:33):

You can't float this goal because it is ?m = 3

Mario Carneiro (May 13 2020 at 17:33):

how would you generalize it?

Reid Barton (May 13 2020 at 17:33):

I don't want to float it, I want to get an error.

Mario Carneiro (May 13 2020 at 17:34):

but that will break things

Mario Carneiro (May 13 2020 at 17:34):

because that feature is used for sure

Mario Carneiro (May 13 2020 at 17:35):

unless you mean that you want tidy specifically to complain when given goals with metavariables

Reid Barton (May 13 2020 at 17:35):

I think those things should be broken.

Mario Carneiro (May 13 2020 at 17:35):

there are other tactics like refine_struct where this is kind of the point

Reid Barton (May 13 2020 at 17:35):

It actually makes it quite difficult to use interactively when Lean is trying to prove things about your incomplete definition, I found

Johan Commelin (May 13 2020 at 17:36):

I guess we want both options (-;

Reid Barton (May 13 2020 at 17:36):

for example, Lean won't just tell you it can't infer something because what if it somehow magically gets fixed by a proof obligation involving tidy

Reid Barton (May 13 2020 at 17:37):

I can't remember exactly how this comes up, but I remember being very frustrated by those . obviously annotations

Reid Barton (May 13 2020 at 17:37):

to the point where it was easier to just add the missing fields in by hand to get Lean to stop doing annoying things

Reid Barton (May 13 2020 at 17:38):

I wonder how much stuff would actually break, though.

Mario Carneiro (May 13 2020 at 17:38):

I can definitely see the issue; if you write { x := _ } and there is an autoparam on p := by tidy you won't get any message saying what x should be

Johan Commelin (May 13 2020 at 17:39):

@Reid Barton You don't want to change the behaviour of structures in general, right? Only . obviously?

Johan Commelin (May 13 2020 at 17:39):

So maybe that should wrap in prove_goal_async?

Reid Barton (May 13 2020 at 17:40):

I confirm that prove_goal_async works as advertised regarding goals with metavariables (which is also apparent from the source):

import tactic

meta def tactic.interactive.tidy' : tactic unit := tactic.prove_goal_async `[tidy]

structure thing : Type :=
(x : )
(p : x = 3)

def my_thing : thing :=
{ x := _, -- don't know how to synthesize placeholder
  p := by tidy' }

Reid Barton (May 13 2020 at 17:41):

then an added bonus is you can replace prove_goal_async by def prove_goal_async' : tactic unit -> tactic unit := pure () :check:

Reid Barton (May 13 2020 at 17:42):

Er maybe use sorry or something I meant

Jalex Stark (May 13 2020 at 19:12):

Rob Lewis said:

There was some not-so-serious talk at Lean Together about buying one of these things with grant money and setting up a fast compile server.

So is this funding-constrained or sysadmin-constrained or both?

Mario Carneiro (May 13 2020 at 19:13):

I think sysadmin-constrained

Jannis Limperg (May 13 2020 at 20:41):

Results of my parallel compilation 'experiment':

#threads | time    | speedup
1        | 2:32:15 | 1
8        |   31:54 | 4.77
16       |   25:57 | 5.87

Commands used to produce these number:

$ cd <mathlib dir>
$ rm **.olean
$ time leanpkg build -- --threads=<N> --memory=16000

Processor: Ryzen 7 3700X (8 cores, 16 threads)

There may be some noise because I was using the computer during those runs (and also I didn't do multiple runs or anything, so major grain of salt).

Rob Lewis (May 13 2020 at 22:21):

Jalex Stark said:

So is this funding-constrained or sysadmin-constrained or both?

It's clearly sysadmin-constrained, so we didn't even look into the funding details.

Jalex Stark (May 13 2020 at 22:25):

seems like the sort of thing where you might be able to turn funding into a sysadmin

Rob Lewis (May 13 2020 at 22:28):

One fast processor could be within a grant budget. It's one trip to one conference, no big deal. Hiring personnel is another story.

Rob Lewis (May 13 2020 at 22:28):

If you know a sysadmin who will work for pennies, then maybe we could talk!

Rob Lewis (May 13 2020 at 22:28):

But tomorrow, it's bedtime.

Jalex Stark (May 13 2020 at 22:30):

(I have in fact pinged a few underemployed sysadmin folks, I'll come back here if I get any bites)

Jannis Limperg (May 13 2020 at 23:11):

Do we really need a professional sysadmin to set up a Github Actions runner? I just looked into it and it doesn't seem very complicated at all. If we rent one or two virtual servers (~50Eur/month for 16 virtual cores), I can probably set up the OS and Github runner application myself.

The only snag is that Github doesn't recommend using self-hosted runners with public repos. Apparently you can execute arbitrary code by creating a PR, which is not so great.

Mario Carneiro (May 13 2020 at 23:12):

it's like any maintenance job. You are not needed at all for several months, and then oh god everything is broken and it needs to be fixed now and no one knows how anything works

Jalex Stark (May 13 2020 at 23:17):

if you're volunteering to be the sysadmin that works for pennies, I'm sure people here appreciate the work

Jalex Stark (May 13 2020 at 23:17):

I don't understand half of the things you said, so I certainly couldn't do it

Jalex Stark (May 13 2020 at 23:18):

I don't know where on the spectrum of skill you draw the line of "professional sysadmin"

Jalex Stark (May 13 2020 at 23:18):

many (most?) of us here are "professional mathematicians" who choose do a bunch of mathematical work for free

Jannis Limperg (May 14 2020 at 00:55):

Okay, if someone gets me some money to rent a suitable virtual server, I can set up a Github runner on it. I can also maintain it for the time being; I still have 3.5 years left on my PhD. I've run my own private virtual server for a couple of years (with a way more complex configuration) and it's really not much work for pretty decent uptime. Besides, should the server ever be down for long, Github should just fall back to its own runners.

Jannis Limperg (May 14 2020 at 01:14):

However, see my message above for security concerns. These apply to any third-party Github runner, regardless of whether that runner is a virtual server or a physical box. One solution would be to restrict PR creation to people who have commit-to-non-master-branch rights (if Github allows this). This would at least prevent complete strangers from hijacking the runner.

Scott Morrison (May 14 2020 at 01:14):

For reference, the machine running the build-time-bot stream is an 18 core 2.3GHz Xeon W. The build-time-bot runs with -j1 for reproducibility.

Scott Morrison (May 14 2020 at 01:19):

Compiling mathlib on this machine never maxs out the processors. The highest I see watching top is about 1500%. (Which sounds close to maxing out, but with hyperthreading we need to get to 3600% to count as maxing out?)

Scott Morrison (May 14 2020 at 01:19):

As far as I can see, nothing changed in how build-time-bot works during the gap that it wasn't running.

Scott Morrison (May 14 2020 at 01:21):

During that gap, the build times appear to have jumped from 86m21.607s to 234m12.034s, which seems pretty worrying.

Reid Barton (May 14 2020 at 01:21):

How long a gap was that?

Scott Morrison (May 14 2020 at 01:22):

January 23, commit 96ee2a6979b5bd20bee9de497d77fcd7f562ed4f

Scott Morrison (May 14 2020 at 01:22):

to

Reid Barton (May 14 2020 at 01:22):

also, is this data recorded somewhere other than in a zulip stream, possibly with graphs?

Scott Morrison (May 14 2020 at 01:22):

April 8, commit 732f7109c5cb2ece35481c200faa38fbbb4dc995

Scott Morrison (May 14 2020 at 01:22):

Sorry!

Scott Morrison (May 14 2020 at 01:23):

My script ends with zulip-send ...

Scott Morrison (May 14 2020 at 01:24):

(I stopped working on this shortly after getting the bare minimum working, because it seems that day-to-day variation was random enough to not be that helpful identifying bad commits.)

Reid Barton (May 14 2020 at 01:24):

I see

Scott Morrison (May 14 2020 at 01:26):

ok, I'm now keeping a local copy

Jannis Limperg (May 14 2020 at 01:28):

Scott Morrison said:

Compiling mathlib on this machine never maxs out the processors. The highest I see watching top is about 1500%. (Which sounds close to maxing out, but with hyperthreading we need to get to 3600% to count as maxing out?)

Afaiu, these numbers are pretty meaningless in the first place because you can 'max out' a system by busy-waiting on a lock or queue. For my 8- and 16-core load, I get perfect 8x and 16x system load, but not nearly the same speedup.

Bryan Gin-ge Chen (May 14 2020 at 01:28):

The log files are all here, but they're not terribly useful due to the bugs in Lean's profiling output. I was going to suggest pulling the numbers from the HTML of the Zulip archive of the build-time-bot thread, but we're not archiving the travis stream.

Scott Morrison (May 14 2020 at 01:30):

Interesting, @Jannis Limperg. I definitely don't get full load. For quite a while at the beginning of building mathlib it sits at about 600%, and then in the later half of building it slowly decays as well.

Scott Morrison (May 14 2020 at 01:30):

This had made me think that the numbers were pretty good, and were actually reflecting the width of the import graph.

Reid Barton (May 14 2020 at 01:32):

it's okay, I can write a zulip bot to consume the output of Scott's zulip bot

Reid Barton (May 14 2020 at 01:40):

Scott Morrison said:

Interesting, Jannis Limperg. I definitely don't get full load. For quite a while at the beginning of building mathlib it sits at about 600%, and then in the later half of building it slowly decays as well.

That seems broadly consistent with the numbers here.

Yury G. Kudryashov (May 14 2020 at 03:10):

@Scott Morrison Can you retrospectively bisect the jump? Was it a gradual increase, or some commit multiplied the build time by a factor of 2?

Scott Morrison (May 14 2020 at 03:11):

I was thinking about that... I only used git bisect a few times, and it scares me attempting to run it with a multi-hour test!

Scott Morrison (May 14 2020 at 03:11):

But I can try by hand.

Yury G. Kudryashov (May 14 2020 at 03:12):

If you don't want to run git bisect, you can just emulate weekly builds.

Yury G. Kudryashov (May 14 2020 at 03:12):

Or daily

Yury G. Kudryashov (May 14 2020 at 03:13):

Using git checkout master@{2020-03-30}

Yury G. Kudryashov (May 14 2020 at 03:24):

If you can add unix users to this server, you can add a user for me, and I'll run the script.

Yury G. Kudryashov (May 14 2020 at 03:26):

I don't have a good VPS (i.e., with reasonably good CPU&RAM) at hand right now because UoT doesn't issue unix accounts to post-docs.

Scott Morrison (May 14 2020 at 03:29):

Unfortunately that machine is not easily internet accessible via ssh.

Scott Morrison (May 14 2020 at 03:39):

Hmm, the master@{2020-03-30} trick doesn't work with a fresh clone, making it hard to use in my script.

Yury G. Kudryashov (May 14 2020 at 03:40):

What is the command line & the error message?

Scott Morrison (May 14 2020 at 03:42):

warning: Log for 'master' only goes back to Thu, 14 May 2020 13:38:49 +1000.

Yury G. Kudryashov (May 14 2020 at 03:43):

How do you git clone?

Yury G. Kudryashov (May 14 2020 at 03:43):

Probably you have something like --depth.

Scott Morrison (May 14 2020 at 03:44):

Just git clone https://github.com/leanprover-community/mathlib.git

Yury G. Kudryashov (May 14 2020 at 03:44):

Try git fetch --unshallow

Yury G. Kudryashov (May 14 2020 at 03:45):

Probably something in git config makes git clone create shallow clones by default.

Scott Morrison (May 14 2020 at 03:46):

$ git fetch --unshallow
fatal: --unshallow on a complete repository does not make sense

Bryan Gin-ge Chen (May 14 2020 at 03:46):

I don't think that's the issue. See https://stackoverflow.com/questions/34799426/git-log-for-master-only-goes-back-to-date-bug

Yury G. Kudryashov (May 14 2020 at 03:47):

Ah, I'm sorry.

Yury G. Kudryashov (May 14 2020 at 03:48):

Another answer suggests git checkout git rev-list -1 --before="Jan 17 2014" master`

Yury G. Kudryashov (May 14 2020 at 03:49):

$ git checkout `git rev-list -1 --before="2020-01-30" master`

works for me

Scott Morrison (May 14 2020 at 03:52):

Ok, I think I've got something working. We should start seeing some messages on the build-time-bot stream about commits in that missing period.

Yury G. Kudryashov (May 14 2020 at 04:29):

I can't find this stream.

Bryan Gin-ge Chen (May 14 2020 at 04:31):

Does this link work?

Yury G. Kudryashov (May 14 2020 at 04:36):

Yes, I was looking for stream name. It's a bit weird to have stream travis with valuable information when we don't use travis anymore.

Johan Commelin (May 14 2020 at 04:45):

@Mario Carneiro Could you please rename the stream to "Continuous integration", or simply "CI"?

Yury G. Kudryashov (May 14 2020 at 04:46):

Will build-bot still work after rename?

Johan Commelin (May 14 2020 at 04:46):

Oooh, good question

Sebastien Gouezel (May 14 2020 at 07:54):

Scott Morrison said:

Ok, I think I've got something working. We should start seeing some messages on the build-time-bot stream about commits in that missing period.

It's maybe worth checking in particular the commits where we bump Lean's version, since these are the most likely to change something in the whole building process.

Kenny Lau (May 14 2020 at 10:16):

Kenny Lau said:

per declarations:

38.0 src/number_theory/sum_four_squares.lean
27.8 src/topology/metric_space/completion.lean
26.2 src/topology/category/Top/adjunctions.lean
26.15 src/algebra/category/Group/biproducts.lean
23.0 src/category_theory/products/basic.lean
22.666666666666668 src/data/padics/hensel.lean
22.2 src/category_theory/limits/shapes/constructions/limits_of_products_and_equalizers.lean
22.1 src/analysis/normed_space/hahn_banach.lean
16.7 src/topology/category/TopCommRing.lean
16.54 src/category_theory/elements.lean
15.7 src/measure_theory/simple_func_dense.lean
14.95 src/category_theory/limits/functor_category.lean
14.9 src/category_theory/monad/algebra.lean
14.75 src/data/zsqrtd/basic.lean
12.971428571428572 src/data/matrix/notation.lean
12.725 src/analysis/calculus/extend_deriv.lean
12.4 src/category_theory/limits/over.lean
12.363636363636363 src/geometry/manifold/basic_smooth_bundle.lean
11.95 src/analysis/complex/polynomial.lean
11.7 src/analysis/complex/basic.lean

(full data)

elaboration: tactic execution took 107s
num. allocated objects:  31
num. allocated closures: 39
107093ms   100.0%   tactic.step
107093ms   100.0%   scope_trace
107093ms   100.0%   _interaction
107093ms   100.0%   tactic.istep._lambda_1
107093ms   100.0%   tactic.istep
107093ms   100.0%   tactic.seq
107005ms    99.9%   tactic.to_expr
106830ms    99.8%   tactic.interactive.exact
106830ms    99.8%   _private.4130977991.all_goals_core._main._lambda_2
106830ms    99.8%   interaction_monad.monad._lambda_9
106830ms    99.8%   all_goals_core
106830ms    99.8%   tactic.all_goals
  263ms     0.2%   tactic.interactive.haveI
  263ms     0.2%   _interaction._lambda_2
  263ms     0.2%   tactic.interactive.have._lambda_1
   88ms     0.1%   tactic.exact
elaboration of prime_sum_four_squares took 130s

Kenny Lau (May 14 2020 at 10:25):

prime_sum_four_squares used ring 8 times...

Gabriel Ebner (May 14 2020 at 10:27):

According to your profiling output, no time is spent inside ring.

Kenny Lau (May 14 2020 at 10:28):

that's the result of lean --profile

Kenny Lau (May 14 2020 at 10:28):

I'm afraid if I even add one byte to the file I'll get deterministic timeout

Kenny Lau (May 14 2020 at 10:28):

the point is, I don't know which profile result corresponds to which one

Gabriel Ebner (May 14 2020 at 10:28):

I'm just saying, it looks like all of the time is spent on term mode elaboration, not on ring.

Kenny Lau (May 14 2020 at 10:29):

I'm saying, this can't be the corresponding profiling result, because there is no ring there

Gabriel Ebner (May 14 2020 at 10:33):

How to hide tactic runtime for beginners: use by exact (by ring) instead of by ring. Then it shows up as to_expr instead of ring.

Kenny Lau (May 14 2020 at 10:34):

ok so it might still be because of the ring

Kenny Lau (May 14 2020 at 10:34):

this line:

by haveI hm0 : _root_.fact (0 < m) := (nat.find_spec hm).snd.1; exact

hid everything

Johan Commelin (May 14 2020 at 10:42):

You guys should stop writing such obfuscated term mode proofs... why not simply write begin haveI ... end?

Kevin Buzzard (May 14 2020 at 12:22):

Then your tools will work better :-)

Ryan Lahfa (May 14 2020 at 13:37):

Rob Lewis said:

If you know a sysadmin who will work for pennies, then maybe we could talk!

Well, here I am :-)

Ryan Lahfa (May 14 2020 at 13:40):

Jalex Stark said:

I don't know where on the spectrum of skill you draw the line of "professional sysadmin"

mathlib's Lean project is thankfully not a usual software company project, also, we could benefit from help from other communities which also do "reproducibility stuff" like Nix/NixOS and by extension the NLnet/NGI Zero foundations: https://www.ngi.eu/ngi-projects/ngi-zero/

Jalex Stark (May 14 2020 at 13:41):

cool! Basically all of our communities' software problems should be already solved in other contexts, learning from such solutions is great

Scott Morrison (May 14 2020 at 14:06):

[Quoting…]
That's a good idea. Maybe tomorrow I will look up these commits, but if someone wants to provide me a list that would be great.

Johan Commelin (May 14 2020 at 14:13):

@Scott Morrison Voila:

git log --oneline -- leanpkg.toml
81f97bdf chore(*): move to lean-3.11.0 (#2632)
a223bbb6 chore(*): switch to lean 3.10.0 (#2587)
99245b33 chore(*): switch to lean 3.9.0 (#2449)
597704ac chore(*): switch to lean 3.8.0 (#2361)
bc84a205 chore(leanpkg.toml): Lean 3.7.2c (#2203)
e719f8ee chore(*): switch to lean 3.7.1c (#2106)
78ffbae0 chore(*): switch to lean 3.6.1 (#2064)
6845aaa6 chore(*): bump Lean version to 3.5.1c (#1958)
c1e594bc feat(meta, logic, tactic): lean 3.4.2: migrate coinductive_predicates, transfer, relator (#610)
78f19497 refactor(*): move everything into `src` (#583)
bcec475a chore(leanpkg.toml): update version to 3.4.1
78d28c5c fix(*): update to lean
03d5bd97 fix(*): update to lean
22e671c5 fix(travis.yml): fix travis setup for new nightlies
81264ec7 fix(leanpkg.toml): remove lean_version
c87f1e6e fix(*): finish lean update
5717986f fix(*): update to lean
4320c417 chore(*): rename stdlib to mathlib
deb16814 refactor(*): import content from lean/library/data and library_dev

Johan Commelin (May 15 2020 at 08:01):

May I please draw your attention to:
https://github.com/leanprover-community/mathlib/pull/2685#issuecomment-629088455
cc: @Kenny Lau @Jannis Limperg @Ryan Lahfa @Mario Carneiro @Gabriel Ebner @Chris Hughes

Johan Commelin (May 15 2020 at 08:02):

Kudos to Mario! :tada: :top_hat: :octopus: :cake:

Kenny Lau (May 15 2020 at 08:03):

I don't understand what your link has to do with this topic and what Mario has to do with your link

Johan Commelin (May 15 2020 at 08:06):

@Kenny Lau Oops, fixed. Sorry.

Kenny Lau (May 15 2020 at 08:08):

indeed there are ways to solve this issue without removing by tidy from the category theory library

Kenny Lau (May 15 2020 at 08:08):

my solutions only dealt with the surface of the problem

Reid Barton (May 15 2020 at 10:05):

Scott Morrison said:

During that gap, the build times appear to have jumped from 86m21.607s to 234m12.034s, which seems pretty worrying.

Did we ever confirm whether the build time really tripled over those 3 months?

Reid Barton (May 15 2020 at 10:05):

I can try building on a decent machine too.

Johan Commelin (May 15 2020 at 10:05):

Maybe you can build both commits 3 times?

Reid Barton (May 15 2020 at 10:06):

Well I was going to start with once, but okay

Johan Commelin (May 15 2020 at 10:06):

With -j6 if you have 8 cores

Reid Barton (May 15 2020 at 10:06):

Why 6?

Johan Commelin (May 15 2020 at 10:06):

So that your other process move to the other threads, and hopefully don't influence the timing too much?

Reid Barton (May 15 2020 at 10:06):

I just wanted to figure out first whether the change was real or caused by some other undiagnosed change

Reid Barton (May 15 2020 at 10:06):

Oh, well, I was going to do it on a cloud VM

Reid Barton (May 15 2020 at 10:07):

But it should be reliable enough to test for a 3x difference

Scott Morrison (May 15 2020 at 11:12):

It seems the build times did triple over those three months.

Scott Morrison (May 15 2020 at 11:12):

I've been building a random sample of intermediate commits, and so far it looks like a fairly steady increase. I haven't pinned down any big jumps.

Chris Hughes (May 15 2020 at 12:53):

If I type import analyis.complex.polynomial and wait for the yellow bars to disappear, then it takes 3 seconds on lean 3.4.2, and 22 seconds on lean 3.11.0 with a full set of oleans. Lean seems to have got slower, I doubt it's mathlib getting slower that fast.

Johan Commelin (May 15 2020 at 12:56):

How much did the file change?

Chris Hughes (May 15 2020 at 12:56):

Hard to tell. The file has a lot of dependencies, which may have changed.

Sebastien Gouezel (May 15 2020 at 12:57):

Before, Lean used to check the file dates to see if it needed to recompile something. Now, it has to compute the md5 of all the files. Could that make up for the difference?

Chris Hughes (May 15 2020 at 12:58):

What Lean version was that change?

Ryan Lahfa (May 15 2020 at 12:58):

Sebastien Gouezel said:

Before, Lean used to check the file dates to see if it needed to recompile something. Now, it has to compute the md5 of all the files. Could that make up for the difference?

Then this kind of difference should disappear on very fast disks I suppose (SSD/NVMe), right?

Reid Barton (May 15 2020 at 12:59):

Here are my timings, btw

$ cat jan23/build.log apr8/build.log
configuring mathlib 0.1
> lean --make src
25449.26user 63.32system 24:14.20elapsed 1754%CPU (0avgtext+0avgdata 8817328maxresident)k
3240inputs+140592outputs (8major+8338993minor)pagefaults 0swaps
configuring mathlib 0.1
> lean --make src
33719.78user 55.97system 43:36.39elapsed 1290%CPU (0avgtext+0avgdata 10324836maxresident)k
0inputs+160088outputs (0major+3256318minor)pagefaults 0swaps

Chris Hughes (May 15 2020 at 12:59):

lean-3.5.1 is fifteen seconds, so the jump seems to have been from 3.4 to 3.5

Reid Barton (May 15 2020 at 13:00):

This is on a VM that claims to have 32 "processors"

Reid Barton (May 15 2020 at 13:01):

and running without any -j flag. The total work did not increase that much: certainly not by 3x...

Ryan Lahfa (May 15 2020 at 13:01):

Reid Barton said:

This is on a VM that claims to have 32 "processors"

cat /proc/cpuinfo ?

Reid Barton (May 15 2020 at 13:01):

Yes, (do you really want the whole thing?)

Reid Barton (May 15 2020 at 13:01):

It's 32 approximate copies of

processor       : 31
vendor_id       : AuthenticAMD
cpu family      : 23
model           : 1
model name      : AMD EPYC 7401P 24-Core Processor
stepping        : 2
microcode       : 0x1000065
cpu MHz         : 1996.248
cache size      : 512 KB
physical id     : 31
siblings        : 1
core id         : 0
cpu cores       : 1
apicid          : 31
initial apicid  : 31
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm rep_good nopl cpuid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext perfctr_core ssbd ibpb vmmcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 virt_ssbd arat npt nrip_save arch_capabilities
bugs            : fxsave_leak sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass
bogomips        : 3992.49
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

Ryan Lahfa (May 15 2020 at 13:01):

Just the non-repetitive part

Reid Barton (May 15 2020 at 13:02):

but, obviously an "AMD EPYC 7401P 24-Core Processor" has more than one core per cpu, so some of the numbers must be lies

Ryan Lahfa (May 15 2020 at 13:02):

And this is a virtualized machine?

Reid Barton (May 15 2020 at 13:02):

Yep

Ryan Lahfa (May 15 2020 at 13:03):

So, how probable is it that you actually get some CPU shares stolen from noisy neighbors?

Reid Barton (May 15 2020 at 13:03):

I could try building the same commits again to estimate the variance.

Reid Barton (May 15 2020 at 13:03):

No idea

Ryan Lahfa (May 15 2020 at 13:04):

Also, kind of surprising to have 32 × 24 exposed in a VM :'D, I'm not sure the OS is able to use them properly, except if it's well configured (NUMA stuff, etc. I suppose)

Reid Barton (May 15 2020 at 13:08):

No, it's not 32 cpus.

Ryan Lahfa (May 15 2020 at 13:08):

It's 32 cores?

Ryan Lahfa (May 15 2020 at 13:08):

Weird

Reid Barton (May 15 2020 at 13:08):

I assume I get some subset of 32 cores from some physical machine that has a total of 24*N cores for some N.

Johan Commelin (May 15 2020 at 13:14):

@Reid Barton I have no idea how much a compile costs... but it might be interesting to rebuild those two commits, to get an idea of the variance.

Johan Commelin (May 15 2020 at 13:14):

Are we talking about $0.05 or $5.00?

Reid Barton (May 15 2020 at 13:16):

Around $0.20 I guess, I don't mind doing more builds.

Reid Barton (May 15 2020 at 13:16):

If I had that 24×3624 \times 36 core server, though... :upside_down:

Ryan Lahfa (May 15 2020 at 13:17):

https://www.packet.com/cloud/servers/c2-medium-epyc/
The EPYC is there, with spot pricing $0.50 per hour, so… depends on the mathlib's times!

Johan Commelin (May 15 2020 at 13:17):

I'm surprised... it seems that the Apr8 build takes about as long in the cloud as on my ancient box

Johan Commelin (May 15 2020 at 13:17):

But I don't know anything about hardware

Andrew Ashworth (May 15 2020 at 13:17):

if you feel ambitious: http://stanford.edu/~sadjad/gg-paper.pdf

Johan Commelin (May 15 2020 at 13:17):

How does

processor       : 15
vendor_id       : GenuineIntel
cpu family      : 6
model           : 26
model name      : Intel(R) Xeon(R) CPU           E5540  @ 2.53GHz
stepping        : 5
microcode       : 0x1d
cpu MHz         : 2043.367
cache size      : 8192 KB
physical id     : 1
siblings        : 8
core id         : 3
cpu cores       : 4
apicid          : 23
initial apicid  : 23
fpu             : yes
fpu_exception   : yes
cpuid level     : 11
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid dtherm ida flush_l1d
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit
bogomips        : 5066.73
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

look?

Johan Commelin (May 15 2020 at 13:17):

Is that reasonable stuff, or as ancient as I think it is?

Ryan Lahfa (May 15 2020 at 13:19):

Andrew Ashworth said:

if you feel ambitious: http://stanford.edu/~sadjad/gg-paper.pdf

Wow, I didn't know that was actually feasible to get really cheap stuff on AWS Lambda, my customers get insane bills for stupid workloads so I advise them to move back to 8/16-cores servers

Ryan Lahfa (May 15 2020 at 13:20):

Johan Commelin said:

Is that reasonable stuff, or as ancient as I think it is?

It's 2009 stuff I believe, it's not that bad, it's just old compared to EPYC 7501P which is 2017

Johan Commelin (May 15 2020 at 13:21):

But I still build mathlib in 50 minutes: w00t!

Ryan Lahfa (May 15 2020 at 13:21):

But it could be we're not using well the newest hardware too :-)

Ryan Lahfa (May 15 2020 at 13:21):

Or we won't benefit from newer hardware which could be also a conclusion

Ryan Lahfa (May 15 2020 at 13:21):

Johan Commelin said:

But I still build mathlib in 50 minutes: w00t!

Used server hardware is awesome :-)

Reid Barton (May 15 2020 at 13:26):

Andrew Ashworth said:

if you feel ambitious: http://stanford.edu/~sadjad/gg-paper.pdf

Wow, this looks super neat (and knowing Keith Winstein, I assume it really is super neat) and probably fairly well-suited to theorem proving, too.

Reid Barton (May 15 2020 at 13:34):

here's the latest master, for comparison

$ /usr/bin/time leanpkg build 2>&1 | tee build.log
configuring mathlib 0.1
> lean --make src
41010.11user 69.07system 42:17.51elapsed 1618%CPU (0avgtext+0avgdata 12982540maxresident)k
29432inputs+169064outputs (0major+4857077minor)pagefaults 0swaps

Reid Barton (May 15 2020 at 13:35):

I'll start another round of builds now

Andrew Ashworth (May 15 2020 at 13:41):

Yeah, implementing that paper for a another project I contribute to has been on my TODO list for quite awhile. It takes ~5 hours to build and run all the integration tests, and I'm in love with the idea that maybe I could run it in 5 minutes by paying a few dollars. Of course, I never got around to it because adding a new build system to a project big enough to take 5 hours to build and test... one does not simply walk into Mordor.

Johan Commelin (May 15 2020 at 13:50):

On a positive note... the witt vector file is now really responsive. And this used to crash and timeout like crazy with lean-3.old

Jannis Limperg (May 15 2020 at 14:01):

@Reid Barton Could you run my benchmark script with a few thread counts (e.g. 1 2 4 8 16 32)? That would give us some more comparable data for this particular (virtualised) processor.

Reid Barton (May 15 2020 at 14:02):

Hmm, it looks like the 1-core build would take 10+ hours and I usually don't leave this machine on overnight.

Jannis Limperg (May 15 2020 at 14:13):

Fair enough. Any data you can get is good data. I'm particularly interested in 4/8/12/16/20 cores. My own experiments seem to indicate that anything above 8 is a waste of time; it would be very helpful if you could test this hypothesis.

Reid Barton (May 15 2020 at 14:17):

Yes, that sounds like a good thing to test.

Gabriel Ebner (May 15 2020 at 15:38):

Regarding the tidy hate on this thread: I just looked at Scott's latest profiling results. The total mathlib build time is 322 minutes. Tidy only takes 14 minutes out of that (4.4%).

Reid Barton (May 15 2020 at 15:43):

Here are the results from both sets of my runs. The numbers appear pretty stable, actually.

jan23/build.log:configuring mathlib 0.1
jan23/build.log:> lean --make src
jan23/build.log:25449.26user 63.32system 24:14.20elapsed 1754%CPU (0avgtext+0avgdata 8817328maxresident)k
jan23/build.log:3240inputs+140592outputs (8major+8338993minor)pagefaults 0swaps
jan23/build2.log:configuring mathlib 0.1
jan23/build2.log:> lean --make src
jan23/build2.log:25641.07user 55.08system 24:26.19elapsed 1752%CPU (0avgtext+0avgdata 8397940maxresident)k
jan23/build2.log:0inputs+140592outputs (0major+7927068minor)pagefaults 0swaps
apr8/build.log:configuring mathlib 0.1
apr8/build.log:> lean --make src
apr8/build.log:33719.78user 55.97system 43:36.39elapsed 1290%CPU (0avgtext+0avgdata 10324836maxresident)k
apr8/build.log:0inputs+160088outputs (0major+3256318minor)pagefaults 0swaps
apr8/build2.log:configuring mathlib 0.1
apr8/build2.log:> lean --make src
apr8/build2.log:33780.71user 51.32system 43:47.52elapsed 1287%CPU (0avgtext+0avgdata 10402992maxresident)k
apr8/build2.log:0inputs+160064outputs (0major+3603219minor)pagefaults 0swaps
master/build.log:configuring mathlib 0.1
master/build.log:> lean --make src
master/build.log:41010.11user 69.07system 42:17.51elapsed 1618%CPU (0avgtext+0avgdata 12982540maxresident)k
master/build.log:29432inputs+169064outputs (0major+4857077minor)pagefaults 0swaps
master/build2.log:configuring mathlib 0.1
master/build2.log:> lean --make src
master/build2.log:40334.56user 62.32system 42:33.54elapsed 1581%CPU (0avgtext+0avgdata 13151128maxresident)k
master/build2.log:0inputs+169032outputs (0major+4809347minor)pagefaults 0swaps

Reid Barton (May 15 2020 at 16:00):

(All the builds preceded all the build2s, so probably there wasn't some adversarial tenant on the same hardware who did work during just the apr8 builds.)

Reid Barton (May 15 2020 at 16:05):

Chris Hughes said:

If I type import analyis.complex.polynomial and wait for the yellow bars to disappear, then it takes 3 seconds on lean 3.4.2, and 22 seconds on lean 3.11.0 with a full set of oleans. Lean seems to have got slower, I doubt it's mathlib getting slower that fast.

I think I can reproduce this (I'll let you know when 3.11.0 finishes importing the file)

Reid Barton (May 15 2020 at 16:06):

okay, the difference is not that drastic for me over the versions I have, but still substantial: ~4.5 s / ~8s / ~14.5s for the three versions above

Reid Barton (May 15 2020 at 16:10):

I tried using lean --export on a file with just this import as a proxy for the size of the part of mathlib reachable from that module, and it increased by less than 20% over that range.

Reid Barton (May 15 2020 at 16:12):

What work does Lean have to do when importing a module? Also, is that work saved when importing the same module twice during the same lean --make session?

Reid Barton (May 15 2020 at 16:12):

Mostly typechecking the contents of the olean files, right? I assume that deserializing them and updating the environment is fast?

Rob Lewis (May 15 2020 at 16:15):

Is it possible more dependencies were added to analysis.complex.polynomial between these versions? A rough proxy for this is the number of declarations in the environment. run_cmd do e ← tactic.get_env, tactic.trace $ e.fold 0 $ λ _, nat.succ

Reid Barton (May 15 2020 at 16:18):

Reid Barton said:

Also, is that work saved when importing the same module twice during the same lean --make session?

Apparently not! My setup:
tmp.lean:

import analysis.complex.polynomial

tmp2.lean:

import .tmp

lean --make tmp.lean takes ~15s. Subsequent lean --make tmp2.lean takes ~15s. Removing oleans and a direct lean --make tmp2.lean takes ~30s!

Reid Barton (May 15 2020 at 16:21):

In Lean 3.4.2/mathlib jan23 each individual file takes ~4.75s while together they take ~6.75s--not sure what to make of that.

Reid Barton (May 15 2020 at 16:22):

Lean 3.7.2/mathlib apr8 is similar: each file takes ~8s with both taking ~12s

Reid Barton (May 15 2020 at 16:26):

Rob Lewis said:

run_cmd do e ← tactic.get_env, tactic.trace $ e.fold 0 $ λ _, nat.succ

49084 / 52933 / 55112. This is also similar to what I was trying to measure with lean --export (though that one also takes into account the sizes of expressions).

Johan Commelin (May 15 2020 at 16:28):

Reid Barton said:

lean --make tmp.lean takes ~15s. Subsequent lean --make tmp2.lean takes ~15s. Removing oleans and a direct lean --make tmp2.lean takes ~30s!

This sounds like something that can be fixed :grinning_face_with_smiling_eyes: And it might save us some seconds (-;

Chris Hughes (May 15 2020 at 16:28):

Rob Lewis said:

Is it possible more dependencies were added to analysis.complex.polynomial between these versions? A rough proxy for this is the number of declarations in the environment. run_cmd do e ← tactic.get_env, tactic.trace $ e.fold 0 $ λ _, nat.succ

I did the same test for computability.halting and got 1 second for 3.4.2, 5 seconds for 3.5.1 and 6 seconds for 3.11.0 I don't think this part of the library has changed that much. It doesn't import much apart from the rest of the computability code.

Reid Barton (May 15 2020 at 16:37):

Based on the perf report it really seems to be building the environment that takes most of the time

Reid Barton (May 15 2020 at 16:52):

Is there no convenient way to do some basic accounting of calls to malloc?

Gabriel Ebner (May 15 2020 at 17:04):

Reid Barton said:

What work does Lean have to do when importing a module? Also, is that work saved when importing the same module twice during the same lean --make session?

Two main things: deserializing the olean files into in-memory data structures (this results in a list of "modifications"), and then applying these modifications in order to produce the environment. No type checking happens here. But just all the inserting takes quite a bit of time.

Reid Barton (May 15 2020 at 17:11):

Gabriel Ebner said:

But just all the inserting takes quite a bit of time.

Yes, I'm seeing that. Do you have a better idea for understanding what changed than modifying these three different versions of Lean to add some profiling output?

Gabriel Ebner (May 15 2020 at 17:15):

Is there an easy way to figure out the dependencies? It would be interesting to see if the olean file size changed.

Reid Barton (May 15 2020 at 17:16):

The line count of lean --export went up by about 20%

Reid Barton (May 15 2020 at 17:17):

which as I understand it is essentially the total size of the environment (okay, not including some stuff like attributes maybe?)

Gabriel Ebner (May 15 2020 at 17:18):

Did I read your numbers correctly? This is the time taken by a file containing just import analysis.complex.polynomial:

3.4.2 3.7.2 3.11.0
Jan 23 Apr 8 Today
4.75s 8s 15s

Reid Barton (May 15 2020 at 17:19):

| 3.4.2 | 3.7.2 | 3.11.0 |

Reid Barton (May 15 2020 at 17:22):

Ooh pretty

Gabriel Ebner (May 15 2020 at 17:22):

And it took me just 5 minutes to get the markdown right...

Gabriel Ebner (May 15 2020 at 17:23):

not including some stuff like attributes maybe

Attributes are a possible explanation, that's why I'm curious.

Reid Barton (May 15 2020 at 17:30):

TIL about openat

Reid Barton (May 15 2020 at 17:32):

$ tail -n 1 {jan23,apr8,master}/wc.log
==> jan23/wc.log <==
  156411   602930 43606276 total

==> apr8/wc.log <==
  196482   723773 48700194 total

==> master/wc.log <==
  191842   780908 48442515 total

produced by strace -o strace.log -e trace=openat ~/.elan/toolchains/leanprover-community-lean-3.11.0/bin/lean tmp.lean (or open for previous versions) followed by
wc $(grep ^open strace.log | grep -v ENOENT | cut -d '"' -f 2 | grep '.olean$' | sort | uniq) | tee wc.log

Gabriel Ebner (May 15 2020 at 17:35):

The part between 3.7.2 and 3.11.0 is worrying. That's just one month. I see just one thing that could have an effect on import times: namely string literals in the VM (3.9.0). This should actually reduce olean size and improve import speed, but who knows.

Reid Barton (May 15 2020 at 17:36):

I guess I could try bisecting but I'm worried I might just get some numbers between 8 and 15

Gabriel Ebner (May 15 2020 at 17:36):

So the olean size stayed the same between 3.7.2 and 3.11.0, but importing now takes twice as long?

Reid Barton (May 15 2020 at 17:36):

So it seems!

Reid Barton (May 15 2020 at 17:37):

Do you want to see some perf report output?

Reid Barton (May 15 2020 at 17:37):

It's probably not that illuminating beyond what you already know...

Gabriel Ebner (May 15 2020 at 17:37):

Please.

Gabriel Ebner (May 15 2020 at 17:38):

I want to pass it to nvim -d.

Reid Barton (May 15 2020 at 17:40):

This is super low tech, but https://gist.github.com/rwbarton/bebc2b4c48f03feda09664167fc48b0d

Reid Barton (May 15 2020 at 17:41):

Hmm, the appearance of memcpy is intriguing

Gabriel Ebner (May 15 2020 at 17:46):

module_ext::~module_ext is also new. Did we get new docstrings in the last month?

Jannis Limperg (May 15 2020 at 17:49):

I added the module init.meta.case_tag with a docstring.

Reid Barton (May 15 2020 at 17:49):

Surely there are some, e.g., https://github.com/leanprover-community/mathlib/commit/9f33b7dd6d4beee9f89c815b21e31cafacd3b8c3#diff-7060ca01fd0d49ccfb27ecb48f8fe0f6L9

Reid Barton (May 15 2020 at 17:49):

Are docstrings slow??

Johan Commelin (May 15 2020 at 17:50):

That's one thing that has vastly increased in the last couple of months...

Gabriel Ebner (May 15 2020 at 17:55):

Reid Barton said:

Are docstrings slow??

A docstring is about as slow as a declaration.

Gabriel Ebner (May 15 2020 at 17:57):

But module doc strings might be much slower. Did we get new module doc strings?

Reid Barton (May 15 2020 at 17:57):

Yes, I linked one above for example

Reid Barton (May 15 2020 at 17:57):

I don't know whether we got many of them though

Gabriel Ebner (May 15 2020 at 17:58):

I have a theory.

Reid Barton (May 15 2020 at 17:59):

Also, I see about a 1.5x difference between apr8 and master with Chris's second suggestion computability.halting, which relies on a lot less of mathlib, so it would be a lot easier to bisect with that one

Gabriel Ebner (May 15 2020 at 18:00):

Do you see what is wrong with the following definition:

struct module_ext : public environment_extension {
    std::vector<module_name> m_direct_imports;
    list<std::shared_ptr<modification const>> m_modifications;
    list<name>        m_module_univs;
    list<name>        m_module_decls;
    name_set          m_module_defs;
    name_set          m_imported;
    std::unordered_map<std::string, std::vector<std::pair<pos_info, std::string>>> m_module_docs;
    name_map<std::string>     m_decl2olean;
    name_map<pos_info>        m_decl2pos_info;
};

Reid Barton (May 15 2020 at 18:01):

It's in C++?

Gabriel Ebner (May 15 2020 at 18:02):

What happens when you copy this structure?

Reid Barton (May 15 2020 at 18:02):

No idea (see above)

Mario Carneiro (May 15 2020 at 18:02):

modifications are shared?

Gabriel Ebner (May 15 2020 at 18:02):

The field m_module_docs is copied as well, including all module docs.

Gabriel Ebner (May 15 2020 at 18:02):

@Mario Carneiro No that's good. That's what we should do with the module docs as well.

Mario Carneiro (May 15 2020 at 18:02):

when does this get copied?

Mario Carneiro (May 15 2020 at 18:03):

what is the string key of the module doc map?

Mario Carneiro (May 15 2020 at 18:03):

why isn't it a name key?

Gabriel Ebner (May 15 2020 at 18:04):

The key is the module file name, I removed the comment that explains this. Lean doesn't know about data.nat when it imports something, it only knows the filename.

Gabriel Ebner (May 15 2020 at 18:05):

It gets copied every time you insert a module doc string, or update any of the other fields. Every declaration updates the m_decl2pos_info field...

Mario Carneiro (May 15 2020 at 18:06):

is that because this is the environment object that is functionally updated?

Gabriel Ebner (May 15 2020 at 18:06):

:+1:

Mario Carneiro (May 15 2020 at 18:06):

I thought there would be a lot more sharing than this

Mario Carneiro (May 15 2020 at 18:06):

this looks more appropriate for a "baked" environment object, once the file is complete

Mario Carneiro (May 15 2020 at 18:07):

but maybe there isn't any such concept in the C++

Reid Barton (May 15 2020 at 18:07):

vector is a reference, not sure about any of the others

Reid Barton (May 15 2020 at 18:07):

wait is it? I don't even know

Mario Carneiro (May 15 2020 at 18:07):

I think it's just owned data

Mario Carneiro (May 15 2020 at 18:08):

I don't know if the name_sets can be functionally shared

Gabriel Ebner (May 15 2020 at 18:11):

std::vector and std::unordered_map are owned. The name_set is fine.

Mario Carneiro (May 15 2020 at 18:14):

How about

persistent_map<std::string, list<std::pair<pos_info, std::string>>> m_module_docs;

then? Not sure what the C++-ism for persistent_map is

Mario Carneiro (May 15 2020 at 18:15):

It would be better if the map contained vector for all files other than the current one and a list for the current file

Gabriel Ebner (May 15 2020 at 18:15):

We typically use rb_map. I'll try to code something up in the next few minutes.

Reid Barton (May 15 2020 at 18:20):

Is there a specific Lean version bump that it would be useful to benchmark both sides of?

Gabriel Ebner (May 15 2020 at 18:25):

Give me a few seconds, then you can benchmark a PR.

Gabriel Ebner (May 15 2020 at 18:31):

@Reid Barton lean#241

Reid Barton (May 15 2020 at 18:31):

It's on 3.12, will mathlib already build with that? I ignored that discussion

Reid Barton (May 15 2020 at 18:33):

alternatively, should it apply to 3.11?

Gabriel Ebner (May 15 2020 at 18:33):

#2681

Gabriel Ebner (May 15 2020 at 18:33):

It should also apply to 3.11

Reid Barton (May 15 2020 at 18:33):

maybe for benchmarking purposes it is better for me to build it on 3.11 then

Reid Barton (May 15 2020 at 18:40):

Apparently I'm building with some version of g++ that produces a heck of a lot of warnings on the Lean code base

Reid Barton (May 15 2020 at 19:05):

oh, I guess I didn't have to go and build all of mathlib, but that's what I'm doing apparently.

Reid Barton (May 15 2020 at 19:23):

39417.19user 56.41system 35:53.47elapsed 1833%CPU (0avgtext+0avgdata 12875252maxresident)k
8inputs+169048outputs (0major+3965789minor)pagefaults 0swaps

Reid Barton (May 15 2020 at 19:24):

down from 42:17.51elapsed ... :tada:

Johan Commelin (May 15 2020 at 19:24):

That's a 1/6 improvement!

Reid Barton (May 15 2020 at 19:25):

import analysis.complex.polynomial is down to 6 seconds, which is more or less in line with mathlib growth

Johan Commelin (May 15 2020 at 19:27):

Does this also include the simp improvements? Or is it only the module-doc fix?

Gabriel Ebner (May 15 2020 at 19:27):

Since you've already compiled mathlib, could you check something for me? Does module_doc_strings still return the doc strings in the same order. (Go to some file with multiple module doc strings like data.set.intervals.basic and call the function at the end of the file.)

Reid Barton (May 15 2020 at 19:28):

that is just lean#241 cherry-picked to 3.11.0

Gabriel Ebner (May 15 2020 at 19:28):

So that's maybe another 25%? 3.13 is going to be a nice release.

Reid Barton (May 15 2020 at 19:29):

Gabriel Ebner said:

Since you've already compiled mathlib, could you check something for me? Does module_doc_strings still return the doc strings in the same order. (Go to some file with multiple module doc strings like data.set.intervals.basic and call the function at the end of the file.)

How do I call it?

Gabriel Ebner (May 15 2020 at 19:29):

#eval tactic.module_doc_strings >>= tactic.trace

Reid Barton (May 15 2020 at 19:31):

Yes, same output

Gabriel Ebner (May 15 2020 at 19:32):

Great!

Reid Barton (May 15 2020 at 20:49):

I guess the thing to do now is to try to build lean#241 with #2681

Reid Barton (May 15 2020 at 20:49):

or vice versa I suppose

Reid Barton (May 15 2020 at 21:03):

oh, the other speedups are in 3.13 but the recent pr is only for 3.12, I see

Ryan Lahfa (May 15 2020 at 21:04):

Wow, 3.13 is really going to be very nice

Reid Barton (May 15 2020 at 22:04):

Yeah, no major change in speed for the 3.12 PR.

Scott Morrison (May 16 2020 at 02:19):

Well done, @Gabriel Ebner and @Reid Barton! It's fantastic we've caught and fixed this.

Scott Morrison (May 16 2020 at 02:20):

I didn't follow the analysis of what had gone wrong in the C++, but it sounds like we were doing something accidentally quadratic (in the number of docstrings)?

Johan Commelin (May 16 2020 at 04:28):

When I build current lean master I get the following warning:

                 from /path/lean/src/util/sexpr/option_declarations.h:11,
                 from /path/lean/src/library/equations_compiler/elim_match.cpp:9:
/path/lean/src/util/name.h: In member function ‘lean::list<lean::elim_match_fn::lemma> lean::elim_match_fn::process(const lean::elim_match_fn::problem&)’:
/path/lean/src/util/name.h:87:40: warning: ‘*((void*)(& fn_name)+8).lean::name::m_ptr’ may be used uninitialized in this function [-Wmaybe-uninitialized]
   87 |     ~name() { if (m_ptr) m_ptr->dec_ref(); }
      |                          ~~~~~~~~~~~~~~^~
/path/lean/src/library/equations_compiler/elim_match.cpp:520:24: note: ‘*((void*)(& fn_name)+8).lean::name::m_ptr’ was declared here
  520 |         optional<name> fn_name;
      |                        ^~~~~~~

Is this anything I should worry about?

Ryan Lahfa (May 16 2020 at 05:12):

Johan Commelin said:

When I build current lean master I get the following warning:

                 from /path/lean/src/util/sexpr/option_declarations.h:11,
                 from /path/lean/src/library/equations_compiler/elim_match.cpp:9:
/path/lean/src/util/name.h: In member function ‘lean::list<lean::elim_match_fn::lemma> lean::elim_match_fn::process(const lean::elim_match_fn::problem&)’:
/path/lean/src/util/name.h:87:40: warning: ‘*((void*)(& fn_name)+8).lean::name::m_ptr’ may be used uninitialized in this function [-Wmaybe-uninitialized]
   87 |     ~name() { if (m_ptr) m_ptr->dec_ref(); }
      |                          ~~~~~~~~~~~~~~^~
/path/lean/src/library/equations_compiler/elim_match.cpp:520:24: note: ‘*((void*)(& fn_name)+8).lean::name::m_ptr’ was declared here
  520 |         optional<name> fn_name;
      |                        ^~~~~~~

Is this anything I should worry about?

If m_ptr has a garbage value, you might get a null pointer dereferencing at destruction time, that could create a segfault during lifetime of the program, but maybe the compiler is drunk, I didn't read the code

Rob Lewis (May 17 2020 at 18:36):

So, uh, I just built master and #2697 on my laptop. Very non-scientific comparison: 50% realtime speedup. That's so big I almost don't believe this was a clean test.

Bryan Gin-ge Chen (May 17 2020 at 18:40):

Hooray! Thanks @Gabriel Ebner and @Johan Commelin!

Rob Lewis (May 17 2020 at 18:58):

I guess it was clean: here's a 56 min Actions build! https://github.com/leanprover-community/mathlib/runs/683136036

Bryan Gin-ge Chen (May 17 2020 at 19:00):

Now the linting takes almost as long as the build...

Sebastien Gouezel (May 17 2020 at 19:24):

And these 56 minutes are with 3000 lines more added by #2697, i.e., all the algebra that was originally in core!


Last updated: Dec 20 2023 at 11:08 UTC