Zulip Chat Archive

Stream: mathlib4

Topic: profiling `ring`


Scott Morrison (Aug 18 2021 at 09:27):

With Mario's fix, I can profile ring in mathlib3 vs mathlib4. Unfortunately there's some work to do before we catch up.

It's currently both slow, and getting slower faster, relative to mathlib3's ring.

import Mathlib.Tactic.Ring
import Mathlib.Util.Time

set_option maxHeartbeats 1000000

-- Takes about 11s vs 13s with mathlib3
#time lemma ring_timing_5 (x1 x2 x3 x4 x5 x6 x7 x8 : ) :
  (x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8)^5 = (x8 + x7 + x6 + x5 + x4 + x3 + x2 + x1)^5 :=
by ring

-- Takes about 34s vs 22s with mathlib3
#time lemma ring_timing_6 (x1 x2 x3 x4 x5 x6 x7 x8 : ) :
  (x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8)^6 = (x8 + x7 + x6 + x5 + x4 + x3 + x2 + x1)^6 :=
by ring

-- Takes about 84s vs 37s with mathlib3
#time lemma ring_timing_7 (x1 x2 x3 x4 x5 x6 x7 x8 : ) :
  (x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8)^7 = (x8 + x7 + x6 + x5 + x4 + x3 + x2 + x1)^7 :=
by ring

For comparison, on the same machine Coq manages the ^15 version in about 5s.

Scott Morrison (Aug 18 2021 at 09:29):

Even without thinking about optimising ring in Mathlib4, I would love to see how the performance changes if we compile the ring tactic!

Gabriel Ebner (Aug 18 2021 at 09:30):

I get the error 'ring_timing' has already been declared with your example.

Scott Morrison (Aug 18 2021 at 09:31):

Fixed, sorry.

Gabriel Ebner (Aug 18 2021 at 09:32):

I also get very different timings (4s, 13s, 32s), but that just seems to be constant factor of 10x.

Scott Morrison (Aug 18 2021 at 09:32):

I've found tests/plugin/ in the lean4 repo, and successfully run it, but it's not quite enough information for me to work out how to use a compiled version of Mathlib4.

Scott Morrison (Aug 18 2021 at 09:32):

(My back is sore and I'm working on an ancient laptop on the floor this evening.... I guess it really is ancient. :-)

Scott Morrison (Aug 18 2021 at 09:37):

Ah, crap. I had a copy of mathlib building in another window. Those numbers are not right. :-(

Scott Morrison (Aug 18 2021 at 09:42):

I've re-run the numbers. They now start off at n=5 the same as mathlib3's ring, and then gradually fall behind for higher n.

Scott Morrison (Aug 18 2021 at 09:43):

So it's not quite as worrying, but still not great.

Sebastian Ullrich (Aug 18 2021 at 11:13):

Would be interesting to check (e.g. with perf) if most time is spent in the interpreter (which we found to be faster than Lean 3's on simple programs) or in some other part that behaves differently from Lean 3

Gabriel Ebner (Aug 18 2021 at 12:03):

What Aurélien didn't port yet is the norm_num instance cache we had in Lean 3. I thought we might get away without the custom cache in Lean 4, but according to perf a large amount of the time is spent in mkAppM.

Sebastian Ullrich (Aug 18 2021 at 12:13):

While synthesizing instances? Might be worth investigating why the built-in SynthInstanceCache doesn't trigger here.

Gabriel Ebner (Aug 18 2021 at 12:25):

I'm not sure if I read the results correctly, but a big part of the runtime was type inference, in particular the hash computation for the type inference cache.

Daniel Selsam (Aug 18 2021 at 14:04):

Tip: whenever something like this is much faster in Lean3, especially if it is an artificial example with a lot of structure/symmetries, try commenting out https://github.com/leanprover-community/lean/blob/master/src/kernel/expr.cpp#L222-L223 and seeing if lean3 becomes slow.

Mario Carneiro (Aug 18 2021 at 19:24):

I started working on a rewrite / improvement of normNum yesterday. Using the OfNat dependent numerals leads to a lot of unnecessary work constructing exprs with strange shapes. Additionally I noticed that we're calling synthInstance in every numeral, which wasn't there before, as well as mkAppM as mentioned already. Those all look like low hanging fruits

Scott Morrison (Aug 20 2021 at 02:10):

I just re-ran my profiling of ring, after @Aurélien Saue's optimisations landed yesterday. Things look better now, and the results from

#time lemma ring_timing_7 (x1 x2 x3 x4 x5 x6 x7 x8 : ) :
  (x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8)^7 = (x8 + x7 + x6 + x5 + x4 + x3 + x2 + x1)^7 :=
by ring

for various values of 7 are now essentially indistinguishable from the numbers from mathlib3's ring. In particular the growth rates seem to match again.


Last updated: Dec 20 2023 at 11:08 UTC