Stream: maths

Topic: ring is slow with polynomials

Kevin Buzzard (Apr 05 2020 at 21:22):

Is this just a fact of life for some reason:

import data.polynomial

set_option profiler true

open polynomial

example (R : Type) [comm_ring R] (x : R) :
(1 + x^2 + x^4 + x^6) * (1 + x) = 1+x+x^2+x^3+x^4+x^5+x^6+x^7 :=
by ring -- 5 seconds

example (R : Type) [comm_ring R] :
(1 + X^2 + X^4 + X^6 : polynomial R) * (1 + X) = 1+X+X^2+X^3+X^4+X^5+X^6+X^7 :=
by ring -- 18 seconds


Mario Carneiro (Apr 05 2020 at 21:30):

I suspect that the majority of the time is coming from the few times I "save some time" by calling lean builtins instead of hand rolling exprs using the constructor

Kevin Buzzard (Apr 05 2020 at 21:31):

My workaround is to prove the top one and deduce the bottom one with a rw.

Mario Carneiro (Apr 05 2020 at 21:33):

 8612ms    37.1%   tactic.mk_mapp
...
3405ms    14.7%   tactic.mk_app
...
5ms     0.0%   expr.app
3ms     0.0%   tactic.ring.cache.cs_app


the ones on the top are the lean builtins, the ones on the bottom are the hand rolled ones

Mario Carneiro (Apr 05 2020 at 21:33):

there is something seriously wrong with the lean internals here

Mario Carneiro (Apr 05 2020 at 21:34):

it's almost certainly calling typeclass inference, and that's responsible for the time spent, but I don't understand why the typeclass cache isn't working

Mario Carneiro (Apr 05 2020 at 21:36):

Actually, it might not be a bad idea to do something like that inside ring, that is, generalize the type of the expression before starting the proof proper

Mario Carneiro (Apr 05 2020 at 21:36):

that way those monstrous typeclass arguments are not copied everywhere, but rather are passed in via a have

orlando (Apr 05 2020 at 21:47):

Hey Kevin, with my computer it's faster with the polynomial ? It's normal ? 15 for the first and 11 for the second :upside_down:

Ryan Lahfa (Apr 05 2020 at 22:09):

FWIW, on a Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz, got 7 seconds for the first and 28 seconds for the second.

Profile for the 2nd one:

elaboration: tactic execution took 28.9s
num. allocated objects:  7769
num. allocated closures: 10240
28887ms   100.0%   _interaction._lambda_2
28887ms   100.0%   scope_trace
28887ms   100.0%   tactic.interactive.ring1._lambda_3
28887ms   100.0%   tactic.interactive.ring1
28887ms   100.0%   tactic.step
28887ms   100.0%   tactic.istep
28887ms   100.0%   tactic.interactive.ring
28887ms   100.0%   _interaction
28887ms   100.0%   tactic.istep._lambda_1
28887ms   100.0%   tactic.interactive.ring._lambda_1
28886ms   100.0%   tactic.ring.ring_m.run
28872ms    99.9%   tactic.ring.lift
13373ms    46.3%   tactic.norm_num
10911ms    37.8%   tactic.mk_mapp
7687ms    26.6%   tactic.ring.eval_pow._main._lambda_2
7154ms    24.8%   norm_num.eval_pow
5977ms    20.7%   norm_num.derive
5977ms    20.7%   tactic.ext_simplify_core
5861ms    20.3%   norm_num.derive._main._lambda_3
4157ms    14.4%   tactic.mk_app
1846ms     6.4%   expr.of_nat._lambda_1


Profile for the 1st:

elaboration: tactic execution took 7.45s
num. allocated objects:  7771
num. allocated closures: 10240
7445ms   100.0%   tactic.interactive.ring._lambda_1
7445ms   100.0%   tactic.istep._lambda_1
7445ms   100.0%   _interaction
7445ms   100.0%   tactic.interactive.ring
7445ms   100.0%   tactic.istep
7445ms   100.0%   _interaction._lambda_2
7445ms   100.0%   tactic.step
7445ms   100.0%   scope_trace
7445ms   100.0%   tactic.interactive.ring1
7445ms   100.0%   tactic.interactive.ring1._lambda_3
7444ms   100.0%   tactic.ring.ring_m.run
7434ms    99.9%   tactic.ring.lift
3567ms    47.9%   tactic.norm_num
2147ms    28.8%   tactic.ring.eval_pow._main._lambda_2
2120ms    28.5%   tactic.mk_mapp
2050ms    27.5%   norm_num.eval_pow
1698ms    22.8%   tactic.ext_simplify_core
1698ms    22.8%   norm_num.derive
1593ms    21.4%   norm_num.derive._main._lambda_3
1440ms    19.3%   tactic.mk_app


Anne Baanen (Apr 06 2020 at 07:55):

I don't want to sound too repetitive,, but you should use ring_exp here too :slight_smile:

Anne Baanen (Apr 06 2020 at 08:01):

I get approximately the same times as Kevin for ring, but 150ms and 500ms for ring_exp. I'm actually surprised that the difference is so large

Kevin Buzzard (Apr 06 2020 at 08:11):

Being repetitive is very helpful for me -- I have to hear something several times before it sinks in. Does ring_exp not use ring then??

Kevin Buzzard (Apr 06 2020 at 08:15):

@Anne Baanen you just shaved 10 seconds off a Codewars Kata I'm working on, which might move it from "too slow to work on the site" to "killer question" :-)

Anne Baanen (Apr 06 2020 at 08:20):

Kevin Buzzard said:

Being repetitive is very helpful for me -- I have to hear something several times before it sinks in. Does ring_exp not use ring then??

It doesn't! Basically ring only knows how to represent + and *, but can do so efficiently (using the Horner form), and ring_exp trades in the efficiency on simpler problems to also deal with ^

Yury G. Kudryashov (Apr 06 2020 at 08:28):

I like "trades in the efficiency" together with "runs much faster".

Sebastien Gouezel (Apr 06 2020 at 08:31):

If in practice ring_exp is most often faster on reasonable problems, shouldn't we rename ring to ring_horner and ring_exp to ring? @Mario Carneiro

Patrick Massot (Apr 06 2020 at 08:31):

Reading assignment for student Kevin: https://lean-forward.github.io/ring_exp/paper.pdf

Kevin Buzzard (Apr 06 2020 at 08:34):

This looks like some implementation issue to me

Kevin Buzzard (Apr 06 2020 at 08:34):

ring = ring_exp, right? By ext.

Kevin Buzzard (Apr 06 2020 at 08:37):

So should Grégoire-Mahboubi be called "proving equalities in a comm ring done wrong in coq"? :P (sorry Assia, I love your paper!)

Johan Commelin (Apr 06 2020 at 08:44):

Kevin Buzzard said:

ring = ring_exp, right? By ext.

No, I don't think so. ring can't deal with (a + b)^2, whereas ring_exp can.

Anne Baanen (Apr 06 2020 at 08:52):

I wouldn't say that ring_exp is always faster in practice: linarith uses ring internally, and linear expressions are exactly the simpler kind that ring is better at. The Horner form is optimal in theory (if you use the correct definition of "optimal")

Rob Lewis (Apr 06 2020 at 08:53):

IIRC, there are some situations where ring is faster than ring_exp. Anne and I did some investigation when they were working on ring_exp and there was a performance regression when we just replaced ring entirely.

(Yep, what Anne said!)

Rob Lewis (Apr 06 2020 at 08:54):

Of course implementation details in both can shift the performance either way for specific problems.

Rob Lewis (Apr 06 2020 at 08:54):

It's probably possible to backport some of the ring_exp optimizations to ring.

Anne Baanen (Apr 06 2020 at 08:55):

Johan Commelin said:

ring can't deal with (a + b)^2, whereas ring_exp can.

It can deal with that specific case, because it first rewrites (a + b)^2 to (a + b)*(a + b). You can replace the 2 with n : ℕ to get a real counterexample

Rob Lewis (Apr 06 2020 at 08:56):

But what I'm hearing from all these questions about ring and polynomials is that there are still type class issues with polynomials. The inefficiency in ring is probably that it's repeating unnecessary searches, but these searches are slow, and that's the root issue.

Kevin Buzzard (Apr 06 2020 at 08:58):

Patrick Massot said:

Reading assignment for student Kevin: https://lean-forward.github.io/ring_exp/paper.pdf

@Anne Baanen typo in table of p3: "$a^{b+c}=a^b+a^c$"

Kevin Buzzard (Apr 06 2020 at 09:01):

section 4 you seem to say "we tried random identities and ring_exp was slower" -- but I'm trying identities that come up in practice and ring_exp is a lot quicker :-)

Anne Baanen (Apr 06 2020 at 09:05):

Kevin Buzzard said:

Anne Baanen typo in table of p3: "$a^{b+c}=a^b+a^c$"

Well spotted! I already fixed it in the updated version of the paper, which should be available soon.

Anne Baanen (Apr 06 2020 at 09:11):

Indeed, it's not the most informative comparison, but it's objective and can be explained in a few sentences when there is a 5 page limit, so I think I will have to live with it appearing a bit slower :slight_smile:

Mario Carneiro (Apr 06 2020 at 11:04):

To repeat: the reason ring is slow here has little to do with ring, it has to do with the lean functions that ring calls that are slow when they should not be. Yes, it is possible to make ring faster by avoiding these functions, but this essentially means that you aren't allowed to use the elaborator at all, you have to build a fully formed expr by hand if you want to get performance and that's ridiculous. I'll do it, ring is already 99% implemented like this and the remaning 1% of lean is causing the problem, but this really should be fixed at a lower level

Jasmin Blanchette (Apr 06 2020 at 11:30):

But Anne, you will have 6 pages for the final version, right? Maybe you can also show some examples from this thread.

Anne Baanen (Apr 06 2020 at 11:45):

Yeah, I want to add a few more positive examples. But space is still pretty tight, since there are some places that the reviewers pointed out that they want more explanation. And also Mario's objection that the same optimizations can be applied to ring.

Mario Carneiro (Apr 06 2020 at 14:26):

Okay, I did a bunch of optimization (which is to say, mk_app considered harmful) and managed to get it down to 11 s. There are a lot of places that implicitly assume that calling mk_app is not that bad, so it's not easy going and the result looks horrible, but that's what you get for trying to circumvent lean internals. I don't think this is likely to get much better until I rewrite tactic.norm_num.

elaboration: tactic compilation took 5.97ms
elaboration: tactic compilation took 4.2ms
elaboration: tactic execution took 3.59s
num. allocated objects:  7909
num. allocated closures: 11047
3594ms   100.0%   tactic.interactive.ring1
3594ms   100.0%   scope_trace
3594ms   100.0%   tactic.interactive.ring1._lambda_3
3594ms   100.0%   tactic.step
3594ms   100.0%   _interaction._lambda_2
3594ms   100.0%   tactic.istep
3594ms   100.0%   tactic.interactive.ring
3594ms   100.0%   _interaction
3594ms   100.0%   tactic.istep._lambda_1
3594ms   100.0%   tactic.interactive.ring._lambda_1
3593ms   100.0%   tactic.ring.ring_m.run
3590ms    99.9%   tactic.ring.ring_m.run._lambda_1
3590ms    99.9%   tactic.using_new_ref
2770ms    77.1%   tactic.norm_num
1519ms    42.3%   tactic.ring.eval_pow._main._lambda_2
1517ms    42.2%   norm_num.eval_pow
1240ms    34.5%   tactic.ext_simplify_core
1240ms    34.5%   norm_num.derive
1166ms    32.4%   norm_num.derive._main._lambda_3
603ms    16.8%   tactic.mk_app
141ms     3.9%   expr.of_nat._lambda_1
85ms     2.4%   tactic.mk_mapp
78ms     2.2%   tactic.ring.eval_pow._main._lambda_19
42ms     1.2%   tactic.ring.eval_mul._main._lambda_2
28ms     0.8%   tactic.ring.eval_atom._lambda_2
15ms     0.4%   tactic.ring.eval._main._lambda_42
12ms     0.3%   char.of_nat
6ms     0.2%   tactic.failed
4ms     0.1%   norm_num.derive._main._lambda_2
3ms     0.1%   tactic.mk_instance
3ms     0.1%   expr.app
2ms     0.1%   tactic.ring.ring_m.mk_app._lambda_2
2ms     0.1%   and.decidable
2ms     0.1%   tactic.ring.eval._main._lambda_43
2ms     0.1%   norm_num.eval_div_ext
2ms     0.1%   expr.mk_app
1ms     0.0%   tactic.ring.eval_atom._lambda_3
1ms     0.0%   tactic.ring.eval
1ms     0.0%   string.empty
1ms     0.0%   expr.to_pos_nat
1ms     0.0%   tactic.exact
1ms     0.0%   or.decidable
1ms     0.0%   tactic.ring.eval._main._lambda_29
1ms     0.0%   expr.to_nat
elaboration: tactic execution took 11.5s
num. allocated objects:  7909
num. allocated closures: 11047
11461ms   100.0%   tactic.interactive.ring._lambda_1
11461ms   100.0%   tactic.istep._lambda_1
11461ms   100.0%   _interaction
11461ms   100.0%   tactic.interactive.ring
11461ms   100.0%   tactic.istep
11461ms   100.0%   _interaction._lambda_2
11461ms   100.0%   tactic.step
11461ms   100.0%   tactic.interactive.ring1
11461ms   100.0%   scope_trace
11461ms   100.0%   tactic.interactive.ring1._lambda_3
11453ms    99.9%   tactic.ring.ring_m.run._lambda_1
11453ms    99.9%   tactic.using_new_ref
11453ms    99.9%   tactic.ring.ring_m.run
9524ms    83.1%   tactic.norm_num
5253ms    45.8%   tactic.ring.eval_pow._main._lambda_2
5251ms    45.8%   norm_num.eval_pow
4390ms    38.3%   tactic.ext_simplify_core
4390ms    38.3%   norm_num.derive
4308ms    37.6%   norm_num.derive._main._lambda_3
1710ms    14.9%   tactic.mk_app
180ms     1.6%   tactic.ring.eval_mul._main._lambda_2
115ms     1.0%   expr.of_nat._lambda_1
66ms     0.6%   tactic.ring.eval_pow._main._lambda_19
62ms     0.5%   tactic.mk_mapp
21ms     0.2%   tactic.ring.eval_atom._lambda_2
17ms     0.1%   tactic.ring.eval._main._lambda_42
8ms     0.1%   char.of_nat
8ms     0.1%   tactic.exact
5ms     0.0%   tactic.failed
5ms     0.0%   expr.app
3ms     0.0%   norm_num.eval_div_ext
2ms     0.0%   expr.mk_app
2ms     0.0%   tactic.ring.ring_m.mk_app._lambda_2
1ms     0.0%   nat.binary_rec
1ms     0.0%   tactic.ring.eval_pow._main._lambda_20
1ms     0.0%   nat.bodd
1ms     0.0%   string.push
1ms     0.0%   norm_num.eval_ineq
1ms     0.0%   tactic.ring.eval_pow._main._lambda_23
1ms     0.0%   tactic.ring.eval_pow._main._lambda_22
1ms     0.0%   tactic.ring.eval_pow
1ms     0.0%   norm_num.derive._main._lambda_2
1ms     0.0%   nat.decidable_lt


Mario Carneiro (Apr 06 2020 at 14:30):

(as always, I am frustrated because I know that this job can be done in <1ms if you consider what literally has to happen to produce the term; everything beyond that is lean/vm bloat)

Patrick Massot (Apr 06 2020 at 14:31):

Why don't you hack^W develop Lean, instead? Would that be much harder?

Mario Carneiro (Apr 06 2020 at 14:34):

I don't want to touch typeclass inference with a ten foot pole

Mario Carneiro (Apr 06 2020 at 14:34):

maybe the issue in mk_app has a simple fix, maybe not

Last updated: May 10 2021 at 08:14 UTC