Zulip Chat Archive
Stream: maths
Topic: ring is slow with polynomials
Kevin Buzzard (Apr 05 2020 at 21:22):
Is this just a fact of life for some reason:
import data.polynomial set_option profiler true open polynomial example (R : Type) [comm_ring R] (x : R) : (1 + x^2 + x^4 + x^6) * (1 + x) = 1+x+x^2+x^3+x^4+x^5+x^6+x^7 := by ring -- 5 seconds example (R : Type) [comm_ring R] : (1 + X^2 + X^4 + X^6 : polynomial R) * (1 + X) = 1+X+X^2+X^3+X^4+X^5+X^6+X^7 := by ring -- 18 seconds
Mario Carneiro (Apr 05 2020 at 21:30):
I suspect that the majority of the time is coming from the few times I "save some time" by calling lean builtins instead of hand rolling exprs using the constructor
Kevin Buzzard (Apr 05 2020 at 21:31):
My workaround is to prove the top one and deduce the bottom one with a rw.
Mario Carneiro (Apr 05 2020 at 21:33):
8612ms 37.1% tactic.mk_mapp ... 3405ms 14.7% tactic.mk_app ... 5ms 0.0% expr.app 3ms 0.0% tactic.ring.cache.cs_app
the ones on the top are the lean builtins, the ones on the bottom are the hand rolled ones
Mario Carneiro (Apr 05 2020 at 21:33):
there is something seriously wrong with the lean internals here
Mario Carneiro (Apr 05 2020 at 21:34):
it's almost certainly calling typeclass inference, and that's responsible for the time spent, but I don't understand why the typeclass cache isn't working
Mario Carneiro (Apr 05 2020 at 21:36):
Actually, it might not be a bad idea to do something like that inside ring, that is, generalize the type of the expression before starting the proof proper
Mario Carneiro (Apr 05 2020 at 21:36):
that way those monstrous typeclass arguments are not copied everywhere, but rather are passed in via a have
orlando (Apr 05 2020 at 21:47):
Hey Kevin, with my computer it's faster with the polynomial ? It's normal ? 15 for the first and 11 for the second :upside_down:
Ryan Lahfa (Apr 05 2020 at 22:09):
FWIW, on a Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz, got 7 seconds for the first and 28 seconds for the second.
Profile for the 2nd one:
elaboration: tactic execution took 28.9s num. allocated objects: 7769 num. allocated closures: 10240 28887ms 100.0% _interaction._lambda_2 28887ms 100.0% scope_trace 28887ms 100.0% tactic.interactive.ring1._lambda_3 28887ms 100.0% tactic.interactive.ring1 28887ms 100.0% tactic.step 28887ms 100.0% tactic.istep 28887ms 100.0% tactic.interactive.ring 28887ms 100.0% _interaction 28887ms 100.0% interaction_monad_orelse 28887ms 100.0% tactic.istep._lambda_1 28887ms 100.0% tactic.interactive.ring._lambda_1 28886ms 100.0% tactic.ring.ring_m.run 28886ms 100.0% interaction_monad.monad._lambda_9 28872ms 99.9% tactic.ring.lift 13373ms 46.3% tactic.norm_num 10911ms 37.8% tactic.mk_mapp 9164ms 31.7% tactic.ring.eval_add._main._lambda_2 7687ms 26.6% tactic.ring.eval_pow._main._lambda_2 7154ms 24.8% norm_num.eval_pow 5977ms 20.7% norm_num.derive 5977ms 20.7% tactic.ext_simplify_core 5861ms 20.3% norm_num.derive._main._lambda_3 4157ms 14.4% tactic.mk_app 1846ms 6.4% expr.of_nat._lambda_1
Profile for the 1st:
elaboration: tactic execution took 7.45s num. allocated objects: 7771 num. allocated closures: 10240 7445ms 100.0% interaction_monad_orelse 7445ms 100.0% tactic.interactive.ring._lambda_1 7445ms 100.0% tactic.istep._lambda_1 7445ms 100.0% _interaction 7445ms 100.0% tactic.interactive.ring 7445ms 100.0% tactic.istep 7445ms 100.0% _interaction._lambda_2 7445ms 100.0% tactic.step 7445ms 100.0% scope_trace 7445ms 100.0% tactic.interactive.ring1 7445ms 100.0% tactic.interactive.ring1._lambda_3 7444ms 100.0% interaction_monad.monad._lambda_9 7444ms 100.0% tactic.ring.ring_m.run 7434ms 99.9% tactic.ring.lift 3567ms 47.9% tactic.norm_num 2348ms 31.5% tactic.ring.eval_add._main._lambda_2 2147ms 28.8% tactic.ring.eval_pow._main._lambda_2 2120ms 28.5% tactic.mk_mapp 2050ms 27.5% norm_num.eval_pow 1698ms 22.8% tactic.ext_simplify_core 1698ms 22.8% norm_num.derive 1593ms 21.4% norm_num.derive._main._lambda_3 1440ms 19.3% tactic.mk_app
Anne Baanen (Apr 06 2020 at 07:55):
I don't want to sound too repetitive,, but you should use ring_exp
here too :slight_smile:
Anne Baanen (Apr 06 2020 at 08:01):
I get approximately the same times as Kevin for ring
, but 150ms and 500ms for ring_exp
. I'm actually surprised that the difference is so large
Kevin Buzzard (Apr 06 2020 at 08:11):
Being repetitive is very helpful for me -- I have to hear something several times before it sinks in. Does ring_exp
not use ring
then??
Kevin Buzzard (Apr 06 2020 at 08:15):
@Anne Baanen you just shaved 10 seconds off a Codewars Kata I'm working on, which might move it from "too slow to work on the site" to "killer question" :-)
Anne Baanen (Apr 06 2020 at 08:20):
Kevin Buzzard said:
Being repetitive is very helpful for me -- I have to hear something several times before it sinks in. Does
ring_exp
not usering
then??
It doesn't! Basically ring
only knows how to represent +
and *
, but can do so efficiently (using the Horner form), and ring_exp
trades in the efficiency on simpler problems to also deal with ^
Yury G. Kudryashov (Apr 06 2020 at 08:28):
I like "trades in the efficiency" together with "runs much faster".
Sebastien Gouezel (Apr 06 2020 at 08:31):
If in practice ring_exp
is most often faster on reasonable problems, shouldn't we rename ring
to ring_horner
and ring_exp
to ring
? @Mario Carneiro
Patrick Massot (Apr 06 2020 at 08:31):
Reading assignment for student Kevin: https://lean-forward.github.io/ring_exp/paper.pdf
Kevin Buzzard (Apr 06 2020 at 08:34):
This looks like some implementation issue to me
Kevin Buzzard (Apr 06 2020 at 08:34):
ring = ring_exp, right? By ext.
Kevin Buzzard (Apr 06 2020 at 08:37):
So should Grégoire-Mahboubi be called "proving equalities in a comm ring done wrong in coq"? :P (sorry Assia, I love your paper!)
Johan Commelin (Apr 06 2020 at 08:44):
Kevin Buzzard said:
ring = ring_exp, right? By ext.
No, I don't think so. ring
can't deal with (a + b)^2
, whereas ring_exp
can.
Anne Baanen (Apr 06 2020 at 08:52):
I wouldn't say that ring_exp
is always faster in practice: linarith
uses ring
internally, and linear expressions are exactly the simpler kind that ring
is better at. The Horner form is optimal in theory (if you use the correct definition of "optimal")
Rob Lewis (Apr 06 2020 at 08:53):
IIRC, there are some situations where ring
is faster than ring_exp
. Anne and I did some investigation when they were working on ring_exp
and there was a performance regression when we just replaced ring
entirely.
(Yep, what Anne said!)
Rob Lewis (Apr 06 2020 at 08:54):
Of course implementation details in both can shift the performance either way for specific problems.
Rob Lewis (Apr 06 2020 at 08:54):
It's probably possible to backport some of the ring_exp
optimizations to ring
.
Anne Baanen (Apr 06 2020 at 08:55):
Johan Commelin said:
ring
can't deal with(a + b)^2
, whereasring_exp
can.
It can deal with that specific case, because it first rewrites (a + b)^2
to (a + b)*(a + b)
. You can replace the 2
with n : ℕ
to get a real counterexample
Rob Lewis (Apr 06 2020 at 08:56):
But what I'm hearing from all these questions about ring
and polynomials is that there are still type class issues with polynomials. The inefficiency in ring
is probably that it's repeating unnecessary searches, but these searches are slow, and that's the root issue.
Kevin Buzzard (Apr 06 2020 at 08:58):
Patrick Massot said:
Reading assignment for student Kevin: https://lean-forward.github.io/ring_exp/paper.pdf
@Anne Baanen typo in table of p3: ""
Kevin Buzzard (Apr 06 2020 at 09:01):
section 4 you seem to say "we tried random identities and ring_exp
was slower" -- but I'm trying identities that come up in practice and ring_exp
is a lot quicker :-)
Anne Baanen (Apr 06 2020 at 09:05):
Kevin Buzzard said:
Anne Baanen typo in table of p3: ""
Well spotted! I already fixed it in the updated version of the paper, which should be available soon.
Anne Baanen (Apr 06 2020 at 09:11):
Indeed, it's not the most informative comparison, but it's objective and can be explained in a few sentences when there is a 5 page limit, so I think I will have to live with it appearing a bit slower :slight_smile:
Mario Carneiro (Apr 06 2020 at 11:04):
To repeat: the reason ring
is slow here has little to do with ring
, it has to do with the lean functions that ring
calls that are slow when they should not be. Yes, it is possible to make ring
faster by avoiding these functions, but this essentially means that you aren't allowed to use the elaborator at all, you have to build a fully formed expr by hand if you want to get performance and that's ridiculous. I'll do it, ring is already 99% implemented like this and the remaning 1% of lean is causing the problem, but this really should be fixed at a lower level
Jasmin Blanchette (Apr 06 2020 at 11:30):
But Anne, you will have 6 pages for the final version, right? Maybe you can also show some examples from this thread.
Kevin Buzzard (Apr 06 2020 at 11:31):
and this thread
Anne Baanen (Apr 06 2020 at 11:45):
Yeah, I want to add a few more positive examples. But space is still pretty tight, since there are some places that the reviewers pointed out that they want more explanation. And also Mario's objection that the same optimizations can be applied to ring
.
Mario Carneiro (Apr 06 2020 at 14:26):
Okay, I did a bunch of optimization (which is to say, mk_app
considered harmful) and managed to get it down to 11 s. There are a lot of places that implicitly assume that calling mk_app
is not that bad, so it's not easy going and the result looks horrible, but that's what you get for trying to circumvent lean internals. I don't think this is likely to get much better until I rewrite tactic.norm_num
.
elaboration: tactic compilation took 5.97ms elaboration: tactic compilation took 4.2ms elaboration: tactic execution took 3.59s num. allocated objects: 7909 num. allocated closures: 11047 3594ms 100.0% tactic.interactive.ring1 3594ms 100.0% scope_trace 3594ms 100.0% tactic.interactive.ring1._lambda_3 3594ms 100.0% tactic.step 3594ms 100.0% _interaction._lambda_2 3594ms 100.0% tactic.istep 3594ms 100.0% tactic.interactive.ring 3594ms 100.0% _interaction 3594ms 100.0% interaction_monad_orelse 3594ms 100.0% tactic.istep._lambda_1 3594ms 100.0% tactic.interactive.ring._lambda_1 3593ms 100.0% tactic.ring.ring_m.run 3590ms 99.9% tactic.ring.ring_m.run._lambda_1 3590ms 99.9% interaction_monad.monad._lambda_9 3590ms 99.9% tactic.using_new_ref 2770ms 77.1% tactic.norm_num 1559ms 43.4% tactic.ring.eval_add._main._lambda_2 1519ms 42.3% tactic.ring.eval_pow._main._lambda_2 1517ms 42.2% norm_num.eval_pow 1240ms 34.5% tactic.ext_simplify_core 1240ms 34.5% norm_num.derive 1166ms 32.4% norm_num.derive._main._lambda_3 603ms 16.8% tactic.mk_app 202ms 5.6% tactic.ring.eval_add._main._lambda_21 141ms 3.9% expr.of_nat._lambda_1 107ms 3.0% tactic.ring.eval_add._main._lambda_20 85ms 2.4% tactic.mk_mapp 78ms 2.2% tactic.ring.eval_pow._main._lambda_19 42ms 1.2% tactic.ring.eval_mul._main._lambda_2 28ms 0.8% tactic.ring.eval_atom._lambda_2 18ms 0.5% tactic.ring.eval_add._main._lambda_14 16ms 0.4% state_t.monad._lambda_19 15ms 0.4% reader_t.monad._lambda_19 15ms 0.4% tactic.ring.eval._main._lambda_42 12ms 0.3% char.of_nat 7ms 0.2% interaction_monad.failed 6ms 0.2% tactic.failed 6ms 0.2% tactic.ring.eval_add._main._lambda_13 4ms 0.1% norm_num.derive._main._lambda_2 4ms 0.1% tactic.ring.eval_add 3ms 0.1% tactic.ring.horner_expr.xadd' 3ms 0.1% tactic.ring.eval_add._main._lambda_23 3ms 0.1% tactic.mk_instance 3ms 0.1% expr.app 2ms 0.1% tactic.ring.ring_m.mk_app._lambda_2 2ms 0.1% tactic.ring.eval_add._main._lambda_22 2ms 0.1% and.decidable 2ms 0.1% tactic.ring.eval._main._lambda_43 2ms 0.1% tactic.ring.eval_add._main._lambda_4 2ms 0.1% norm_num.eval_div_ext 2ms 0.1% expr.mk_app 1ms 0.0% tactic.ring.eval_atom._lambda_3 1ms 0.0% tactic.ring.eval_add._main._lambda_6 1ms 0.0% interaction_monad.fail 1ms 0.0% tactic.ring.eval 1ms 0.0% tactic.ring.eval_add._main._lambda_15 1ms 0.0% state_t.monad._lambda_20 1ms 0.0% string.empty 1ms 0.0% expr.to_pos_nat 1ms 0.0% tactic.exact 1ms 0.0% or.decidable 1ms 0.0% tactic.ring.eval._main._lambda_29 1ms 0.0% expr.to_nat elaboration: tactic execution took 11.5s num. allocated objects: 7909 num. allocated closures: 11047 11461ms 100.0% tactic.interactive.ring._lambda_1 11461ms 100.0% tactic.istep._lambda_1 11461ms 100.0% interaction_monad_orelse 11461ms 100.0% _interaction 11461ms 100.0% tactic.interactive.ring 11461ms 100.0% tactic.istep 11461ms 100.0% _interaction._lambda_2 11461ms 100.0% tactic.step 11461ms 100.0% tactic.interactive.ring1 11461ms 100.0% scope_trace 11461ms 100.0% tactic.interactive.ring1._lambda_3 11453ms 99.9% tactic.ring.ring_m.run._lambda_1 11453ms 99.9% interaction_monad.monad._lambda_9 11453ms 99.9% tactic.using_new_ref 11453ms 99.9% tactic.ring.ring_m.run 9524ms 83.1% tactic.norm_num 5637ms 49.2% tactic.ring.eval_add._main._lambda_2 5253ms 45.8% tactic.ring.eval_pow._main._lambda_2 5251ms 45.8% norm_num.eval_pow 4390ms 38.3% tactic.ext_simplify_core 4390ms 38.3% norm_num.derive 4308ms 37.6% norm_num.derive._main._lambda_3 1710ms 14.9% tactic.mk_app 180ms 1.6% tactic.ring.eval_mul._main._lambda_2 156ms 1.4% tactic.ring.eval_add._main._lambda_21 115ms 1.0% expr.of_nat._lambda_1 90ms 0.8% tactic.ring.eval_add._main._lambda_20 66ms 0.6% tactic.ring.eval_pow._main._lambda_19 62ms 0.5% tactic.mk_mapp 21ms 0.2% tactic.ring.eval_atom._lambda_2 18ms 0.2% tactic.ring.eval_add._main._lambda_14 17ms 0.1% tactic.ring.eval._main._lambda_42 8ms 0.1% char.of_nat 8ms 0.1% tactic.exact 8ms 0.1% state_t.monad._lambda_19 7ms 0.1% reader_t.monad._lambda_19 7ms 0.1% interaction_monad.failed 5ms 0.0% tactic.failed 5ms 0.0% tactic.ring.eval_add._main._lambda_13 5ms 0.0% expr.app 3ms 0.0% norm_num.eval_div_ext 2ms 0.0% expr.mk_app 2ms 0.0% tactic.ring.horner_expr.xadd' 2ms 0.0% tactic.ring.ring_m.mk_app._lambda_2 1ms 0.0% nat.binary_rec 1ms 0.0% tactic.ring.eval_pow._main._lambda_20 1ms 0.0% interaction_monad.monad._lambda_4 1ms 0.0% tactic.ring.eval_add._main._lambda_22 1ms 0.0% nat.bodd 1ms 0.0% string.push 1ms 0.0% norm_num.eval_ineq 1ms 0.0% tactic.ring.eval_pow._main._lambda_23 1ms 0.0% tactic.ring.eval_pow._main._lambda_22 1ms 0.0% tactic.ring.eval_pow 1ms 0.0% state_t.monad._lambda_20 1ms 0.0% norm_num.derive._main._lambda_2 1ms 0.0% nat.decidable_lt 1ms 0.0% reader_t.monad._lambda_20
Mario Carneiro (Apr 06 2020 at 14:30):
(as always, I am frustrated because I know that this job can be done in <1ms if you consider what literally has to happen to produce the term; everything beyond that is lean/vm bloat)
Patrick Massot (Apr 06 2020 at 14:31):
Why don't you hack^W develop Lean, instead? Would that be much harder?
Mario Carneiro (Apr 06 2020 at 14:34):
I don't want to touch typeclass inference with a ten foot pole
Mario Carneiro (Apr 06 2020 at 14:34):
maybe the issue in mk_app
has a simple fix, maybe not
Last updated: Dec 20 2023 at 11:08 UTC