Zulip Chat Archive

Stream: general

Topic: performance improvements to ring

Mario Carneiro (Sep 02 2018 at 20:43):

@Rob Lewis The latest commit replaces some of the mk_app applications used in ring, and there was a huge performance gain. Now ring will solve (x + y)^n = (x+y)^n up to n = 60 before hitting the timeout, compared to n = 11 before

Mario Carneiro (Sep 02 2018 at 20:44):

example (ε : ℚ) : ε / 3 + ε / 3 + ε / 3 = ε := by ring

takes a fraction of a second now

Patrick Massot (Sep 02 2018 at 20:45):

Is this something we could learn from?

Mario Carneiro (Sep 02 2018 at 20:46):

mk_app sucks

Mario Carneiro (Sep 02 2018 at 20:47):

There are still some mk_app uses in ring, but the problems seem to be when some of the arguments are (large) proof terms, even if they aren't used in type inference

Mario Carneiro (Sep 02 2018 at 20:47):

I think it is typechecking the terms, which is a really bad idea since it gets done so often

Kevin Buzzard (Sep 02 2018 at 21:01):

good catch Mario. I have had a lot of success with ring in the past but I usually cleared denominators myself and then applied the tactic when there were no divisions left.

Mario Carneiro (Sep 02 2018 at 21:13):

the problem wasn't denominators per se. The major slowdown was in structural parts of the proof that are done in just about every proof. For example normalizing (x+y)^nuses no division

Simon Hudon (Sep 02 2018 at 21:13):

I assume you're talking about tactic.mk_app. What do you replace it with? I was told it was preferable to use it to to_expr.

Mario Carneiro (Sep 02 2018 at 21:17):

mk_app will build an application while performing typeclass inference and typechecking (or at least inferring the types) of all passed arguments. The alternative is just to build the expr yourself, using expr.app or expr.mk_app, which does no typechecking or inference. I thought the inference might be slow, which is why ring uses an instance_cache, but I think that this was not the problem. Now I just do the typeclass inference directly and then put the app together manually.

Mario Carneiro (Sep 02 2018 at 21:18):

meta def cache.mk_app (c : cache) (n inst : name) (l : list expr) : tactic expr :=
do m ← mk_instance ((expr.const inst [c.univ] : expr) c.α),
   return $ (@expr.const tt n [c.univ] c.α m).mk_app l


c.mk_app ``norm_num.subst_into_prod ``has_mul [l, e, tl, e, t, hl, hr, p₂],

Rob Lewis (Sep 02 2018 at 21:24):

That's awesome! I spent a little while profiling linarith earlier today and was also seeing performance problems with mk_app. It's not a straightforward story when to use it over to_expr, and yeah, making expressions by hand is probably the way to go for critical applications.

Simon Hudon (Sep 02 2018 at 21:27):

It would be nice to have a more systematic approach to finding bottle necks. One way I could think of is to use Travis to publish metrics for every build.

Rob Lewis (Sep 02 2018 at 22:22):

@Mario Carneiro I'm about to go to bed, so I'm going to leave this here for you to think about if you want to, otherwise I'll investigate tomorrow. :slight_smile: Compare the two profiles here:

set_option profiler true

constants (T : Type) (h : discrete_linear_ordered_field T)
attribute [instance] h
example (ε : T) : (41/42) * ε - (ε / 2 + ε / 3 + ε / 7) = 0 :=
by ring

example (ε : ) : (41/42) * ε - (ε / 2 + ε / 3 + ε / 7) = 0 :=
by ring

The second one takes twice as long and spends almost all its time in the final exact. It's evaluating rational arithmetic in the kernel. The first one spends 20x longer in norm_num but that's still better than kernel evaluation.

Mario Carneiro (Sep 02 2018 at 22:23):

My guess is that this is tactic.norm_num's fault (the one built into core)

Rob Lewis (Sep 03 2018 at 11:50):

After chasing down a few false leads: it isn't tactic.norm_num, although tactic.norm_num is surprisingly sensitive to typeclass short circuits. These commands take roughly the same amount of time if you add enough short circuits for T, otherwise the second takes 7x longer.

run_cmd prod.fst <$> norm_num `((5/60) + (1/70) : ) >>= trace
run_cmd prod.fst <$> norm_num `((5/60) + (1/70) : T) >>= trace

I find this a little surprising since norm_num caches and pre-applies the instances, so it shouldn't be searching for any instance more than once. But this isn't what's causing the issue above.

Rob Lewis (Sep 03 2018 at 11:51):

When ring solves (41/42)*ε - (ε / 2 + ε / 3 + ε / 7) = 0, it actually proves (41/42)*ε + -(ε / 2 + ε / 3 + ε / 7) = 0. Unifying these over takes way longer than over T for some reason.

Rob Lewis (Sep 03 2018 at 11:52):

There's an easy workaround: change the last lines of ring1 to

  ntp  infer_type p,
  tactic.change ntp ff,
  tactic.exact p

Rob Lewis (Sep 03 2018 at 11:54):

The kernel is much faster at checking that the proof from ring has the expected type than exact.

Mario Carneiro (Sep 03 2018 at 21:00):

Argh, this is so annoying. I inserted explicit change proof terms in the proof. I think it is fixed but keep me appraised if you find anything else.

Rob Lewis (Sep 04 2018 at 07:14):

Thanks! Will do.

Last updated: Aug 03 2023 at 10:10 UTC