Zulip Chat Archive

Stream: general

Topic: Why is `int.cast_abs` so slow?


Anne Baanen (Jan 05 2022 at 14:13):

On the latest mathlib master, I was surprised that the combination of simp lemmas [max_eq_left, neg_le_self_iff, inv_nonneg] are very slow:

import data.int.cast
set_option profiler true

-- Original code from `src/data/int/cast.lean`
@[simp, norm_cast] theorem cast_abs {α : Type*} [linear_ordered_ring α] {q : } :
  ((|q| : ) : α) = |q| :=
by simp [abs_eq_max_neg] -- tactic execution took 1.3s
-- `squeeze_simp`sed:
example {α : Type*} [linear_ordered_ring α] {q : } :
  ((|q| : ) : α) = |q| :=
by simp only [abs_eq_max_neg, int.cast_max, int.cast_neg] -- tactic execution took 49ms
-- With the offending lemmas back in:
example {α : Type*} [linear_ordered_ring α] {q : } :
  ((|q| : ) : α) = |q| :=
by simp only [abs_eq_max_neg, int.cast_max, int.cast_neg, neg_le_self_iff, inv_nonneg, max_eq_left] -- tactic execution took 1.48s

Anne Baanen (Jan 05 2022 at 14:14):

More specific test cases:

import data.int.cast
set_option profiler true

example {q : } : max (-q) q = q := by simp; sorry -- tactic execution took 1.64s
example {q : } : max (-q) q = q := by simp only [neg_le_self_iff, inv_nonneg, max_eq_right]; sorry -- tactic execution took 1.17s

example {q : } : -q  q := by simp; sorry -- tactic execution took 527s
example {q : } : -q  q := by simp only [neg_le_self_iff, inv_nonneg]; sorry -- tactic execution took 508s

example {q : } : -q  q := by simp only [neg_le_self_iff]; sorry -- tactic execution took 14ms
example {q : } : 0  q := by simp only [inv_nonneg]; sorry -- tactic execution took 13ms

Anne Baanen (Jan 05 2022 at 14:16):

I noticed this lemma was being elaborated slowly when trying out lean#659 on mathlib, but it turns out the slowness was there already.

Anne Baanen (Jan 05 2022 at 14:30):

As expected, it seems to come from an extremely slow defeq check 0 ≤ q =?= 0 ≤ ?m_1⁻¹. This is then triggering many pages of cached failure for linear_ordered_field ℤ.

Anne Baanen (Jan 05 2022 at 14:58):

The check is specifically for the zero of linear_ordered_field to equal the zero of linear_ordered_ring, by checking that linear_ordered_field.to_linear_ordered_comm_ring.to_linear_ordered_ring.to_ordered_ring.to_ring.to_non_unital_non_assoc_ring.to_non_unital_non_assoc_semiring.to_mul_zero_class.to_has_zero.zero coincides with linear_ordered_ring.to_ordered_ring.to_ring.to_non_unital_non_assoc_ring.to_non_unital_non_assoc_semiring.to_mul_zero_class.to_has_zero.zero, and it looks like it's exponentially slow in doing so:

  1. for each to_subclass, they match so check that their arguments match, by considering the to_subclass of the parent.
  2. these are both constructor applications, so check their fields are the same. well, the add field is just linear_ordered_field.to_subclass...to_subclass.add =?= int.linear_ordered_comm_ring.to_subclass.to_subclass....to_subclass.add, which looks the same so call 1 on the parent class.
  3. eventually we get stuck on linear_ordered_field.add ℤ ?m_1 =?= int.add, fail the comparison, unfold the to_subclass of the parent class and go back to 1.

for each time that we go through 3, we add as many iterations of 2 as we have already done, together with an extra unfold, so this is exponentially slow in the length of the inheritance chain.

Gabriel Ebner (Jan 05 2022 at 19:29):

Did someone fix this already, or did you forget an 'm' before the 's'? I'm getting 879ms, 598ms, 457ms, 181ms, 6ms, 6ms on your second example. It's not great, but not the ten minutes I was expecting either.

Anne Baanen (Jan 07 2022 at 12:12):

(Sorry, missed your reply last time!) This really is what the profiler reports on my machine. Perhaps it depends on the mathlib version, since the hierarchy got restructured a bit recently with the non_unital_non_assoc_ring.

Gabriel Ebner (Jan 07 2022 at 12:14):

So really 8.5 minutes, wow. If I understand your analysis correctly, then this should also be fixed in Lean 4 since step 3 should succeed due to η. But I didn't verify this in mathlib3port yet.

Anne Baanen (Jan 07 2022 at 12:19):

Ah oops, debug traces are expensive! Perhaps I should run the actual code I posted instead... Now I have timings similar to yours: 1.53s, 1.2s, 779ms, 585ms, 22ms, 13ms.


Last updated: Dec 20 2023 at 11:08 UTC