Zulip Chat Archive

Stream: mathlib4

Topic: how much failing is too much for a simp theorem

Matthew Ballard (May 07 2025 at 13:36):

#24558 removes docs#Set.insert_eq_of_mem from the simp theorems. From the benchmark, we see that ~1/1000 = 0.1% of the total instructions in mathlib are from Lean attempting unification with insert_eq_of_mem whenever the keys @insert _ (Set _) _ _ _ are found in a subterm. We will see this n times for ({a₁, a₂, ..., aₙ} : Set α).

What should be done? I don't think that 1/1000 for a single simp theorem is not sustainable. We can remove it from the simp theorems but it can maybe also be improved if {} used an Insert.insert with a signature of the form ℕ → α → γ → γ

Sébastien Gouëzel (May 07 2025 at 13:45):

Is it possible to see if the failures come from a long time spent trying to solve the side assumption (h : a ∈ s) in complicated cases, or if it's the mere application of the lemma which is costly?

Matthew Ballard (May 07 2025 at 13:46):

Poking at it now

Matthew Ballard (May 07 2025 at 13:48):

Actually it appears that it cannot discharge the side condition with it in the local context but only with it being manually fed into simp.

Matthew Ballard (May 07 2025 at 13:50):

import Mathlib.Data.Set.Insert

example {a : α} {s : Set α} (h : a ∈ s) : insert a s = s := by
  simp [h] -- succeeds

example {a : α} {s : Set α} (h : a ∈ s) : insert a s = s := by
  simp  -- fails

Yury G. Kudryashov (May 07 2025 at 13:59):

simp doesn't use context to discharge side conditions by default, unless you use simp [h] or simp [*].

Yury G. Kudryashov (May 07 2025 at 14:01):

Shoult it be replaced by a simp_proc that only works for {a, b, c}?

Eric Wieser (May 07 2025 at 14:04):

I think a better way to test for Sebastien's theory is to test how simp behaves on {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1}

Eric Wieser (May 07 2025 at 14:08):

I think it's unavoidable quadratic, but maybe we're doing worse than that, or our constant term is bad

Sébastien Gouëzel (May 07 2025 at 14:12):

a simproc just working to discharge a ∈ univ, and a ∈ s if it's in the context, and for sets with 1, 2 or 3 elements, would already cover most reasonable applications of the lemma. If s is very complicated and simp works hard to check whether a ∈ s holds, I'm not surprised this can be costly quite quickly.

Eric Wieser (May 07 2025 at 14:14):

some profiling suggests it is indeed asymptotically as expected:

import Mathlib

set_option trace.profiler true

-- 0.257972s
#simp ({1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
        1} : Set Nat)
-- 0.825771s
#simp ({1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
        1} : Set Nat)
-- 3.288878s
#simp ({1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
        21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
        1} : Set Nat)
set_option maxHeartbeats 400000
-- 14.628687s
#simp ({1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
        21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
        41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60,
        61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
        1} : Set Nat)

Yury G. Kudryashov (May 07 2025 at 14:16):

I think that it will be slow even without 1 at the end.

Eric Wieser (May 07 2025 at 14:17):

It's about 30% faster if you reduce the import to import Mathlib.Data.Set.Insert, so perhaps something else is making things slow here

Yury G. Kudryashov (May 07 2025 at 14:18):

I guess, it has more theorems to match a ∈ _ with.

Matthew Ballard (May 07 2025 at 14:22):

Yeah, inserting 100 ∈ in front of each #simp barely moves the time

Eric Wieser (May 07 2025 at 14:23):

Presumably that should be a higher priority simp lemma than Set.insert_eq_of_mem?

Eric Wieser (May 07 2025 at 14:23):

There's no point running O(n^2) dedupication to conclude an O(n) membership

Yury G. Kudryashov (May 07 2025 at 14:25):

What does simp do to f (g x) if it can apply a low prio lemma to g x or a high prio lemma to f (g x)?

Eric Wieser (May 07 2025 at 14:26):

attribute [simp↓] Set.mem_insert_iff makes the above faster for the 100 ∈ case

Eric Wieser (May 07 2025 at 14:27):

Indeed, priority is the wrong tool, it still applies the inner low priority lemma first

Bhavik Mehta (May 07 2025 at 19:53):

What's the performance cost of removing @[simp] on docs#Finset.insert_eq_of_mem too? (I'm on board with both changes)

Peter Nelson (May 08 2025 at 10:58):

I would be sad to see this particular lemma desimped. I use it quite frequently with simp being fed a membership hypothesis.

Yury G. Kudryashov (May 09 2025 at 00:57):

Can we make it faster but less powerful by using a cheaper tactic for the side goal?

Matthew Ballard (May 13 2025 at 13:48):

Why isn't the simp normal form insert a s = {a} ∪ s?

Yury G. Kudryashov (May 13 2025 at 14:52):

I guess, because {a, b, c} looks better than {a} \cup {b} \cup {c}.

Kevin Buzzard (May 13 2025 at 16:02):

Can it be simp normal form with an appropriate delaborator then?

Yury G. Kudryashov (May 13 2025 at 16:10):

What will it change for the purpose of this discussion?

Yury G. Kudryashov (May 13 2025 at 16:11):

I mean, if we change the simp normal form to use unions, then we'll get to "should a \in s -> {a} \cup s = s be a simp lemma?"

Last updated: Feb 28 2026 at 14:05 UTC