Zulip Chat Archive

Stream: lean4

Topic: Exponential blowup in unification with metavariables

Jovan Gerbscheid (May 14 2025 at 10:17):

I had seen this problem before, but it popped up twice recently again in #mathlib4 > Coercion triggers timeout and #mathlib4 > simp timeout at `whnf`, so I decided to investigate further.

#mwe of exponential blowup in type class search:

class A (n : Nat) where
  x : Nat

instance [A n] : A (n+1) where
  x := A.x n

theorem test [A 0] : A.x 15 = sorry := sorry

set_option trace.profiler true in
example [A 1] : A.x 15 = sorry := by
  rw [@test]

The problem arises for unification goals of the form f a₁ .. aₙ =?= f b₁ .. bₙ which contain at least one metavariable. There are two main approaches to solve this:

Unify the arguments pairwise: a₁ =?= b₁, .. aₙ =?= bₙ.
Unfold f on both sides

It is important to start with (1), e.g. for goals of the form f a =?= f ?m, so that we can immediately assign ?m := a.

When applying (2), we often end up with many of the same arguments a₁ .. aₙ and b₁ .. bₙ on both sides. This means that applying (1) to that leads to many of the same unification goals a₁ =?= b₁, .. aₙ =?= bₙ. The problem is that the transient cache (that is the cache for unification with metavariables) is not available, meaning that the same unification is tried multiple times. If this pattern repeats, then it leads to an exponential blowup.

I'm attempting to make a fix for this at lean#8294, and one test showed 1% speedup in Mathlib.

In the PR I am running into some mysterious build problems with segfaults in the PR, does anyone understand that? It seems to show up with some probability, but since there are multiple builds, usually at least one is affected.

Jovan Gerbscheid (May 14 2025 at 10:40):

My proposed solution is to change where the transient cache is being reset. Instead of resetting it at each checkpointDefEq, it should be reset at the start of isDefEq, and stay unreset as long as no metavariables are assigned. To make sure the cache stays valid, we keep track of the number of metavariable assignments in the metavariable context. If this number has increased w.r.t. the transient cache, then it isn't valid anymore. Whenever we revert the metavariable context (which we do in checkpointDefEq), if the number of metavariable assignments decreases, we also reset the transient cache.

Michael Stoll (May 14 2025 at 11:25):

I assume that this fix will improve performance also in the situation described in #lean4 > Coercion and unification @ 💬 (I noticed quite a bit of repetition there), but will not prevent the fruitless unification attempt altogether. Would it be possible to also test my proposal (that the term ↑ ... gets re-elaborated [if this is the correct term] as soon as its type metavariable gets assigned)? This may involve additional bookkeeping, so may not be entirely trivial. (Unfortunately, I know essentially nothing about metaprogramming or how the unification algorithm works...)

Jovan Gerbscheid (May 14 2025 at 11:57):

Yes, I tried your mathlib free example on the lean version of my PR, and there it is quadratically slow instead of exponentially. with p := 50, it's still under a second.

Jovan Gerbscheid (May 14 2025 at 12:29):

Any change specific to coercions would be a completely separate fix. Presumably that'll require a change on the elaboration side of things, instead of unification.

Michael Stoll (May 14 2025 at 13:14):

It would likely fix lean4#2831 as well, though.

Jovan Gerbscheid (May 14 2025 at 15:03):

Indeed, that seems to require an optimization to how coercions are elaborated.

Jovan Gerbscheid (Jun 03 2025 at 16:31):

I got back to lean#8294, with an improved approach, and I fixed some confusing mathlib failures. There is now a speedup of

| build | -5127.32⬝10⁹ | -3.25% |
| lint  | -524.117⬝10⁹ | -7.11% |

In my new version, I fully reset the transient cache at the start of an isDefEq call as usual.

The cache also stores the total number of metavariable assignments at the time the cache is made.
When this number increases, the old cache is invalidated.
When this number decreases (which can happen while reverting the metavariable context if a sub goal failed to unify) we revert the cache to the previous version.
When this number stays the same, we keep using the same cache.

Jovan Gerbscheid (Jun 21 2025 at 11:00):

I made some more improvements, and tested it on a newer mathlib version (in particular one including the recent simp improvements), and the speedup is now

| build | -5685.57⬝10⁹ | -3.77% |
| lint  | -593.369⬝10⁹ | -8.44% |

It's now at the new PR lean#8883, and I've marked it as ready for review. Hopefully the core team will find time to review this at some point.

Edit: the speedup can be seen here

Last updated: Feb 28 2026 at 14:05 UTC