Zulip Chat Archive

Stream: lean4

Topic: Coercion and unification

Michael Stoll (May 11 2025 at 18:02):

(See #mathlib4 > Coercion triggers timeout @ 💬 and the other thread linked there for context. I'm moving the discussion here since in the end it is about Lean's unification algorithm.)

Consider the following example (which uses Mathlib; this is how I came across it while trying to formalize some innoccuous proof, but the underlying problem is not Mathlib-specific):
[EDIT: fixed a typo]

import Mathlib.Data.Complex.Norm

variable (x : ℝ) (z : ℂ)

#count_heartbeats in -- Used 391 heartbeats
-- Try this: exact norm_add_le z ↑x
example : ‖z + x‖ ≤ ‖z‖ + ‖(x : ℂ)‖ := by exact?

#count_heartbeats in -- Used 326941 heartbeats
-- Try this: exact norm_add_le (↑x) z
example : ‖x + z‖ ≤ ‖(x : ℂ)‖ + ‖z‖ := by exact?

#count_heartbeats in -- Used 245 heartbeats
#check (norm_add_le z (↑x) : ‖z + x‖ ≤ ‖z‖ + ‖(x : ℂ)‖)

#count_heartbeats in -- Used 1149716 heartbeats
#check (norm_add_le (↑x) z : ‖x + z‖ ≤ ‖(x : ℂ)‖ + ‖z‖)

What is going on here?

In my (admittedly limited) understanding gained from staring at traces of similar examples and the comments by @Jovan Gerbscheid in the other thread, the situation is this.

In the first #check, when norm_add_le z (↑x) is elaborated (with expected type as given) and we get to ↑x, the expected type of that is Complex because the type of the arguments in norm_add_le has already been fixed by the first argument z. This causes Lean to look for a coercion from Real to Complex, which it duly finds, and from there, everything proceeds smoothly.
In the second #check we look at ↑x before we know its expected type (which is soon after fixed by looking at z). This has the effect that when Lean tries to unify the type of norm_add_le (↑x) z with the given type, ↑x has been replaced by an unassignable metavariable ?m.abc. At some point, we try to unify ‖↑x‖, which internally is ‖?m.abc‖, with ‖Complex.ofReal x‖. Since ?m.abc is unassignable, this is bound to fail, but before it does so, Lean tries very hard to succeed by successively unfolding the complex norm (which involves docs#Real.sqrt, which in turn is defined in terms of docs#NNReal.sqrt, which involves docs#NNReal.powOrderIso etc. etc.) until the bitter end, which is reached after a few 10⁵ heartbeats. And then it does the same thing again with switched sides. And then it tries to find a coercion of the type of norm_add_le ... to the given type, which involves going down the same rabbit holes another time. Only after all of this does it go back and elaborate ↑x again with its expected type Complex, which finally leads to success.

So my question is, Can this inefficiency be avoided?

A naïve suggestion: when a unification problem involves an unassignable metavariable that stands for a coerced term and has known expected type, first elaborate the original coerced term with this expected type to fix the coercion and thus get rid of the metavariable. Or perhaps even do the latter as soon as the expected type becomes known. (So in the second #check above, this would occur after we have processed z in norm_add_le (↑x) z, because at that point the metavariable that stands for the type of ↑x has been assigned to Complex, and then it would work essentially like in the first #check.)

Michael Stoll (May 11 2025 at 19:59):

That unfolding the norm is related to the huge timing differences can be seen by sealing docs#Real.sqrt in the example:

import Mathlib.Data.Complex.Norm

variable (x : ℝ) (z : ℂ)

seal Real.sqrt

#count_heartbeats in -- Used 246 heartbeats
-- Try this: exact norm_add_le z ↑x
example : ‖z + x‖ ≤ ‖z‖ + ‖(x : ℂ)‖ := by exact?

#count_heartbeats in -- Used 689 heartbeats
-- Try this: exact norm_add_le (↑x) z
example : ‖x + z‖ ≤ ‖(x : ℂ)‖ + ‖z‖ := by exact?

#count_heartbeats in -- Used 144 heartbeats
#check (norm_add_le z (↑x) : ‖z + x‖ ≤ ‖z‖ + ‖(x : ℂ)‖)

#count_heartbeats in -- Used 1338 heartbeats
#check (norm_add_le (↑x) z : ‖x + z‖ ≤ ‖(x : ℂ)‖ + ‖z‖)

(The first heartbeats count is higher the first time the text is processed in the web editor; this is probably because the cache for exact?/apply? needs to be initialized.)

Michael Stoll (May 12 2025 at 10:03):

Here is a Mathlib-free (except for importing #count_heartbeats) example:

import Mathlib.Util.CountHeartbeats -- just for counting heartbeats

@[irreducible] def A : Type := Unit

@[irreducible] def B : Type := Unit

unseal B in
@[coe] def AtoB (_a : A) : B := ()

instance : Coe A B where coe := AtoB

def h {α : Type} (a b : α) : Nat → α
| 0 => a
| n + 1 => h b a n

def f {α : Type} (a b : α) : Nat → Prop
| 0 => a = b
| n + 1 => f (h a b n) (h b a n) n ∧ f (h a a n) (h b b n) n

def p : Nat := 6 -- last `#check` takes time exponential in `p`

theorem foo {α : Type} (a b : α) : f a b p := sorry

variable (x : A) (y : B)

#count_heartbeats in -- Used 36 heartbeats
#check (foo (x : B) y : f (AtoB x) y p)

#count_heartbeats in -- Used 31 heartbeats
#check foo (↑x) y

#count_heartbeats in -- Used 35 heartbeats
#check (foo y (↑x) : f y (AtoB x) p)

#count_heartbeats in -- Used 143388 heartbeats
#check (foo (↑x) y : f (AtoB x) y p)

This simulates a complicated-to-unfold definition (with quick whnf); this is essentially the same situation as with the norm on the complex numbers in the original example. (The last number of heartbeats varies slightly between the web editor and my own machine, possibly because of slightly different lean versions.)

Michael Stoll (May 14 2025 at 11:34):

There is a similar phenomenon in lean4#2831, but there it shows up with a * (↑b).

Michael Stoll (May 14 2025 at 11:34):

Note also

#count_heartbeats in -- Used 35 heartbeats
#check ((foo (↑x) y :) : f (AtoB x) y p)

(in the same context as above).

Jovan Gerbscheid (May 14 2025 at 12:02):

The problem in lean#2831 isn't solved by my fix for unification slowness. Here, the exponentially many unfication sub-problems are really all different, so this can't be fixed with better caching.

Michael Stoll (May 14 2025 at 13:09):

It would be interesting to try this with the "real life" Mathlib example involving complex norms. (I should also be able to modify the Mathlib-free example to genuinely require exponentially many sub-problems.)

Jovan Gerbscheid (May 14 2025 at 14:37):

Here are the heartbeat counts I get on the original example with Mathlib

import Mathlib.Data.Complex.Norm

variable (x : ℝ) (z : ℂ)

#count_heartbeats in -- Used 394 heartbeats
-- Try this: exact norm_add_le z ↑x
example : ‖z + x‖ ≤ ‖z‖ + ‖(x : ℂ)‖ := by exact?

#count_heartbeats in -- Used 598 heartbeats
-- Try this: exact norm_add_le (↑x) z
example : ‖x + z‖ ≤ ‖(x : ℂ)‖ + ‖z‖ := by exact?

#count_heartbeats in -- Used 246 heartbeats
#check (norm_add_le z (↑x) : ‖z + x‖ ≤ ‖z‖ + ‖(x : ℂ)‖)

#count_heartbeats in -- Used 662 heartbeats
#check (norm_add_le (↑x) z : ‖x + z‖ ≤ ‖(x : ℂ)‖ + ‖z‖)

Michael Stoll (May 14 2025 at 14:38):

OK, thanks!

This may indicate that your proposed fix to the caching may go a long way to mitigate the problem in practice.

Michael Stoll (May 14 2025 at 15:21):

@Mario Carneiro @Kyle Miller Should I extend lean4#2831 to include this problem or make a new issue that links to lean4#2831?

Michael Stoll (May 14 2025 at 15:35):

Here is an example that builds on the one in lean4#2831 (Mathlib-free except for #count_heartbeats ). This will not significantly sped up by @Jovan Gerbscheid's suggested fix to caching in isDefEq, I assume.

import Mathlib.Util.CountHeartbeats -- just for counting heartbeats

def dontUnfoldMe {α : Type} (x y : α) : Nat → List Bool → Nat
  | 0, _ => 0
  | n+1, l => dontUnfoldMe x y n (true::l) + dontUnfoldMe x y n (false::l)

theorem dontUnfoldMe_eq {α : Type} (x y : α) : ∀ n l, dontUnfoldMe x y n l = 0
  | 0, _ => rfl
  | n+1, l => by rw [dontUnfoldMe, dontUnfoldMe_eq x y n, dontUnfoldMe_eq x y n]

#count_heartbeats in -- Used 40 heartbeats
example (a : Int) (n : Nat) : dontUnfoldMe a n 14 [] = 0 :=
  dontUnfoldMe_eq a n 14 [] -- fast

#count_heartbeats in -- 12 --> 20776, 13 --> 41567, 14 --> 83212 heartbeats
example (a : Nat) (n : Int) : dontUnfoldMe (↑a) n 12 [] = 0 :=
  dontUnfoldMe_eq (↑a) n 12 [] -- slow (~ 2^m)

#count_heartbeats in -- 40
example (a : Nat) (n : Int) : dontUnfoldMe (↑a) n 14 [] = 0 :=
  (dontUnfoldMe_eq (↑a) n 14 [] :) -- fast

#count_heartbeats in -- Used 28 heartbeats
example (a : Nat) (n : Int) : dontUnfoldMe (↑a) n 14 [] = 0 :=
  dontUnfoldMe_eq .. -- fast

#count_heartbeats in -- 5 --> 617, 6 --> 1660, 10 --> 126204
example (a : Nat) (n : Int) : dontUnfoldMe (↑a) n 10 [] = 0 :=
  dontUnfoldMe_eq (↑a) n .. -- slower (~ 3^m)

#count_heartbeats in -- 29
example (a : Nat) (n : Int) : dontUnfoldMe (↑a) n 14 [] = 0 :=
  dontUnfoldMe_eq _ n .. -- fast

#count_heartbeats in -- 38
example (a : Nat) (n : Int) : dontUnfoldMe (↑a) n 5 [] = 0 :=
  dontUnfoldMe_eq (α := Int) a n .. -- fast

(Heartbeat counts in the web editor vary slightly.)

Michael Stoll (May 15 2025 at 15:06):

@Jovan Gerbscheid Did you check that there is no (significant) speed-up?

In the trace (example slightly varied by using types A and B in place of Nat and Int), I see

      [Meta.isDefEq] [54943086.000000] ✅️ dontUnfoldMe (↑a) n 14 [] = 0 =?= dontUnfoldMe ?m.750 n 14 [] = 0 ▶
    [Meta.isDefEq] [52116086.000000] ✅️ dontUnfoldMe ?m.750 n 14 [] = 0 =?= dontUnfoldMe (↑a) n 14 [] = 0 ▶

If (as I naïvely would assume) x =?= y and y =?= x are essentially the same problem, then the second unification problem above should be caught by the cache.

Michael Stoll (May 15 2025 at 19:04):

I have now created an issue: lean4#8364 .

Michael Stoll (May 15 2025 at 19:13):

Here is a related weirdness:

@[irreducible] def A : Type := Unit

@[irreducible] def B : Type := Unit

unseal B in
@[coe] def AtoB (_a : A) : B := ()

instance : Coe A B where coe := AtoB

def foo {α : Type} (a b : α) : Prop := sorry

theorem bar {α : Type} (a b : α) : foo a b := sorry

variable (a : A) (b : B)

#check (bar (↑a) b : foo (↑a) b) -- bar (↑a) b : foo (↑a) b

#check (bar (↑a) _ : foo (↑a) b) -- bar a ?m.242 : foo a ?m.242

#check (bar .. : foo (↑a) b) -- bar (↑a) b : foo (↑a) b

Mario Carneiro (May 15 2025 at 19:14):

(could you replace the defs with opaque?)

Mario Carneiro (May 15 2025 at 19:15):

I assume it's not unfolding but it's simpler without the extra moving part

Michael Stoll (May 15 2025 at 19:15):

You mean, opaque foo ...?

Michael Stoll (May 15 2025 at 19:16):

Then the middle one gives an error:

type mismatch
  bar a ?m.259
has type
  foo a ?m.259 : Prop
but is expected to have type
  foo (↑a) b : Prop

Mario Carneiro (May 15 2025 at 19:16):

ah, so the fact that foo is a constant function is relevant

Michael Stoll (May 15 2025 at 19:18):

Replacing the definition sorry by a = b gives the same kind of error (with def).

Michael Stoll (May 15 2025 at 19:19):

Still, I'm surprised by the non-monotonicity: giving no arguments is OK, specifying one is bad, but specifying both is OK again?

Mario Carneiro (May 15 2025 at 19:19):

what about bar _ _?

Michael Stoll (May 15 2025 at 19:19):

(Of course, this comes down to what I consider a bug in how terms ↑a are handled while elaborating...)

Michael Stoll (May 15 2025 at 19:20):

bar _ _ is (I think) by definition the same as bar .. and gives the same result.

Mario Carneiro (May 15 2025 at 19:21):

(it's not guaranteed to always act the same, e.g. apply has a similar relation to refine but it still knows less about the expected type and this can change elaboration order)

Jovan Gerbscheid (May 15 2025 at 21:33):

Michael Stoll said:

@Jovan Gerbscheid Did you check that there is no (significant) speed-up?

Yes, no speedup with my cache fix.

Michael Stoll (May 16 2025 at 09:29):

If the result of x =?= y is in the cache, does it also apply to y =?= x?
I've noticed that in the examples, Lean first tries to unify the types in one direction and then checks that what results matches by unifying in the other direction. So if the cache were "symmetric" in the sense above, your fix would give a speed-up by a factor of 2 (or 3 in cases where unification fails and another attempt is made to coerce one type into the other).

Jovan Gerbscheid (May 16 2025 at 09:43):

The defEq cache entries are sorted, so x =?= y and y =?= x are the same in the cache.

Michael Stoll (May 16 2025 at 09:44):

Then I wonder why you don't see the 2x speed-up...

Jovan Gerbscheid (May 16 2025 at 09:48):

The two are completely separate isDefEq calls. Thus, transient cache (with metavariables) can't be shared between them. But in fact even the normal cache currently isn't shared between different isDefEq calls (which I guess may be fixed some day).

Last updated: Feb 28 2026 at 14:05 UTC