Zulip Chat Archive
Stream: lean4
Topic: Coercion and unification
Michael Stoll (May 11 2025 at 18:02):
(See and the other thread linked there for context. I'm moving the discussion here since in the end it is about Lean's unification algorithm.)
Consider the following example (which uses Mathlib; this is how I came across it while trying to formalize some innoccuous proof, but the underlying problem is not Mathlib-specific):
[EDIT: fixed a typo]
import Mathlib.Data.Complex.Norm
variable (x : β) (z : β)
#count_heartbeats in -- Used 391 heartbeats
-- Try this: exact norm_add_le z βx
example : βz + xβ β€ βzβ + β(x : β)β := by exact?
#count_heartbeats in -- Used 326941 heartbeats
-- Try this: exact norm_add_le (βx) z
example : βx + zβ β€ β(x : β)β + βzβ := by exact?
#count_heartbeats in -- Used 245 heartbeats
#check (norm_add_le z (βx) : βz + xβ β€ βzβ + β(x : β)β)
#count_heartbeats in -- Used 1149716 heartbeats
#check (norm_add_le (βx) z : βx + zβ β€ β(x : β)β + βzβ)
What is going on here?
In my (admittedly limited) understanding gained from staring at traces of similar examples and the comments by @Jovan Gerbscheid in the other thread, the situation is this.
- In the first
#check, whennorm_add_le z (βx)is elaborated (with expected type as given) and we get toβx, the expected type of that isComplexbecause the type of the arguments innorm_add_lehas already been fixed by the first argumentz. This causes Lean to look for a coercion fromRealtoComplex, which it duly finds, and from there, everything proceeds smoothly. - In the second
#checkwe look atβxbefore we know its expected type (which is soon after fixed by looking atz). This has the effect that when Lean tries to unify the type ofnorm_add_le (βx) zwith the given type,βxhas been replaced by an unassignable metavariable?m.abc. At some point, we try to unifyββxβ, which internally isβ?m.abcβ, withβComplex.ofReal xβ. Since?m.abcis unassignable, this is bound to fail, but before it does so, Lean tries very hard to succeed by successively unfolding the complex norm (which involves docs#Real.sqrt, which in turn is defined in terms of docs#NNReal.sqrt, which involves docs#NNReal.powOrderIso etc. etc.) until the bitter end, which is reached after a few 10β΅ heartbeats. And then it does the same thing again with switched sides. And then it tries to find a coercion of the type ofnorm_add_le ...to the given type, which involves going down the same rabbit holes another time. Only after all of this does it go back and elaborateβxagain with its expected typeComplex, which finally leads to success.
So my question is, Can this inefficiency be avoided?
A naΓ―ve suggestion: when a unification problem involves an unassignable metavariable that stands for a coerced term and has known expected type, first elaborate the original coerced term with this expected type to fix the coercion and thus get rid of the metavariable. Or perhaps even do the latter as soon as the expected type becomes known. (So in the second #check above, this would occur after we have processed z in norm_add_le (βx) z, because at that point the metavariable that stands for the type of βx has been assigned to Complex, and then it would work essentially like in the first #check.)
Michael Stoll (May 11 2025 at 19:59):
That unfolding the norm is related to the huge timing differences can be seen by sealing docs#Real.sqrt in the example:
import Mathlib.Data.Complex.Norm
variable (x : β) (z : β)
seal Real.sqrt
#count_heartbeats in -- Used 246 heartbeats
-- Try this: exact norm_add_le z βx
example : βz + xβ β€ βzβ + β(x : β)β := by exact?
#count_heartbeats in -- Used 689 heartbeats
-- Try this: exact norm_add_le (βx) z
example : βx + zβ β€ β(x : β)β + βzβ := by exact?
#count_heartbeats in -- Used 144 heartbeats
#check (norm_add_le z (βx) : βz + xβ β€ βzβ + β(x : β)β)
#count_heartbeats in -- Used 1338 heartbeats
#check (norm_add_le (βx) z : βx + zβ β€ β(x : β)β + βzβ)
(The first heartbeats count is higher the first time the text is processed in the web editor; this is probably because the cache for exact?/apply? needs to be initialized.)
Michael Stoll (May 12 2025 at 10:03):
Here is a Mathlib-free (except for importing #count_heartbeats) example:
import Mathlib.Util.CountHeartbeats -- just for counting heartbeats
@[irreducible] def A : Type := Unit
@[irreducible] def B : Type := Unit
unseal B in
@[coe] def AtoB (_a : A) : B := ()
instance : Coe A B where coe := AtoB
def h {Ξ± : Type} (a b : Ξ±) : Nat β Ξ±
| 0 => a
| n + 1 => h b a n
def f {Ξ± : Type} (a b : Ξ±) : Nat β Prop
| 0 => a = b
| n + 1 => f (h a b n) (h b a n) n β§ f (h a a n) (h b b n) n
def p : Nat := 6 -- last `#check` takes time exponential in `p`
theorem foo {Ξ± : Type} (a b : Ξ±) : f a b p := sorry
variable (x : A) (y : B)
#count_heartbeats in -- Used 36 heartbeats
#check (foo (x : B) y : f (AtoB x) y p)
#count_heartbeats in -- Used 31 heartbeats
#check foo (βx) y
#count_heartbeats in -- Used 35 heartbeats
#check (foo y (βx) : f y (AtoB x) p)
#count_heartbeats in -- Used 143388 heartbeats
#check (foo (βx) y : f (AtoB x) y p)
This simulates a complicated-to-unfold definition (with quick whnf); this is essentially the same situation as with the norm on the complex numbers in the original example. (The last number of heartbeats varies slightly between the web editor and my own machine, possibly because of slightly different lean versions.)
Michael Stoll (May 14 2025 at 11:34):
There is a similar phenomenon in lean4#2831, but there it shows up with a * (βb).
Michael Stoll (May 14 2025 at 11:34):
Note also
#count_heartbeats in -- Used 35 heartbeats
#check ((foo (βx) y :) : f (AtoB x) y p)
(in the same context as above).
Jovan Gerbscheid (May 14 2025 at 12:02):
The problem in lean#2831 isn't solved by my fix for unification slowness. Here, the exponentially many unfication sub-problems are really all different, so this can't be fixed with better caching.
Michael Stoll (May 14 2025 at 13:09):
It would be interesting to try this with the "real life" Mathlib example involving complex norms. (I should also be able to modify the Mathlib-free example to genuinely require exponentially many sub-problems.)
Jovan Gerbscheid (May 14 2025 at 14:37):
Here are the heartbeat counts I get on the original example with Mathlib
import Mathlib.Data.Complex.Norm
variable (x : β) (z : β)
#count_heartbeats in -- Used 394 heartbeats
-- Try this: exact norm_add_le z βx
example : βz + xβ β€ βzβ + β(x : β)β := by exact?
#count_heartbeats in -- Used 598 heartbeats
-- Try this: exact norm_add_le (βx) z
example : βx + zβ β€ β(x : β)β + βzβ := by exact?
#count_heartbeats in -- Used 246 heartbeats
#check (norm_add_le z (βx) : βz + xβ β€ βzβ + β(x : β)β)
#count_heartbeats in -- Used 662 heartbeats
#check (norm_add_le (βx) z : βx + zβ β€ β(x : β)β + βzβ)
Michael Stoll (May 14 2025 at 14:38):
OK, thanks!
This may indicate that your proposed fix to the caching may go a long way to mitigate the problem in practice.
Michael Stoll (May 14 2025 at 15:21):
@Mario Carneiro @Kyle Miller Should I extend lean4#2831 to include this problem or make a new issue that links to lean4#2831?
Michael Stoll (May 14 2025 at 15:35):
Here is an example that builds on the one in lean4#2831 (Mathlib-free except for #count_heartbeats ). This will not significantly sped up by @Jovan Gerbscheid's suggested fix to caching in isDefEq, I assume.
import Mathlib.Util.CountHeartbeats -- just for counting heartbeats
def dontUnfoldMe {Ξ± : Type} (x y : Ξ±) : Nat β List Bool β Nat
| 0, _ => 0
| n+1, l => dontUnfoldMe x y n (true::l) + dontUnfoldMe x y n (false::l)
theorem dontUnfoldMe_eq {Ξ± : Type} (x y : Ξ±) : β n l, dontUnfoldMe x y n l = 0
| 0, _ => rfl
| n+1, l => by rw [dontUnfoldMe, dontUnfoldMe_eq x y n, dontUnfoldMe_eq x y n]
#count_heartbeats in -- Used 40 heartbeats
example (a : Int) (n : Nat) : dontUnfoldMe a n 14 [] = 0 :=
dontUnfoldMe_eq a n 14 [] -- fast
#count_heartbeats in -- 12 --> 20776, 13 --> 41567, 14 --> 83212 heartbeats
example (a : Nat) (n : Int) : dontUnfoldMe (βa) n 12 [] = 0 :=
dontUnfoldMe_eq (βa) n 12 [] -- slow (~ 2^m)
#count_heartbeats in -- 40
example (a : Nat) (n : Int) : dontUnfoldMe (βa) n 14 [] = 0 :=
(dontUnfoldMe_eq (βa) n 14 [] :) -- fast
#count_heartbeats in -- Used 28 heartbeats
example (a : Nat) (n : Int) : dontUnfoldMe (βa) n 14 [] = 0 :=
dontUnfoldMe_eq .. -- fast
#count_heartbeats in -- 5 --> 617, 6 --> 1660, 10 --> 126204
example (a : Nat) (n : Int) : dontUnfoldMe (βa) n 10 [] = 0 :=
dontUnfoldMe_eq (βa) n .. -- slower (~ 3^m)
#count_heartbeats in -- 29
example (a : Nat) (n : Int) : dontUnfoldMe (βa) n 14 [] = 0 :=
dontUnfoldMe_eq _ n .. -- fast
#count_heartbeats in -- 38
example (a : Nat) (n : Int) : dontUnfoldMe (βa) n 5 [] = 0 :=
dontUnfoldMe_eq (Ξ± := Int) a n .. -- fast
(Heartbeat counts in the web editor vary slightly.)
Michael Stoll (May 15 2025 at 15:06):
@Jovan Gerbscheid Did you check that there is no (significant) speed-up?
In the trace (example slightly varied by using types A and B in place of Nat and Int), I see
[Meta.isDefEq] [54943086.000000] β
οΈ dontUnfoldMe (βa) n 14 [] = 0 =?= dontUnfoldMe ?m.750 n 14 [] = 0 βΆ
[Meta.isDefEq] [52116086.000000] β
οΈ dontUnfoldMe ?m.750 n 14 [] = 0 =?= dontUnfoldMe (βa) n 14 [] = 0 βΆ
If (as I naΓ―vely would assume) x =?= y and y =?= x are essentially the same problem, then the second unification problem above should be caught by the cache.
Michael Stoll (May 15 2025 at 19:04):
I have now created an issue: lean4#8364 .
Michael Stoll (May 15 2025 at 19:13):
Here is a related weirdness:
@[irreducible] def A : Type := Unit
@[irreducible] def B : Type := Unit
unseal B in
@[coe] def AtoB (_a : A) : B := ()
instance : Coe A B where coe := AtoB
def foo {Ξ± : Type} (a b : Ξ±) : Prop := sorry
theorem bar {Ξ± : Type} (a b : Ξ±) : foo a b := sorry
variable (a : A) (b : B)
#check (bar (βa) b : foo (βa) b) -- bar (βa) b : foo (βa) b
#check (bar (βa) _ : foo (βa) b) -- bar a ?m.242 : foo a ?m.242
#check (bar .. : foo (βa) b) -- bar (βa) b : foo (βa) b
Mario Carneiro (May 15 2025 at 19:14):
(could you replace the defs with opaque?)
Mario Carneiro (May 15 2025 at 19:15):
I assume it's not unfolding but it's simpler without the extra moving part
Michael Stoll (May 15 2025 at 19:15):
You mean, opaque foo ...?
Michael Stoll (May 15 2025 at 19:16):
Then the middle one gives an error:
type mismatch
bar a ?m.259
has type
foo a ?m.259 : Prop
but is expected to have type
foo (βa) b : Prop
Mario Carneiro (May 15 2025 at 19:16):
ah, so the fact that foo is a constant function is relevant
Michael Stoll (May 15 2025 at 19:18):
Replacing the definition sorry by a = b gives the same kind of error (with def).
Michael Stoll (May 15 2025 at 19:19):
Still, I'm surprised by the non-monotonicity: giving no arguments is OK, specifying one is bad, but specifying both is OK again?
Mario Carneiro (May 15 2025 at 19:19):
what about bar _ _?
Michael Stoll (May 15 2025 at 19:19):
(Of course, this comes down to what I consider a bug in how terms βa are handled while elaborating...)
Michael Stoll (May 15 2025 at 19:20):
bar _ _ is (I think) by definition the same as bar .. and gives the same result.
Mario Carneiro (May 15 2025 at 19:21):
(it's not guaranteed to always act the same, e.g. apply has a similar relation to refine but it still knows less about the expected type and this can change elaboration order)
Jovan Gerbscheid (May 15 2025 at 21:33):
Michael Stoll said:
@Jovan Gerbscheid Did you check that there is no (significant) speed-up?
Yes, no speedup with my cache fix.
Michael Stoll (May 16 2025 at 09:29):
If the result of x =?= y is in the cache, does it also apply to y =?= x?
I've noticed that in the examples, Lean first tries to unify the types in one direction and then checks that what results matches by unifying in the other direction. So if the cache were "symmetric" in the sense above, your fix would give a speed-up by a factor of 2 (or 3 in cases where unification fails and another attempt is made to coerce one type into the other).
Jovan Gerbscheid (May 16 2025 at 09:43):
The defEq cache entries are sorted, so x =?= y and y =?= x are the same in the cache.
Michael Stoll (May 16 2025 at 09:44):
Then I wonder why you don't see the 2x speed-up...
Jovan Gerbscheid (May 16 2025 at 09:48):
The two are completely separate isDefEq calls. Thus, transient cache (with metavariables) can't be shared between them. But in fact even the normal cache currently isn't shared between different isDefEq calls (which I guess may be fixed some day).
Last updated: Dec 20 2025 at 21:32 UTC