Zulip Chat Archive

Stream: mathlib4

Topic: eval matrix notation for the end entries is really slow

Bhavik Mehta (Feb 13 2026 at 14:45):

Here's my example:

import Mathlib

def myVec : Fin 30 → Nat :=
  ![0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
    10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
    20, 21, 22, 23, 24, 25, 26, 27, 28, 29]

set_option trace.profiler true

#eval myVec 24

On the playground, 24 took 10ish seconds for me. I did the same thing for the other entries, and here's a plot of the results.
image.png
That's nearly 4 minutes to evaluate myVec 29. Note that reducing these is relatively fast. Now this notation presumably wasn't designed for good evaluation, but this seems exponentially bad.

Eric Wieser (Feb 13 2026 at 15:59):

This seems surprising, since ultimately this should compile to a linear chain of ifs

Eric Wieser (Feb 13 2026 at 16:19):

Does putting inline or specialize on src#Matrix.vecCons help

Eric Wieser (Feb 13 2026 at 16:25):

There is a -- FIXME: Performance review in core for Fin.induction

Henrik Böving (Feb 13 2026 at 16:57):

After a bit of simplification we arrive at:

        let _x.32 := @Matrix.vecEmpty _;
        let _x.33 := @Matrix.vecCons _ _x.3 _x.2 _x.32;
        let _x.34 := @Matrix.vecCons _ _x.5 _x.4 _x.33;
        let _x.35 := @Matrix.vecCons _ _x.7 _x.6 _x.34;
        let _x.36 := @Matrix.vecCons _ _x.9 _x.8 _x.35;
        let _x.37 := @Matrix.vecCons _ _x.11 _x.10 _x.36;
        let _x.38 := @Matrix.vecCons _ _x.13 _x.12 _x.37;
        let _x.39 := @Matrix.vecCons _ _x.15 _x.14 _x.38;
        let _x.40 := @Matrix.vecCons _ _x.17 _x.16 _x.39;
        let _x.41 := @Matrix.vecCons _ _x.19 _x.18 _x.40;
        let _x.42 := @Matrix.vecCons _ _x.21 _x.20 _x.41;
        let _x.43 := @Matrix.vecCons _ _x.23 _x.22 _x.42;
        let _x.44 := @Matrix.vecCons _ _x.25 _x.24 _x.43;
        let _x.45 := @Matrix.vecCons _ _x.27 _x.26 _x.44;
        let _x.46 := @Matrix.vecCons _ _x.29 _x.28 _x.45;
        let _x.47 := @Matrix.vecCons _ _x.31 _x.30 _x.46;
        let _x.48 := @Matrix.vecCons _ _x.30 _x.31 _x.47;
        let _x.49 := @Matrix.vecCons _ _x.28 _x.29 _x.48;
        let _x.50 := @Matrix.vecCons _ _x.26 _x.27 _x.49;
        let _x.51 := @Matrix.vecCons _ _x.24 _x.25 _x.50;
        let _x.52 := @Matrix.vecCons _ _x.22 _x.23 _x.51;
        let _x.53 := @Matrix.vecCons _ _x.20 _x.21 _x.52;
        let _x.54 := @Matrix.vecCons _ _x.18 _x.19 _x.53;
        let _x.55 := @Matrix.vecCons _ _x.16 _x.17 _x.54;
        let _x.56 := @Matrix.vecCons _ _x.14 _x.15 _x.55;
        let _x.57 := @Matrix.vecCons _ _x.12 _x.13 _x.56;
        let _x.58 := @Matrix.vecCons _ _x.10 _x.11 _x.57;
        let _x.59 := @Matrix.vecCons _ _x.8 _x.9 _x.58;
        let _x.60 := @Matrix.vecCons _ _x.6 _x.7 _x.59;
        let _x.61 := @Matrix.vecCons _ _x.4 _x.5 _x.60;
        let _x.62 := @Fin.cases _x.2 _ _x.3 _x.61 a.1;

And under the hood each of these Matrix.vecCons is going to allocate a closure that calls Fin.cases as well. Now Fin.cases is implemented in terms of Fin.induction which is implemented like an ordinary recursor. This means Fin.cases will traverse down along the entire successor chain every time it is called despite only wanting to know if the value is a succ or zero.

What happens now is that due to this funny closuring etc. each of these steps spawns in turn a separate computation that also traverses down the remaining successor chain in the same tree fashion, so we get an exponential computation tree.
If Fin.cases was defined as just a pattern match on the Fin instead of with Fin.induction this would basically just work.

TLDR: Don't compile your recursors or you end up in situations like this :upside_down:

Eric Wieser (Feb 13 2026 at 17:00):

But docs#Fin.induction is just defined as a pattern match?

Eric Wieser (Feb 13 2026 at 17:03):

I tried making cases and vecCons always_inline, but I wasn't sure if that was better than macro_inline or specialize

Henrik Böving (Feb 13 2026 at 17:04):

Yes, it is defined as the pattern match you would get when compiling the recursor for Fin using the mathlib infrastructure. And that pattern match is at fault.

Fin.cases is

induction zero fun i _ => succ i

and the author here expected that the _ argument, which provides the result of calling the induction recursively on i, would not be computed. But due to the fact that induction is defined as:

@[elab_as_elim, expose] def induction {motive : Fin (n + 1) → Sort _} (zero : motive 0)
    (succ : ∀ i : Fin n, motive (castSucc i) → motive i.succ) :
    ∀ i : Fin (n + 1), motive i
  | ⟨i, hi⟩ => go i hi
where
  -- Use a curried function so that this is structurally recursive
  go : ∀ (i : Nat) (hi : i < n + 1), motive ⟨i, hi⟩
  | 0, hi => by rwa [Fin.mk_zero]
  | i+1, hi => succ ⟨i, Nat.lt_of_succ_lt_succ hi⟩ (go i (Nat.lt_of_succ_lt hi))

where the result of go is always used so doing the recursion cannot be optimized away unless you specialize go for the specific succ function. But that is just pointless code blowup really. Instead what should be done is to define cases as just a pattern match and it will work.

Eric Wieser (Feb 13 2026 at 17:07):

unless you specialize go for the specific succ function. But that is just pointless code blowup really

Is it? Is there an situation in which we don't want to specialize this recursor?

Eric Wieser (Feb 13 2026 at 17:07):

I agree that this is overkill vs a pattern match, but it seems like independently a reasonable idea.

Eric Wieser (Feb 13 2026 at 17:08):

As an aside, I suspect it would be better not to redefine cases (which would break defeqs), and instead set up a csimp lemma that expands it to the version defined with a match

Henrik Böving (Feb 13 2026 at 17:14):

Eric Wieser said:

unless you specialize go for the specific succ function. But that is just pointless code blowup really

Is it? Is there an situation in which we don't want to specialize this recursor?

I think if you want to do it induction and cases should be marked inline and go as specialize, then it would generate somewhat reasonable code.

Henrik Böving (Feb 13 2026 at 17:23):

Note that this does of course rely on optimization to discover that the recursive argument is unused. So while it's going to work out pretty well for Fin.cases specialization won't fix all explosive situations for Fin.induction

Miyahara Kō (Feb 13 2026 at 17:41):

Drawing from what I learned through my experience with #20519, #20538 & #12610, let me say that function types (e.g. Fin n → α, Matrix, Equiv.Perm), unlike inductive types (e.g. List), are fundamentally not suited for computations beyond a certain level of complexity.

The reason is that inductive types get "reduced" during computation, whereas function types do not.

Miyahara Kō (Feb 13 2026 at 17:44):

To give a concrete example, once the n-th value of l : List has been computed, no further computation is required to retrieve the n-th element again.

Miyahara Kō (Feb 13 2026 at 17:46):

However, with v : Fin m → α, even if the n-th value has been computed once, whenever that n-th value is needed again, the same computation generally has to be performed all over again.

Miyahara Kō (Feb 13 2026 at 17:55):

This phenomenon leads to an exponential increase in computational cost.
For example, suppose each component of v₁ : Fin n → α is computed based on all the components of v₀ : Fin n → α. Then every time a component of v₁ is called, the components of v₀ are called n times.
Next, if each component of v₂ : Fin n → α is similarly computed based on all the components of v₁ : Fin n → α, then every time a component of v₂ is called, v₁ is called n times, and v₀ is called n² times. The rest, as you can imagine, follows accordingly. (cont.)

Miyahara Kō (Feb 13 2026 at 18:08):

For what it's worth, this phenomenon can be resolved with the following workarounds.

After computing v₁ from v₀, save the result to a List or similar structure by setting v₁' := (Vector.ofFn v₁).get
Devise the algorithm so that when computing the components of v₄, instead of directly calling the components of v₃, v₂, v₁, it calls the components of v₀ directly (which is what I did in #20519, #20538 & #12610)

Miyahara Kō (Feb 13 2026 at 18:21):

However, dealing with this problem will undoubtedly introduce extra considerations during computation and make definitions more complex. This also does not align with the spirit of Mathlib, which determines the appropriateness of definitions based on how mathematically tractable they are, without considering computational efficiency. Therefore, I would generally recommend against using function types for data types that require moderately complex computations involving concrete values.
Incidentally, I personally refer to this problem as the "function closure issue".

Miyahara Kō (Feb 13 2026 at 18:23):

(These replies were translated from Japanese by Claude. Please bear with us if there are any awkward translations.)

Yury G. Kudryashov (Feb 14 2026 at 00:12):

If we want to be computationally effective, then we may want to add FinMatrix storing its values in a Vector of Vectors or in a single Vector of size m * n.

Yury G. Kudryashov (Feb 14 2026 at 00:13):

(or replace Matrix with it, if we don't care about other index types - I rarely use this part of the library, so I don't know)

Julian Berman (Feb 14 2026 at 00:18):

At NYC Lean I think we were pleasantly surprised when it finally dawned on us why Matrix allowed for other index types, it made our proof of Graham-Pollak easier to have matrices indexed by our arbitrary vertex set V.

Yury G. Kudryashov (Feb 14 2026 at 00:19):

:man_facepalming: Of course, we need arbitrary index types for the incidence matrix of a graph.

Yury G. Kudryashov (Feb 14 2026 at 00:21):

Another option is to make ![a, b] a notation for Vector.get #[a, b], though this may lead to a different kind of a natural API.

Last updated: Feb 28 2026 at 14:05 UTC