Zulip Chat Archive

Stream: lean4

Topic: What is the point of `Vector`?

Jakub Nowak (Jan 15 2026 at 12:53):

Can someone explain to me, or show examples of, what is the benefit of using docs#Vector over docs#Array?

Instead of using

def foo : Vector α n → Vector α n := sorry

one could use

def foo' : Array α → Array α := sorry

@[simp, grind =]
theorem foo'_size : (foo xs).size = xs.size := by sorry

So I guess the first version is shorter and more concise. On the other hand if the proof of foo'_size is non-trivial, then this proof would have to be inlined inside of foo. IMO the second version, where the proof is separated, would make code more readable. Kinda benefit of Vector is that there's no risk of forgetting some missing verification API in the likeness of foo'_size.

If there's a need for it, one can use { xs : Array α // xs.size = n } instead of Vector α n. In fact, we could probably even replace whole Vector API with API around { xs : Array α // xs.size = n }. Afaik, at least in mathlib, we would prefer unbundled/curried version instead. It seems like the only benefit of Vector is that the name is shorter than subtype. Is Vector useful enough to justify its existence? (why is it useful at all remains a mystery to me)
As an example we use subtype here List.MergeSort.Internal.splitInTwo or here SimpleGraph.fintypeSubtypeWalkLength, without defining List.Vector or some kind of SimpleGraph.WalkWithLength.

I though that maybe Vector has different runtime semantics than Array (e.g. non-dynamic array with linear-time Vector.push), but I checked that it is not the case. Unless there's a plan to potentially change that?

I could imagine the situation where Vector might be hurtful. Someone starts using Vector in their code and it works well for some time. Until they realize they need to use .filter. Now, they either have to enter DTT hell, or refactor existing code to use Array instead of Vector.

One argument for Vector is that it makes easier proving some things about length in monadic context. I am unsure about how big currently is this "easier" factor, I think it's pretty big? But this arguments becomes obsolete if/when better support for proofs in monadic context lands in core (like mvcgen).

Maybe the main benefit of Vector is when Lean is used as a programming language and not theorem prover (which would explain why Vector is in core, but List.Vector isn't)? I still can't exactly imagine why, possibly because I never did much programming with dependant types.

Eric Wieser (Jan 15 2026 at 13:25):

Jakub Nowak said:

If there's a need for it, one can use { xs : Array α // xs.size = n } instead of Vector α n.

So is your question why we are not just using the following?

abbrev Vector (α n) := { xs : Array α // xs.size = n }

Eric Wieser (Jan 15 2026 at 13:26):

Because the answer is probably that a designated Structure means we get docs#Vector.toArray instead of docs#Subtype.val in theorem statements

Jakub Nowak (Jan 15 2026 at 13:39):

This does explain why we prefer structure over subtype, but doesn't explain why is it useful in the first place. As I mentioned, we usually prefer unbundled/curried version, when a better definition is not available.

Your argument is only about naming? This sounds pretty minor to me. I would say that ability to use dot-notation would be better argument.

My question is more about, why someone would want to use bundled version in the first place? And whether it's worthwhile enough to justify maintaining this API.

Eric Wieser (Jan 15 2026 at 13:43):

You often need a bundled version when working with m (Vector α n) for some monad m

Eric Wieser (Jan 15 2026 at 13:45):

mvcgen won't help you if you need the proof to pass into the next step

Théophile (Jan 15 2026 at 13:58):

It seems that the question belongs to the broader debate of doing proofs in the intrinsic style with dependent types (e.g. as in foo) or do proofs in the extrinsic style (e.g. with the pair foo' / foo'_size). I think folklore says that doing proofs in the intrinsic style is risky (as you mentioned it may quickly lead to DTT hell, but note that this is not just an issue with Vector, but with Subtypes in general), with for example with the classic issue of Vector a (n+m) vs Vector a (m+n).

In your example, the implementation of the function foo doesn't need to know that the array has length n, here using Vector is just a convenience to prove foo preserves the length at the same time we define it. In this case I think using foo' / foo'_size is more advisable. However, there are cases where we do need in the implementation of foo to know the length of the array, in this case using Vector (or Subtype or an additional parameter that contains proof of the length) is useful.

Théophile (Jan 15 2026 at 14:02):

Maybe a way to think about it is that Vector is useful in the same scenarios Fin is useful: note that Fin has all the same issues you mentioned with Vector

Jakub Nowak (Jan 15 2026 at 14:12):

Eric Wieser said:

mvcgen won't help you if you need the proof to pass into the next step

I would imagine that a better support for proofs and proof automation in monadic context would let you write something like this (I completely made-up syntax here):

variable {m : Type → Type} [Monad m] [LawfulMonad m]

monadic m
def foo : Array α → Array α := sorry

@[simp]
monadic m
theorem foo_size (xs : Array α) : (← foo xs).size = xs.size := by sorry

monadic m
def next_step_requiring_proof (xs : Array α) (h : xs.size = 42) := sorry

example (xs : Array α) : m Unit := do
  have : xs.size = 42 := by sorry
  let xs' ← foo xs
  next_step_requiring_proof xs' (by simp [xs'])

example (xs : Array α) : m Unit := do
  have : xs.size = 42 := by sorry
  next_step_requiring_proof (← foo xs) (by simp)

Jakub Nowak (Jan 15 2026 at 14:14):

But, I'm not denying, that with the current state of things, Vector can be useful in monadic context.

Jakub Nowak (Jan 15 2026 at 14:17):

Théophile said:

Maybe a way to think about it is that Vector is useful in the same scenarios Fin is useful: note that Fin has all the same issues you mentioned with Vector

In fact, even with Fin we prefer curried version: https://github.com/leanprover/lean4/commit/35bbb48916f75ee4abd1ce2d9af899fb11989f34

Théophile said:

However, there are cases where we do need in the implementation of foo to know the length of the array

Can you provide an example? Or rather, why cannot you use curried version in this case?

Théophile (Jan 15 2026 at 14:28):

Yes, of course you can always curry it. But like with any code, if your functions keep getting the same set of arguments, it is good practice to factorize and bundle them into a structure, therefore if you always have an hypothesis on the length of your array, it is also good practice to bundle it into a structure, which is Vector

Théophile (Jan 15 2026 at 14:30):

In the code I'm writing right now I have something that looks like this (it's a bit more complex but this is a simplified version):

class TupleLike (a b: Type) where
  n: Nat
  f: a → Vector b n
  g: Vector b n → a
  f_g_inv: ∀ x, f (g x) = x
  g_f_inv: ∀ x, g (f x) = x

Here, I believe Vector is useful

Théophile (Jan 15 2026 at 14:32):

Of course I could write it like this:

class TupleLike (a b: Type) where
  n: Nat
  f: a → { x: Array b // x.size = n }
  g: { x: Array b // x.size = n } → a
  f_g_inv: ∀ x, f (g x) = x
  g_f_inv: ∀ x, g (f x) = x

but is this an improvement?

Jakub Nowak (Jan 15 2026 at 14:39):

Why not:

class TupleLike (a b : Type) where
  f: a → Array b
  g: Array b → a
  f_g_inv: ∀ x, f (g x) = x
  g_f_inv: ∀ x, g (f x) = x

Théophile (Jan 15 2026 at 14:40):

Now I have trouble defining g for the instance TupleLike (a × a × a) a

Théophile (Jan 15 2026 at 14:41):

For example, what if the input array is empty and a = Empty? Such a function does not exists

Jakub Nowak (Jan 15 2026 at 14:48):

I see, the unbudled version indeed looks less readable

class TupleLike (a b : Type) where
  n : Nat
  f: a → Array b
  hf : ∀ x, (f x).size = n
  g: (xs : Array b) → (_ : xs.size = n) → a
  f_g_inv: ∀ xs, (h : xs.size = n) → f (g xs h) = xs
  g_f_inv: ∀ x, g (f x) (hf x) = x

I like this example.

I would still ask for some real-life code sample that uses Vector to get more reference though.

pandaman (Jan 15 2026 at 14:57):

Usually I like using Array, but I found that Vector is nicer when writing algorithms that involve only size-preserving primitives.
A typical example is sorting algorithms. Many sorting algorithms use only swapping, and since it returns Vector α n for the same n, I could reuse in-bounds proofs without reasoning about the size after swapping.
Another example is a data structure that uses fixed size arrays and only getElem/set.
https://github.com/pandaman64/lean-regex/blob/main/regex/Regex/Data/SparseSet/Basic.lean

Théophile (Jan 15 2026 at 15:06):

(note that this example is real-world, it's stemming from the code I'm writing this week)

another time I had a use for Vector was when I designed a framework for verified binary format parsers / serializers (link to the paper, see message_format_for in section 3.1)

In here, to construct MessageFormat (Array a), we do as follows (this is slightly simplified though):

construct MessageFormat (Vector a 0) from MessageFormat Unit
construct MessageFormat a → MessageFormat (Vector a n) → MessageFormat (Vector a (n+1))
by induction with the two above, construct MessageFormat a → (n: Nat) → MessageFormat (Vector a n)
then construct MessageFormat a → MessageFormat Nat → ((n: Nat) → MessageFormat (Vector a n)) → MessageFormat (Array a)
obtain MessageFormat a → MessageFormat (Array a) this way (we assume MessageFormat Nat exists somewhere)

This relies on the fact that Array a is isomorphic to the dependent pair n: Nat × Vector a n

It is of course possible to write it using Subtype (here currying doesn't work because we use it inside MessageFormat), but it would be much more ugly.

Jakub Nowak (Jan 15 2026 at 15:22):

Thanks, I'm convinced that it's not always possible/preferred to use curried version. I realized that I also had problems with curry/uncurry when programming in Haskell. So basically the reason to have Vector is that it's easier to use and shorter than Subtype, and this specific Subtype is somewhat common?

Jakub Nowak (Jan 15 2026 at 15:27):

Hm, it would be nice if Lean had some kind of syntax and special support, that would let us write Vector α _ to mean something like (n : Nat) × Vector α n. And if we additionally could make this second argument default to _, then we would use Vector α instead of Array α which would result in unified API.

Last updated: Feb 28 2026 at 14:05 UTC