Zulip Chat Archive

Stream: new members

Topic: The Novice vs Indexed Types in Lean: A Short Story

Middle Adjunction (Oct 22 2024 at 19:54):

Vectors, lists indexed by their size, are the "Hello World" of dependently-typed functional programming. Defining vectors of natural numbers as an indexed inductive type was straightforward for the novice:

inductive NatVec : Nat → Type where
| nil  : NatVec 0
| cons : Nat → NatVec n → NatVec (1 + n)
open NatVec

One of the basic operations on (indexed) lists is appending. The novice's first attempt was to define append like this:

def append (xs : NatVec n) (ys : NatVec m) : NatVec (n + m) :=
  match xs with
  | nil        => by
    simp [ Nat.zero_add ]
    exact ys
  | cons x xs' => by
    simp [ Nat.add_assoc ]
    exact cons x (append xs' ys)

However, the novice was soon informed that tactics should not be used to define executable code like append. So, the novice tried again:

def append' (xs : NatVec n) (ys : NatVec m) : NatVec (n + m) :=
  match xs with
  | nil =>
    have p : m = 0 + m := by simp [ Nat.zero_add ]
    p ▸ ys
  | cons (n := n') x xs' =>
    have p : 1 + (n' + m) = (1 + n') + m := by simp [ Nat.add_assoc ]
    p ▸ (cons x (append xs' ys))

While pleased with the accomplishment, the novice’s friends shared their Agda equivalent:

_++_ : Vec A m → Vec A n → Vec A (m + n)
[]       ++ ys = ys
(x ∷ xs) ++ ys = x ∷ (xs ++ ys)

The novice reminded the friends that Lean’s verbosity was the price of its fine control over unfolding and stepwise reasoning. The novice also pointed out with some better encoding of the datatype, like swapping 1 + n to n + 1 in cons, the issue goes away and the issue is not particularly unique to Lean. However, after writing a few more definitions involving vectors, the novice’s code began to feel burdensome:

tactics required explicit type annotations,
these type annotations needed naming implicit types, and
worst of all, the proofs were trivial facts about natural numbers, unrelated to the "theory of vectors".

So, the novice tried tagging every relevant definition with @[inline] but failed (e.g., no inline on recursives); attempted using coercions and failed; and of course failed to use quotients to work "up to" certain equivalences of natural numbers.

Dear experienced Lean users,
all the novice wanted was to write code where certain equalities definitionally hold. What do you suggest?

Regards,
The Novice

Edward van de Meent (Oct 22 2024 at 20:05):

i'm not sure it matters a lot, but it might help down the line if you use n+1 rather than 1+n, because i believe lean recognises this as a common way to express n.succ.

Kyle Miller (Oct 22 2024 at 20:07):

This is entirely due to Agda using a different argument to do recursion on in the definition of Nat.add. This works for example:

inductive NatVec : Nat → Type where
| nil  : NatVec 0
| cons : Nat → NatVec n → NatVec (n + 1)
open NatVec

def Nat.add' : Nat → Nat → Nat
  | 0, n => n
  | m + 1, n => Nat.add' m n + 1

def append' (xs : NatVec n) (ys : NatVec m) : NatVec (Nat.add' n m) :=
  match xs with
  | nil => ys
  | cons (n := n') x xs' => cons x (append' xs' ys)

Kyle Miller (Oct 22 2024 at 20:09):

Actually I'm confused, https://agda.github.io/agda-stdlib/v2.1.1/Data.Nat.Base.html#5411 suggests that the argument-swapped addition has special syntax.

Kyle Miller (Oct 22 2024 at 20:09):

In any case, this works, assuming the 1 + n to n + 1 fix:

def append' (xs : NatVec n) (ys : NatVec m) : NatVec (m + n) :=
  match xs with
  | nil => ys
  | cons (n := n') x xs' => cons x (append' xs' ys)

Edward van de Meent (Oct 22 2024 at 20:11):

ah, the induction for (. + .) is on the right argument, while for append, it's on the left argument...

Kyle Miller (Oct 22 2024 at 20:12):

Yeah, Agda has exactly the same issues as Lean when it comes to definitional equalities and addition. It's just about either being clever about what you induct/recurse on, or deciding to be OK with doing rewrites

Middle Adjunction (Oct 22 2024 at 20:16):

The key issue, the novice wants to stress, is that in Agda, unlike Lean, definitions are unfolded by default. Notice recursive functions should not even be tagged with @[inline] in Lean. Moreover, there{#-REWRITE ...#-} can save the day. Of course, in simpler cases 1 + n vs n + 1 helps.

Kyle Miller (Oct 22 2024 at 20:18):

This isn't the issue, the definitions are in fact unfolded. It's just not defeq.

Kyle Miller (Oct 22 2024 at 20:19):

Unless REWRITE can automatically insert that 1 + n = n + 1? That's not something Lean can do.

Kyle Miller (Oct 22 2024 at 20:20):

(@[inline] isn't what marks things for being more unfoldable, it's @[reducible]. The @[inline] attribute marks things for the compiler to inline as an optimization. Still, the default for definitions is semireducible, which unfolds automatically in defeq checks.)

Edward van de Meent (Oct 22 2024 at 20:20):

looking at the way this is done in mathlib, i guess hijacking List for Vector a n := {l:List a // l.length = n} does simplify all this because in lean, proofs of the same thing are equal...

Edward van de Meent (Oct 22 2024 at 20:20):

it just requires you have the api for List.length

Edward van de Meent (Oct 22 2024 at 20:21):

for reference, Mathlib.Vector

Kyle Miller (Oct 22 2024 at 20:23):

Here's the Agda page on REWRITE: https://agda.readthedocs.io/en/latest/language/rewriting.html

Presumably this is why the Agda _++_ goes through?

Notice @middle adjunction that I wrote append' without any rewriting and it worked — I'm just pointing this out again to stress that Lean does unfold definitions. It just fails if in the end the indices aren't defeq.

Kyle Miller (Oct 22 2024 at 20:24):

It would be neat if Lean had this feature, but I'm not sure if or when that would happen...

Middle Adjunction (Oct 22 2024 at 20:25):

Kyle Miller said:

(@[inline] isn't what marks things for being more unfoldable, it's @[reducible]. The @[inline] attribute marks things for the compiler to inline as an optimization. Still, the default for definitions is semireducible, which unfolds automatically in defeq checks.)

aha, if recursive functions can be labelled as @[reducible], then some of the needed definitional equalities are restored.
I don't know what "defeq" is btw.

Kyle Miller (Oct 22 2024 at 20:25):

"defeq" is short for "definitional equality"

Kyle Miller (Oct 22 2024 at 20:26):

as in, "X and Y are defeq" if "Lean can find a sequence of definitional equalities between X and Y"

Kyle Miller (Oct 22 2024 at 20:26):

You shouldn't need to label anything as @[reducible] to make defeq succeed here. Do you have an example?

Kyle Miller (Oct 22 2024 at 20:27):

middle adjunction said:

and of course failed to use quotients to work "up to" certain equivalences of natural numbers.

By the way, quotients do not add definitional equalities to a type. It just makes it possible to propositionally prove that more things are equal.

Middle Adjunction (Oct 22 2024 at 20:28):

Kyle Miller said:

Do you have an example?

I'm sure I posted a few examples on the Discord server. I will try to come up with some good examples.

Kyle Miller (Oct 22 2024 at 20:30):

To stress this more, since you said that Lean has "fine control over unfolding and stepwise reasoning", Lean is perfectly happy to unfold definitions when checking definitional equality:

def Nat.myAdd (a b : Nat) : Nat :=
  match a, b with
  | 0, b => b
  | succ a', b => succ (myAdd a' b)

example : Nat.myAdd 10 20 = 30 := rfl

It's like Agda here.

Middle Adjunction (Oct 22 2024 at 20:31):

Kyle Miller said:

By the way, quotients do not add definitional equalities to a type. It just makes it possible to propositionally prove that more things are equal

Yep, it was a desperate attempt to simulate working "up to" certain equivalence

Kyle Miller (Oct 22 2024 at 20:32):

Here we go, the definition of _+_ in Agda: it recurses on the first argument, not the second like in Lean.

Agda: https://agda.github.io/agda-stdlib/v2.1.1/Agda.Builtin.Nat.html#336
Lean: https://github.com/leanprover/lean4/blob/eddbdd77b85c44728e0a749fd224d3ce31770976/src/Init/Prelude.lean#L1538-L1540

Kyle Miller (Oct 22 2024 at 20:33):

It's not doing anything fancy (no REWRITE pragmas), it's just using a + whose definition helps you with the "hello world" of dependently typed functional programming. (In practice, we find the type that Edward mentioned to be better to work with.)

Middle Adjunction (Oct 22 2024 at 20:58):

Just to summarise

no Agda does not need and use REWRITE pragmas for this
such REWRITES are considered unsafe (but I did use them before to utilise equalities like 0 + n = n as rewrites)
after what you mentioned (e.g., semireducible), I am not actually sure how to compare Agda's unfolding with Lean's
now I understand at least @[reducible] is the right annotation (not @[inline]), so I can play with it
Agda-style unfolding or not, n + 1 or 1 + n, the core issue, in both Agda (minus rewrite pragmas) and Lean remains, no? In more complex cases (examples pending) where equality on indices is not as simple (e.g., can be resolved with 1 + n) the code gets quickly burdensome, so much that I actually prefer append over append'.

Kyle Miller (Oct 22 2024 at 21:02):

One of the design patterns you can use is writing an index casting function

def NatVec.cast (h : n = m) (xs : NatVec n) : NatVec m :=
  h ▸ xs
-- Basic API:
@[simp] theorem NatVec.cast_rfl (xs : NatVec n) : xs.cast rfl = xs := rfl
@[simp] theorem NatVec.cast_trans (xs : NatVec n) (h : n = n') (h' : n' = n'') :
    (xs.cast h).cast h' = xs.cast (h.trans h') := by
  cases h; cases h'; rfl

This is better than using the triangle, since you can make algebraic rules for cast.

In both Lean and Agda, you'd need something like this to express that append is associative.

Kyle Miller (Oct 22 2024 at 21:06):

More or less, what reducible is for is controlling what tactics and typeclasses will unfold when doing pattern matching. It shouldn't affect what's defeq to what in basic cases.

Kyle Miller (Oct 22 2024 at 21:10):

Example:

def append (xs : NatVec n) (ys : NatVec m) : NatVec (n + m) :=
  match xs with
  | nil => ys.cast (by omega)
  | cons x xs' => cons x (append xs' ys) |>.cast (by omega)

Middle Adjunction (Oct 22 2024 at 21:10):

Ok, this setup seems pretty promising! Especially since it appears that:

there is one cast only per branch,
there is no need to provide explicit type annotations, and
there is no need for bringing into scope implicit variables just for type annotations.

Kyle Miller (Oct 22 2024 at 21:25):

Yeah, this is one of our "Charons" for navigating dependently type hell. A cost is making cast-normalization simp lemmas, but it's not so bad to work with. Here's append_append:

def NatVec.append (xs : NatVec n) (ys : NatVec m) : NatVec (n + m) :=
  match xs with
  | nil => ys.cast (by omega)
  | cons x xs' => cons x (append xs' ys) |>.cast (by omega)

@[simp] theorem NatVec.cons_cast (xs : NatVec n) (h : n = n') :
    NatVec.cons x (xs.cast h) = (NatVec.cons x xs).cast (by subst h; rfl) := by
  subst_vars; rfl

@[simp] theorem NatVec.cast_append (xs : NatVec n) (h : n = n') (ys : NatVec m) :
    (xs.cast h).append ys = (xs.append ys).cast (by subst h; rfl) := by
  subst_vars; rfl

@[simp] theorem NatVec.append_cast (xs : NatVec n) (ys : NatVec m) (h : m = m') :
    xs.append (ys.cast h) = (xs.append ys).cast (by subst h; rfl) := by
  subst_vars; rfl

theorem append_append (xs : NatVec a) (ys : NatVec b) (zs : NatVec c) :
    (xs.append ys).append zs = (xs.append (ys.append zs)).cast (by omega) := by
  induction xs with
  | nil => simp [NatVec.append]
  | cons x xs' ih => simp [NatVec.append, ih]

Kyle Miller (Oct 22 2024 at 21:30):

I'm sure someone could figure out how to make automation for these. The formula is "given a function that returns NatVec, wherever an argument with NatVec, make a lemma that pushes a cast there to the output"

Kyle Miller (Oct 22 2024 at 21:31):

(Why push outward? In my experience, that causes them to merge using cast_trans and, hopefully, become eliminated entirely.)

Middle Adjunction (Oct 22 2024 at 21:37):

I am playing with your suggestion...

def append'' (xs : NatVec n) (ys : NatVec m) : NatVec (n + m) :=
  match xs with
  | nil        => ys                     |>.cast (by omega)
  | cons x xs' => cons x (append xs' ys) |>.cast (by omega)

This is clean, and pedagogically as expected: part related to lists before |> and part related to indices after, and also it is stable under swapping between n+1and 1 + n so no need to rely on a "trick" (e.g., by looking at the definition of _+_ , instead of taking it as abstract)

Middle Adjunction (Oct 22 2024 at 21:39):

I don't even need to come up an equational theory to give me a rewriting system as I have the full power of omega

Middle Adjunction (Oct 23 2024 at 05:00):

@Kyle Miller Is this pattern documented somewhere?

Kyle Miller (Oct 23 2024 at 05:03):

I'm not sure. I haven't looked into the history of it, but I've used it in mathlib, and I point it out whenever I notice someone trying to work with a Vector-like type on this Zulip.

Middle Adjunction (Oct 23 2024 at 05:54):

It seems like the boilerplate code is reimplementing a variant of simp tactic: M |>.cast (by omega) looks like by simp [...]; exact M of the first variant of append. It makes me wonder where is the line between bad uses of tactics for defining executable code versus these morally correct uses. Afterall, if I know morally that my uses of tactics are only applying propositions to rewrite the types, the run-time behaviour of the code should be unaffected. When is it not the case in this context?

Kyle Miller (Oct 23 2024 at 06:02):

It's different because cast is constraining the proof to be inside a function that you can reason about. Otherwise, you get unrestricted Eq.recs that you have no hope getting simp to do anything about.

Kyle Miller (Oct 23 2024 at 06:05):

It's still a rewrite, but it's a constrained rewrite that has algebraic rules you can exploit.

Sometimes there are types that let you sink the rewrite into a constructor and get some nice defeq properties. Fin.cast comes to mind. If you define your vector type as a Subtype, like in Edward's suggestion, you could do this too.

Middle Adjunction (Oct 23 2024 at 06:14):

ok let me try to redescribe: what would be the issue to write a tactic M |>Nat_Theorywhere

M is executable
_ |>Nat_Theory has access to goal type and type of M
_ |>Nat_Theory generates code in effect of (have p : type_of_goal = type_of_M := simp [Nat_Theory]); p ▸ M
This tactic/macro generates casts only, but unlike |>.cast, works on arbitrary type (as long as simp supports them).
Having access to goal and term type in this macro/tactic, somehow affects the execution behaviour?

Kyle Miller (Oct 23 2024 at 06:15):

I don't think there's anything that would affect the execution behavior. It's more of a matter of whether you can prove anything about your functions.

Kyle Miller (Oct 23 2024 at 06:17):

It could affect whether defeqs are preserved, though the NatVec.cast function doesn't have any good defeqs except when you're rewriting using rfl.

Kyle Miller (Oct 23 2024 at 06:19):

Here are some examples of defeqs you can get with the Subtype version:

def NatVec (n : Nat) := {xs : List Nat // xs.length = n}

def NatVec.cast (h : n = n') (xs : NatVec n) : NatVec n' :=
  ⟨xs.val, h ▸ xs.property⟩

example (h : n = n') (xs : NatVec n) : xs.val = (xs.cast h).val := rfl

def NatVec.cons (x : Nat) (xs : NatVec n) : NatVec (n + 1) := ⟨x :: xs.val, congrArg (·+1) xs.2⟩

example (h : n = n') (x : Nat) (xs : NatVec n) :
  NatVec.cons x (xs.cast h) = (NatVec.cons x xs).cast (congrArg (·+1) h) := rfl

Things tend to commute with cast in this case.

Middle Adjunction (Oct 23 2024 at 06:20):

Kyle Miller said:

I don't think there's anything that would affect the execution behavior. It's more of a matter of whether you can prove anything about your functions.

Can you explain what you mean?

All the _ |>Nat_Theory macro is doing is to replace the handwritten bits like

  | nil =>
    have p : m = 0 + m := by simp [ Nat.zero_add ]
    p ▸ ys

with

  | nil =>  ys |>Nat_Theory

No boilerplate code needed.

Kyle Miller (Oct 23 2024 at 06:21):

Think about the theorems you'd write about this function.

Middle Adjunction (Oct 23 2024 at 06:21):

append nil xs = xs

Kyle Miller (Oct 23 2024 at 06:22):

The point is that in the proofs you have to work with whatever the tactic proof in the definition of append happened to create.

With cast functions, you will see very constrained terms that you can reason about algebraically.

Kyle Miller (Oct 23 2024 at 06:24):

Unrestricted Eq.rec (which is what you're suggesting using) is "dependently typed hell". It's possible to navigate it, but it's better to take some care and make some preparations.

Kyle Miller (Oct 23 2024 at 06:25):

An exercise is to write the append_append theorem from before

Kyle Miller (Oct 23 2024 at 06:26):

If you don't care about proving things, then there's probably no harm in using simple tactics in your definitions.

Middle Adjunction (Oct 23 2024 at 06:27):

Sorry, I am a bit confused: you are saying even append' from the original post causes issues with reasoning (e.g., append_append)?

Kyle Miller (Oct 23 2024 at 06:28):

Yeah, I've found that sort of thing to cause a lot of problems. It's worth trying to prove append_append

Middle Adjunction (Oct 23 2024 at 06:29):

I thought we were comparing append vs append', not append' vs append''

Kyle Miller (Oct 23 2024 at 06:30):

You'll have to remind me which these are all. My main argument here is that "it's better to define append with a cast function if you want to do reasoning, if you have to do rewrites at all"

Middle Adjunction (Oct 23 2024 at 06:32):

Here are the three variants:

def append (xs : NatVec n) (ys : NatVec m) : NatVec (n + m) :=
  match xs with
  | nil        => by
    simp [ Nat.zero_add ]
    exact ys
  | cons x xs' => by
    simp [ Nat.add_assoc ]
    exact cons x (append xs' ys)

def append' (xs : NatVec n) (ys : NatVec m) : NatVec (n + m) :=
  match xs with
  | nil =>
    have p : m = 0 + m := by simp [ Nat.zero_add ]
    p ▸ ys
  | cons (n := n') x xs' =>
    have p : 1 + (n' + m) = (1 + n') + m := by simp [ Nat.add_assoc ]
    p ▸ (cons x (append xs' ys))

-- definitions related to cast and pushing it around types

def append'' (xs : NatVec n) (ys : NatVec m) : NatVec (n + m) :=
  match xs with
  | nil        => ys                     |>.cast (by omega)
  | cons x xs' => cons x (append xs' ys) |>.cast (by omega)

-- definitons related to hypothetical macro _ |>Nat_Theory
def append''' (xs : NatVec n) (ys : NatVec m) : NatVec (n + m) :=
  match xs with
  | nil        => ys                     |>Nat_Theory
  | cons x xs' => cons x (append xs' ys) |>Nat_Theory

Kyle Miller (Oct 23 2024 at 06:35):

There's not too much of a difference between the first two, and the third is easier to prove things about. The second is probably somewhat easier to prove things about than the first, since ▸ elaborates to a single Eq.rec.

Kyle Miller (Oct 23 2024 at 06:36):

The key is that cast constrains the rewrite to the index. In the first two, the rewrite is for the whole NatVec type, and it's hard to do anything with type equalities.

Kyle Miller (Oct 23 2024 at 06:37):

For example, from NatVec n = NatVec m you can't prove that n = m. But, with a cast function, you have n = m immediately available.

Middle Adjunction (Oct 23 2024 at 07:43):

I see. I believe there are three separate dimensions to this discussion (based on code):

whether there should be casts:
- append has no explicit casts, but
- append', append'' and append''' have casts

We agree there should casts.

whether casts should be restricted to indices or entire type:
- in append' casts are unrestricted, but
- in append'' (and potential append''') casts are restricted to indices

We agree restricted casts are better.

syntactic burden (and boilerplates):
- append'' is easier to write compared to append and append' except the generic boilerplate (e.g., NatVec.cast_rfl) needed for pushing casts around
- the proposed macro/tactic _|>Nat_Theory in append''' attempts to simulate append'' style without the boilerplate

Finally, to summarise what _|>Nat_Theory means:
for M : NatVec n, and goal type NatVec m the macro M |>Nat_Theory generates

have p : m = n := simp [ Nat_theory ]
M.cast( p)

Do you foresee any issues with the macro _|>Nat_Theory?

Kyle Miller (Oct 23 2024 at 16:05):

I don't think there are any issues with that, and it would be nice automation to have.

It would also be nice if there were reasoning automation that could automatically push casts around.

Middle Adjunction (Oct 23 2024 at 16:15):

Thank you for your replies @Kyle Miller!
Encouraged by this, I am currently reading Metaprogramming in Lean 4, and am deep in Lean's metaprogramming libraries.
I am not sure this thread is the right venue to delve into details of developing such automation, so I will open a new thread.
After a few days, I plan to come back and add a sequel to this story with summarising conclusions.

Dean Young (Oct 23 2024 at 18:40):

Stay tuned...

Last updated: May 02 2025 at 03:31 UTC