Zulip Chat Archive

Stream: CSLib

Topic: Longest Common Subsequence

Gordon Hsu (Feb 22 2026 at 14:21):

Hi, I am new to CSLib. I have been formalizing the longest common subsequence problem. Currently, I provide 3 implementations:

recursion without dynamic programming,
functional dynamic programming, and
imperative dynamic programming.

The correctness of the first 2 implementations are proved whereas I am still experimenting with mvcgen for the imperative implementation. Could I contribute this in any form to CSLib?

https://gist.github.com/kuotsanhsu/e21ec4cfb78f3b43e898ed19d55d3625

Chris Henson (Feb 22 2026 at 15:07):

Hi! I don't think this exists anywhere (someone correct me if I'm wrong), so this is probably suitable with some cleaning. For 3) we've actually not discussed mvcgen proofs in CSLib yet, but I think you could start with just the first two.

Have you considered also verifying the time complexity of the quadratic approach? There's some ongoing discussion of generalizing it, but at the moment we have TimeM for expressing this.

Shreyas Srinivas (Feb 22 2026 at 15:18):

You could also make the PR to cslib#275 ‘s prog approach.

Gordon Hsu (Feb 22 2026 at 15:22):

Thanks for the pointers. I'll attempt TimeM for implementation 2 following cslib#275.

Chris Henson (Feb 22 2026 at 15:27):

Please do not feel that you have to wait if you'd like to try out the existing TimeM in the meantime. If it is superseded by cslib#275, we can handle porting over any proofs.

Gordon Hsu (Feb 22 2026 at 15:37):

Ok, I'll try out TimeM directly, but I would need some time to get familiar with it.

Shreyas Srinivas (Feb 22 2026 at 15:38):

If you are targeting cslib#275, I think you want two constructors in your query type. One reads a list (the input list or dp table) at an index. The other writes a value at an index to a list (which will be the dp table). ~~your choice of query type becomes easier if you declare a “DataStructure” type that contains the input array and the dp array.~~ actually let’s just use queries that take lists or arrays as input and outputs as required. Just having an RwCollection query type for a given ‘Collection’ would be nicer

Shreyas Srinivas (Feb 22 2026 at 15:39):

And you can reuse your existing proofs for correctness by proving that the Prog version evaluates to your version. On all inputs.

Shreyas Srinivas (Feb 22 2026 at 15:40):

Glad to assist as much as I can (since I have a bit of travel coming up)

Gordon Hsu (Feb 22 2026 at 15:43):

Thank you so much for the instructions. I'll try to make a PR even if I hit a wall so that I can call for help.

Sorrachai Yingchareonthawornchai (Feb 25 2026 at 11:29):

Hi Gordon, if you want to use TimeM, the simplest approach is to write an algorithm in functional code, and then you can use TimeM almost immediately without change.

Shreyas Srinivas (Feb 25 2026 at 12:01):

This also goes for Prog. There is a little bit more overhead, because you actually have to commit to a model and costs of each query in advance, but essentially after that the Prog will look identical to the functional version.

You just have to replace List.get and set with their query versions. Also the query you define will be usable for many array based algorithms. Future algorithms authors will just use the model you define instead of writing their own.

Shreyas Srinivas (Feb 25 2026 at 12:08):

Fwiw you might get the TimeM version for free from the Prog version. For timing Prog compiles to TimeM.

Eric Wieser (Feb 25 2026 at 17:25):

At any rate, since the approaches are similar it is fine to use TimeM now, and then switch to Prog later

Gordon Hsu (Feb 25 2026 at 19:37):

I have proven time complexity with TimeM, but TimeM messed up my other two lemmas which relied on the definitional equality of List.foldr. Reviewers can start giving feedback while I keep working on the lemmas.

https://github.com/leanprover/cslib/pull/370

Shreyas Srinivas (Feb 26 2026 at 05:55):

I think this PR could do with a lot simplification. You could directly create a 2D table DP table by DP : a -> a -> b instead of doing things in a roundabout fashion with DP and DPS.

Shreyas Srinivas (Feb 26 2026 at 06:09):

Secondly that long proof in the middle could be drastically shorter with fun_induction, simp and/or grind.

Gordon Hsu (Feb 26 2026 at 10:33):

Shreyas Srinivas said:

I think this PR could do with a lot simplification. You could directly create a 2D table DP table by DP : a -> a -> b instead of doing things in a roundabout fashion with DP and DPS.

Don't worry, your original wording is spot on. I share the same sentiment. I doubted my sanity when my first version looks like this:

def lcs (xs ys : List α) : List α := Id.run do
  if ys.isEmpty then
    return []
  let mut dps : Vector (DP α) ys.length := default
  for x in xs.reverse do
    let mut i := 0
    let mut yy : DP α := default
    let mut xy : DP α := default
    for y in ys.reverse do
      let xx := dps[i]!
      yy := dp x y xx yy xy
      dps := dps.set! i yy
      xy := xx
      i := i + 1
  return dps.back!.1
where
  dp (x y : α) (xx yy xy : DP α) : DP α :=
    if x = y then
      ⟨x :: xy.lcs, xy.length + 1, xy.length_eq.rec rfl⟩
    else
      if xx.length ≥ yy.length then xx else yy

Gordon Hsu (Feb 26 2026 at 10:40):

Shreyas Srinivas said:

Secondly that long proof in the middle could be drastically shorter with fun_induction, simp and/or grind.

It could be shorter if I don't annotate the types of the local variables/proofs and hide the structure of the proof behind tactics. However, wouldn't you find it more readable this way: the context is directly visible in the code instead of needing to hook up the infoView panel in VSCode. Anyway, I think the point is mute, people like tactics. I can rewrite it.

Gordon Hsu (Feb 26 2026 at 10:52):

Could you elaborate on the 2D table approach? Immutability leads to construction of lists. The visiting order is the key part of the proof but with foldr it's free. How do you index the table? How do you zip (as in visiting two list simultaneously) the DP table and one of the input lists without incurring the cloning of both for each iteration? How do you index both the DP table and one of the input lists without tricky proofs on the index (foldr is easier to reason)?

2D table was my first thought, as it is much closer to my original imperative implementation. However, when I tried to attempt an mvcgen proof, I realized how tricky it is with all the indices needing to be in bound, and the prefix of the DP array needing to satisfy certain lengthy property. The proof would be DPS plus index consistency.

Nevertheless, I would like to see a 2D table approach. I just couldn't think of one that is less convoluted than with foldr and DPS.

Shreyas Srinivas (Feb 26 2026 at 10:57):

So let’s start with the most important thing. Apologies for the wording.

Gordon Hsu (Feb 26 2026 at 10:57):

I will fix the stylistic issues brought up by Eric Wieser, move the unit tests to CSLibTest as Shreyas Srinivas, and close the sorry in the two lemmas.

Shreyas Srinivas (Feb 26 2026 at 10:57):

Secondly I actually find this version you posted above better

Gordon Hsu (Feb 26 2026 at 10:57):

Shreyas Srinivas said:

So let’s start with the most important thing. Apologies for the wording.

No need to apologize, I feel the same way.

Shreyas Srinivas (Feb 26 2026 at 10:58):

Thirdly I would probably start by getting rid of DPS

Shreyas Srinivas (Feb 26 2026 at 10:58):

You don’t actually need the list length to be carried along inside a structure.

Gordon Hsu (Feb 26 2026 at 10:59):

Shreyas Srinivas said:

Secondly I actually find this version you posted above better

I find it harder to prove than with foldr and DPS. Please see my original attempt without TimeM: https://gist.github.com/kuotsanhsu/e21ec4cfb78f3b43e898ed19d55d3625

Shreyas Srinivas (Feb 26 2026 at 11:00):

Thirdly. I understand you find your current proof readable.

But this is a library. It needs to have compact proofs.
Even from a readability perspective a long proof is unreadable

Gordon Hsu (Feb 26 2026 at 11:00):

Shreyas Srinivas said:

Thirdly I would probably start by getting rid of DPS

Isn't the caching of LCS length the key to DP? Lengths have to be cached in the supposedly 2D table, or 1D array with 2 more trackers in the mutable case.

Gordon Hsu (Feb 26 2026 at 11:00):

Shreyas Srinivas said:

Thirdly. I understand you find your current proof readable.

But this is a library. It needs to be produce compact proofs.

Even from a readability perspective a long proof is unreadable

Ok, I'll refactor with tactics.

Shreyas Srinivas (Feb 26 2026 at 11:01):

About that. I had this funny idea that in functional languages, a top down memoizing dp is easier to write. But more importantly I think with a clever bit of metaprogramming we can get automatic memoization for any divide and conquer recursion.

Gordon Hsu (Feb 26 2026 at 11:02):

Shreyas Srinivas said:

About that. I had this funny idea that in functional languages, a top down memoizing dp is easier to write. But more importantly I think with a clever bit of metaprogramming we can get automatic memoization for any divide and conquer recursion.

Indeed, but we currently don't have that. It would be a good case for future refactoring if such a feature comes about.

Shreyas Srinivas (Feb 26 2026 at 11:03):

You could actually repurpose your DP structure to hold a 2D Finvec(Matrix in mathlib)/list/array

Shreyas Srinivas (Feb 26 2026 at 11:03):

And hold the current indices upto which they are filled.

Shreyas Srinivas (Feb 26 2026 at 11:03):

Instead of list length

Shreyas Srinivas (Feb 26 2026 at 11:04):

And then every update takes this state and the next index from the inputs to produce a new state.

Gordon Hsu (Feb 26 2026 at 11:04):

What about immutability?

Shreyas Srinivas (Feb 26 2026 at 11:04):

I am not worrying myself about mvcgen for now

Gordon Hsu (Feb 26 2026 at 11:05):

Shreyas Srinivas said:

About that. I had this funny idea that in functional languages, a top down memoizing dp is easier to write. But more importantly I think with a clever bit of metaprogramming we can get automatic memoization for any divide and conquer recursion.

If we have good memoization facility, lcsBrute would work directly by adding an attribute.

Shreyas Srinivas (Feb 26 2026 at 11:05):

I am trying to focus on the question: how can we get a clean functional programming version of lcs

Shreyas Srinivas (Feb 26 2026 at 11:05):

Gordon Hsu said:

Shreyas Srinivas said:

About that. I had this funny idea that in functional languages, a top down memoizing dp is easier to write. But more importantly I think with a clever bit of metaprogramming we can get automatic memoization for any divide and conquer recursion.

If we have good memoization facility, lcsBrute would work directly by adding an attribute.

There is something called MonadCache

Shreyas Srinivas (Feb 26 2026 at 11:05):

But let’s leave it to one side for now.

Shreyas Srinivas (Feb 26 2026 at 11:06):

Let’s first make the lcs functional version simple and familiar

Gordon Hsu (Feb 26 2026 at 11:06):

I would still like to keep DP. The only concern is DPS.

Shreyas Srinivas (Feb 26 2026 at 11:09):

I am not a machine so I can’t give you good arguments against it. So I will skip to my second point.

Gordon Hsu (Feb 26 2026 at 11:09):

But if you think about it, DPS serves the same role as the 1d vector with 2 extra trackers.

Shreyas Srinivas (Feb 26 2026 at 11:10):

Take the internal auxiliary lemmas and bring them out. Give them proper names

Shreyas Srinivas (Feb 26 2026 at 11:11):

Prove the lcs invariant and prove correct initialisation

Gordon Hsu (Feb 26 2026 at 11:11):

Could you sketch with some pseudocode?

Gordon Hsu (Feb 26 2026 at 11:16):

Say if we are not concerned with cloning, zipping ys and List (DP α) would be cleaner.

Shreyas Srinivas (Feb 26 2026 at 11:16):

Just right now I can’t. I’m on a train in the middle of nowhere

Gordon Hsu (Feb 26 2026 at 11:17):

No problem. I'll make a new attempt.

Gordon Hsu (Feb 26 2026 at 11:23):

Gordon Hsu said:

But if you think about it, DPS serves the same role as the 1d vector with 2 extra trackers.

Correction, DPS is just to avoid passing ys and cloning for each zip which reduces clarity.

Shreyas Srinivas (Feb 26 2026 at 11:43):

As for “2D tables” check mathlib matrices

Gordon Hsu (Feb 26 2026 at 12:03):

I've proved things using matrices before, but how do they perform with regard to indexing and updating? Would updating every element of a matrix lead to m*n deep Function.update? Why not just list of lists by foldr?

Shreyas Srinivas (Feb 26 2026 at 12:09):

Let’s ignore actual performance for now please.

Shreyas Srinivas (Feb 26 2026 at 12:11):

Let’s simplify first, prove stuff, and then optimise step by step.

Gordon Hsu (Feb 26 2026 at 12:12):

Ok, I'll do the following:

Use tactics to shorten proofs.
Replace DPS with zipping, but keep list of list.
Third implementation using matrix.

Shreyas Srinivas (Feb 26 2026 at 12:13):

Matrix is just a function on two index variables. This is the structure you want to use

Gordon Hsu (Feb 26 2026 at 12:14):

I've tried proving stuff with indexing. I doubt it will be cleaner, but I'll nevertheless see for myself.

Shreyas Srinivas (Feb 26 2026 at 12:59):

I will be away for a few days. I hope someone can step in and help you if I can’t.

Gordon Hsu (Feb 26 2026 at 17:08):

Shreyas Srinivas said:

I will be away for a few days. I hope someone can step in and help you if I can’t.

Thanks for your input. Have a good time. I think a suitable recursion could serve as building a matrix without the indexing nor foldr:

/-- List with cached length. -/
structure DP α where
  lcs : List α
  length : Nat
  length_eq : lcs.length = length

scoped instance : Inhabited (DP α) where default := ⟨[], 0, rfl⟩

/-- Core decision of LCS dynamic programming that is considered *one tick*. -/
@[macro_inline]
def core (x y : α) (xx yy xy : DP α) : DP α :=
  if x = y then
    ⟨x :: xy.lcs, xy.length + 1, xy.length_eq.rec rfl⟩
  else
    if xx.length ≥ yy.length then xx else yy

/-- Dynamic programming taking quadratic time and quadratic space via table-like recursion. -/
def lcsDP (xs ys : List α) : TimeM ℕ (List α) :=
  return match ← outerLoop xs ys with
  | [] => []
  | {lcs, ..} :: _ => lcs
where
  /-- Iterate through all rows of the DP table, keeping just the last row. -/
  outerLoop (xs ys : List α) : TimeM ℕ (List (DP α)) :=
    match xs with
    | [] => return replicate ys.length default
    | x :: xs => do
      let dps ← outerLoop xs ys
      let (dps, _, _) ← innerLoop x (zip ys dps)
      return dps
  /-- Grow the row of the DP table for `x` by one `y` at a time. -/
  innerLoop (x : α) : List (α × DP α) → TimeM ℕ (List (DP α) × DP α × DP α)
    | [] => return ([], default, default)
    | (y, xx) :: rest => do
      let (dps, yy, xy) ← innerLoop x rest
      ✓ let dp := core x y xx yy xy
      return (dp :: dps, dp, xx)

Last updated: Feb 28 2026 at 14:05 UTC