Zulip Chat Archive

Stream: lean4

Topic: Termination of a "group-by-key" List function


Yakov Pechersky (Oct 27 2024 at 12:11):

Consider the following definition. What is the right way to show termination for it? I am not sure how to utilize termination_by, since the partitioning depends on the first element.

namespace List

variable {X Y : Type _}

def groupByKey [DecidableEq Y] (f : X  Y) : List X  List (List X) := fun
  | [] => []
  | l@(x :: _) =>
    have : (l.partition (fun a => f x = f a)).2.length < l.length := sorry
    (l.partition (fun a => f x = f a)).1 :: groupByKey f (l.partition (fun a => f x = f a)).2

end List

Yakov Pechersky (Oct 27 2024 at 12:12):

My overall goal is to have a function similar to docs#List.groupBy, but one that does not require the groups to be contiguous. I am not trying to have a performant/optimized function, but rather, one that is easy to prove things about -- and then lift it into Multiset and Finset.

Mario Carneiro (Oct 27 2024 at 18:35):

Why not just save a step of computation and put x in the first bucket already since you know it has to be there?

Mario Carneiro (Oct 27 2024 at 18:38):

Here's how you can make the definition go through as written:

def groupByKey [DecidableEq Y] (f : X  Y) : List X  List (List X) := fun
  | [] => []
  | l@(x :: _) =>
    have : (l.partition (fun a => f x = f a)).2.length < l.length := sorry
    (l.partition (fun a => f x = f a)).1 :: groupByKey f (l.partition (fun a => f x = f a)).2
termination_by l => l.length
decreasing_by subst l; assumption

Mario Carneiro (Oct 27 2024 at 18:42):

Here's a version that avoids calculating partition multiple times:

def groupByKey [DecidableEq Y] (f : X  Y) : List X  List (List X) := fun
  | [] => []
  | (x :: l) =>
    match e : l.partition (fun a => f x = f a) with
    | (left, right) =>
      have : right.length < (x::l).length := by
        simp at e; simp [ e, Nat.lt_succ, List.length_filter_le]
      (x :: left) :: groupByKey f right
termination_by l => l.length

Yakov Pechersky (Oct 27 2024 at 20:45):

Yes, I knew about the optimization of not recalculating the bucket given the existing x. Still, I was stymied about how to provide the termination details. Thanks for giving both examples!


Last updated: May 02 2025 at 03:31 UTC