Zulip Chat Archive

Stream: Is there code for X?

Topic: O(nlogn) dedup with provable properties

Jiatong Yang (May 14 2025 at 09:17):

I want Multiset.dedup on Multiset Nat with O(nlogn) complexity with provable properties(provably equal to Multiset.dedup), using first sort and then dedup. Is there code for that in Mathlib?

Markus Himmel (May 14 2025 at 09:58):

Not exactly what you're asking for, but here is one way to get a fast (expected linear time) Multiset.fastDedup using a hash set for deduplication. The proofs can be optimized considerably.

import Std.Data.HashSet.Lemmas
import Mathlib

open Std

namespace List

def fastDedup [DecidableEq α] [Hashable α] (l : List α) : List α :=
  impl l |>.1
where
  impl (l : List α) : List α × Std.HashSet α :=
    l.foldr (β := List α × Std.HashSet α) (fun x IH => if ¬x ∈ IH.2 then (x :: IH.1, IH.2.insert x) else IH) ([], ∅)

theorem fastDedup.mem_snd_impl [DecidableEq α] [Hashable α] (l : List α) (x : α) :
    x ∈ (fastDedup.impl l).2 ↔ x ∈ l.dedup := by
  induction l with
  | nil => simp [impl]
  | cons y t ih =>
    simp [impl, ite_not] at ih
    simp only [impl, ite_not, foldr_cons]
    split <;> rename_i h
    · simp [ih]
      rintro rfl
      simp_all
    · simp [ih, @Eq.comm _ x y]

@[simp]
theorem fastDedup_eq_dedup [DecidableEq α] [Hashable α]
    (l : List α) : l.fastDedup = l.dedup := by
  rw [Eq.comm]
  induction l with
  | nil => simp [fastDedup, fastDedup.impl]
  | cons h t ih =>
    simp [List.pwFilter, dedup, fastDedup, fastDedup.impl] at ⊢ ih
    split <;> rename_i h'
    · rw [if_neg]
      · simp [ih]
      · have := fastDedup.mem_snd_impl t h
        simp only [fastDedup, fastDedup.impl, ite_not] at this
        simp [this, -mem_dedup, dedup, pwFilter]
        intro h''
        simpa using h' _ h''
    · rw [if_pos]
      · simp [ih]
      · have := fastDedup.mem_snd_impl t h
        simp only [fastDedup, fastDedup.impl, ite_not] at this
        simp [this, -mem_dedup, dedup, pwFilter]
        simpa using h'

end List

namespace Multiset

def fastDedup [DecidableEq α] [Hashable α] (s : Multiset α) : Multiset α :=
  Quot.liftOn s (fun l => (l.fastDedup : Multiset α)) fun _ _ p => Quot.sound (by simpa using p.dedup)

theorem fastDedup_eq_dedup [DecidableEq α] [Hashable α] (s : Multiset α) : s.fastDedup = s.dedup := by
  simp [dedup, fastDedup]

end Multiset

Floris van Doorn (May 14 2025 at 10:02):

There is also docs#Array.sortDedup that you could try to port to List and then Multiset. I'm quite sure that what you ask doesn't exist in Mathlib yet.

Jiatong Yang (May 14 2025 at 10:02):

Thank you very much!

Andrew Yang (May 14 2025 at 10:09):

A slight caveat of the approach above is that

example : fastDedup [1, 2, 3] = [1, 2, 3] := rfl

fails, so it cannot actually provide proof terms of someList.dedup = someOtherList where the two lists are actual list literals.

Andrew Yang (May 14 2025 at 10:15):

You can achieve an actual decidable one by doing something like this but It will be a lot more work to actually prove it is the right thing.

import Mathlib.Data.List.Defs

def List.mergeTR' (l₁ l₂ : List α) (le : α → α → Bool) : List α :=
  go (l₁.length + l₂.length) l₁ l₂ (by simp) []
where go : (fuel : Nat) → (l₁ l₂ : List α) → (l₁.length + l₂.length = fuel) → List α → List α
  | 0, _, _, _, _ => []
  | fuel + 1, [], l₂, h, acc => reverseAux acc l₂
  | fuel + 1, l₁, [], h, acc => reverseAux acc l₁
  | fuel + 1, x :: xs, y :: ys, h, acc =>
    if le x y then
      go fuel xs (y :: ys) (by simp_all; omega) (x :: acc)
    else
      go fuel (x :: xs) ys (by simp_all; omega) (y :: acc)

def List.mergeSortTR' (l : List α) (le : α → α → Bool := by exact fun a b => a ≤ b) : List α :=
  run (l.length + 1) l (by simp)
where run : (fuel : Nat) → (l : List α) → l.length < fuel → List α
  | 0, _, h => by contradiction
  | fuel + 1, [], _ => []
  | fuel + 1, [a], _ => [a]
  | fuel + 1, x₁::x₂::xs, h =>
    let lr := splitAt (((x₁::x₂::xs).length+1)/2) (x₁::x₂::xs)
    mergeTR' (run fuel lr.1 (by simp [lr] at h ⊢; omega))
      (run fuel lr.2 (by simp [lr] at h ⊢; omega)) le

def List.dedup' (l : List Nat) : List Nat := l.mergeSortTR'.destutter (· ≠ ·)

example : [1, 4, 6, 2, 2, 3, 1].dedup' = [1, 2, 3, 4, 6] := rfl

Markus Himmel (May 14 2025 at 10:20):

Andrew Yang said:

A slight caveat of the approach above is that
example : fastDedup [1, 2, 3] = [1, 2, 3] := rfl
fails, so it cannot actually provide proof terms of someList.dedup = someOtherList where the two lists are actual list literals.

simp solves this example.

Andrew Yang (May 14 2025 at 10:24):

It can only do small lists though, which is a concern if one needs $O(n \log n)$ dedup (assuming that they actually care about proof terms for concrete lists).

example : fastDedup (eval% List.range 100) = (eval% List.range 100) := by
  simp [fastDedup, fastDedup.impl] -- fails

Markus Himmel (May 14 2025 at 10:27):

Sure. In general it should be assumed that there should be no relationship between what is fast for the kernel to evaluate and what is fast at runtime. Something like $O(n \log n)$ doesn't really make sense for kernel evaluation, as the kernel is not a virtual machine in any traditional sense.

Andrew Yang (May 14 2025 at 10:34):

Hmmm I assume the time for the kernel to check a proof term is directly related to how big a proof term is? (perhaps quadratic in depth and linear in size for fix depth etc)
I actually need something like this (which is why I had that code snippet lying around) but it can only go up to ~5000. I wonder if you have any suggestions? Perhaps it is inherently limited by lists being recursive and having depth $O(n)$ and I need to switch to a better data structure (e.g. a balanced tree)?

Markus Himmel (May 14 2025 at 10:38):

The kernel is a complex machine full of heruristics and opportunities for things to go wrong. Often it is better to have something like a tactic or a simproc which produces something that is easy for the kernel to check. Example:

example : fastDedup (eval% List.range 100) = (eval% List.range 100) := by
  rw [fastDedup_eq_dedup]
  repeat' rw [dedup_cons_of_not_mem (by decide), List.cons_inj_right]
  rw [dedup_nil]

(though 5000 is too much for this example. The approach is still quadratic after all.)

Markus Himmel (May 14 2025 at 10:40):

May I ask what your use case for kernel computations involving giant lists is?

Andrew Yang (May 14 2025 at 10:44):

An example is to check $x$ is a primitive root mod $p$ you just generate the list $[x ^ y | y < p ]$ and see if there are duplicates. (Of course there are mathematical solutions here that drastically reduces the amount of calculations needed but this is just a simplified example)

Andrew Yang (May 14 2025 at 10:45):

From my limited experiences, decide+kernel is almost always faster then my (or Bhavik Mehta 's) hand roll tactic trying to construct a proof term though.

Markus Himmel (May 14 2025 at 10:53):

Thanks. It would certainly be nice to have more data structures specifically optimized for proofs (whether helped along by a tactic or not). We have docs#Lean.RArray which is a first step in that direction.

Mario Carneiro (May 14 2025 at 14:48):

Note, it is algorithmically easier to check that a list has no duplicates than to compute what the result of dedup is. So if the only thing you need is Nodup you should ask for that and not dedup

Mario Carneiro (May 14 2025 at 14:51):

You can for example prove that a list has no duplicates by sorting and then checking that it is strictly monotone. This can actually be done in O(n) proof length

Andrew Yang (May 14 2025 at 14:52):

I'm not sure how to provide a O(n) proof length that L' is the sorted version of L.

Andrew Yang (May 14 2025 at 14:53):

(and if you can do so, then you can linearly remove duplicates so dedup would also be O(n))

Aaron Liu (May 14 2025 at 14:58):

Andrew Yang said:

(and if you can do so, then you can linearly remove duplicates so dedup would also be O(n))

No, the proof is O(n), but the process of finding the proof is O(n log n) since you have to sort.

Andrew Yang (May 14 2025 at 15:01):

How is the proof O(n)?

Andrew Yang (May 14 2025 at 15:01):

I think we are all talking about proof sizes here? The algorithm Mario gave clearly has run time O(n log n) as well.

Aaron Liu (May 14 2025 at 15:07):

I guess, provide an e : Fin n ≃ Fin n and show that ∀ n, L[n] = L'[e n], and also show that L' is sorted

Aaron Liu (May 14 2025 at 15:08):

This all seems to be O(n)

Andrew Yang (May 14 2025 at 15:08):

Lists probably don't support O(1) random access. You need some binary tree or some finger tree that can only give you O(log n) random access. Arrays don't work here because in the kernel arrays are just lists.

Mario Carneiro (May 14 2025 at 15:12):

I double checked my notes on this problem and it's O(n log n), not O(n).

Aaron Liu (May 14 2025 at 15:14):

If you're counting kernel checking time then nothing is O(n) because you have to at least access every element of the list in a random order

Mario Carneiro (May 14 2025 at 15:14):

That's O(n)?

Mario Carneiro (May 14 2025 at 15:16):

Permutation is O(n log n) in this model because you have only finitely many choices you can do at each stage and n! possible places to get to

Mario Carneiro (May 14 2025 at 15:16):

so basically the same kind of argument as why sorting is O(n log n) even though there is no comparison needed

Andrew Yang (May 14 2025 at 15:18):

In an extreme example you traverse once and put everything into a balanced binary tree and then everything is fine and this will only be O(n log ^2 n) at most. This is just an example why it is not necessarily quadratic. The usual sorting algorithms should work in O(n log n) by the standard arguments.

Mario Carneiro (May 14 2025 at 15:19):

Sorting is not quite optimal in this setting though because it does more comparisons than necessary

Mario Carneiro (May 14 2025 at 15:19):

You can do it in O(n log n) time/proof length and O(n) comparisons

Andrew Yang (May 14 2025 at 15:20):

But I am suspecting that the kernel check time is quadratic wrt the depth of the proof term (and hence the 5000 and not something higher) and traversing lists will necessarily have O(n) depth. Can someone confirm such a suspicion?

Mario Carneiro (May 14 2025 at 15:20):

No? Why would that be the case

Andrew Yang (May 14 2025 at 15:23):

Because my near linear thing (see the snippet above) runs quadratically on my inputs. As for why I am guessing (pure guessing) that checking a node takes O(size of context) or whatever.

Mario Carneiro (May 14 2025 at 15:24):

You aren't generating proof terms though, whnf is an entirely different beast

Andrew Yang (May 14 2025 at 15:26):

Oh you mean it's decide being quadratic? I'll see if I can manage to produce an actual proof term of size O(n log n)

Mario Carneiro (May 14 2025 at 15:26):

I find it extremely challenging to write functions that actually have good whnf performance

Mario Carneiro (May 14 2025 at 15:27):

keep in mind that if you use the default equation compiler it will generate a ton of gunk that has to be waded through in every step of the iteration

Mario Carneiro (May 14 2025 at 15:28):

Your best bet is to use T.rec directly

Bhavik Mehta (May 14 2025 at 15:28):

How does the kernel check time scale with the size and depth of the proof term? I have a memory of you Mario telling me that it's worse than linear in the depth, but I might well be misremembering

Mario Carneiro (May 14 2025 at 15:28):

It's ~quadratic in the size of the context, so you should avoid binders if possible

Bhavik Mehta (May 14 2025 at 15:29):

Does that include let_fun blocks?

Mario Carneiro (May 14 2025 at 15:30):

yes, if the data being manipulated has fvars in it

Mario Carneiro (May 14 2025 at 15:31):

You can use letI to avoid introducing a binder in this case

Mario Carneiro (May 14 2025 at 15:32):

the issue is that doing a beta reduction when the function to reduce contains your data in it will have lean walk the term to substitute the variable in, which gets more expensive as your data grows unless it's a closed term in which case it can be short circuited

Mario Carneiro (May 14 2025 at 15:35):

The two basic optimizations one can take advantage of during whnf are:

subterms are cached so if the term has internal sharing or it is duplicative then is not expensive the second time
There is a flag in each term saying "I have no bvars/fvars" which makes the lift and subst operations short circuit

Other than this, beta reduction is linear

Last updated: Feb 28 2026 at 14:05 UTC