Zulip Chat Archive

Stream: Is there code for X?

Topic: Computation Complexity of Implementations in Lean


Amitayush Thakur (Feb 16 2025 at 06:50):

I was wondering if there is a way to argue about the computational complexity of implementations of certain algorithms in Lean.

For example, consider the two implementations of search:

def findIndex : List Int  Int  Int
| [], _ => -1
| x::xs, y => if x == y then 0
              else
                let i := findIndex xs y;
                if i == -1 then
                  -1
                else
                  i + 1

and

def binarySearchLowHi (xs: List Int) (y: Int) (low: Nat) (hi: Nat): Int :=
  if low > hi then -1
  else
    let sum := low + hi;
    let mid := if sum % 2 == 0
               then sum / 2
               else (sum - 1) / 2;
    have h_i_lt_n : mid < xs.length := by
      sorry
    -- ^ NOTE:
    -- This proof is required to ensure that
    -- list.get can be called with the index `i`
    -- without any runtime errors.
    let mid_val := xs.get mid, h_i_lt_n⟩;
    if mid_val == y then mid
    else if mid_val < y then
      binarySearchLowHi xs y (mid + 1) hi
    else
      binarySearchLowHi xs y low (mid - 1)
termination_by hi - low
decreasing_by
  simp
  rename_i h_hi_leq_low
  -- We only have to prove termination for the recursive case when
  -- `low <= hi`. This is because the base case is already covered by the `if low > hi then -1` line.
  -- We rename the hypothesis to `h_hi_leq_low` to make it more readable.
  simp at h_hi_leq_low
  have h_sum_mod_2_eq_0_or_1 : sum % 2 = 0  sum % 2 = 1 := by
    sorry
  cases h_sum_mod_2_eq_0_or_1
  rename_i h_sum_mod_2_eq_0
  rw [h_sum_mod_2_eq_0] at *
  simp -- at this point, prove that hi - ((low + hi) / 2 + 1) < hi - low
  sorry
  rename_i h_sum_mod_2_eq_1
  rw [h_sum_mod_2_eq_1] at *
  simp -- at this point, prove that hi - ((low + hi - 1) / 2 + 1) < hi - low
  sorry
  sorry

How can I state that binary search is faster than the simple search given the list is sorted

Daniel Weber (Feb 16 2025 at 07:13):

This isn't true, because a docs#List is a linked list, so it's actually less efficient and can be O(n log(n))

Daniel Weber (Feb 16 2025 at 07:13):

However, in any case, due to function extensionality (docs#funext) you can't state this directly about the function

Amitayush Thakur (Feb 16 2025 at 07:22):

Okay agreed, my implementation may not use Array where I would have been able to write it more efficiently.

However, my question is more general. Can we argue about the time complexity of functions in general? But It seems that based on https://leanprover-community.github.io/mathlib4_docs/Init/Core.html#funext, we cannot state that one function is faster than another. However, I was wondering if we could define some form of measure to describe the number of computations needed to execute the function.

Luigi Massacci (Feb 16 2025 at 10:17):

Yes you can. I’m from my phone so it’s hard to search, but there were past discussions here on Zulip (and I think some still open PRs) with several approaches. One fairly straightforward option is to replace your function f:αβf : \alpha \to \beta to fˉ:αβ×N\bar{f} : \alpha \to \beta \times \N, where the extra natural number you carry around is your complexity measure (eg: number of comparisons, arithmetical ops, etc). Then you prove all your usual complexity results about fˉ.2\bar{f}.2 and use fˉ.1\bar{f}.1 for correctness and applications

Eric Wieser (Feb 16 2025 at 12:02):

There is an open PR with a more principled approach to this

Daniel Weber (Feb 16 2025 at 12:23):

Do you know its number? I'd be interested in reviewing that

Eric Wieser (Feb 16 2025 at 13:45):

#20924

GasStationManager (Feb 17 2025 at 07:55):

Interesting discussions! I have also been thinking about this question. Ideally, I want to be able to state a spec of a function that talks about both its return value and its complexity. E.g. in the format of the challenges at Code with Proofs: Arena. And then given a function implementation and proof that it satisfies the spec, I know that the function is both correct and efficient.

I have been experimenting with the monadic approach: use a monad to count the operation we want to keep track of. E.g. to count the number of the addition operations of an implementation of Fibonacci, a spec would be:

import Mathlib.Data.Nat.Defs

structure State (s α : Type) where
  run : s  (α × s)

instance : Monad (State s) where
  pure x := fun s => (x, s)
  bind ma f := fun s =>
    let (a, s') := ma.run s
    (f a).run s'


def increment : State Nat Unit :=
  fun s => ((), s + 1)

def countingOp (a b : Nat) : State Nat Nat := do
  increment
  pure (a + b)

-- Original recursive implementation
def f [Monad m] (op : Nat  Nat  m Nat) (n : Nat) : m Nat :=
  match n with
  | 0 => pure 1
  | 1 => pure 1
  | n + 2 =>
    do
      let x  f op (n + 1)
      let y  f op n
      op x y

-- Linear-time implementation of f
def fLinear [Monad m] (op : Nat  Nat  m Nat) (n : Nat) : m Nat :=sorry

theorem fLinear_correct (op : Nat  Nat  Id Nat) (n : Nat)  :
  (fLinear op n) =f op n := by sorry

theorem fLinear_op_calls (n : Nat):
 (cnt:Nat)->
  ((fLinear countingOp n).run cnt |>.2) = match n with
  | 0 => cnt
  | 1 => cnt
  | n' + 2 => n'+1+cnt
:= by sorry

Here one can't avoid the counting by simply using +, since op could be something other than addition.

A linear time implementation would be

-- Linear-time implementation of f
def fLinear [Monad m] (op : Nat  Nat  m Nat) (n : Nat) : m Nat :=
    let rec loop : Nat  Nat  Nat  m Nat
      | 0, a, _ => pure a
      | 1, _, b => pure b
      | i + 2, a, b => do
        let c  op b a
        loop (i + 1) b c
    loop n 1 1

(proof exercise for the reader)

One nice thing is that once I am satisfied that fLinear has the right properties, I can turn off the counting and just use fLinear with minimum overhead.

GasStationManager (Feb 17 2025 at 08:09):

I kind of suspect that this is not perfectly secure, that someone could potentially cheat by somehow "hacking" the monad. The PR linked above seems more secure, and seems great for proving lower bounds, but perhaps not so easy to express general algorithms in that formalism. Is there something in between? Maybe using metaprogramming to make a secure version?


Last updated: May 02 2025 at 03:31 UTC