Zulip Chat Archive

Stream: Is there code for X?

Topic: Computation Complexity of Implementations in Lean


Amitayush Thakur (Feb 16 2025 at 06:50):

I was wondering if there is a way to argue about the computational complexity of implementations of certain algorithms in Lean.

For example, consider the two implementations of search:

def findIndex : List Int  Int  Int
| [], _ => -1
| x::xs, y => if x == y then 0
              else
                let i := findIndex xs y;
                if i == -1 then
                  -1
                else
                  i + 1

and

def binarySearchLowHi (xs: List Int) (y: Int) (low: Nat) (hi: Nat): Int :=
  if low > hi then -1
  else
    let sum := low + hi;
    let mid := if sum % 2 == 0
               then sum / 2
               else (sum - 1) / 2;
    have h_i_lt_n : mid < xs.length := by
      sorry
    -- ^ NOTE:
    -- This proof is required to ensure that
    -- list.get can be called with the index `i`
    -- without any runtime errors.
    let mid_val := xs.get mid, h_i_lt_n⟩;
    if mid_val == y then mid
    else if mid_val < y then
      binarySearchLowHi xs y (mid + 1) hi
    else
      binarySearchLowHi xs y low (mid - 1)
termination_by hi - low
decreasing_by
  simp
  rename_i h_hi_leq_low
  -- We only have to prove termination for the recursive case when
  -- `low <= hi`. This is because the base case is already covered by the `if low > hi then -1` line.
  -- We rename the hypothesis to `h_hi_leq_low` to make it more readable.
  simp at h_hi_leq_low
  have h_sum_mod_2_eq_0_or_1 : sum % 2 = 0  sum % 2 = 1 := by
    sorry
  cases h_sum_mod_2_eq_0_or_1
  rename_i h_sum_mod_2_eq_0
  rw [h_sum_mod_2_eq_0] at *
  simp -- at this point, prove that hi - ((low + hi) / 2 + 1) < hi - low
  sorry
  rename_i h_sum_mod_2_eq_1
  rw [h_sum_mod_2_eq_1] at *
  simp -- at this point, prove that hi - ((low + hi - 1) / 2 + 1) < hi - low
  sorry
  sorry

How can I state that binary search is faster than the simple search given the list is sorted

Daniel Weber (Feb 16 2025 at 07:13):

This isn't true, because a docs#List is a linked list, so it's actually less efficient and can be O(n log(n))

Daniel Weber (Feb 16 2025 at 07:13):

However, in any case, due to function extensionality (docs#funext) you can't state this directly about the function

Amitayush Thakur (Feb 16 2025 at 07:22):

Okay agreed, my implementation may not use Array where I would have been able to write it more efficiently.

However, my question is more general. Can we argue about the time complexity of functions in general? But It seems that based on https://leanprover-community.github.io/mathlib4_docs/Init/Core.html#funext, we cannot state that one function is faster than another. However, I was wondering if we could define some form of measure to describe the number of computations needed to execute the function.

Luigi Massacci (Feb 16 2025 at 10:17):

Yes you can. I’m from my phone so it’s hard to search, but there were past discussions here on Zulip (and I think some still open PRs) with several approaches. One fairly straightforward option is to replace your function f:αβf : \alpha \to \beta to fˉ:αβ×N\bar{f} : \alpha \to \beta \times \N, where the extra natural number you carry around is your complexity measure (eg: number of comparisons, arithmetical ops, etc). Then you prove all your usual complexity results about fˉ.2\bar{f}.2 and use fˉ.1\bar{f}.1 for correctness and applications

Eric Wieser (Feb 16 2025 at 12:02):

There is an open PR with a more principled approach to this

Daniel Weber (Feb 16 2025 at 12:23):

Do you know its number? I'd be interested in reviewing that

Eric Wieser (Feb 16 2025 at 13:45):

#20924

GasStationManager (Feb 17 2025 at 07:55):

Interesting discussions! I have also been thinking about this question. Ideally, I want to be able to state a spec of a function that talks about both its return value and its complexity. E.g. in the format of the challenges at Code with Proofs: Arena. And then given a function implementation and proof that it satisfies the spec, I know that the function is both correct and efficient.

I have been experimenting with the monadic approach: use a monad to count the operation we want to keep track of. E.g. to count the number of the addition operations of an implementation of Fibonacci, a spec would be:

import Mathlib.Data.Nat.Defs

structure State (s α : Type) where
  run : s  (α × s)

instance : Monad (State s) where
  pure x := fun s => (x, s)
  bind ma f := fun s =>
    let (a, s') := ma.run s
    (f a).run s'


def increment : State Nat Unit :=
  fun s => ((), s + 1)

def countingOp (a b : Nat) : State Nat Nat := do
  increment
  pure (a + b)

-- Original recursive implementation
def f [Monad m] (op : Nat  Nat  m Nat) (n : Nat) : m Nat :=
  match n with
  | 0 => pure 1
  | 1 => pure 1
  | n + 2 =>
    do
      let x  f op (n + 1)
      let y  f op n
      op x y

-- Linear-time implementation of f
def fLinear [Monad m] (op : Nat  Nat  m Nat) (n : Nat) : m Nat :=sorry

theorem fLinear_correct (op : Nat  Nat  Id Nat) (n : Nat)  :
  (fLinear op n) =f op n := by sorry

theorem fLinear_op_calls (n : Nat):
 (cnt:Nat)->
  ((fLinear countingOp n).run cnt |>.2) = match n with
  | 0 => cnt
  | 1 => cnt
  | n' + 2 => n'+1+cnt
:= by sorry

Here one can't avoid the counting by simply using +, since op could be something other than addition.

A linear time implementation would be

-- Linear-time implementation of f
def fLinear [Monad m] (op : Nat  Nat  m Nat) (n : Nat) : m Nat :=
    let rec loop : Nat  Nat  Nat  m Nat
      | 0, a, _ => pure a
      | 1, _, b => pure b
      | i + 2, a, b => do
        let c  op b a
        loop (i + 1) b c
    loop n 1 1

(proof exercise for the reader)

One nice thing is that once I am satisfied that fLinear has the right properties, I can turn off the counting and just use fLinear with minimum overhead.

GasStationManager (Feb 17 2025 at 08:09):

I kind of suspect that this is not perfectly secure, that someone could potentially cheat by somehow "hacking" the monad. The PR linked above seems more secure, and seems great for proving lower bounds, but perhaps not so easy to express general algorithms in that formalism. Is there something in between? Maybe using metaprogramming to make a secure version?

Jakub Nowak (Sep 17 2025 at 18:04):

I'm pondering formalizing PolyTime in Lean. It looks like funext makes it impossible to refer to a specific implementation of a function. E.g, given:

import Mathlib

def fib :   
| 0 => 0
| 1 => 1
| n + 2 => fib n + fib (n + 1)

def fib_impl (n : ) :  :=
  go n 0 1
where
  go (n a b : ) :=
    match n with
    | 0 => a
    | n + 1 => go n b (a + b)

one can prove h : fib_impl = fib. So regardless of what definition of PolyTime one could think up, given t : PolyTime fib_impl we get h ▸ t : PolyTime fib.

In other words Lean's value of a function only refers to its semantics. To define PolyTime we have to somehow refer to its implementation. Maybe this could somehow be possible by extracting elaborated Lean.Expr of a function and defining PolyTime on it?

One more though is that maybe we could define PolyTime to mean "there exists a polynomial implementation of this function". This definition would not contradict funext.

Violeta Hernández (Sep 17 2025 at 18:29):

You'd have to formalize an algorithm in some separate way, using Turing Machines or Lambda Calculus or whatever your favorite model of computation is

Aaron Liu (Sep 17 2025 at 18:29):

I think we want polytime to be independent of what the compiler is doing

Violeta Hernández (Sep 17 2025 at 18:30):

Then you can talk about the number of operations employed in an algorithm, and then you can formalize computational complexity as some equivalence class of this function

Eric Wieser (Sep 17 2025 at 18:36):

You could use the free monad from CSLib, which you can instantiate with an effect for an "addition oracle"

Jakub Nowak (Sep 17 2025 at 22:52):

Violeta Hernández said:

You'd have to formalize an algorithm in some separate way, using Turing Machines or Lambda Calculus or whatever your favorite model of computation is

The point is, that elaborated Lean.Expr is already Lambda Calculus term.

Aaron Liu (Sep 17 2025 at 22:53):

it's actually a Lean Expr term

Aaron Liu (Sep 17 2025 at 22:53):

depending on which variant of lambda calculus you use it may not be a lambda calculus term

Jakub Nowak (Sep 17 2025 at 23:00):

I agree, it's some Lambda Calculus. I think it's similar, but not exactly, to the usual System F.
It still should be possible to define PTIME in terms of number of beta-reductions of Lean.Expr.

Aaron Liu (Sep 17 2025 at 23:02):

Remember the natural number n is represented as n applications of Nat.succ to Nat.zero

Kenny Lau (Sep 17 2025 at 23:04):

well...

Jakub Nowak (Sep 17 2025 at 23:04):

It could also be Lean.Expr.lit? I don't know when it's used and how the Nat optimization works.

Aaron Liu (Sep 17 2025 at 23:09):

Jakub Nowak said:

It could also be Lean.Expr.lit? I don't know when it's used and how the Nat optimization works.

there's a whole list of optimizations for reducing applications of certain function on nat literals

Aaron Liu (Sep 17 2025 at 23:09):

docs#Lean.Meta.reduceNat?

Jakub Nowak (Sep 17 2025 at 23:11):

The distinction between binary and unary representation of numbers is important and completely changes what is PTIME and what isn't. So this distinction will also have to be somehow made in Lean.

Niels Voss (Sep 17 2025 at 23:12):

My understanding is that Lean has two different types of optimizations: Compiler-level optimizations, which only affect the behavior of code at runtime and during native_decide, and kernel-level optimizations, which affect the typechecking of code.

Most optimizations are compiler-level (you'll see them @[implemented_by] or @[extern] annotations, and maybe also @[csimp]) but there is a kernel-level optimization for Nat which makes arithmetic in Nat use the GNU MultiPrecision library (GMP), which makes reducing integers close to linear time (maybe logarithmic time?) in the number of bits in the number, rather than exponential time in the number of bits in the number.

Aaron Liu (Sep 17 2025 at 23:13):

there's also the thing about reducing well-foundedness proofs being slow

Niels Voss (Sep 17 2025 at 23:14):

You might want to check out https://proofassistants.stackexchange.com/q/2018/6046 and https://proofassistants.stackexchange.com/q/1976/6046

Kenny Lau (Sep 17 2025 at 23:14):

it seems a bit dangerous to be inspecting expressions of "real-world" objects and then outputting things to the "real world"

Kenny Lau (Sep 17 2025 at 23:14):

(where "real world" means the world that Lean thinks it is in)

Aaron Liu (Sep 17 2025 at 23:16):

Kenny Lau said:

(where "real world" means the world that Lean thinks it is in)

what's "real world" I don't get it

Kenny Lau (Sep 17 2025 at 23:16):

well it depends on whether you believe there is anything beyond the Expressions

Aaron Liu (Sep 17 2025 at 23:16):

I still don't get it

Kenny Lau (Sep 17 2025 at 23:17):

like, only countably many real numbers are actually expressable, but there are uncountably many real numbers

Aaron Liu (Sep 17 2025 at 23:17):

we have well over 10 different layers of abstraction from the hardware to the Lean program, most of which could be called a "real world"

Aaron Liu (Sep 17 2025 at 23:17):

and I still don't know if you're referring to any of them

Kenny Lau (Sep 17 2025 at 23:18):

probably not

Niels Voss (Sep 17 2025 at 23:18):

I guess ultimately, you have to define a model of computation, and then prove stuff about that. Proving properties about your model of computation is pure math, and can be done in Lean. Translating your lean functions to meta-versions of those functions in your model of computation can be done, but maybe not in a verifiable way (not sure about this). Claiming that your model of computation matches the model of computation in reality is outside the domain of pure math and cannot be formalized in Lean.

Jakub Nowak (Sep 17 2025 at 23:20):

Exactly! Thank you for saying that clearly!

Jakub Nowak (Sep 17 2025 at 23:25):

So my idea is, to define model of computation directly on Lean.Expr. You already have to trust in Lean compiler to do elaboration from Lean language to Lean.Expr that preserves semantics. Lean language is high-level and easier to write in. While Lean.Expr is much simpler and it's easier to define "computation" and "complexity" on it. This approach makes it possible to prove things about complexity of programs written in high-level Lean language.

I think that this approach is what is being proposed in the first answer to this question. https://proofassistants.stackexchange.com/questions/2018/what-proof-assistants-can-reason-about-the-complexity-of-its-own-programs
Although, I'm not sure whether the author meant to use Lean.Expr or TSyntax.
EDIT: I think the author meant to use TSyntax. But that would require defining the semantics of Lean's syntax, which would be pretty problematic. And would make it impossible to work with user-defined elaborated syntax.

Jakub Nowak (Sep 18 2025 at 00:17):

Kenny Lau said:

like, only countably many real numbers are actually expressable, but there are uncountably many real numbers

I think you could define real numbers to be only these real numbers there are expressable in Lean, and that wouldn't change anything from the Lean side. Every theorem would still be true and you wouldn't be able to prove more theorems (in Lean itself).

Aaron Liu (Sep 18 2025 at 00:33):

but there are uncountably many real numbers

Aaron Liu (Sep 18 2025 at 00:33):

and countably many expressible ones

Aaron Liu (Sep 18 2025 at 00:33):

you should clarify what you mean by that

Henrik Böving (Sep 18 2025 at 06:52):

Jakub Nowak said:

So my idea is, to define model of computation directly on Lean.Expr. You already have to trust in Lean compiler to do elaboration from Lean language to Lean.Expr that preserves semantics. Lean language is high-level and easier to write in. While Lean.Expr is much simpler and it's easier to define "computation" and "complexity" on it. This approach makes it possible to prove things about complexity of programs written in high-level Lean language.

I think that this approach is what is being proposed in the first answer to this question. https://proofassistants.stackexchange.com/questions/2018/what-proof-assistants-can-reason-about-the-complexity-of-its-own-programs
Although, I'm not sure whether the author meant to use Lean.Expr or TSyntax.
EDIT: I think the author meant to use TSyntax. But that would require defining the semantics of Lean's syntax, which would be pretty problematic. And would make it impossible to work with user-defined elaborated syntax.

This is still a highly underspecified approach though. You don't just get away with saying that you are in a lambda calculus in complexity. What kind of semantics do you have for that calculus? And what sort of operation are you even counting as a computation step. Lean.Expr answers you none of these questions and depending on your answer to these questions your proofs might end up looking vastly different.

Shreyas Srinivas (Sep 18 2025 at 09:01):

To add to all the above, the query/free monad approach can let you capture complexity in many models.

Jakub Nowak (Sep 18 2025 at 11:29):

Henrik Böving said:

This is still a highly underspecified approach though.

That's why I called it "idea". I'm currently exploring it. I appreciate the comments, especially pointing out the problems with that approach. IIrc for an untyped lambda calculus if you apply beta-reductions in a normal order (i.e. left to right) and count number of beta-reductions you get a definition of complexity that is equivalent to a typical definition in turing machines up to some linear term. I.e. you can simulate beta-reduction on turing machine and you can simulate turing machines in lambda calculus, both in linear time. But that's for untyped lambda calculus, no idea if the result applies for System F, or specifically for Lean.Expr in this case.

Henrik Böving (Sep 18 2025 at 12:20):

You also have to keep in mind that turing machines are not necessarily the model of complexity you want to aim for. e.g. RAM machines are much more interesting if you want to talk about "real world" algorithms.

Shreyas Srinivas (Sep 18 2025 at 12:24):

Speaking of RAM machines, you need to be really careful about what a word means in this RAM, and what operations are unit operations. That being said, the relevant conceptual content in algorithms can be expressed at a higher level of abstraction than the RAM model.

Shreyas Srinivas (Sep 18 2025 at 12:26):

In many cases it is more important to know that you can solve a problem by so many max s-t flow queries, or reachability queries, than it is to know the exact complexity.

Jakub Nowak (Sep 18 2025 at 14:00):

RAM machine model also gives exactly the same PTIME class I think?

Jakub Nowak (Sep 18 2025 at 14:00):

My use-case for formalizing complexity is in cryptography, e.g. zero-knowledge proofs.


Last updated: Dec 20 2025 at 21:32 UTC