Zulip Chat Archive

Stream: lean4

Topic: Term reduction order

Shreyas Srinivas (Jan 19 2024 at 15:19):

I had a silly question I should have asked long ago. Does lean reduce terms inside binders before applying the function? If I were to write a CPS style tail recursive function, would the continuations just pile up as a big thunk or get reduced at each recursive call?

Shreyas Srinivas (Jan 19 2024 at 15:28):

Here's a textbook example as an #mwe:

import Mathlib.Tactic

def sum_cont_aux (xs : List ℕ) (cont : ℕ → α) :=
  match xs with
  | [] => cont 0
  | y :: ys => sum_cont_aux ys (fun n ↦ cont (y + n))

def sum_and_id (xs : List ℕ) := sum_cont_aux xs (fun x => x)
def sum_and_square (xs : List ℕ) := sum_cont_aux xs (fun x => x * x)
#reduce sum_and_id [1,2,3,4] -- 10

James Gallicchio (Jan 19 2024 at 17:42):

In the runtime you just get a big pile of thunks. CPS transformation, roughly speaking, puts on the heap what would normally be on the stack :-)

Mario Carneiro (Jan 19 2024 at 18:21):

what's more, I think lean will not do TCO to cont in the generated thunk, meaning that this program will use O(n) stack and run out for large n, not on the way down (this is tail recursive) but on the way back, when you call cont 0 on your giant stack of thunks and it calls another thunk which calls another thunk etc

Shreyas Srinivas (Jan 19 2024 at 18:28):

Is there a way to generate these thunks and inspect them within lean?

Shreyas Srinivas (Jan 19 2024 at 18:29):

And to get more to the point, measure the evaluation time?

Shreyas Srinivas (Jan 19 2024 at 18:30):

Similar to the Linux time command

Mario Carneiro (Jan 19 2024 at 18:32):

you should be able to use set_option profiler true and/or #time to time things

Shreyas Srinivas (Jan 19 2024 at 18:34):

Do I need additional imports (I always have Mathlib.Tactic)

Shreyas Srinivas (Jan 19 2024 at 18:34):

It doesn't work as is

Shreyas Srinivas (Jan 19 2024 at 18:39):

Current toolchain : leanprover/lean4:v4.5.0-rc1

James Gallicchio (Jan 19 2024 at 19:11):

Mario Carneiro said:

what's more, I think lean will not do TCO to cont in the generated thunk

wait, really? :frowning: that feels like a big missed opportunity

Kyle Miller (Jan 19 2024 at 20:26):

For efficient reduction, you might want to use a strictness operator to force y to evaluate. I think this works?

import Mathlib.Tactic

set_option autoImplicit true

def Nat.strict (n : ℕ) (k : ℕ → α) : α :=
  match n with
  | 0 => k 0
  | _ + 1 => k n

def sum_cont_aux (xs : List ℕ) (cont : ℕ → α) :=
  match xs with
  | [] => cont 0
  | y :: ys => Nat.strict y fun y => sum_cont_aux ys (fun n ↦ cont (y + n))

def sum_and_id (xs : List ℕ) := sum_cont_aux xs (fun x => x)
def sum_and_square (xs : List ℕ) := sum_cont_aux xs (fun x => x * x)
#whnf sum_and_id [1,2,3,4] -- 10

Also, no need for #reduce if it's just a Nat, unless you want to partially evaluate something with free variables.

Kyle Miller (Jan 19 2024 at 20:28):

There's a chance this would work too:

def Nat.seq (n : ℕ) (b : α) : α :=
  match n with
  | 0 => b
  | _ + 1 => b

def sum_cont_aux (xs : List ℕ) (cont : ℕ → α) :=
  match xs with
  | [] => cont 0
  | y :: ys => Nat.seq y <| sum_cont_aux ys (fun n ↦ cont (y + n))

but I haven't tested it carefully. It's like the haskell seq operator, and it's relying on the fact that it's adding the reduction of y to the cache. The cost of these solutions by the way are additional stack frames for evaluating Nat.strict/Nat.cont. I think Nat.seq might be better here, if it works.

Kyle Miller (Jan 19 2024 at 20:40):

@James Gallicchio I think there are plenty of missed opportunities around due to having had no resources put towards them yet. So far, having a correct and simple kernel has been the priority. Making the kernel support TCO in an efficient way would make its implementation less simple, or at least it would take restructuring how it works so that there aren't multiple C++ stack frames in the way of performing TCO.

I know one of the missions of the FRO is to improve proof by reflection support. Maybe this is something that will have resources put into it at some point. I don't know this part of the roadmap, so this is just speculation on my part.

Shreyas Srinivas (Jan 19 2024 at 20:41):

Is the reduction order identical in the kernel and the VM. That would be strange because the VM is just executing code and trying to be optimal, whereas the kernel needs to be faithful to the user's definition upto some syntactic level (or maybe Expr)

Kyle Miller (Jan 19 2024 at 20:42):

No, it's not identical. When doing whnf code is evaluated lazily, effectively.

James Gallicchio (Jan 19 2024 at 20:50):

Kyle Miller said:

Making the kernel support TCO in an efficient way would make its implementation less simple

I mostly mean the runtime, not the kernel. But yeah, the super-constrained resources nonetheless applies :big_smile: I'm also guessing (hoping?) the new compiler will make this easier to implement

Henrik Böving (Jan 19 2024 at 20:52):

Shreyas Srinivas said:

Is the reduction order identical in the kernel and the VM. That would be strange because the VM is just executing code and trying to be optimal, whereas the kernel needs to be faithful to the user's definition upto some syntactic level (or maybe Expr)

The VM has no sense of reduction or terms really.

Kyle Miller (Jan 19 2024 at 20:56):

@James Gallicchio I guess I don't understand what you meant by "runtime" since Shreyas was talking about #reduce. Do you mean VM evaluation (#eval) or compiled code evaluation? The compiler for those definitely has TCO.

Henrik Böving (Jan 19 2024 at 20:58):

The VM used in #eval can do TCO. The compiled code is basically at the whim of the C compiler with respect to TCO. This is one of the things that we can e.g. guarantee with the LLVM backend.

Kyle Miller (Jan 19 2024 at 21:00):

You can also see from the set_option trace.compiler true output that it's building a chain of closures:

[compiler.lambda_pure] >> sum_cont_aux._rarg._lambda_1 := fun _x_1 _x_2 _x_3 ↦
  let _x_4 := Nat.add _x_1 _x_3;
  let _x_5 := _apply _x_2 _x_4;
  _x_5
[compiler.lambda_pure] >> sum_cont_aux._rarg := fun _x_1 _x_2 ↦
  List.casesOn
    (let _x_5 := _proj.0 _x_1;
    let _x_6 := _proj.1 _x_1;
    let _x_7 := _closure sum_cont_aux._rarg._lambda_1 _x_5 _x_2;
    let _x_8 := sum_cont_aux._rarg _x_6 _x_7;
    _x_8)
[compiler.lambda_pure] >> sum_cont_aux := fun _x_1 ↦
  let _x_2 := _closure sum_cont_aux._rarg;
  _x_2

I'm not sure it can do any better than this unless sum_cont_aux were to be specialized to a particular list.

But this doesn't have any bearing on #reduce, just to be clear.

Henrik Böving (Jan 19 2024 at 21:11):

Henrik Böving said:

The VM used in #eval can do TCO. The compiled code is basically at the whim of the C compiler with respect to TCO. This is one of the things that we can e.g. guarantee with the LLVM backend.

Note that this is kind of sales pitchy, of course C compilers tend to be not stupid and do end up doing TCO, it's just about having the 100% certainty

Joachim Breitner (Jan 19 2024 at 21:51):

The “VM used in #eval” is ir_interpreter.cpp, right?
I see code to do TCO for manifest self-recursion, but I am not sure I see where it would TCO a tail-call to another function, or to a closure – but maybe I am missing it?
(This would not affect just #eval, it’s evaluating anything that wasn’t compiled to native code, e.g. tactics defined in std or mathlib, right?)

Henrik Böving (Jan 19 2024 at 22:11):

The “VM used in #eval” is ir_interpreter.cpp, right?

right

I see code to do TCO for manifest self-recursion, but I am not sure I see where it would TCO a tail-call to another function, or to a closure – but maybe I am missing it?

I don't think it does that, in general the IR interpreter is not very well optimized (which we definitely should work on!), for example it is more friendly to the branch predictor to use a different model of execution than this and many other things that we could try to do.

(This would not affect just #eval, it’s evaluating anything that wasn’t compiled to native code, e.g. tactics defined in std or mathlib, right?)

Indeed! That's why it would be good to put work into making it faster (or alternatively having a cool copy and patch JIT that runs fast and produces good enough code!)

Shreyas Srinivas (Jan 19 2024 at 22:14):

Kyle Miller said:

James Gallicchio I guess I don't understand what you meant by "runtime" since Shreyas was talking about #reduce. Do you mean VM evaluation (#eval) or compiled code evaluation? The compiler for those definitely has TCO.

Sorry this one's my fault. Lately I have been doing some PL formalization and gotten into the habit of using #reduce because it works well with unexpanders, as opposed to #eval which insists on showing me the raw terms.

Last updated: May 02 2025 at 03:31 UTC