Zulip Chat Archive

Stream: lean4

Topic: Faster Nat-indexed data structure for kernel reduction

Joachim Breitner (Sep 18 2024 at 14:35):

In proofs by reflection (omega, bv_decide) one typically has a closed “problem” with Nat indices, and a separate map from these indices to the actual expression atoms appearing in the proof goal. This requires an “denotation” function that relates the problem to the goal, which needs to be evaluated by the kernel.

In bv_decide the map is a plain list, and @Henrik Böving mentioned that in (maybe artificial) test cases with many variable this becomes the bottleneck. So we briefly investigated this.

The base line is this code, which generates a large manifest list and measures indexing into it:

import Lean

open Lean

def exp : Nat := 8

def list : List Nat := by
  run_tac
    let goal ← Elab.Tactic.getMainGoal
    let l := List.range (2^exp)
    goal.assign (toExpr l)

set_option maxRecDepth 2000

set_option trace.profiler true in
def test1 : ∀ n < 2^exp, list.get? n = Option.some n := by decide

This shows 1.835058 for kernel type-checking.

Clearly, a linear list is not a great data structure to index into. So here is a binary tree that should at least theoretically gives us a logarithmic lookup:

inductive BinTree (α : Type) : Type where
  | empty
  | node : Nat → α → BinTree α → BinTree α → BinTree α
deriving Inhabited

def BinTree.toExpr [ToExpr α] : BinTree α → Expr
  | .node p x t1 t2 => mkApp5 (.const ``node []) (toTypeExpr α) (ToExpr.toExpr p) (ToExpr.toExpr x) t1.toExpr t2.toExpr
  | .empty => .app (.const ``empty []) (toTypeExpr α)

instance [ToExpr α] : ToExpr (BinTree α) where
  toTypeExpr := .app (.const ``BinTree []) (toTypeExpr α)
  toExpr := BinTree.toExpr

partial def BinTree.ofList (l : List α) (o : Nat := 0) : BinTree α := Id.run do
  if l.isEmpty then return .empty
  let s := l.length/2
  let l1 := l.take s
  let x::l2 := l.drop s | unreachable!
  .node (o+s) x (BinTree.ofList l1 o) (BinTree.ofList l2 (o + s + 1))

def BinTree.lookup? (idx : Nat) : BinTree α → Option α
  | empty => .none
  | node pivot x t1 t2 =>
    if idx < pivot then t1.lookup? idx else
    if idx = pivot then x else
    t2.lookup? idx

def tree : BinTree Nat := by
  run_tac
    let goal ← Elab.Tactic.getMainGoal
    let l := List.range (2^exp)
    let t := BinTree.ofList l
    goal.assign (toExpr t)

That's 0.325068 for kernel checking in

set_option trace.profiler true in
def test2 : ∀ n < 2^exp, tree.lookup? n = Option.some n := by decide

We can shave off some constants by realizing that structural recursion compiles to a complicated construction using brecOn etc, so if we define a “kernel-reduction friendly lookup”, like so

noncomputable def BinTree.lookup2? (idx : Nat) : BinTree α → Option α :=
  BinTree.rec
    .none
    (fun pivot x t1 t2 ih1 ih2 =>
      if idx < pivot then ih1 else
      if idx = pivot then x else
      ih2)

Then

set_option trace.profiler true in
def test3 : ∀ n < 2^exp, tree.lookup2? n = Option.some n := by decide

gives 0.282465.

Full code

More data points:

2^n	List.get?	Tree.lookup?	Tree.lookup2?
4	0.010372	0.010587	0.007030
5	0.031792	0.023548	0.018365
6	0.120239	0.054061	0.038838
7	0.482878	0.134126	0.103754
8	1.827731	0.327895	0.264725

Take-aways:

Code that we want the kernel to evaluate quickly should still be “good” functional code, and use suitable data structures.
To get another 20% of speedup, avoid recursive functions and use combinators directly.

So my question into the room is: Do people in practice have omega or bv_decide proofs with a larger number of variables where this optimization would be worth the effort? Or is this all realistic cases dwarfed by other costs (e.g. that of producing and checking the certificate)?

I was told that in particular @Alex Keizer @Siddharth Bhat or @Shilpi Goel might be interested :-)

Notification Bot (Sep 18 2024 at 14:56):

18 messages were moved from this topic to #lean4 > Reflection tactics and unwanted reduction by Joachim Breitner.

Mario Carneiro (Sep 18 2024 at 15:42):

Here's a test using a quickly hacked together proof producing version of your test, and it is indeed faster:

import Lean
import Qq

open Lean

def exp : Nat := 8

def list : List Nat := by
  run_tac
    let goal ← Elab.Tactic.getMainGoal
    let l := List.range (2^exp)
    goal.assign (toExpr l)

set_option maxRecDepth 2000

theorem list_head {l : List Nat} {a} : (a::l).get? 0 = some a := rfl
theorem list_tail {l : List Nat} {n a b} (h : l.get? n = some a) :
  (b::l).get? n.succ = some a := h ▸ rfl

open Qq

def proveList (n : Nat) (l : Q(List Nat)) : MetaM ((a : Q(Nat)) × Q(($l).get? $n = some $a)) := do
  let .app (.app _ (a : Q(Nat))) (l' : Q(List Nat)) := l | failure
  show MetaM ((b : Q(Nat)) × Q(($a :: $l').get? $n = some $b)) from do
  match n with
  | 0 => pure ⟨a, q(list_head)⟩
  | n+1 =>
    let ⟨b, p⟩ ← proveList n l'
    pure ⟨b, q(list_tail $p)⟩

theorem forall_zero {P : Nat → Prop} : ∀ a < 0, P a := by simp
theorem forall_succ {P : Nat → Prop} {n} (H1 : ∀ a < n, P a) (H2 : P n) :
    ∀ a < n.succ, P a := by
  intro a h
  cases Nat.lt_succ_iff_lt_or_eq.1 h with
  | inl h => exact H1 _ h
  | inr h => exact h ▸ H2

def proveForall (P : Q(Nat → Prop))
    (H : ∀ a : Nat, MetaM Q($P $a)) (n : Nat) : MetaM (Q(∀ n < $n, $P n)) := do
  match n with
  | 0 => pure q(forall_zero)
  | n+1 =>
    let H1 ← proveForall P H n
    let H2 ← H n
    pure q(forall_succ $H1 $H2)

elab "prove_list_tac" : tactic =>
  Elab.Tactic.liftMetaTactic1 fun g => do
  let x : Q(Prop) ← g.getType'
  let ~q(∀ x < $n, List.get? (α := Nat) $l x = some x) := x | failure
  let n ← unsafe Meta.evalExpr Nat q(Nat) n
  let p ← proveForall q(fun x => List.get? (α := Nat) $l x = some x) (fun x => (·.2) <$> proveList x l) n
  g.assign p
  pure none

set_option maxRecDepth 2000

set_option trace.profiler true in
def test1 : ∀ n < 2^exp, list.get? n = Option.some n := by decide
-- 3.301 s

set_option trace.profiler true in
def test2 : ∀ n < 2^exp, list.get? n = Option.some n := by unfold list; prove_list_tac
-- 0.928 s

Mario Carneiro (Sep 18 2024 at 15:43):

you can of course do much better if you exploit structural properties of List.range here, but this is presumably not the point of the test. This is mimicking the quadratic behavior of the original test

Daniel Weber (Sep 18 2024 at 15:44):

You can also try something like List.forall_ind from here, although I guess that's also not the point of the test

Joachim Breitner (Sep 18 2024 at 15:50):

Hmm, still not quite my point, but interesting indeed!

And this is really creating a proof terms with approx 2^exp * 2^exp/2 constructor applications, one for each step in the function reduction?

Not sure if I should be sad about the reduction performance or happy about the type checking performance :-)

Last updated: May 02 2025 at 03:31 UTC