Zulip Chat Archive

Stream: lean4

Topic: Codegen of functions taking Prop arguments

Michael MacLeod (May 06 2025 at 23:10):

I'm working on an implementation of merge sort defined on Array which uses USize for indexing, and only allocates O(n) extra memory. My goal is to write a formally verified, reasonably efficient implementation of mergesort. So far, I've proved that all the array access instructions are in-bounds (thank you to the people behind bv_decide), and that there are no infinite loops.

Ideally, the fact that the code uses proofs to show that array accesses are in bounds shouldn't slow down the code. Stated another way, if I were to replace all the proofs in the program with sorry, and remove all Prop function arguments, I would expect that this new unsafe code would perform identically to its safe counterpart. Unfortunately, after profiling both of these versions, it seems like the unsafe merge sort implementation is actually faster (e.g., consistently sorting an array of 10^8 Nat values in 3 seconds, in contrast to the safe version, which does the same in 4 seconds).

I turned on trace.compiler.ir.result, and, after inspecting the ir of both versions, I found that the only difference between the two versions is the safe version has an extra argument, which I think is the Prop structure I pass to it containing various propositions about the relationship between the different indices involved in sorting. Inspecting the generated C code seems to confirm this fact. Debugging the code with gdb and inserting a breakpoint at one of the function calls in my program prints the following:

l_mergeAdjacentChunksIntoAux_loop_loopLeft___rarg(x_1, x_2, x_3, x_4, x_7, x_5, x_6, x_8, lean_box(0));

Of particular interest here is the last argument, lean_box(0). The code generated from this function comes from the lean function of the following type, where H₄ arr aux low mid high ptr₁ ptr₂ i is a structure of sort Prop declared in the program:

mergeAdjacentChunksIntoAux.loop.loopLeft
  {α : Type}
  (arr : Array α)
  (low mid high ptr₂ : USize)
  (aux : Array α)
  (ptr₁ i : USize)
  (h₄ : H₄ arr aux low mid high ptr₁ ptr₂ i)
  : Array α

What I think is happening is that lean isn't geting rid of the Prop function argument (H₄) when generating code. Instead, it keeps the argument around and passes a dummy value, lean_box(0), which is ignored by the function in question. My hypothesis is that it is this fact which explains the performance difference I see between the safe and unsafe versions of my function. Of course, I don't have a proof of this, and it's possible that there's something else going on.

Does anyone have any advice? It would be really cool to be able to write a formally verified version of mergesort which is as fast as the unsafe, unverified version. From the perspective of compiling lean code, it seems like it should be possible to remove Prop function arguments completely, instead of just supplying dummy values at runtime---of course, it's probably a lot more complicated than I'm aware of.

Aaron Liu (May 06 2025 at 23:24):

You can sneak the proofs in through a subtype and it will eliminate the extra argument

Aaron Liu (May 06 2025 at 23:25):

For your example mergeAdjacentChunksIntoAux.loop.loopLeft, this would be changing the signature to

mergeAdjacentChunksIntoAux.loop.loopLeft
  {α : Type}
  (arr : Array α)
  (low mid high ptr₂ : USize)
  (aux : Array α)
  (ptr₁ : USize)
  (i : Subtype (H₄ arr aux low mid high ptr₁ ptr₂))
  : Array α

Michael MacLeod (May 06 2025 at 23:33):

Thanks, I wasn't aware of this possibility. It looks promising, and I'll try it out.

Michael MacLeod (May 07 2025 at 00:23):

My program is something like 900 lines of proof-related code currently, so it's going to take a little while to figure out how to do all the refactoring so it uses Subtype. I made a little demo example of how to do some of the refactoring if anyone else is interested:

structure H (a b : Nat) : Prop where
  eq : a = b

-------------------- Original version

def H.next
    {a b : Nat}
    (h : H a b)
    : H (a - 1) (b - 1) :=
  { eq := by rw [h.eq] }

def versionA (a b : Nat) (h : H a b) : Nat :=
  if a = 0 then
    b
  else
    versionA (a - 1) (b - 1) h.next

-------------------- Version with subtypes

def HB.next
    {a : Nat}
    (hb : Subtype (H a))
    : Subtype (H (a - 1)) :=
  Subtype.mk (hb.val - 1) { eq := by rw [← hb.property.eq] }

def versionB (a : Nat) (hb : Subtype (H a)) : Nat :=
  if a = 0 then
    hb.val
  else
    versionB (a - 1) (HB.next hb)

Aaron Liu (May 07 2025 at 00:59):

Here's what I got:

structure H (a b : Nat) : Prop where
  eq : a = b

-------------------- Original version

theorem H.next
    {a b : Nat}
    (h : H a b)
    : H (a - 1) (b - 1) :=
  { eq := by rw [h.eq] }

def versionA (a b : Nat) (h : H a b) : Nat :=
  if a = 0 then
    b
  else
    versionA (a - 1) (b - 1) h.next

-------------------- Version with subtypes
-- `H.next` is a theorem, so no code is generated, and you don't need to worry about efficiecy

def versionB (a : Nat) (b : Subtype (H a)) : Nat :=
  if a = 0 then
    b.val
  else
    versionB (a - 1) ⟨b.val - 1, b.property.next⟩

Aaron Liu (May 07 2025 at 00:59):

My impression is that you don't need to change a whole lot

Aaron Liu (May 07 2025 at 01:00):

My impression is that you don't need to change a whole lot

Aaron Liu (May 07 2025 at 01:00):

Did that send twice?

Aaron Liu (May 07 2025 at 01:04):

You can even make them even more similar:

def versionB' (a : Nat) (b : Subtype (H a)) : Nat :=
  let ⟨b, hb⟩ := b
  if a = 0 then
    b
  else
    versionB' (a - 1) ⟨b - 1, hb.next⟩

I'm pretty sure this generates the same code as the other versionB

Michael MacLeod (May 07 2025 at 02:05):

Your version is a lot simpler, I'm going to try using it in my program. Thanks for all the help.

Michael MacLeod (May 07 2025 at 03:31):

After refactoring, I can confirm (via gdb) that the extra argument is gone, and the performance of the safe merge sort version now seems to be the same as the unsafe version (first two results below):

Testing custom Array.mergeSort (safe) using (size := 100000) (seed := 0) (average of 3 runs on each test)
0s          4ms             4222885ns               mostlyAscending
0s          5ms             5022282ns               randomWithDuplicates
0s          6ms             6712451ns               random
0s          2ms             2500876ns               ascending
0s          2ms             2658119ns               descending
0s          3ms             3138381ns               ascendingWithRandomTail

Testing custom Array.mergeSort (unsafe) using (size := 100000) (seed := 0) (average of 3 runs on each test)
0s          4ms             4180480ns               mostlyAscending
0s          5ms             5151306ns               randomWithDuplicates
0s          6ms             6795108ns               random
0s          2ms             2946593ns               ascending
0s          2ms             2379960ns               descending
0s          3ms             3305699ns               ascendingWithRandomTail

Testing List.mergeSort using (size := 100000) (seed := 0) (average of 3 runs on each test)
0s          11ms            11271286ns              mostlyAscending
0s          18ms            18722682ns              randomWithDuplicates
0s          20ms            20342137ns              random
0s          9ms             9260695ns               ascending
0s          8ms             8935158ns               descending
0s          9ms             9365413ns               ascendingWithRandomTail

Testing Array.qsortOrd using (size := 100000) (seed := 0) (average of 3 runs on each test)
0s          266ms           266871536ns             mostlyAscending
1s          1595ms          1595832164ns            randomWithDuplicates
0s          18ms            18541981ns              random
0s          12ms            12463519ns              ascending
0s          18ms            18904547ns              descending
0s          13ms            13027594ns              ascendingWithRandomTail

Last updated: Feb 28 2026 at 14:05 UTC