Zulip Chat Archive
Stream: lean4
Topic: Codegen of functions taking Prop arguments
Michael MacLeod (May 06 2025 at 23:10):
I'm working on an implementation of merge sort defined on Array which uses USize for indexing, and only allocates O(n) extra memory. My goal is to write a formally verified, reasonably efficient implementation of mergesort. So far, I've proved that all the array access instructions are in-bounds (thank you to the people behind bv_decide), and that there are no infinite loops.
Ideally, the fact that the code uses proofs to show that array accesses are in bounds shouldn't slow down the code. Stated another way, if I were to replace all the proofs in the program with sorry, and remove all Prop function arguments, I would expect that this new unsafe code would perform identically to its safe counterpart. Unfortunately, after profiling both of these versions, it seems like the unsafe merge sort implementation is actually faster (e.g., consistently sorting an array of 10^8 Nat values in 3 seconds, in contrast to the safe version, which does the same in 4 seconds).
I turned on trace.compiler.ir.result, and, after inspecting the ir of both versions, I found that the only difference between the two versions is the safe version has an extra argument, which I think is the Prop structure I pass to it containing various propositions about the relationship between the different indices involved in sorting. Inspecting the generated C code seems to confirm this fact. Debugging the code with gdb and inserting a breakpoint at one of the function calls in my program prints the following:
l_mergeAdjacentChunksIntoAux_loop_loopLeft___rarg(x_1, x_2, x_3, x_4, x_7, x_5, x_6, x_8, lean_box(0));
Of particular interest here is the last argument, lean_box(0). The code generated from this function comes from the lean function of the following type, where H₄ arr aux low mid high ptr₁ ptr₂ i is a structure of sort Prop declared in the program:
mergeAdjacentChunksIntoAux.loop.loopLeft
{α : Type}
(arr : Array α)
(low mid high ptr₂ : USize)
(aux : Array α)
(ptr₁ i : USize)
(h₄ : H₄ arr aux low mid high ptr₁ ptr₂ i)
: Array α
What I think is happening is that lean isn't geting rid of the Prop function argument (H₄) when generating code. Instead, it keeps the argument around and passes a dummy value, lean_box(0), which is ignored by the function in question. My hypothesis is that it is this fact which explains the performance difference I see between the safe and unsafe versions of my function. Of course, I don't have a proof of this, and it's possible that there's something else going on.
Does anyone have any advice? It would be really cool to be able to write a formally verified version of mergesort which is as fast as the unsafe, unverified version. From the perspective of compiling lean code, it seems like it should be possible to remove Prop function arguments completely, instead of just supplying dummy values at runtime---of course, it's probably a lot more complicated than I'm aware of.
Aaron Liu (May 06 2025 at 23:24):
You can sneak the proofs in through a subtype and it will eliminate the extra argument
Aaron Liu (May 06 2025 at 23:25):
For your example mergeAdjacentChunksIntoAux.loop.loopLeft, this would be changing the signature to
mergeAdjacentChunksIntoAux.loop.loopLeft
{α : Type}
(arr : Array α)
(low mid high ptr₂ : USize)
(aux : Array α)
(ptr₁ : USize)
(i : Subtype (H₄ arr aux low mid high ptr₁ ptr₂))
: Array α
Michael MacLeod (May 06 2025 at 23:33):
Thanks, I wasn't aware of this possibility. It looks promising, and I'll try it out.
Michael MacLeod (May 07 2025 at 00:23):
My program is something like 900 lines of proof-related code currently, so it's going to take a little while to figure out how to do all the refactoring so it uses Subtype. I made a little demo example of how to do some of the refactoring if anyone else is interested:
structure H (a b : Nat) : Prop where
eq : a = b
-------------------- Original version
def H.next
{a b : Nat}
(h : H a b)
: H (a - 1) (b - 1) :=
{ eq := by rw [h.eq] }
def versionA (a b : Nat) (h : H a b) : Nat :=
if a = 0 then
b
else
versionA (a - 1) (b - 1) h.next
-------------------- Version with subtypes
def HB.next
{a : Nat}
(hb : Subtype (H a))
: Subtype (H (a - 1)) :=
Subtype.mk (hb.val - 1) { eq := by rw [← hb.property.eq] }
def versionB (a : Nat) (hb : Subtype (H a)) : Nat :=
if a = 0 then
hb.val
else
versionB (a - 1) (HB.next hb)
Aaron Liu (May 07 2025 at 00:59):
Here's what I got:
structure H (a b : Nat) : Prop where
eq : a = b
-------------------- Original version
theorem H.next
{a b : Nat}
(h : H a b)
: H (a - 1) (b - 1) :=
{ eq := by rw [h.eq] }
def versionA (a b : Nat) (h : H a b) : Nat :=
if a = 0 then
b
else
versionA (a - 1) (b - 1) h.next
-------------------- Version with subtypes
-- `H.next` is a theorem, so no code is generated, and you don't need to worry about efficiecy
def versionB (a : Nat) (b : Subtype (H a)) : Nat :=
if a = 0 then
b.val
else
versionB (a - 1) ⟨b.val - 1, b.property.next⟩
Aaron Liu (May 07 2025 at 00:59):
My impression is that you don't need to change a whole lot
Aaron Liu (May 07 2025 at 01:00):
My impression is that you don't need to change a whole lot
Aaron Liu (May 07 2025 at 01:00):
Did that send twice?
Aaron Liu (May 07 2025 at 01:04):
You can even make them even more similar:
def versionB' (a : Nat) (b : Subtype (H a)) : Nat :=
let ⟨b, hb⟩ := b
if a = 0 then
b
else
versionB' (a - 1) ⟨b - 1, hb.next⟩
I'm pretty sure this generates the same code as the other versionB
Michael MacLeod (May 07 2025 at 02:05):
Your version is a lot simpler, I'm going to try using it in my program. Thanks for all the help.
Michael MacLeod (May 07 2025 at 03:31):
After refactoring, I can confirm (via gdb) that the extra argument is gone, and the performance of the safe merge sort version now seems to be the same as the unsafe version (first two results below):
Testing custom Array.mergeSort (safe) using (size := 100000) (seed := 0) (average of 3 runs on each test)
0s 4ms 4222885ns mostlyAscending
0s 5ms 5022282ns randomWithDuplicates
0s 6ms 6712451ns random
0s 2ms 2500876ns ascending
0s 2ms 2658119ns descending
0s 3ms 3138381ns ascendingWithRandomTail
Testing custom Array.mergeSort (unsafe) using (size := 100000) (seed := 0) (average of 3 runs on each test)
0s 4ms 4180480ns mostlyAscending
0s 5ms 5151306ns randomWithDuplicates
0s 6ms 6795108ns random
0s 2ms 2946593ns ascending
0s 2ms 2379960ns descending
0s 3ms 3305699ns ascendingWithRandomTail
Testing List.mergeSort using (size := 100000) (seed := 0) (average of 3 runs on each test)
0s 11ms 11271286ns mostlyAscending
0s 18ms 18722682ns randomWithDuplicates
0s 20ms 20342137ns random
0s 9ms 9260695ns ascending
0s 8ms 8935158ns descending
0s 9ms 9365413ns ascendingWithRandomTail
Testing Array.qsortOrd using (size := 100000) (seed := 0) (average of 3 runs on each test)
0s 266ms 266871536ns mostlyAscending
1s 1595ms 1595832164ns randomWithDuplicates
0s 18ms 18541981ns random
0s 12ms 12463519ns ascending
0s 18ms 18904547ns descending
0s 13ms 13027594ns ascendingWithRandomTail
Last updated: Dec 20 2025 at 21:32 UTC