Zulip Chat Archive

Stream: lean4

Topic: Performance of terms after erasure


James Gallicchio (Jul 10 2022 at 17:13):

I'm trying to do something a bit weird and implement a List.fold' (that exposes proof of membership) in terms of just fold; I am using some trickery to push off the proof obligations until outside the fold (where they can actually be dispatched) but it tanks the performance of the code

structure Partial (α : Type u) where
  dom : Prop
  get : dom  α

def fold' (l : List τ) (f : β  (x : τ)  x  l  β) (acc : β) : β :=
  let temp := List.foldl
    (fun dom,acc x => x  l  dom, λ h_x, h => f (acc h) x h_x⟩)
    (⟨True, λ _ => acc : Partial β)
    l
  temp.get (by
    have : temp.dom = List.foldl (fun acc x => x  l  acc) True l := by
      sorry
    rw [this]
    sorry
    )

James Gallicchio (Jul 10 2022 at 17:15):

I was hoping that the Partial gets erased to just () → α, and that the fold closure fun ⟨dom,acc⟩ x => ⟨x ∈ l ∧ dom, λ ⟨h_x, h⟩ => f (acc h) x h_x⟩ would be simplified to fun acc x () => f (acc ()) x, does that all seem accurate? not entirely sure how to decifer the generated C code

James Gallicchio (Jul 10 2022 at 17:18):

Alternatively, does anyone have a better idea how to implement fold' in terms of foldl that at runtime just makes a single call to foldl?

Yakov Pechersky (Jul 10 2022 at 17:42):

Take a look at how lean3 does it for docs#list.pmap and docs#list.attach

Wojciech Nawrocki (Jul 10 2022 at 17:53):

Not really an answer, but you might find this thread relevant.

James Gallicchio (Jul 10 2022 at 18:07):

ah, that is relevant. hm

James Gallicchio (Jul 10 2022 at 18:30):

So terms like fun x h => f x (by blah) are not simplified to f

James Gallicchio (Jul 10 2022 at 18:31):

I will play around with this a bit more, thanks for the links :)

Wojciech Nawrocki (Jul 10 2022 at 19:21):

Are h and (by blah) supposed to prove the same proposition, or two definitionally equal ones? If so, then I guess you could simplify it to f by proof irrelevance. But if not, it wouldn't be type-correct. (Or you mean in the compiler? It does erase proofs but not always, I think, so maybe this is one of the non-erased cases.)

Sebastian Ullrich (Jul 10 2022 at 19:30):

Proofs should always be erased

Sebastian Ullrich (Jul 10 2022 at 19:43):

Half of the performance difference is due to a missing @[specialize]. After that, the issue is probably that () → α is still a closure that is composed and then invoked only after the fold.

James Gallicchio (Jul 16 2022 at 19:24):

Ah, thank you for the tip wrt specialize. But the cost of the closures was what I expected to be quite high -- I still can't figure out a way to avoid building up the list (or an equivalent stack of closures) while defining fold'. Duplicating the with-proof and without proof versions is annoying but a fine workaround

Kyle Miller (Jul 16 2022 at 19:35):

If I understand your test correctly, you're mostly comparing the compiled performance of List.fold against the VM performance of your List.fold'. Maybe try compiling your version and comparing against that? (For example, by having an actual main function with your test and building it.)

James Gallicchio (Jul 16 2022 at 19:36):

Wojciech Nawrocki said:

Are h and (by blah) supposed to prove the same proposition, or two definitionally equal ones? If so, then I guess you could simplify it to f by proof irrelevance. But if not, it wouldn't be type-correct. (Or you mean in the compiler? It does erase proofs but not always, I think, so maybe this is one of the non-erased cases.)

No, I mean in the non-type-correct sense. Which is why it'd have to be erased & simplified after typechecking. And of course it's not always safe to do that simplification, right, since fun (h : False) => False.elim h can't be simplified toFalse.elim since there is no actual code to generate for False.elim.

So I'm not really sure what I'd even envision doing to get this function to be equivalent to a single fold (aside from the optimization mentioned in the other thread w.r.t. identifying functions that are identity after proof erasure). Definitely an engineering issue not a Lean compiler issue.

James Gallicchio (Jul 16 2022 at 19:40):

I wonder if there's some way to "seal" some unsafe post-erasure simplifications in a surrounding closure to ensure that proof obligations were met before code gets executed. Like, I think it is safe to simplify

fun () => (fun (h : P) => x)

to

fun () => x

when erasing proof terms, since the exterior function's body will only be executed if its application typechecks.

James Gallicchio (Jul 16 2022 at 19:42):

Kyle Miller said:

If I understand your test correctly, you're mostly comparing the compiled performance of List.fold against the VM performance of your List.fold'. Maybe try compiling your version and comparing against that? (For example, by having an actual main function with your test and building it.)

sorry, missed your message. isn't this comparing the VM performance of both? the test is running both in an #eval, but does the List.foldl call get exported to the compiled function?

Wojciech Nawrocki (Jul 16 2022 at 20:56):

James Gallicchio said:

Kyle Miller said:

If I understand your test correctly, you're mostly comparing the compiled performance of List.fold against the VM performance of your List.fold'. Maybe try compiling your version and comparing against that? (For example, by having an actual main function with your test and building it.)

sorry, missed your message. isn't this comparing the VM performance of both? the test is running both in an #eval, but does the List.foldl call get exported to the compiled function?

Oh yes, this is important. List.foldl is part of stdlib so it gets compiled into liblean and will be executed in compiled form, whereas your fold' is interpreted. You could try writing the benchmark as a def main : IO Unit and running the compiled executable.

Wojciech Nawrocki (Jul 16 2022 at 20:58):

(Note: the calls into List.foldl from fold' will also be to the compiled code.) Nvm, because of specialization it seems like it might all be interpreted. You can introspect the code with set_option trace.compiler.ir true in def fold' ...

Wojciech Nawrocki (Jul 16 2022 at 21:04):

Oh, and it looks like the List.foldl in the timeit is also specialized, thus interpreted. So the comparison might be fair after all? Sorry about the confusion!

James Gallicchio (Jul 17 2022 at 01:05):

Wait, specialization --> #eval interprets instead of calling out to compiled code? wacky. I'll try out making it main in a sec

James Gallicchio (Jul 17 2022 at 01:14):

Okay, with the fold specialized and running it fully compiled, the timings are

fold 2.02ms
fold' 20.6ms

which, yeah, about as expected

James Gallicchio (Jul 17 2022 at 01:44):

Okay, implementing it directly gives exactly the same performance as regular fold

@[specialize]
def fold' (l : List τ) (f : β  (x : τ)  x  l  β) (acc : β) : β :=
  let rec @[specialize] go (rest : List τ) (acc : β) (h :  x, x  rest  x  l) : β :=
    match rest with
    | [] => acc
    | x::xs => go xs (f acc x (h x (List.Mem.head _ _)))
      (by intros x hxs; exact h _ (List.Mem.tail _ hxs))
  go l acc (by intros; trivial)

(And the generated IR is identical to foldl, albeit with a couple extra unused parameters that aren't being erased)

Mario Carneiro (Jul 17 2022 at 04:13):

James Gallicchio said:

Wait, specialization --> #eval interprets instead of calling out to compiled code? wacky. I'll try out making it main in a sec

cc: @Leonardo de Moura ^ this is an interesting interaction of features

Gabriel Ebner (Jul 17 2022 at 06:15):

And it's unfortunately a well-known issue: https://leanprover.zulipchat.com/#narrow/stream/270676-lean4/topic/include.20lean.2Eh/near/264881145

Sebastian Ullrich (Jul 17 2022 at 08:11):

I don't see what we could do about it. Specialization generates new code, yes. Fortunately, it is now easier to run compiled code than ever.

Sebastian Ullrich (Jul 17 2022 at 08:12):

That is, you could set precompileModules in your lakefile and then move the #evals to a different file. No need to leave the editor anymore.

Mac (Jul 17 2022 at 08:43):

Sebastian Ullrich said:

That is, you could set precompileModules in your lakefile and then move the #evals to a different file. No need to leave the editor anymore.

Note that this still won't work if it is the #eval code that is doing the specializing.

Mario Carneiro (Jul 17 2022 at 13:47):

Yeah, the problem here is specifically the fact that specialization only uses lemmas that have been generated in the current file, which means that they will move along with the #eval


Last updated: Dec 20 2023 at 11:08 UTC