Zulip Chat Archive
Stream: std4
Topic: Build lists top-down
James Gallicchio (Oct 06 2022 at 06:12):
Hey, been experimenting with a new toy: lists you construct from the top down :)
It's a relatively simple FFI external object, just stores the lean_object*
representing the head of an expression, and a pointer to a "hole" that needs to be filled to complete the expression. So we can write stuff like
#eval Hole.hole ()
|>.cons 5
|>.cons 7
|>.cons 10
|>.fill []
This snippet generates 0 closures, and the last fill []
call is O(1) rather than O(n) :D
Should be useful for writing tail-rec functions that return lists, to avoid the extra reverse
at the end where possible. I'll try to write something like append
and see if there's a measurable difference compared to the standard implementation.
Lean: https://github.com/JamesGallicchio/LeanColls/blob/main/LeanColls/Hole.lean
C: https://github.com/JamesGallicchio/LeanColls/blob/main/bindings/leancolls_hole.c
James Gallicchio (Oct 06 2022 at 06:13):
(Sharing this here because we were discussing a couple days ago whether List function impls might be faster if they use arrays instead of constructing a reversed list and reversing it at the end -- this avoids any extra cost and I think should just be faster than either solution)
James Gallicchio (Oct 06 2022 at 17:51):
The performance is much worse than doing two traversals -- I'm thinking something is wrong with how my FFI code is building cons nodes. It seems to be spending a ton of time on calls to lean_* functions that I presume I am using incorrectly:
image.png
Does anyone know better than I do how these functions are intended to be used?
James Gallicchio (Oct 06 2022 at 17:52):
In particular I'm very unsure what all the lean_align
stuff is doing
James Gallicchio (Oct 06 2022 at 17:56):
(alternatively, the overhead of external functions and more cache misses might just be very high here, but I want to think that's not the case)
Sebastian Ullrich (Oct 06 2022 at 18:08):
These stack frames shouldn't even exist I think unless you compiled your code without optimizations
James Gallicchio (Oct 06 2022 at 18:14):
You know, I'm not sure why I assumed lake would add the -O3
flag automatically... The stackframes are all gone, and now the hole version is measurably faster than the revappend version :D
James Gallicchio (Oct 06 2022 at 18:18):
Looks like spends ~36% less cycles if you do not include the reference counting, ~29% less cycles if you do include reference counting. If the input list is unshared, the revappend version might be faster because of the savings on reference counting; let me test that
Tomas Skrivan (Oct 06 2022 at 19:34):
How do you get this nice profiling plot?
James Gallicchio (Oct 06 2022 at 20:33):
Hotspot! It's a basic frontend for perf, Leo recommended it and it's been a lifesaver
James Gallicchio (Oct 07 2022 at 02:34):
Screenshot_20221006_222726.png more pretty plots :) I'm not entirely sure how to interpret this one, in particular why the leancolls version is hitting so many more lean_alloc_small_cold
calls than the lean version
James Gallicchio (Oct 07 2022 at 02:34):
but that is the subject of further investigation!
Last updated: Dec 20 2023 at 11:08 UTC