Zulip Chat Archive

Stream: std4

Topic: Build lists top-down

James Gallicchio (Oct 06 2022 at 06:12):

Hey, been experimenting with a new toy: lists you construct from the top down :)

It's a relatively simple FFI external object, just stores the lean_object* representing the head of an expression, and a pointer to a "hole" that needs to be filled to complete the expression. So we can write stuff like

#eval Hole.hole ()
      |>.cons 5
      |>.cons 7
      |>.cons 10
      |>.fill []

This snippet generates 0 closures, and the last fill [] call is O(1) rather than O(n) :D

Should be useful for writing tail-rec functions that return lists, to avoid the extra reverse at the end where possible. I'll try to write something like append and see if there's a measurable difference compared to the standard implementation.

Lean: https://github.com/JamesGallicchio/LeanColls/blob/main/LeanColls/Hole.lean
C: https://github.com/JamesGallicchio/LeanColls/blob/main/bindings/leancolls_hole.c

James Gallicchio (Oct 06 2022 at 06:13):

(Sharing this here because we were discussing a couple days ago whether List function impls might be faster if they use arrays instead of constructing a reversed list and reversing it at the end -- this avoids any extra cost and I think should just be faster than either solution)

James Gallicchio (Oct 06 2022 at 17:51):

The performance is much worse than doing two traversals -- I'm thinking something is wrong with how my FFI code is building cons nodes. It seems to be spending a ton of time on calls to lean_* functions that I presume I am using incorrectly:
image.png

Does anyone know better than I do how these functions are intended to be used?

James Gallicchio (Oct 06 2022 at 17:52):

In particular I'm very unsure what all the lean_align stuff is doing

James Gallicchio (Oct 06 2022 at 17:56):

(alternatively, the overhead of external functions and more cache misses might just be very high here, but I want to think that's not the case)

Sebastian Ullrich (Oct 06 2022 at 18:08):

These stack frames shouldn't even exist I think unless you compiled your code without optimizations

James Gallicchio (Oct 06 2022 at 18:14):

You know, I'm not sure why I assumed lake would add the -O3 flag automatically... The stackframes are all gone, and now the hole version is measurably faster than the revappend version :D

James Gallicchio (Oct 06 2022 at 18:18):

Looks like spends ~36% less cycles if you do not include the reference counting, ~29% less cycles if you do include reference counting. If the input list is unshared, the revappend version might be faster because of the savings on reference counting; let me test that

Tomas Skrivan (Oct 06 2022 at 19:34):

How do you get this nice profiling plot?

James Gallicchio (Oct 06 2022 at 20:33):

Hotspot! It's a basic frontend for perf, Leo recommended it and it's been a lifesaver

James Gallicchio (Oct 07 2022 at 02:34):

Screenshot_20221006_222726.png more pretty plots :) I'm not entirely sure how to interpret this one, in particular why the leancolls version is hitting so many more lean_alloc_small_cold calls than the lean version

James Gallicchio (Oct 07 2022 at 02:34):

but that is the subject of further investigation!

Last updated: May 02 2025 at 03:31 UTC