Zulip Chat Archive

Stream: Lean Together 2025

Topic: Joseph Rotella: Programming with Dependently Typed Tables

Jireh Loreaux (Jan 15 2025 at 14:56):

Discussion thread for this talk.

Jireh Loreaux (Jan 15 2025 at 15:01):

My takeaway from this talk: Joseph Rotella and Rob Lewis went into dependent type theory hell and came out the other side alive. Very interesting!

Rob Lewis (Jan 15 2025 at 15:39):

The battle with the DTT demons was all Joseph's :grinning:

Rob Lewis (Jan 15 2025 at 15:43):

Joseph's thesis describes the fight in more detail!

Pietro Monticone (Jan 16 2025 at 09:41):

:video_camera: Video recording: https://youtu.be/pao0UQKoKO0

Yuri (Jan 16 2025 at 18:11):

Thank you for the talk! It’s great to see dependently typed languages—and Lean in particular—applied to computer science concepts!

I’m curious about the possibility of decoupling the table body’s representation from the API, type-level definitions, and proofs. Specifically, I wonder if it would be feasible to store data using column-oriented tables as an alternative to the row-oriented approach presented here. Additionally, it would be interesting to explore support for various “backends” to enable more efficient storage of larger datasets.

Joseph Rotella (Jan 16 2025 at 20:56):

Specifically, I wonder if it would be feasible to store data using column-oriented tables as an alternative to the row-oriented approach presented here.

This is an interesting idea and one that I did consider: while you can do it, it comes with a bit more overhead than the row-wise approach. In particular, you have to disallow "ragged" tables at the type level by recording the table height—this ensures that all columns have the same number of rows. Because of this, you're effectively locked into tracking both dimensions of the table at the type level. With the row-wise approach, on the other hand, you're free not to enforce table height as a type-level invariant, but you're also free to add it (just switch out the list of rows in the table body for a vector of rows). But the column-wise representation doesn't give you the symmetric freedom not to track table width, because you're forced into implicitly doing that anyway by having to record the type of each column.

Additionally, it would be interesting to explore support for various “backends” to enable more efficient storage of larger datasets.

We didn't focus much on performance for this project, so there's a fair bit of low-hanging fruit even in Lean itself for optimizing performance, but interfacing with native code (if I'm interpreting your suggestion correctly) would also be an interesting avenue to explore!

Last updated: May 02 2025 at 03:31 UTC