Zulip Chat Archive

Stream: lean4

Topic: functional but in place paradigm


view this post on Zulip Huỳnh Trần Khanh (Jan 22 2021 at 13:59):

Does Lean 4 implement some sort of "functional but in-place" algorithm? That is the data structures are immutable to the user but they are mutated internally. Apologies if the terminology is not really correct.

view this post on Zulip Huỳnh Trần Khanh (Jan 22 2021 at 14:01):

because from what I've gathered pure functional programming programs can experience some sort of slowdown because of persistent data structures overhead

view this post on Zulip Huỳnh Trần Khanh (Jan 22 2021 at 14:02):

thanks in advance

view this post on Zulip Marc Huisinga (Jan 22 2021 at 14:03):

Huỳnh Trần Khanh said:

Does Lean 4 implement some sort of "functional but in-place" algorithm? That is the data structures are immutable to the user but they are mutated internally. Apologies if the terminology is not really correct.

From what I can gather the terminology "functional but in-place" comes from a paper that was built right on top of https://arxiv.org/pdf/1908.05647.pdf, so yes

view this post on Zulip Huỳnh Trần Khanh (Jan 22 2021 at 14:41):

What does the :seedling: emoji mean in this context :joy:

view this post on Zulip Huỳnh Trần Khanh (Jan 22 2021 at 14:42):

Immutable beans?

view this post on Zulip Johan Commelin (Jan 22 2021 at 14:42):

I guess that X planted a seed for Y?

view this post on Zulip Jesse Michael Han (Jan 22 2021 at 15:50):

Huỳnh Trần Khanh said:

What does the :seedling: emoji mean in this context :joy:

the immutable beans sprouted :sprout:

view this post on Zulip Julian Berman (Jan 22 2021 at 17:46):

Huỳnh Trần Khanh said:

because from what I've gathered pure functional programming programs can experience some sort of slowdown because of persistent data structures overhead

I don't know the answer for Lean 4, but my understanding of modern data structures is that persistent data structures exactly "fixed" this performance problem in practice. I.e. old "immutable" data structures are indeed very expensive for copying, but persistent ones bring you terms that look mostly like O(log verygoodthing) for all the operations, and in practice say Clojure, where data structures are persistent, do just fine

view this post on Zulip Julian Berman (Jan 22 2021 at 17:47):

(In Python I'm a big proponent of "use persistent data structures over the built in ones", and anecdotally that doesn't affect performance benchmarks for most real applications I've seen)

view this post on Zulip Julian Berman (Jan 22 2021 at 17:50):

Yeah looks like lean 4 has PersistentArray, PersistentHashSet...

view this post on Zulip Julian Berman (Jan 22 2021 at 17:50):

Lean implementations of them too interesting...

view this post on Zulip Huỳnh Trần Khanh (Jan 23 2021 at 01:47):

@Julian Berman Well, modern persistent data structures are fast when you need persistence!

in practice say Clojure, where data structures are persistent, do just fine

Yeah. Most applications aren't CPU bound, they are usually IO bound. And persistent data structures incur a very small or I'd say negligible overhead for those applications. Defensive copying is the root of most performance problems for these types of apps.

that look mostly like O(log verygoodthing)

Yeah, for HAMTs it's like 6 steps for every lookup/mutate operation when there are like 2 to the 64 elements.

As a (retired) competitive programmer, I can implement a persistent binary tree in 7 minutes. Because I'm trained to do that. The easiest way to implement persistence is refcounting everything, and I'd imagine that it'd be easier in a functional programming language. There are more crafty approaches, I read somewhere on the internet that an array can be made persistent in a way that operations cost O(log log n) time. That's pretty close to constant time.

But asymptotic time complexity is not everything. After all, you need a computer to execute an algorithm, an algorithm can't run itself. Persistence means a lot of pointer chasing, cache line fills and whatnot, and when it comes to compute-heavy programs the slowdown can be fatal. Programming competitions usually limit program runtime to something like 1 second or even one fifth of a second. Performance is critical. Every nanosecond counts. Abusing 64 bit integers can lead to TLE (time limit exceeded). If you make too many syscalls, you can get TLE too. Because the sandbox checks every syscall to make sure the syscalls won't do something wonky to the judging servers. And using STL binary trees (map and set) can inflate program runtime tenfold :joy: Or even two orders of magnitude if you happen to be unlucky. Avoiding cache line fills is really important.

There are circumstances where performance is really critical and mandatory persistence is pretty much inappropriate.

Anyway, it's nice to hear that Lean 4 does optimize copying into in-place mutation.


Last updated: May 07 2021 at 12:15 UTC