Zulip Chat Archive

Stream: lean4

Topic: mmap


Alexander Bentkamp (Aug 09 2023 at 14:23):

The Lean 4 web editor and the natural number game use quite a bit of memory on our server, and I would like to figure out how to reduce that so that we can handle more users at once. Currently, we are using a docker container with gvisor to isolate each user into a sandbox. As a consequence we cannot make use of Lean's mmap memory sharing. Using other tools such as bubblewrap we should be able to resolve this.

However, I just tried to run the game without using any security measures, and the processes still do not seem to share memory. Is there a way to see whether memory is shared other than just observing how much memory is used in total? Maybe we accidentally destroyed the memory sharing mechanism by modifying the Lean server for our game? How is this memory sharing implemented in the Lean server?

Mario Carneiro (Aug 09 2023 at 16:39):

The memory sharing is not implemented in lean at all. The way it works is that multiple independent processes request to mmap the same file, and the OS notices this and makes them share the same backing memory

Alexander Bentkamp (Aug 10 2023 at 08:57):

Only the oleans are shared, right? So the Environment will still take a big chunk of space for each process, won't it?

Scott Morrison (Aug 10 2023 at 10:04):

It's a good question how much is actually in an Environment that is not just pointers in data in memory mapped oleans. I don't have a good guess, but would like to know!

Sebastian Ullrich (Aug 10 2023 at 10:12):

The original mmap PR has some numbers I referenced in my thesis (§3.3):

Indeed, when activating this extension we have seen great savings on mathport , a version of Lean 3 mathlib binary-translated to the Lean 4 module format: the time for importing these, at the time, 1871 modules fell from 2.3 to 1.5 seconds [Lean still gathers information such as a map of all declaration from the imports, so it cannot be 0; Leo later optimized the import time further independently of memory mapping.] , and more importantly the total heap size of all allocations fell from 1.2GB to 55MB.

Sebastian Ullrich (Aug 10 2023 at 10:13):

It is probably more for current mathlib because of additional initialization of environment extensions

Notification Bot (Aug 10 2023 at 10:28):

A message was moved from this topic to #lean4 > Term to tactic error by Buster.

Alexander Bentkamp (Aug 10 2023 at 15:55):

and more importantly the total heap size of all allocations fell from 1.2GB to 55MB.

@Sebastian Ullrich What's a good way to measure things like this? I am still not sure if the mmap works and it's other things that eat up space or if mmap is broken on our server for some reason.

Sebastian Ullrich (Aug 10 2023 at 16:00):

I think I used valgrind's massif

Sebastian Ullrich (Aug 10 2023 at 16:00):

But I'm always disappointed that time -v doesn't do a better job there

Sebastian Ullrich (Aug 10 2023 at 16:12):

I think it was /proc/$PID/status that does do a better job - RssFile vs RssAnon

Sebastian Ullrich (Aug 11 2023 at 11:59):

@Alexander Bentkamp I remembered it's actually much easier: use lean --stats

Alexander Bentkamp (Aug 11 2023 at 12:01):

Oh, that looks much easier indeed. Still, I can only see that the oleans have been loaded into some mmap'ed part of memory, but I can't be sure that they are actually shared between processes, can I?

Sebastian Ullrich (Aug 11 2023 at 12:02):

I don't see why that would not happen as long as it's indeed the same file

Alexander Bentkamp (Aug 11 2023 at 12:25):

Okay, thanks!


Last updated: Dec 20 2023 at 11:08 UTC