Zulip Chat Archive

Stream: general

Topic: Lean consumes too much memory for theorems with many branch

pandaman (Jan 18 2025 at 13:00):

I have a theorem that performs a functional induction for a function with 9 branches (code). The intention is to fine-tune the auto-generated .induct theorem to give a meaningful name and more useful types to each case. However, Lean consumes a huge amount of memory (>20GB) and time when editing this theorem, and the Lean process crashes when I'm trying to add two more branches to the theorem to support new features. I also found that the applications of the theorem is slow, and profiling didn't reveal much info (previous thread).

Is there any way to reduce the resources used by Lean when processing the theorem? Thank you!

pandaman (Jan 18 2025 at 13:02):

Reproduction steps:

Clone lean-regex.
Open correctness/RegexCorrectness/VM/EpsilonClosure/Basic.lean and try editing εClosure'.induct'
- Alternatively, you can open correctness/RegexCorrectness/VM/EpsilonClosure/Lemmas.lean to see the slowness when applying this theorem

Kim Morrison (Jan 18 2025 at 23:37):

Just pinging @Joachim Breitner as the relevant expert.

pandaman (Jan 19 2025 at 03:54):

I also found that the applications of the theorem is slow, and profiling didn't reveal much info

As for the applications of the εClosure'.induct', it turned out that the slowness comes from a different issue (see linked thread), and it's resolved now.

The definitions of the εClosure'.induct' itself still has the resource usage issue, so I'd welcome any help on this. Thanks!

Joachim Breitner (Jan 19 2025 at 18:58):

@pandaman , can you tell whether the generation of the .induct theorem is where lean consumes memory, or your theorem? (Just write def foo := @bar.induct before to trigger the generation of the theorem to make sure it happens separate from your theorem.)

pandaman (Jan 20 2025 at 09:05):

@Joachim Breitner Hi! def foo := @εClosure'.induct is instant. So Lean is consuming memory when using it to prove my theorem.

Joachim Breitner (Jan 20 2025 at 09:09):

Ok, thanks. I don't see anything obviously wrong with your theorem (no pattern-matching in let or so, which could be expensive to elaborate in a large context). Maybe be worth reporting as a lean bug, if you can distill it into a #mwe

pandaman (Jan 20 2025 at 12:27):

Ok, I was able to minify this a bit. It seems that Lean language server is leaking memory when processing the theorem definition of εClosure'.induct'. Every time I add or remove def foo := 0 before εClosure'.induct', the Lean server process consumes additional 1-2GB of RAM when re-processing εClosure'.induct', but it never frees memory until it crashes (or WSL crashes).
This happened even when I replaced the theorem body with sorry, so functional induction is not related. Sorry for the false alarm!

pandaman (Jan 20 2025 at 12:29):

Probably the Lean server reprocessed εClosure'.induct' frequently when I was editing them, so it quickly exhausted all RAM in my WSL environment.

Sebastian Ullrich (Jan 20 2025 at 12:46):

How many CPU cores do you have? Could you try running VSCode under a lower LEAN_NUM_THREADS?

pandaman (Jan 20 2025 at 12:53):

My machine has 6 cores, but a single lean process consumes 100% of CPU and several GB of RAM, so I'm not sure if multi-core is contributing the issue. I can try it anyway.

pandaman (Jan 20 2025 at 13:08):

LEAN_NUM_THREADS=1 didn't change the behavior.

image.png

pandaman (Jan 20 2025 at 13:24):

Oh, at least with LEAN_NUM_THREADS=1, the memory usage of the most memory-hungry Lean process is capped at 91% RAM (RES: 14.2g), so my WSL doesn't crash even after I repeatedly add and remove def foo := 0.

pandaman (Jan 20 2025 at 13:25):

Maybe there is a cache that doesn't evict entries early enough?

Sebastian Ullrich (Jan 20 2025 at 18:46):

If there is a cap using that setting, it looks like that is the memory high-water mark of elaborating the file, it's just that you found a file with an unusually high high-water mark. And then when you allow more threads, the high-water mark is replicated across the thread-local heaps, which are not currently deallocated until the end of the process. It would be great to track that in an issue!

Sebastian Ullrich (Jan 20 2025 at 18:46):

Does running it on the cmdline take similar amounts of memory?

pandaman (Jan 21 2025 at 12:52):

When running lake build, the memory usage peaks at around 2GB when processing the file with εClosure'.induct', so it's less than when I edit the file in VSCode.

Minimizing this would be tricky since just commenting out one or two argument of εClosure'.induct' reduces the additional memory usage significantly.

pandaman (Jan 23 2025 at 14:06):

Filed https://github.com/leanprover/lean4/issues/6753. I couldn't minimize much, but at least I made the reproduction self-contained. Thanks!

Joachim Breitner (Jan 23 2025 at 14:21):

Thanks!

Last updated: May 02 2025 at 03:31 UTC