Zulip Chat Archive

Stream: lean4

Topic: weird memory leaks from hell


JJ (Aug 24 2024 at 07:48):

i am encountering a very strange issue.

i was working on a little toy project in lean when my computer ground to a halt. i shrugged, force rebooted my computer, and continued on, assuming i accidentally had too much stuff open (my oom killer is not very good). this began to happen again but i was able to close vs code before it ground to a halt. so this was a lean issue.

the third time this happened, i started investigating. lake build was hanging on clearly invalid code and eating up memory, quickly, as it did so. i eventually whittled my code down to the following minimal reproducible example:

def next? : List α  Option (α × List α)
  | [] => none
  | a :: l => some (a, l)

structure Header where
  (width height : UInt32)
  (alpha linear : Bool)

def decode_header (data : List UInt8) : Option Header := do
  let (a, data)  next? data
  let (b, data)  next? data
  let (c, data)  next? data
  let (d, data)  next? data
  let (a, data)  next? data
  let (b, data)  next? data
  let (c, data)  next? data
  let (d, data)  next? data
  let width : UInt32 := 0
  let (a, data)  next? data
  let (b, data)  next? data     -- *all* of these are necessary for some reason
  let (c, data)  next? data
  let (d, data)  next? data
  let height : UInt32 := 0
  let (a, data)  next? data
  let (b, data)  next? data
  let alpha :=
    match a with
      | 3 => true
      | 4 => false
      | _ => return none     -- underlying problem was probably here
  let linear :=
    match b with
      | 0 => false
      | 1 => true
      | _ => return none
  some { width, height, alpha, linear }

yes, minimal reproducible example. making any changes to this code - removing parameters on the structure, touching any of the assignments - produced code that would either quickly error out and not hang lake, or still hang lake but not exhibit the rapid memory consumption that was crashing my computer. and this was reproducible. removing .lake and rebuilding from source triggered the same behavior, every time.

now that i had a (weird) reproducible example, i was going to file a report, but noticed lake wasn't actually downloading or rebuilding lean. i'm not terribly familiar with the ins and outs of the lean build system: i had in my lean-toolchain file leanprover/lean4:v4.11.0-rc2, but noticed it seemed to happen with whatever i changed it to, which was strange. i remembered elan and wondered if i had updated the toolchains via there recently. i ran elan update, it updated my nightly, and: no issue. lake build successfully failed.

so i was very confused by all of this and resolved to go to sleep and give up on the whole affair of tracking down exactly what elan update updated that made this all work again (my guess was lake). but, i went back to my original project: deleting the .lake folder, successfully getting build failures. then i opened it in vs code again.

my memory usage was fine. i changed the problematic line (above) from return none to failure, in hopes lean would treat my a variable as having a consistent type of UInt8.

my memory usage started ballooning.

so, right now, i'm a little bit stuck. there was previously some horrifyingly specific memory-leak-causing performance pitfall in either lean or lake. updating my toolchain with elan update fixed it. it's still around, but only in my tooling, and only when i change return none to failure or vice versa. (it does not occur upon first opening code and initial typechecking).

i do not know how to solve this. does vscode-lean4 manage its own lean toolchain, independent of elan? was this a known bug and actually fixed, or should i figure out exactly what version of all tooling my vscode install is using so as to get an actually-reproducible environment for figuring this out, in case unrelated upstream changes only changed the specific trigger of this extremely-pathological performance case? any advice (particularly about vscode-lean4's tooling management) would be appreciated.

also - i'm having a lot of fun with lean :smile:. it's a beautiful language and the tooling is awesome. this issue i do actually need to fix though, because it's preventing me from writing lean (by crashing my computer). i figured i'd write up the full path to this strangeness for clarity and for others' amusement.

JJ (Aug 24 2024 at 07:54):

(deleted)

Daniel Weber (Aug 24 2024 at 07:55):

What's your OS? What's the output of lake --version?

JJ (Aug 24 2024 at 08:00):

oh, right, i put this in the issue i was going to file but forgot to put it here.

  • arch linux 6.6.46
  • elan 3.1.1
  • vscode 1.92.1
  • vscode-lean4 v0.0.176

my current toolchains installed with elan are these, but these are probably not helpful because it was after i ran elan update. same for lake --version (which is 5.0.0-c375e19)

[~]$ elan show
installed toolchains
--------------------

leanprover/lean4:nightly
leanprover/lean4:stable (default)
leanprover/lean4:v4.10.0
leanprover/lean4:v4.11.0-rc1
leanprover/lean4:v4.11.0-rc2
leanprover/lean4:v4.9.0-rc1

active toolchain
----------------

leanprover/lean4:stable (default)
Lean (version 4.10.0, x86_64-unknown-linux-gnu, commit c375e19f6b65, Release)

JJ (Aug 24 2024 at 08:07):

(also i probably should not have filed this right now - it's very late where i am and i do need to sleep. i will try any suggestions anyone has and provide more info if needed in the morning...)

Eric Wieser (Aug 24 2024 at 09:40):

Does the memory leak go away if you use noncomputable def instead of def?

JJ (Aug 24 2024 at 15:05):

yes, it appears to! with some cursory testing

Eric Wieser (Aug 24 2024 at 21:50):

That suggests this is a compiler bug then

Eric Wieser (Aug 24 2024 at 21:51):

(as opposed to elaborator or kernel)

JJ (Aug 26 2024 at 03:50):

hmm, interesting. do you have any idea why it would have gone away after elan update when lean-toolchain remained the same (leanprover/lean4:v4.11.0-rc2)?

JJ (Aug 28 2024 at 01:25):

2024-08-26-112842.webm

JJ (Aug 28 2024 at 01:26):

here's a video of the behavior i'm seeing. is there any way to tell the extension to update / redownload its internally managed tooling? presumably this would fix it as lake build now successfully fails after elan update, but for now i can't do anything in vscode without sticking noncomputable on this function...

Marc Huisinga (Aug 28 2024 at 07:13):

vscode-lean4 does not manage its own toolchain (it uses Elan for this), though the logic it uses for which lean-toolchain file to use for a given Lean file is slightly different from Elan when there are nested lean-toolchain files.
What's the output of the Troubleshooting: Show Setup Information command when focusing the file that you are seeing the issue in?

JJ (Aug 28 2024 at 07:14):

Operating system: Linux (release: 6.6.46-1-lts)
CPU architecture: x64
CPU model: 8 x 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
Available RAM: 16.48 GB

VS Code version: Reasonably up-to-date (version: 1.92.1)
Lean 4 extension version: 0.0.176
Curl installed: true
Git installed: true
Elan: Reasonably up-to-date (version: 3.1.1)
Lean: Reasonably up-to-date (version: 4.11.0-rc2)
Project: Valid Lean project (path: /home/apropos/Projects/lean/qoi)


Elan toolchains:

installed toolchains
--------------------

leanprover/lean4:nightly
leanprover/lean4:stable (default)
leanprover/lean4:v4.10.0
leanprover/lean4:v4.11.0-rc1
leanprover/lean4:v4.11.0-rc2
leanprover/lean4:v4.9.0-rc1

active toolchain
----------------

leanprover/lean4:v4.11.0-rc2 (overridden by '/home/apropos/Projects/lean/qoi/lean-toolchain')

Lean (version 4.11.0-rc2, x86_64-unknown-linux-gnu, commit 0edf1bac392f, Release)

Marc Huisinga (Aug 28 2024 at 07:16):

In that case, the VS Code extension is using leanprover/lean4:v4.11.0-rc2 (as designated in /home/apropos/Projects/lean/qoi/lean-toolchain).

JJ (Aug 28 2024 at 07:17):

yeah. what is weird is presumably lake build uses that as well when i run it in the project root. yet lake build successfully fails and vscode unsuccessfully hangs on build => memory leak

JJ (Aug 28 2024 at 07:17):

[apropos@arch qoi]$ pwd
/home/apropos/Projects/lean/qoi
[apropos@arch qoi]$ elan show
installed toolchains
--------------------

leanprover/lean4:nightly
leanprover/lean4:stable (default)
leanprover/lean4:v4.10.0
leanprover/lean4:v4.11.0-rc1
leanprover/lean4:v4.11.0-rc2
leanprover/lean4:v4.9.0-rc1

active toolchain
----------------

leanprover/lean4:v4.11.0-rc2 (overridden by '/home/apropos/Projects/lean/qoi/lean-toolchain')
Lean (version 4.11.0-rc2, x86_64-unknown-linux-gnu, commit 0edf1bac392f, Release)

JJ (Aug 28 2024 at 07:18):

unless my eyes deceive me, these toolchains are identical

JJ (Aug 28 2024 at 07:21):

at this point i really just want to get this fixed rather than find the source of the problem... i've tried uninstalling & reinstalling the extension to no avail. going to try to track down and remove any cache the lean extension might be keeping tmrw (if anyone knows where this might be please lmk)

Marc Huisinga (Aug 28 2024 at 07:59):

I can reproduce this issue using the code in the first post with both the language server and lake build. Looks like it is indeed stuck in the compiler, which explains why noncomputable fixes it. Here's a profile of where lake build spends the first two minutes: https://share.firefox.dev/3AEOrt8

Could you file a bug in the Lean 4 repository?

JJ (Aug 28 2024 at 08:17):

oh sweet!! i thought for sure it was tied up in some strange computer-specific caching issue...

JJ (Aug 28 2024 at 08:18):

i'll file an issue in the morning, it's a bit late here. what versions of tooling are you running? is it the same as my elan show above?

Marc Huisinga (Aug 28 2024 at 08:18):

I used v4.10.0.

JJ (Aug 28 2024 at 19:45):

report filed. interestingly, i can reproduce this with lake build again.


Last updated: May 02 2025 at 03:31 UTC