Zulip Chat Archive

Stream: new members

Topic: Memory consumption of mathlib


Meyer Zinn (Oct 22 2020 at 18:28):

Hey y'all, new Lean user here :) I'm trying to do some simple proofs using mathlib and am encountering a memory error:

excessive memory consumption detected at 'expression traversal' (potential solution: increase memory consumption threshold)

I've already allowed Lean to use up to 12GB of memory. How much more should I allocate to compile it the first time? Is there a memory leak in mathlib?

Yakov Pechersky (Oct 22 2020 at 18:29):

If you set up your local copy of mathlib using leanproject, it should have downloaded the oleans files and you shouldn't have to recompile mathlib itself. How did you download mathlib, and how did you open it in your editor?

Meyer Zinn (Oct 22 2020 at 18:45):

I just changed the reference in the tutorials project, and re-ran the build. However, I may have found the issue--I thought I'd increased the memory limit by adding a command-line flag to the VSCode settings and did not even see the specific setting for memory limits. It seems to build fine now that I've actually increased the limit.

Johan Commelin (Oct 22 2020 at 18:48):

how long does the build take?

Johan Commelin (Oct 22 2020 at 18:48):

are you building mathlib or the tutorials project?

Yakov Pechersky (Oct 22 2020 at 18:51):

Did you mean you changed the reference to the version of mathlib in the toml file? Did you run leanproject get-mathlib-cache and then restart the Lean interpreter in VSCode?

Bryan Gin-ge Chen (Oct 22 2020 at 19:17):

The "orange bars" that show up next to lines in VS Code indicate that Lean is processing them; in almost all cases they should disappear within 10 seconds or so when you open a new file; if they stick around any longer, (especially near the "import" lines) that's a strong indication that the Lean server is recompiling mathlib, which is almost always not what you want. For one, when the Lean server launched by your editor compiles mathlib, it doesn't write olean files to disk, which means that next time you open the project the same thing might happen.

If you do run into this, the solution is to go to the command line and run either leanproject get-mathlib-cache to get the precompiled olean files from our Azure server, or leanproject build and wait an hour or so for mathlib to build locally.

Meyer Zinn (Oct 22 2020 at 19:20):

It took less than five minutes to build from scratch (on a fairly beefy computer). I'll run the get-mathlib-cache command and see what it does.

Bryan Gin-ge Chen (Oct 22 2020 at 19:21):

Yeah, an hour is the worst case (single-threaded, no previous olean files).

Bryan Gin-ge Chen (Oct 22 2020 at 19:23):

Ah, what did you mean when you say you "changed the reference" above? I'm guessing you either edited leanpkg.toml or ran leanproject up?

Meyer Zinn (Oct 22 2020 at 19:33):

I edited the TOML.

Yakov Pechersky (Oct 22 2020 at 19:35):

The TOML edit means that you're likely having older oleans but newer lean files. Which triggers the recompilation. try leanproject get-mathlib-cache, which will sync the oleans to the commit you've indicated in the toml.

Yakov Pechersky (Oct 22 2020 at 19:36):

Then a restart of the lean interpreter (via VSCode) will update the interpreter cache to use those oleans. so the oleans will now correspond to the lean files, and won't have to recompile any of them.

Patrick Massot (Oct 22 2020 at 19:38):

A more interesting question to us is: why did you feel the need for all those manipulations? Why not following the instructions (which never ever mention modifying the toml file)?

Julian Berman (Oct 22 2020 at 22:19):

I can't answer for the original person, and I'm guilty of the same sin of editing the TOML file -- but TOML is an odd choice for a non-human-editable file. It'd perhaps be a bit clearer if it were specifically a non-human-editable file-format as maybe one additional signal it's not meant to be edited. =

Mario Carneiro (Oct 22 2020 at 22:25):

it's quite human editable

Julian Berman (Oct 22 2020 at 22:25):

Well I agree but I've seen Patrick say a few times "no one should edit it directly" :)

Mario Carneiro (Oct 22 2020 at 22:25):

but leanproject also makes it easy to manage without direct editing

Mario Carneiro (Oct 22 2020 at 22:26):

I think patrick associates editing the TOML file with people who take installation "into their own hands" and end up with strange configuration messes

Mario Carneiro (Oct 22 2020 at 22:27):

which is understandably frustrating since leanproject was written specifically to solve that problem

Julian Berman (Oct 22 2020 at 22:27):

Ah, that's probably fair.

Julian Berman (Oct 22 2020 at 22:27):

YEah.

Julian Berman (Oct 22 2020 at 22:28):

I mean I'd still say if you're coming from somewhere else and you see a TOML file you'll assume you should edit it, so personally I wouldn't find it too surprising if someone did so. But I guess I see the conundrum.

Mario Carneiro (Oct 22 2020 at 22:29):

I liken it to the git configuration file

Mario Carneiro (Oct 22 2020 at 22:29):

you can write your settings in the file directly, or if you prefer use git config and it writes to the file for you

Julian Berman (Oct 22 2020 at 22:30):

Yeah :) I think it'd be an interesting usability study to know who does what more -- I essentially never use git config personally, but maybe it's a good analogy indeed since some sites will tell new users which git config command lines to paste.

Mario Carneiro (Oct 22 2020 at 22:31):

You also have to keep in mind that there is some history here; leanpkg was written to solve these problems and leanpkg.toml is its frontend, but now leanproject is a wrapper around that and it produces a leanpkg.toml file for consumption by lean and leanpkg

Julian Berman (Oct 22 2020 at 22:33):

nod

Mario Carneiro (Oct 22 2020 at 22:35):

I don't really understand all the reasoning behind why leanproject needs to be a separate program from leanpkg, and it's obviously confusing for newcomers. I think it was just that leanpkg is written in lean and patrick wanted to write python?

Mario Carneiro (Oct 22 2020 at 22:36):

it also has a bunch of mathlib-specific features while leanpkg is in core and is mostly lean-only

Julian Berman (Oct 22 2020 at 22:43):

Possibly if it wants to front leanpkg entirely for someone who doesn't want to use it it could plop leanpkg.toml into and out of existence only when it's communicating with leanpkg (if it doesn't have some other way to communicate). But maybe that was also considered

Kevin Lacker (Oct 22 2020 at 22:56):

personally I expected that the toml file was editable and ran into the same problem at first - the reason people use markup files like yaml for config is usually so that you can edit them, and in most programming ecosystems like npm in javascript or requirements.txt in python, you are supposed to edit those files and it will not cause a massive recompilation

Kevin Lacker (Oct 22 2020 at 22:56):

it seems like people run into the problem of "it is forever recompiling" often enough that maybe some level of the tooling can alert you about it automatically?

Kevin Lacker (Oct 22 2020 at 22:58):

one trick is, if it isn't supposed to be edited, you can keep it gzipped it on disk, so that it's impossible for people to edit it ;-)

Yakov Pechersky (Oct 22 2020 at 22:59):

or rename with a leading .

Kevin Lacker (Oct 22 2020 at 23:00):

well, you typically do edit those, like your .emacs

Julian Berman (Oct 22 2020 at 23:00):

Yeah it's kind of trying to be a lockfile it sounds like, except because of the hysterical raisins it's TOML and has to be named as is since that's what leanpkg is looking for

Heather Macbeth (Oct 22 2020 at 23:03):

Julian Berman said:

hysterical raisins

Can I have some? :rolling_on_the_floor_laughing:

Reid Barton (Oct 22 2020 at 23:14):

If leanproject doesn't expect the leanpkg.toml file to be edited by a human then it could add a comment to the file to this effect

Bryan Gin-ge Chen (Oct 22 2020 at 23:17):

I think it's totally fine to edit leanpkg.toml yourself if you know what you're doing. I don't think leanproject has any trouble with manually edited leanpkg.toml files. It's just that people might not know to manually run leanproject get-mathlib-cache afterwards and then they end up compiling mathlib in their editor.

Kevin Lacker (Oct 22 2020 at 23:17):

what if leanproject just got the cache for me

Bryan Gin-ge Chen (Oct 22 2020 at 23:18):

One wrinkle is that leanpkg (which leanproject runs) removes comments from leanpkg.toml.

Bryan Gin-ge Chen (Oct 22 2020 at 23:18):

How would leanproject know to get the cache if all you did was edit leanpkg.toml and open VS Code or emacs?

Bryan Gin-ge Chen (Oct 22 2020 at 23:18):

Are you suggesting that the editor should always run leanproject?

Kevin Lacker (Oct 22 2020 at 23:19):

well... sure?

Kevin Lacker (Oct 22 2020 at 23:19):

could just run leanproject --get-the-cache-if-I-should-obviously-get-the-new-cache

Bryan Gin-ge Chen (Oct 22 2020 at 23:23):

It's a good idea; I recorded it as a vscode-lean issue here.

Kevin Lacker (Oct 22 2020 at 23:25):

or, if we consider the goal to be to minimize the number of identical complaints in zulip that compiling is taking forever, with a minimal amount of work, we could just change the "compiling is taking forever" error message to say "try running leanproject get-cache"

Miroslav Olšák (Oct 23 2020 at 09:52):

So the comment in leanpkg.toml (added by leanproject) should be something like:

This file is meant to be managed by the "leanproject" script. Do not edit this file unless you know what you are doing.

Editing leanpkg.toml is simply the first idea one gets when he looks into the directory of a lean project, and he does not remember all the various commands for lean package management.

Chris B (Oct 23 2020 at 11:03):

In some languages, a newly generated manifest/config file will start with a comment along the lines of users aren't really supposed to edit this directly, learn more [here]. Maybe something like that is appropriate for a newly generated leanpkg.toml.

Chris B (Oct 23 2020 at 11:06):

IE the default cargo.lock produced by Rust's package manner starts with:

# This file is automatically @generated by Cargo.
# It is not intended for manual editing.

Patrick Massot (Oct 23 2020 at 11:23):

Let us repeat once more: there is no issue at all editing this file by hand if you know what you are doing. The message is: if you are a beginner and edit this file, don't be surprised when things don't work.

Kevin Buzzard (Oct 23 2020 at 11:46):

The message coming back to us is that it's all well and good you saying this here, but this is not at all clear before the chaos starts because there is no such warning in the toml.

Chris B (Oct 23 2020 at 12:26):

Patrick Massot said:

Let us repeat once more: there is no issue at all editing this file by hand if you know what you are doing. The message is: if you are a beginner and edit this file, don't be surprised when things don't work.

I understand, but your original question was "why did you feel the need for all those manipulations? Why not following the instructions (which never ever mention modifying the toml file)?". The point is that some languages use comments to directly address users within their configuration files, the contents are obviously variable.

This applies to your question in that "the directions don't mention editing the toml file" is not the same thing as "the toml files tells you not to edit the toml file (unless you know what you're doing)". You can put whatever version ofIf you're new to Lean, please read [link] before editing the contents of this file. in there.

Julian Berman (Oct 23 2020 at 12:47):

Was just about to throw together the trivial PR (which I'm sure we can bikeshed on), but I think said trivial PR may not be completely trivial since I think the leanpkg TOML parser needs a fix

Julian Berman (Oct 23 2020 at 12:48):

Will send the mathlibtools one anyway but I guess it can't be merged until that's investigated (and I haven't had coffee yet)

Julian Berman (Oct 23 2020 at 12:55):

https://github.com/leanprover-community/mathlib-tools/pull/85 for anyone's bikeshedding pleasure.

Patrick Massot (Oct 23 2020 at 15:07):

Great, I'll have a look during the week-end. I also have some suggestion by Floris in the leanproject backlog.

Julian Berman (Oct 23 2020 at 15:29):

Cool and thanks for being open to it :) I'm sure as usual this is a thing that you've thought about like 20 times already so it's probably not fun to revisit.

Kevin Buzzard (Oct 23 2020 at 15:47):

In fact people continually come here saying "I did random stuff not in the instructions and now it's broken" and of course this is frustrating to read, but it seems difficult to stop this happening. I think part of the problem is that if you're going to engage with lean at all then you're probably the kind of person who might fiddle with config files.

Damiano Testa (Oct 23 2020 at 15:50):

I would like to point out that I did try to follow the instructions (not in their entirety), but I could not find a description of how leanproject works and what are the commands that it executes.

Somewhat related to memory timeouts, one command that I just learned today from @Johan Commelin is

lean --make src/data/finset/lattice.lean

which, as far as I can tell, allows you to build locally only the file specified in the path. I would have liked to know about this command earlier!

Johan Commelin (Oct 23 2020 at 15:52):

I guess we need two kinds of docs... for people that want to casually experiment with mathlib, and for people who want to make (more than one) PRs

Julian Berman (Oct 23 2020 at 15:54):

I mean this is obviously a common thing for better or worse, there's people who just won't read docs, whether because they're too excited to use lean or otherwise, so to me it's basically "make things as error-proof as can be reasonably", and then yeah the docs may fill in the rest.

Julian Berman (Oct 23 2020 at 15:55):

For 10 years (quite literally) I think we've (in Python land) been answering the same questions (about someone setting up Python wrong on ubuntu, or touching their system python, or etc. etc.)

Julian Berman (Oct 23 2020 at 15:55):

All things which are documented, folks just have limited initial reading buffers :)

Kevin Lacker (Oct 23 2020 at 18:02):

I think the main issue is that the mathlib build system is unusual compared to other build systems. so people come in expecting it to work like python or node.js and it doesn't. in particular, the situation where you cannot compile the library on your own machine, but the packaging system does not distribute prebuilt object files at import time, instead you manually grab the cached olean files and that lookup is based on git hashes which normally would not be worked into a package management system.

Kevin Lacker (Oct 23 2020 at 18:04):

it doesn't really make much sense to try to improve the build system now, though, if lean 4 is going to use a totally incompatible one, and it is at least probable that mathlib ends up migrating to lean 4

Bryan Gin-ge Chen (Oct 23 2020 at 18:21):

You can compile the library on your own machine though. We just recommend that people avoid it if they can.


Last updated: Dec 20 2023 at 11:08 UTC