Zulip Chat Archive
Stream: mathlib4
Topic: import Mathlib slows down loading
Yaël Dillies (Dec 03 2024 at 08:19):
Kim Morrison said:
I have added a section at https://github.com/leanprover-community/mathlib4/wiki/Using-mathlib4-as-a-dependency
This should take insignificant time and not rebuild any mathlib files.
Unfortunately, it seems like this is false nowadays...
Yaël Dillies (Dec 03 2024 at 08:20):
@Javier Lopez-Contreras and I were pair-programming yesterday on FLT using gitpod, and everything was super-laggy until we removed all the import Mathlib
from the files upstream of where we were working
Yaël Dillies (Dec 03 2024 at 08:21):
cc @Kevin Buzzard for an extra reason not to use import Mathlib
in FLT files besides the "docs take ages to build" point
Kevin Buzzard (Dec 03 2024 at 08:40):
Hmm. I really like import Mathlib
for the following reason. You sit down for an FLT session, you make a new file, you write down the thing you want to prove, you think "I'll be a good citizen and minimise imports", and then you start working on it, and random instances don't fire, and exact? doesn't work, and you get frustrated about how stupid Lean is and/or how bad people were at writing an API for your objects, and then you go back to import Mathlib
and your problems are all solved, and then you discover that because of all of this aggressive import minimisation recently, what's happened is that your code is importing FancyThing.Defs
to make the statement compile, but not FancyThing.Basic
where all the facts/instances about it are. @Kim Morrison do you have any thoughts on this?
Kevin Buzzard (Dec 03 2024 at 08:42):
I would rather pay the "docs take ages to build" price than the frustration of not knowing what to import any more because we've got too good at modularising files in mathlib, at least when developing.
Kevin Buzzard (Dec 03 2024 at 08:46):
Is there some magic switch which I can press, which makes VS Code import all of Mathlib when I'm developing even though I have minimal imports in my files? What I worry about with "get CI to run shake" is that I minimise imports and push, and then pull and start again and then everything is degraded because I forgot to change the top line to "import Mathlib
because I'm developing again". More than once I have thought "dammit why doesn't this file import Mathlib
, that was the problem all along".
Michael Rothgang (Dec 03 2024 at 10:18):
I've also noticed that adding import Mathlib
at the top of the file adds 5-10 seconds of waiting. This is breaking my flow and definitely very annoying.
Kim Morrison (Dec 03 2024 at 11:08):
@Kevin Buzzard, it's definitely a trade-off. For the most part, I just want to insist that working on the import structure of Mathlib (in particular, untangling, separating, slimming down) is really important for the long-term.
But there are a couple of things that I think we should do:
- Marc is planning on starting some work (early) next year that will help find missing imports, at least at the level of not-yet-imported identifiers. I'm not sure that there are any plans specifically for helping with missing instances, however.
- Perhaps we could explore an
interactive import
command, which only runs in VSCode. You'd still have to get the imports right forlake build
(i.e. before you can actually commit). This feels a little unlikely to be a good solution to me, but we should think it through! -
I think Mathlib should institute a policy (automated!) of having a
.lean
file for every subdirectory, which simply imports everything in that subdirectory.- It would be forbidden to import these from Mathlib itself
- But from downstream projects, it would give a good intermediate between minimized imports and
import Mathlib
. You'd just be able toimport Mathlib.RingTheory
, etc. - An obvious objection is that our directories aren't very well organized, and the obvious fix is to organize them better! Hopefully having these import-only files would actually give us some incentive to do that organization.
- Note that Batteries and Lean both already do exactly this, so we would be in good company!
Patrick Massot (Dec 03 2024 at 11:44):
I also noticed that import Mathlib
became unusable and it makes me pretty sad. What happened?
Eric Wieser (Dec 03 2024 at 12:43):
Is the cost of import Mathlib
linear in the number of files (as opposed to total file size), and therefore made worst by splitting files?
Sebastian Ullrich (Dec 03 2024 at 14:04):
Could we please try separating "the line import Mathlib
is slow" and "the line import Mathlib
makes everything below it slow" (this thread) into two threads? I don't think they have the same cause
Sebastian Ullrich (Dec 03 2024 at 14:07):
For example I can't tell which of the two issues Patrick is referring to (both?)
Kevin Buzzard (Dec 03 2024 at 15:08):
Yes, I'm not having any trouble with import Mathlib
(which is why I'm continually doing it) but for sure I don't need any of the set theory and combinatorics, and right now I don't need the differential geometry either, so there's definitely scope for imports which are more aggressive in number theory/algebra (and thus more likely to pick up all instances I need) whilst not taking in everything.
Mario Carneiro (Dec 03 2024 at 15:22):
It would be nice if we had multiple mid-level imports like that, assuming we can find enough domain experts to curate them
Mario Carneiro (Dec 03 2024 at 15:22):
I don't agree with Kim that this should just be a folder level import
Patrick Massot (Dec 03 2024 at 17:49):
I was talking about: I start a file with import Mathlib
and then I need to wait for a very long time before I can do anything with it. But then Lean behaves normally.
Kevin Buzzard (Dec 03 2024 at 21:57):
If I ever change imports in a file at any stage then I always "restart server" or at least "restart file". It's a habit I've got into, I've basically convinced myself that it helps (I might be wrong).
Kim Morrison (Dec 03 2024 at 22:19):
Mario Carneiro said:
I don't agree with Kim that this should just be a folder level import
What is your alternative suggestion?
Marc Huisinga (Dec 04 2024 at 09:43):
Kevin Buzzard said:
If I ever change imports in a file at any stage then I always "restart server" or at least "restart file". It's a habit I've got into, I've basically convinced myself that it helps (I might be wrong).
Please stop doing that and report issues if changing the imports of a file doesn't work correctly.
Using 'Restart Server' whenever you change the imports induces a significant overhead. All files that you have open as tabs will be re-elaborated entirely and we need to reload all .ileans as well.
Using 'Restart File' should not be necessary if VS Code doesn't tell you to use it. Changing the imports of the file already restarts the file worker process, it just doesn't automatically trigger a (potentially expensive) rebuild, which you can then trigger manually with 'Restart File' if VS Code tells you that this is necessary.
Sebastian Ullrich (Dec 04 2024 at 10:15):
Patrick Massot said:
I was talking about: I start a file with
import Mathlib
and then I need to wait for a very long time before I can do anything with it. But then Lean behaves normally.
Okay, let's try distilling that issue. Not necessarily saying there will be a (simple) solution but I at least want to understand Mathlib's scalability. Could you confirm this is about the same time time lake env lean Mathlib.lean
takes on the cmdline? It's 1.7s for me but this is also a very fast machine.
Michael Stoll (Dec 04 2024 at 10:30):
FWIW, on my machine (fairly new reasonably fast laptop (Linux)):
mstoll@btm2xg:~$ cd lean4/mathlib4/
mstoll@btm2xg:~/lean4/mathlib4$ time lake env lean Mathlib.lean
real 0m24,085s
user 0m3,149s
sys 0m4,575s
mstoll@btm2xg:~/lean4/mathlib4$ time lake env lean Mathlib.lean
real 0m1,696s
user 0m1,393s
sys 0m0,304s
Maybe it had to build something the first time around...
Sebastian Ullrich (Dec 04 2024 at 10:32):
Right, Lake might do some more work in the first invocation depending on what you did before that. The same should be true when restarting the file or changing imports in the editor.
Michael Rothgang (Dec 04 2024 at 11:07):
My timings
$ time lake env lean Mathlib.lean
real 0m7,198s
user 0m3,095s
sys 0m1,153s
I haven't measured the waiting time when adding import Mathlib
, but it seems plausibly in the same ball-park.
Collected on mathlib revision d7abea93d01641d025a6fdcaa8bc0298f17cd9ee
(fresh master, unmodified). Did lake exe cache get
before. 12 GB of RAM, somewhat old CPU (i7-6600U, 4 cores). Linux (Debian stable).
Kevin Buzzard (Dec 04 2024 at 11:27):
Aah, so perhaps the phenomenon I observed (change imports, very long pause, I get frustrated, I restart file, very short pause, deduce that restarting is beneficial) is spurious and what's actually going on is that there's a one-time cost which is independent of the restarting.
Floris van Doorn (Dec 04 2024 at 13:21):
On a reasonably fast Windows laptop, the improvement is unfortunately not as great after the first run:
Floris@Dell-E MINGW64 ~/projects/mathlibf ((a38db992d08...))
$ time lake env lean Mathlib.lean
real 0m21.173s
user 0m0.000s
sys 0m0.000s
Floris@Dell-E MINGW64 ~/projects/mathlibf ((a38db992d08...))
$ time lake env lean Mathlib.lean
real 0m6.874s
user 0m0.000s
sys 0m0.000s
Floris@Dell-E MINGW64 ~/projects/mathlibf ((a38db992d08...))
$ time lake env lean Mathlib.lean
real 0m6.798s
user 0m0.000s
sys 0m0.000s
Floris van Doorn (Dec 04 2024 at 13:34):
lake build
seems to speed up more drastically (this is in the same repository after only running lake env lean Mathlib.lean
a few times):
Floris@Dell-E MINGW64 ~/projects/mathlibf ((a38db992d08...))
$ time lake build
Build completed successfully.
real 0m57.523s
user 0m0.000s
sys 0m0.000s
Floris@Dell-E MINGW64 ~/projects/mathlibf ((a38db992d08...))
$ time lake build
Build completed successfully.
real 0m1.977s
user 0m0.000s
sys 0m0.000s
Floris van Doorn (Dec 04 2024 at 13:34):
I did some tests opening a file with only import Mathlib
as its contents. This was in a repository where I got a fresh cache and then ran lake env lean Mathlib.lean
and lake build
a few times. Then I opened a file with only import Mathlib
:
The first time opening took >60 seconds, I didn't time it exactly.
Restarting the file then took 32 seconds.
After that, restarting the file or the server, or reloading the VSCode window all take roughly 8-14 seconds.
The first two loading times are very long, but after that it seems to be quite quick.
Johan Commelin (Dec 04 2024 at 14:58):
So you open VSCode the first time, and it takes > 60s. But after a few restarts, when you close VSCode completely, and open again, then it takes 8-14s?
Could this have anything to do with loading all oleans in RAM? And when opening VSCode later, the files are still sitting in RAM?
This is me doing a wild guess...
Floris van Doorn (Dec 04 2024 at 15:42):
One test I did was closing VSCode and then immediately reopening it, which indeed was quick.
But if I reopen it now, after my computer was turned off, it again takes >60s.
Junyan Xu (Dec 04 2024 at 15:48):
How come
import Mathlib
in the web editor is still fast (3-4 seconds)? I don't think it's run on super beefy machines? Some caching machanism in action, like observed by people here after the first run? Anything can be learned from their setup?
Floris van Doorn (Dec 04 2024 at 15:57):
So your guess that the files were still in RAM might be right.
Julian Berman (Dec 04 2024 at 16:46):
I don't really know anything about windows but if you want to check your theory, it looks like the page cache is called the "standby list" on Windows, and you can use https://learn.microsoft.com/en-us/sysinternals/downloads/rammap to clear it (and then check to see if you always get 60s startup).
Mario Carneiro (Dec 05 2024 at 10:37):
Kim Morrison said:
Mario Carneiro said:
It would be nice if we had multiple mid-level imports like that, assuming we can find enough domain experts to curate them
I don't agree with Kim that this should just be a folder level import
What is your alternative suggestion?
Like Mathlib.Tactic
: Mathlib.NumberTheory
(or maybe Mathlib.Bundles.NumberTheory
) would be a file maintained by domain experts to have a reasonable collection of stuff relevant for doing things in number theory. Possibly there would be multiple "profiles" depending on whether you want just the basics or something more complete. Probably Mathlib.NumberTheory
itself should be reserved for the kitchen sink import. But the details here depend on what "profiles" exist and this needs more domain expertise than I have.
Last updated: May 02 2025 at 03:31 UTC