Zulip Chat Archive

Stream: mathlib4

Topic: long import rebuilding


Vasily Ilin (Dec 21 2024 at 20:20):

When I modify Mathlib.MeasureTheory.Measure.MeasureSpace and open Mathlib.Probability.Distributions.Gaussian, Lean tells me that my imports are out of date and must be rebuilt. Then I have to wait for about 30-60 minutes (I did not time it but it's a very long time) for ~80 files to build. Once imports are rebuilt, I realize that the lemma I stated in Mathlib.MeasureTheory.Measure.MeasureSpace does not quite work, so I change it, and incur another 30-60 minutes of imports rebuilding. This process is very time-consuming. How can I speed it up? One idea is to move all new lemmas to a single file, and only move them to their respective files at the very end. But it seems pretty incovenient. Any better way?

Ruben Van de Velde (Dec 21 2024 at 21:16):

You probably want to write your lemma in the same file where you want to use it at least until you're sure you've stated it the way you want it, yeah

Kim Morrison (Dec 21 2024 at 23:13):

Something is horribly wrong if it takes 30 minutes to compile 80 files!

Eric Wieser (Dec 21 2024 at 23:33):

In the past I've seen this slowness caused (on windows) by antivirus getting very in the way of Lean

Yury G. Kudryashov (Dec 21 2024 at 23:40):

Lean 3 (community edition) had an option "don't recompile the unchanged dependencies, just load oleans and assume that my changes to data/set/basic didn't break them". AFAIK, there is no such option in Lean 4, so I suggest one of 2 ways:

  • put all your new lemmas in the new file, move them to appropriate files as the last step in PR preparation;
  • when you modify a basic file, push to github, switch to another project, then get oleans from cache when you go back to this branch.

Yury G. Kudryashov (Dec 21 2024 at 23:40):

Also, I agree that compiling 80 files shouldn't take 30-60 minutes.

Vasily Ilin (Dec 22 2024 at 05:48):

Thank you for the responses. I will try to time it to get a more precise estimate. I ssh to my school server to work on mathlib because my surface pro 7 seems unable to handle mathlib (compilations take an egregious amount of time). But even on the 60-core 512Gb-RAM server, the compilations take dozens of minutes when I work on mathlib.

Yury G. Kudryashov (Dec 22 2024 at 07:58):

Does it use a slow network filesystem?

Vasily Ilin (Dec 22 2024 at 09:13):

It's a Linux machine, not sure if it answers your question.

Eric Wieser (Dec 22 2024 at 09:36):

The question is what path your .lake directory resides in, and whether that belongs to a slow filesystem

Eric Wieser (Dec 22 2024 at 09:37):

As an experience you could checkout mathlib inside/tmp/lean_test and see if the build is faster there

Eric Wieser (Dec 22 2024 at 09:38):

Though obviously don't actually keep any work there

Yury G. Kudryashov (Dec 22 2024 at 16:09):

What does stat --file-system --format=%T . say?

Vasily Ilin (Dec 22 2024 at 18:19):

@Yury G. Kudryashov ,

vilin@doppio:~/mathlib4$ stat --file-system --format=%T .
ext2/ext3

Vasily Ilin (Dec 22 2024 at 18:51):

Here is a test I did. I modified file Mathlib.MeasureTheory.Measure.MeasureSpace with a dummy theorem theorem test : 1 = 1 := rfl. Then I opened file Mathlib.Probability.Distributions.Gaussian and timed how long it took to rebuild its imports.
After 10 minutes: [2110/2174]
After 15 minutes: [2128/2174]
After 20 minutes: [2134/2174]
After 29 minutes: Done!

So it takes ~30 minutes to rebuild imports after modifying Mathlib.MeasureTheory.Measure.MeasureSpace.

Junyan Xu (Dec 22 2024 at 18:56):

While people are diagnosing this problem maybe you want to switch to Gitpod :) especially since you're remote anyway

Eric Wieser (Dec 22 2024 at 20:12):

Could you repeat your test for a checkout in /tmp?

Vasily Ilin (Dec 23 2024 at 05:56):

Eric Wieser said:

Could you repeat your test for a checkout in /tmp?

I didn't quite understand the suggestion. I make a new folder tmp and clone mathlib there? I don't see how that makes any difference.

Yury G. Kudryashov (Dec 23 2024 at 06:47):

It can make difference, if /tmp is on tmpfs (a.k.a. RAM)

Kevin Buzzard (Dec 23 2024 at 08:23):

This is /tmp in the root directory of a linux distribution, not a new directory tmp in your home directory. The point is that we want to rule out the conjecture that reading and writing from some disc or other is slow.

Kevin Buzzard (Dec 23 2024 at 08:28):

For example I have a fast disc and a slow disc on my computer, and on the slow disc there are various operations which take forever. I just spun up a mathlib on the slow disc and I get this:

Attempting to download 5786 file(s)
Downloaded: 5786 file(s) [attempted 5786/5786 = 100%] (100% success)
Decompressing 5786 file(s)
Unpacked in 92460 ms
Completed successfully!

i.e. 1.5 minutes to do some disc-intensive operation (unpacking mathlib compiled binaries).

Kevin Buzzard (Dec 23 2024 at 08:35):

On this slow drive, my timings for building Mathlib.Probability.Distributions.Gaussian after adding your test to Mathlib.MeasureTheory.Measure.MeasureSpace are: started 08:29:36, finished 08:33:48, so just over 4 minutes.

When I'm in this sort of situation, I do what's been suggested above: I add lemmas to the file I'm working on, and only move everything to the right directory right at the end of the process, just before pushing to github. Then github compiles everything for me and I can get cache later on.

Vasily Ilin (Dec 23 2024 at 09:07):

I did the same experiment on my local computer, with essentially the same result. Actually, it's a bit faster than on my university server (20 minutes instead of 30). I'll try the tmp thing now.

Vasily Ilin (Dec 23 2024 at 09:31):

I now did the experiment where I cloned mathlib in tmp, with essentially the same result -- it takes around 20-30 minutes to rebuilt imports.

Kevin Buzzard (Dec 23 2024 at 09:54):

(I used quite a fast machine even though the disc was slow)

Vasily Ilin (Dec 23 2024 at 10:22):

I tried the same experiemnt on a free Github codespace. It's a bit faster (~15 minutes) but that's still way slower than 4 minutes.

image.png

Vasily Ilin (Dec 23 2024 at 10:27):

The nice thing about a Github codespace is that it should be completely reproducible.

Kevin Buzzard (Dec 23 2024 at 10:29):

I think Kim's statement about something being "horribly wrong" might just be a reflection of the fact that they are used to working with fast computers. What concerns me here is the comment about the 60-core 512Gb-RAM server. I just have a high end PC with 8 cores and 32 gigs and it took 4 minutes to compile those files.

Vasily Ilin (Dec 23 2024 at 10:50):

Maybe it has to do with the fact that this high-end school server is used by many other graduate students in the applied math department. It's interesting that a free GitHub codespace is ~2x faster

Sebastian Ullrich (Dec 23 2024 at 11:12):

A higher number of cores can also be a hindrance if not enough files can be built in parallel as it usually means a lower per-CPU frequency

Junyan Xu (Dec 23 2024 at 11:14):

On latest master it starts with [2068/2179], so it's more like 110 files.

Junyan Xu (Dec 23 2024 at 11:24):

Tested on Gitpod:

gitpod /workspace/mathlib4 (master) $ time lake build Mathlib.Probability.Distributions.Gaussian
Build completed successfully.
real 8m20.599s
user 16m9.127s
sys 1m7.653s

I noticed that these files are slow, but maybe there are more when I was not watching:
[2142/2179] Running Mathlib.MeasureTheory.Integral.SetToL1
[2143/2179] Running Mathlib.MeasureTheory.Integral.Bochner

Vasily Ilin (Dec 23 2024 at 12:10):

Junyan Xu said:

Tested on Gitpod:

gitpod /workspace/mathlib4 (master) $ time lake build Mathlib.Probability.Distributions.Gaussian
Build completed successfully.
real 8m20.599s
user 16m9.127s
sys 1m7.653s

I noticed that these files are slow, but maybe there are more when I was not watching:
[2142/2179] Running Mathlib.MeasureTheory.Integral.SetToL1
[2143/2179] Running Mathlib.MeasureTheory.Integral.Bochner

Is the sum of these numbers the real wall clock time? I.e. ~25 minutes

Junyan Xu (Dec 23 2024 at 12:14):

I think the "user" number is close to the wall clock time.

Eric Wieser (Dec 23 2024 at 12:17):

Real is the wall time

Eric Wieser (Dec 23 2024 at 12:17):

User tells you it's running on two cpus there, I think

Vasily Ilin (Dec 23 2024 at 13:01):

Is there a guide to using Lean in Gitpod? I only found this old thread: https://leanprover-community.github.io/archive/stream/113488-general/topic/gitpod.html

Junyan Xu (Dec 23 2024 at 14:33):

You can just click "Open in Gitpod" here to work off master. Each PR also comes with such a button by default.

Eric Wieser (Dec 23 2024 at 14:45):

Which is the same as just putting gitpod.io/# before the url

Vasily Ilin (Dec 30 2024 at 19:41):

Is Gitpod recommended over GitHub codespaces?

Kevin Buzzard (Dec 30 2024 at 20:23):

They're "the same thing" but IIRC gitpod is being retired in April.

Yaël Dillies (Dec 30 2024 at 20:23):

I personally love gitpod and get 250h/month for free because of their open source plan, but indeed as Kevin says gitpod might soon disappear :sad:

Junyan Xu (Dec 30 2024 at 23:56):

I've started trying Codespaces and it has a nice feature that comments on the PR can be shown together with the code in the editor. Also I don't see the lean exe cache get run in the terminal, but the cache is indeed fetched; I'm not sure how it's done in the background. It seems to take longer to load a file, even though it's already built. It seems push/pull/fetch is faster probably because the server is native to GitHub. It seems I can't stop a codespace within the VSCode editor, and must navigate to the repo and click various buttons. The GitHub favicon causes more confusion than the Gitpod one, since you can't use it to distinguish codespaces from PRs ... Otherwise the experience is similar to Gitpod, so the transition should be smooth.

Should we add a Codespaces button next to the Gitpod button in every PR? This saves three clicks since currently you need to first click on the branch name, then on the green Code button, then the Codespaces tab, before you can finally create a codespace.

Eric Wieser (Dec 31 2024 at 01:12):

You don't have to click on the branch name, there is a "code" link in the top right of the PR page

Junyan Xu (Dec 31 2024 at 02:56):

Good find! Still two more clicks but the advantage is it's also accessible from the "Files Changed" view, not just the "Conversation" view.

Junyan Xu (Dec 31 2024 at 17:20):

It seems I can't stop a codespace within the VSCode editor, and must navigate to the repo and click various buttons.

I found that I can stop the codespace by clicking the green button at the lower left corner of the editor.


Last updated: May 02 2025 at 03:31 UTC