Zulip Chat Archive

Stream: mathlib4

Topic: running `cache get` as part of every `lake build`


Scott Morrison (Aug 16 2023 at 08:07):

@Mac Malone has put up the PR #6603, which modifies the lakefile so that every lake build will first run lake exe cache get automatically.

The thinking here is that lake exe cache get is now very fast (thanks @Mario Carneiro!) and you want to use the cache a sufficiently high fraction of the time that its worth the slight cost of having it run when you don't.

Thoughts?

Eric Wieser (Aug 16 2023 at 08:07):

What does this do if there is no network connection?

Ruben Van de Velde (Aug 16 2023 at 08:08):

Or worse, if there is a network connection, but you're paying by the byte?

Oliver Nash (Aug 16 2023 at 08:17):

Personally I'd rather this be opt-in rather than default.

Mario Carneiro (Aug 16 2023 at 08:17):

I think this should hold off until https://github.com/digama0/lean-cache, which is even faster than the current lake exe cache get

Scott Morrison (Aug 16 2023 at 08:19):

With no network connection available, and no cache available, time lake exe cache get takes about 9.5s for me. With a full cache available it takes about 4.5s, without or without network.

Mario Carneiro (Aug 16 2023 at 08:19):

I think we should try to avoid unsolicited network requests, but we should also try to avoid unsolicited major local builds

Mario Carneiro (Aug 16 2023 at 08:20):

I think this behavior is more important for actions other than lake build, like opening a random file in the editor

Sebastian Ullrich (Aug 16 2023 at 08:22):

Without committing to this specific approach, note that it means you don't have to ever interact with the cmdline in order to get started with mathlib4. Of course many variations are conceivable that could also achieve this.

Eric Wieser (Aug 16 2023 at 08:22):

I think the sharpest edge here is still that cache get makes a mess if you already have a file open in the editor (or depending on your point of view, the editor is making the mess)

Scott Morrison (Aug 16 2023 at 08:23):

Eric, what is the mess here?

Scott Morrison (Aug 16 2023 at 08:23):

If you change lean-toolchain with the editor open, then yes there is a giant mess as the server for the editor and lake on the command line fight each other.

Scott Morrison (Aug 16 2023 at 08:25):

And it is annoying that opening a file can trigger a lake build.

Scott Morrison (Aug 16 2023 at 08:25):

If you change lean-toolchain with the editor open, then yes there is a giant mess as the server for the editor and lake on the command line fight each other.

Scott Morrison (Aug 16 2023 at 08:25):

But I don't experience problems caused by running cache get and then reloading a file in the editor...?

Ruben Van de Velde (Aug 16 2023 at 08:28):

I thought the issue was that, if you have a file open in vscode, this will write oleans, which makes cache get stop before it's done

Eric Wieser (Aug 16 2023 at 08:29):

My usual case is that I have a fresh checkout with no oleans, run lake exe cache get, then open a file to start reading it while the cache is running

Eric Wieser (Aug 16 2023 at 08:30):

The result is that the cache ends up incomplete in some way and I have to close the editor, pkill lean, and run lake exe cache get one more time

Eric Wieser (Aug 16 2023 at 08:30):

I think you created a thread about this before!

Scott Morrison (Aug 16 2023 at 08:32):

Ah, I thought you meant just having a file open, rather than opening a new one while cache was running.

Scott Morrison (Aug 16 2023 at 08:32):

Yes, this is exactly the request that opening a file doesn't start a build.

Sebastian Ullrich (Aug 16 2023 at 08:33):

Ah, we have lean4#2385 now but cache doesn't use the lock yet! Does anyone want to give using Lake.withLockFile a try?

Sebastian Ullrich (Aug 16 2023 at 08:35):

At least that would be necessary for explicit invocations of cache. Making caching part of the build like in the PR at hand addresses the issue as well of course.

Scott Morrison (Aug 16 2023 at 08:45):

This will requiring bumping to the newest nightly (just landed), but I'm having some trouble there. lake has a new manifest version number, and I think we are going to have to get this updated in all the dependencies (Std, Aesop, Qq, Cli, ProofWidgets) before we can actually bump.

Scott Morrison (Aug 16 2023 at 08:46):

Pinging @Mac Malone in case I'm missing something there.

Kevin Buzzard (Aug 16 2023 at 08:49):

Just to say that if we can make it so that mathematicians can download and install a project without ever touching the command line then hang the people who are paying for their internet connection by the byte because the mathematicians who can't use the command line outnumber them 100-1

Eric Wieser (Aug 16 2023 at 08:50):

A "would you like to try downloading pre-compiled oleans" popup would make both camps happy

Scott Morrison (Aug 16 2023 at 08:50):

I'm with Kevin on this one --- as close as we can get to single click installation is really important.

Scott Morrison (Aug 16 2023 at 08:51):

https://github.com/JLimperg/aesop/pull/64 is the bump PR for aesop, @Jannis Limperg.

Scott Morrison (Aug 16 2023 at 08:53):

Hopefully that is the only one we need, as Std / quote4 / lean4-cli / ProofWidgets4 don't actually have lake-manifests (on account of having no dependencies).

Oliver Nash (Aug 16 2023 at 08:57):

Those of us who don't want unsolicited network request could presumably override this behaviour with a command-line flag to lake build and a setting in the VS Code extension?

Scott Morrison (Aug 16 2023 at 09:45):

Sebastian Ullrich said:

Ah, we have lean4#2385 now but cache doesn't use the lock yet! Does anyone want to give using Lake.withLockFile a try?

@Sebastian Ullrich, I've done this in #6611, and it works, but there are a few other PRs that will need to be merged before this can be.

Jannis Limperg (Aug 16 2023 at 11:35):

Scott Morrison said:

https://github.com/JLimperg/aesop/pull/64 is the bump PR for aesop, Jannis Limperg.

On a clean checkout of this PR, I get the following warning, which I don't understand and which breaks my golden tests:

warning: manifest out of date: git revision of dependency 'std' changed, use `lake update` to update

lake update runs successfully, but doesn't change the manifest. @Mac Malone could you take a look at this?

Jannis Limperg (Aug 16 2023 at 11:53):

Reported as lean4#2427 (by Scott)

Jon Eugster (Aug 16 2023 at 16:59):

Oliver Nash said:

Those of us who don't want unsolicited network request could presumably override this behaviour with a command-line flag to lake build and a setting in the VS Code extension?

Such a flag would be awesome! I'm quite frequently in networks that dont work well (e.g. train wifi/hotspot) and I see a lot of other programs just hanging up forever because there is a network, just no response coming back... (or if there is an answer it might be horrendously slow) but lake build --no-download would be conpletely satisfactory.

Gabriel Ebner (Aug 17 2023 at 15:56):

With a full cache available it takes about 4.5s, without or without network.

This is pretty bad. (In addition to downloading over metered/slow/unavailable connections.) Does this also apply to lake build in downstream projects that have mathlib as a dependency?

Scott Morrison (Aug 17 2023 at 22:37):

lake exe noop takes almost 2s on my system, and lake exe cache (i.e. just printing the message) takes 3.5s.

Mac Malone (Aug 17 2023 at 23:12):

@Scott Morrison The good thing is this will remove the noop overhead as lake is already loaded. I haven't look deep into the cache code, but I believe before the first message it computes all the traces of all files to determine what to download. In addition, lake checks all of cache's dependencies to see if it needs rebuilding. In the auto scenario, cache still sadly needs to do the former (compute traces to determine what to download). However, the later (cache dependency checking) will actually shave some time off the post-cache build because lake will not need to recheck the files it already checked to build cache.

Mac Malone (Aug 17 2023 at 23:15):

A potential extension to his PR is to pass to cache the files it needs to download so it does not have to do trace checking. This potentially be done by passing the module names line-by-line on stdin. Using command line arguments is probably infeasible because it will almost certainly pass Windows command size limit on a fresh cache get.

Sebastian Ullrich (Aug 18 2023 at 07:10):

I think any move away from duplicating Lake trace logic in Cache would be a good move - not just for performance but also robustness, especially in view of potential future Lake changes

Sebastian Ullrich (Aug 18 2023 at 07:17):

FWIW, this is what a no-op lake exe cache get consists of on my machine according to Hotspot:
image.png

Sebastian Ullrich (Aug 18 2023 at 07:39):

This is roughly:

  • 0.6s initialization and loadWorkspace, mostly elaboration it seems
  • 0.6s materializeDeps, again mostly elaboration
  • < 0.1s of trace hashing in Lake (when the many parallel threads in Lake appear)
  • 1s of cache initializing, most of it spent in String.replace in hashFileContents (oops)
  • 0.15s of leantar unpacking, which matches its own output (edit: in further runs, it is more like 20ms)

Sebastian Ullrich (Aug 18 2023 at 07:51):

So yes, not duplicating effort would help, but also here we can start small by not duplicating a performance bottleneck: https://github.com/leanprover-community/mathlib4/pull/6647

Sebastian Ullrich (Aug 18 2023 at 08:00):

One more data point: with 0.7s, the proofwidgets lakefile is about as costly to elaborate as all other dependencies combined. Could we possibly elaborate them in parallel?

Sebastian Ullrich (Aug 18 2023 at 08:03):

Disabling the new compiler, which for lakefiles specifically seems unproblematic, shaves off another 0.3s in total

Scott Morrison (Aug 18 2023 at 08:11):

I would guess that proofwidgets does not actually need to do very much when it is running via a cloud release! Can the all target there detect that it is running out of a cloud release and short-circuit something?

Sebastian Ullrich (Aug 18 2023 at 08:16):

It's tricky, isn't it: to even know what all is we have to elaborate the file, at least without e.g. restructuring the manifest file (but personally I would think precompiling the lakefile into an olean would be simpler)

Mario Carneiro (Aug 18 2023 at 19:00):

Sebastian Ullrich said:

I think any move away from duplicating Lake trace logic in Cache would be a good move - not just for performance but also robustness, especially in view of potential future Lake changes

lean-cache (the cache rewrite) duplicates more of lake trace logic, but this is also the source of a lot of the speedup. There are two main modifications made to the calculation to make it faster:

  • Cache the results of lakefile.lean elaboration into a lake-workspace.json file
  • Don't hash olean file contents, precompute the result at build time and read it at hash time instead

Scott Morrison (Aug 20 2023 at 23:11):

Summing up here:

  • The option -KnoCacheGet has been added to this PR, and allows people to opt of the automatic call to cache get on every build.
  • Even after #6647 we're worried that the current cache get is too slow to run every time?

My inclination is to merge as is, and keep working on speeding things up (through whichever combination of https://github.com/digama0/lean-cache and ongoing work on Lake is appropriate). But if we don't want to merge yet, it would be good for those opposed to specify if there is a performance milestone that would be good enough.

Mac Malone (Aug 21 2023 at 04:28):

Scott Morrison said:

The option -KnoCacheGet has been added to this PR, and allows people to opt of the automatic call to cache get on every build.

Not that this option is only really use for a CLI lake build invocation. There is currently no easy way to set this option on the server.

Mac Malone (Aug 21 2023 at 04:32):

If one is working on mathlib itself and wants to avoid editing the committed lakefile, the only way to configure the server would be through editor-specific configuration options. Right now, the VSCode extensions does not provide a way to set Lake server options like -K (it only provides for configuring Lean server options like -D).

Mario Carneiro (Aug 21 2023 at 04:34):

why wouldn't extra -K args just go to lake via lake serve?

Mac Malone (Aug 21 2023 at 04:36):

@Mario Carneiro They can, the problem is there is no current way to configure them (at least with the VSCode extension).

Mario Carneiro (Aug 21 2023 at 04:36):

I thought there was an extra args vscode option

Mario Carneiro (Aug 21 2023 at 04:37):

Lean4: Server Args looks a bit too structured

Mac Malone (Aug 21 2023 at 04:37):

@Mario Carneiro Yep, but those options are past after the -- in the invocation. That is, the VSCode command is:

lake serve -- <vscode-options>

And the options after the -- are the ones forwarded to lean --server not the ones processed by Lake.

Mario Carneiro (Aug 21 2023 at 04:37):

you might be able to do it from Lean4: toolchain path?

Mario Carneiro (Aug 21 2023 at 04:39):

or rather Lean4: Lake Path

Sebastian Ullrich (Aug 21 2023 at 07:41):

Would this option (currently) not have to go to print-paths, not serve? Note that I proposed a scheme for forwarding such arguments from the editor as part of auto-build customization

Mac Malone (Aug 21 2023 at 22:00):

@Sebastian Ullrich You are correct! I am mistaken. That means the VSCode configuration options would be sufficient if the Lean server followed your proposal.

Mac Malone (Sep 09 2023 at 00:31):

Scott Morrison said:

But if we don't want to merge yet, it would be good for those opposed to specify if there is a performance milestone that would be good enough.

I ooncur. I would love to hear what to TODOs the mathlib maintains would like to see addressed before this kind of integration is acceptable. Hopefully, the number of the recent performance improvements in both cache and Lake should get a closer to that position.

Scott Morrison (Sep 09 2023 at 04:27):

How about lake exe cache get (or replacement) under a second?

We are still a long way from that.

On nightly-testing I see:

% time lake env
...
lake env  0.67s user 0.22s system 98% cpu 0.910 total
% time lake exe cache
...
lake exe cache  1.34s user 0.50s system 109% cpu 1.692 total
% time lake exe cache get
No files to download
Decompressing 3725 file(s)
unpacked in 3227 ms
lake exe cache get  1.45s user 11.10s system 250% cpu 5.022 total

Scott Morrison (Sep 09 2023 at 04:35):

I just tried running lean-cache, which fails with

thread 'main' panicked at 'workspace-manifest hash does not match, run 'lake-ext'', src/main.rs:532:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Running lake-ext fails with

zsh: segmentation fault  ../lean-cache/lake-ext/build/bin/lake-ext

Mac Malone (Sep 09 2023 at 04:57):

@Mario Carneiro, do you have some advice on how to get the new cache working on @Scott Morrison's machine?

Mario Carneiro (Sep 09 2023 at 08:12):

I'm guessing lake has changed its data structures in a recent version

Mario Carneiro (Sep 09 2023 at 08:13):

lake-ext needs to stay up to date with lake (part of why I want to upstream it)

Mario Carneiro (Sep 09 2023 at 08:16):

actually maybe not, it was last built for rc4 and it looks like nothing changed between that and v4.0.0

Mario Carneiro (Sep 09 2023 at 08:19):

oh wait, this is on nightly - indeed https://github.com/leanprover/lean4/compare/v4.0.0...13ca443f058b0e33e44a9fdba347a53330463348 shows some lake changes

Scott Morrison (Sep 09 2023 at 10:06):

Yes, I wanted to test on nightly because that includes the lakefile.olean commit.

Mario Carneiro (Sep 09 2023 at 13:23):

you will at least need to set the lean-toolchain in lake-ext and recompile it, I will see whether code changes are necessary

Sebastian Ullrich (Sep 11 2023 at 20:47):

Scott Morrison said:

How about lake exe cache get (or replacement) under a second?

On nightly-testing with warm mathlib cache and file cache for me:

$ time lake exe cache get
No files to download
Decompressing 3733 file(s)
unpacked in 26 ms
lake exe cache get  0.70s user 0.45s system 116% cpu 0.994 total

Sebastian Ullrich (Sep 11 2023 at 20:51):

It is unfortunate that because of the lakefile.olean optimization, we have to initialize five separate Environments, which adds measurable time in finalizeImports as well as MT-marking and freeing time. But I don't see an easy way to avoid that.
image.png

Scott Morrison (Sep 12 2023 at 00:10):

Hmm... today I get (also on nightly-testing)

% time lake exe cache get
No files to download
Decompressing 3740 file(s)
unpacked in 634 ms
lake exe cache get  1.48s user 11.02s system 507% cpu 2.464 total

Which is both significantly slower than what you are getting, but also significantly faster than I was getting three days ago (5.022s, see above).

Scott Morrison (Sep 12 2023 at 00:11):

I suspect what has changed is that I noticed my .cache/mathlib directory had become enormous, and deleted it (even deleting it took a long time) in the intervening time.

Scott Morrison (Sep 12 2023 at 00:14):

Deleting my .cache/mathlib again brings it down to 2.2s, so I think there is still a significant effect here.

Mario Carneiro (Sep 12 2023 at 00:16):

what is your OS / file system?

Mario Carneiro (Sep 12 2023 at 00:16):

IIRC lake exe cache get no longer does directory traversals so it shouldn't make a big difference even if there are a ton of files

Scott Morrison (Sep 12 2023 at 00:17):

mac os (big sur), apfs filesystem

Mac Malone (Sep 12 2023 at 00:43):

Sebastian Ullrich said:

It is unfortunate that because of the lakefile.olean optimization, we have to initialize five separate Environments, which adds measurable time in finalizeImports as well as MT-marking and freeing time. But I don't see an easy way to avoid that.

A potential long-term way to avoid this is to import all of the lakefiles at once. This would require making all definitions in each module private (or package namespaced) to avoid name clashes.

Mac Malone (Sep 12 2023 at 00:45):

Also, Sebastian Ullrich, Do you have performance / hotspot break of were the major time sinks are in lake and/or cache other than finalizing the environment?

Mario Carneiro (Sep 12 2023 at 00:47):

What are these five Environments?

Mac Malone (Sep 12 2023 at 00:48):

@Mario Carneiro The environment of the mathlib configuration and each of its dependencies.

Mario Carneiro (Sep 12 2023 at 00:48):

that is, the lakefile.olean for mathlib and each dependency?

Mario Carneiro (Sep 12 2023 at 00:50):

or is this lakefile.lean elaboration stuff?

Sebastian Ullrich (Sep 12 2023 at 06:40):

It is the combination of Lake closure and specific lakefile.olean for each package

Sebastian Ullrich (Sep 12 2023 at 06:47):

Mac Malone said:

Also, Sebastian Ullrich, Do you have performance / hotspot break of were the major time sinks are in lake and/or cache other than finalizing the environment?

For Lake, it really seems to be only that. For cache, it's
image.png

Mario Carneiro (Sep 12 2023 at 07:03):

Mac Malone said:

Sebastian Ullrich said:

It is unfortunate that because of the lakefile.olean optimization, we have to initialize five separate Environments, which adds measurable time in finalizeImports as well as MT-marking and freeing time. But I don't see an easy way to avoid that.

A potential long-term way to avoid this is to import all of the lakefiles at once. This would require making all definitions in each module private (or package namespaced) to avoid name clashes.

Alternatively, you could not use declarations at all and instead use environment extensions for the main data structures (Package, LeanLib etc). That wouldn't help with user declarations though, which could still conflict, although it wouldn't be too hard to institute conventions to avoid clashes here (like putting everything in a suitable namespace).

Sebastian Ullrich (Sep 12 2023 at 08:55):

Sebastian Ullrich said:

as well as MT-marking

@Mac Malone For what it's worth, job parallelism probably only slows us down in the no-op case. I noticed that the (I assume) Environment MT-marking is triggered by Package.fetchRelease, which it shouldn't do when the release is already fetched. But I don't know if that is the only async use.

Mario Carneiro (Sep 12 2023 at 08:56):

Why does anything have to be marked MT if it's loaded from oleans?

Mario Carneiro (Sep 12 2023 at 08:56):

those are already persistent

Sebastian Ullrich (Sep 12 2023 at 08:58):

I assume it is the extra state created by finalizeImports. The big hash table, for example.

Mario Carneiro (Sep 12 2023 at 08:59):

you mean the token table? I recently had a look at doing less at import time for that

Sebastian Ullrich (Sep 12 2023 at 08:59):

No, the map of all imported declarations in the environment

Mario Carneiro (Sep 12 2023 at 09:00):

Hm, right, hard to do much about that

Mario Carneiro (Sep 12 2023 at 09:01):

although maybe if there was a way to set a global flag to MT everything...?

Sebastian Ullrich (Sep 12 2023 at 09:01):

Only to import less, which is also hard without a module system

Mario Carneiro (Sep 12 2023 at 09:01):

That's definitely possible for lakefiles, we could curate something that is barely more than init, some macros, and bootstrapping definitions

Mario Carneiro (Sep 12 2023 at 09:03):

the tricky part is that unless it is a bootstrap command you have to import a lot of lean just to define an elab

Mario Carneiro (Sep 12 2023 at 09:03):

but maybe lake can get by just using macros?

Sebastian Ullrich (Sep 12 2023 at 09:49):

For what it's worth, here is the profile in Firefox Profiler, which may even be a tad better than Hotspot: https://share.firefox.dev/3PyO1JE

Mario Carneiro (Sep 12 2023 at 10:01):

how did you capture this? I haven't managed multi-process captures before

Sebastian Ullrich (Sep 12 2023 at 10:06):

perf record --call-graph dwarf lake exe cache get, it just works for me

Sebastian Ullrich (Sep 12 2023 at 10:14):

Some usage notes: the "tracks" for individual threads are hidden by default. The dark blue bars at the bottom of each track seem to show actual CPU use, i.e. time not spent waiting on locks or I/O.

Sebastian Ullrich (Sep 12 2023 at 10:19):

Scott Morrison said:

Hmm... today I get (also on nightly-testing)

% time lake exe cache get
No files to download
Decompressing 3740 file(s)
unpacked in 634 ms
lake exe cache get  1.48s user 11.02s system 507% cpu 2.464 total

Which is both significantly slower than what you are getting, but also significantly faster than I was getting three days ago (5.022s, see above).

@Scott Morrison I forgot to say, that is a surprising platform divergence but also very good to know. Is it a repeated run, i.e. with a warm file cache? It's at least clear that the surprisingly high system time must be in one of the parallel parts, though in the Linux profile these seem very much insignificant.

Scott Morrison (Sep 12 2023 at 11:27):

Yes, warmed up cache. I think we consistently see much higher system times for lean on macos than other system.

Sebastian Ullrich (Sep 12 2023 at 11:43):

Ah yes, that is true. But here it is such an outsized part of the total time that it might be worth investigating further. Do we have anyone adept at macOS profiling? :)

Alex J. Best (Sep 12 2023 at 17:20):

I may regain access to a mac next week, in the meantime I'll just mention that I have found the builtin "Instruments" application in OSX useful for profiling Lean in the past.


Last updated: Dec 20 2023 at 11:08 UTC