Zulip Chat Archive

Stream: lean4

Topic: refresh file dependencies


Kayla Thomas (Jun 06 2023 at 03:35):

Is there a way to enable automatic file dependency checking? That is, make it so that it is not necessary to run refresh file dependencies when an imported file changes?

Kayla Thomas (Jun 06 2023 at 03:53):

That is, to set the behavior to what it was in Lean 3.

Scott Morrison (Jun 06 2023 at 05:25):

There's a significant change under the hood between Lean 3 and Lean 4 here: when VSCode rebuilds dependencies in Lean 3, it is not writing oleans to disk, but in Lean 4 it is.

This makes it a bit "scarier" to automatically refresh dependencies.

Scott Morrison (Jun 06 2023 at 05:26):

In fact, what I'd like is "more of the same". Even when you first open a file in VSCode, I don't want the extension to write to oleans --- only when I run refresh file dependencies explicitly.

Scott Morrison (Jun 06 2023 at 05:27):

@Kayla Thomas, you know there is a keyboard shortcut (option-R on a mac) for this refresh. I barely notice doing it now.

Kayla Thomas (Jun 06 2023 at 05:29):

I see.

James Gallicchio (Jun 09 2023 at 19:33):

i wonder if we could detect when dependencies are out of date and then have some sort of suggestion/warning at the top saying "dependencies out of date; CTRL+SHIFT+x to refresh"

James Gallicchio (Jun 09 2023 at 19:34):

since that's the best way to teach people to do it

Scott Morrison (Jun 09 2023 at 23:22):

Yes please! I would really like to turn off the automatic "let me mess with the oleans on your disk without asking you" feature. It is a constant productivity drain.

Scott Morrison (Jun 09 2023 at 23:23):

I know I should:

  • close VSCode
  • run lake exe cache get
  • (possibly run lake build to check that it actually worked and isn't in some error 139 state)
  • reopen VSCode

every single time I want to switch a branch.

But too often it is tempting to say"oh, maybe I won't actually need live Lean to do the inspection / fix I want, I'll just run lake exe cache get from inside VSCode while I get started", and then half then time find myself in the state of VSCode and lake exe cache get having fought over the oleans.

Scott Morrison (Jun 09 2023 at 23:23):

Worst of all is when VSCode automatically building oleans goes berserk, and manages to launch 50 or more lake processes, grinding the machine to a halt until I sudo pkill lake lean.

Scott Morrison (Jun 09 2023 at 23:25):

A related productivity killer is not being able to restart the Lean4 server if you don't have any Lean4 files open! I'm frequently in the bind that I want to kill the server, but opening a lean file will start overwriting my nice oleans with bad ones, so the only way to restart the server is to quit VSCode and restart. (Which is often painful for other reasons, particularly reseting other terminals I have open.)

Gabriel Ebner (Jun 09 2023 at 23:39):

I wholeheartedly support the proposal (see also lean4#2154), but I don't have any time to implement it.

Gabriel Ebner (Jun 09 2023 at 23:41):

Scott Morrison said:

I know I should:

  • close VSCode
  • run lake exe cache get
  • (possibly run lake build to check that it actually worked and isn't in some error 139 state)
  • reopen VSCode

every single time I want to switch a branch.

You forgot step (1.5) terminate any running Lake process. Closing VS Code is not enough to kill Lake's zombie minions.

Scott Morrison (Jun 10 2023 at 00:01):

At the end of the discussion in lean4#2154, it seems there's not a decision between:

  • have lake print-paths never build, but instead return an error and suggested command line invocation if building is required
  • have lake print-paths by default error if a rebuild is required, but have lake print-paths --rebuild actually proceed with the build.

The second option seems easier to implement, so if it's at all satisfactory I hope we can go with that, at least initially to get to a usable system.

Scott Morrison (Jun 10 2023 at 00:03):

Would the steps in that case be:

  1. add the --rebuild flag for print-paths
  2. adjust the VSCode extension to detect the error from lake print-paths, and offer a GUI button to re-run lake print-paths --rebuild.

and then optional extras:

  • provide a VSCode user setting that enables lake print-paths --rebuild always?

Scott Morrison (Jun 10 2023 at 00:04):

I have never tried modifying the Lake code, but if this description is all that would be required I could probably try. :-)

Gabriel Ebner (Jun 10 2023 at 00:07):

My objection to the --rebuild flag was that it's hard to script from the editor. "Rebuild file dependencies" should work even if the server is not started / the file has crashed / the file hasn't been restarted yet after the dependencies were modified, i.e., the editor should know how to rebuild a file's dependencies w/o having to ask Lean.

Gabriel Ebner (Jun 10 2023 at 00:11):

Practically speaking, I think the MVP would be:

  1. Add an environment variable LAKE_AUTO_BUILD_DEPS ( :bike: :house:) that allows VS Code to enable/disable automatic dependency building (I think that's the most straightforward way to make it configurable).
  2. Change lake print-path so that when the environment variable is not set, it prints an error message when it would need to rebuild anything.
  3. Make the Lean server ignore this message, i.e., you can work on a file with outdated deps.

Scott Morrison (Jun 10 2023 at 00:12):

Okay, that sounds even easier to implement. :-)

Gabriel Ebner (Jun 10 2023 at 00:12):

From what I can tell, the hardest part here is figuring out if Lake needs to rebuild anything. Lake doesn't have an equivalent of make --dry-run that we could reuse.

Scott Morrison (Jun 10 2023 at 00:13):

And where would --dry-run be used from VSCode? (If I understand right, you're now talking about something post-MVP?)

Gabriel Ebner (Jun 10 2023 at 00:14):

No, --dry-run is what we need for the MVP already. We want to show the error message (please run lake print-paths manually) if and only if --dry-run would print something.

Scott Morrison (Jun 10 2023 at 00:17):

What if instead of step 3. we just show the error message, and expect users to hit alt-r to rebuild? Maybe I'm misunderstand what alt-r does.

Gabriel Ebner (Jun 10 2023 at 00:17):

What does alt-r do on your machine? I think we've got different keybindings. :smile:

Scott Morrison (Jun 10 2023 at 00:17):

Rebuilds all dependencies.

James Gallicchio (Jun 10 2023 at 00:18):

(shft+ctrl+x i assume :joy:)

Scott Morrison (Jun 10 2023 at 00:18):

And I guess it is option-r on my machine. :-)

Gabriel Ebner (Jun 10 2023 at 00:18):

To be clear (3) is not strictly necessary.

Scott Morrison (Jun 10 2023 at 00:22):

My preference for UX would be:

  • when opening a new file, if all dependent oleans already exist and are up to date, the current file is processed as usual
  • otherwise, there is a red squiggle on the first line (I don't really care what it says?)
  • and hitting shift+ctrl+x / alt+r results in rebuilding (i.e. so I then get a blue squiggle on the first line showing progress)

In particular I don't think there's any particular need for pop-ups or banners, and in particular no buttons to press: Lean4 users are going to have shift+ctrl+x burnt into their brains anyway, we don't need to be afraid of asking them to use it.

Scott Morrison (Jun 10 2023 at 00:22):

Is this what would result from your proposed 2 step MVP above?

Gabriel Ebner (Jun 10 2023 at 00:24):

That's harder to implement, because the server doesn't know whether you opened a file or pressed ctrl-shift-x. (ctrl-shift-x is implemented as close+open)

Gabriel Ebner (Jun 10 2023 at 00:25):

The MVP would mainly allow you to call lake build manually without having vscode interfere.

Scott Morrison (Jun 10 2023 at 00:27):

Okay, even that would be an improvement over what we have today. Obviously sad that it then necessitates using the command line, but presumably later we could modify shift+ctrl+x to be:

  1. close
  2. run lake build File in a terminal
  3. open

Scott Morrison (Jun 10 2023 at 00:28):

that actually has the advantage of being much more explicit about what happens when you hit shift+ctrl+x in multiple windows at once (i.e. competing builds start that will overwrite each other).

Scott Morrison (Jun 10 2023 at 01:12):

In fact, changing the behaviour of shift+ctrl+x could be done independently of touching lake. That's purely a change in the VSCode extension.

Gabriel Ebner (Jun 10 2023 at 01:14):

Only once we can disable the auto-build of course. It's Lean itself that calls lake print-paths right now, not VS Code.

Scott Morrison (Jun 10 2023 at 01:17):

I mean: the change to shift+ctrl+x by itself wouldn't solve the underlying problem (of opening a new file overwriting oleans). But having it run lake build can be done now, and if nothing else solves the explicitness problem of users not seeing what multiple shift+ctrl+xs are doing. By the time the file is reopened in the server lake build will have already run, at which point it doesn't matter whether lake print-paths is still auto-building.

Sebastian Ullrich (Jun 10 2023 at 06:31):

Would the introduction of a lockfile to keep parts from trampling over each other be a simpler first change? I understand that eventually we want both though.

Sebastian Ullrich (Jun 10 2023 at 06:31):

And then there is the also somewhat orthogonal issue of how to reliably abort Lake builds

Mario Carneiro (Jun 10 2023 at 06:37):

Another thing I have been thinking about is adding checksums to oleans, because it's pretty ballsy to just mmap and start jumping around in an arbitrary file downloaded from somewhere even without assuming ill will

Mario Carneiro (Jun 10 2023 at 06:37):

I'm pretty sure error 139 is a segfault (EDIT: it is, SIGSEGV)

Mario Carneiro (Jun 10 2023 at 06:39):

or at least a length check, if we suppose the main failure mode is a process dying in the middle of writing an olean

Sebastian Ullrich (Jun 10 2023 at 07:28):

This should be covered by the lockfile as well

James Gallicchio (Jun 10 2023 at 08:03):

do we have a way to atomically get rw access to a filename? my impression was that we needed a primitive for making unnamed temp files, which can then be atomically moved to a specific filename

Gabriel Ebner (Jun 10 2023 at 17:56):

Would the introduction of a lockfile to keep parts from trampling over each other be a simpler first change? I understand that eventually we want both though.

The immediate issue is that Lake starts building when you don't want it to, and it continues to build even after you've closed everything. That's why people close VS Code before switching branches. I don't see how a lockfile would help with that, although locking seems like a good idea eventually.

Sebastian Ullrich (Jun 10 2023 at 21:30):

I'm not sure I understand the scenario yet. Is the action of switching branches itself triggering Lake? Because an open file may have had its imports changed by automatic reload?

Scott Morrison (Jun 11 2023 at 00:26):

Okay, let me see if I can describe the problematic scenario.

Scott Morrison (Jun 11 2023 at 00:26):

  1. I've been working on a branch, editing multiple files. Without having checked via a lake build that everything works, I'm nevertheless satisfied enough with the condition of the branch that I'm ready to commit and send everything off to CI.
  2. I now want to switch to a new branch and begin work, so I:
    a. run git checkout blah in a terminal
    b. run lake exe cache get in that terminal
    c. possibly before b. has finished, open a file to begin looking at the issue (deciding that I don't really need live compilation to begin reading)

Scott Morrison (Jun 11 2023 at 00:26):

At least, that is the dream! It is not at all possible today.

Scott Morrison (Jun 11 2023 at 00:26):

The problems are:

  1. Because I was opening and closing multiple files while editing the initial branch, there may be hidden lake processes still compiling oleans for files that aren't even open in the editor, and I have no way of knowing if these are still running (without ps or equivalent). These processes will continue overwriting oleans that are being produced by lake exe cache get.
  2. If I proceed to step 2c above (opening a file in the new branch to begin reading) before confirming that 2b is finished, VSCode will begin compiling the file, and overwriting the oleans being produced by lake exe cache get.

Scott Morrison (Jun 11 2023 at 00:26):

To avoid these problems today, I need to:

  • when I'm finished working on a branch, quit VSCode and run sudo pkill lake lean to kill off any stray processes. (This is of course incompatible with a very common workflow --- I often have 2 or 3 branches of mathlib4 checked out in different directories, jumping between them to avoid having to wait on compilation. I understand that this is less common.) Sometimes just killing the server from VSCode is enough at this step, but regularly processes seem to survive this.
  • carefully avoid opening any new files in VSCode until lake exe cache get has finished running, leaving me with nothing to do but :fencing: (particularly because the previous bullet point makes it hard to just switch to a different checked out copy of mathlib and work there for a moment).

Scott Morrison (Jun 11 2023 at 00:26):

Directly addressing @Sebastian Ullrich's question, switching branches itself is not triggering Lake --- VSCode does not (thankfully) trigger recompilation of any already open file just because its upstream dependencies have changed on disk. For this we need to hit ctrl+shift+x.

Scott Morrison (Jun 11 2023 at 00:26):

The suggestions higher up in this thread are:

  1. When VSCode opens a file, if dependencies are not already built it should simply error, rather than starting building upstream oleans. This allows us to start opening files and looking around even while oleans are still being retrieved (or we haven't even started retrieving them).
  2. When we hit ctrl+shift+x, instead of the current behaviour of "close in the server, then open in server" (relying on the automatic building of upstream oleans when we open in the server), change to "close in the server, run lake build Foo in a VSCode terminal, open in the server). This means that there are no hidden processes that may continue writing oleans even after we close editor windows: every (long running) lake process is tied to a terminal window in VSCode, so if those are closed, you know there are no zombie processes.

Scott Morrison (Jun 11 2023 at 00:26):

Both of these fixes seem like they are very doable right now.

Scott Morrison (Jun 11 2023 at 00:27):

A harder thing to do, which we really need to do at some point, is:

  • when there are multiple lake build Foo and lake build Bar processes running on the same directory, they should coordinate at least minimally, to avoid crashing when they both write the same file. Ideally they would coordinate even more, and not duplicate work.

Scott Morrison (Jun 11 2023 at 00:27):

A related problem to all of this is that I can not kill the server if no Lean 4 files are open (just because this command palette option is only available if a Lean4 file is open), so I either need to close VSCode and kill lake/lean processes on the command line (disruptive!) or open a Lean 4 file (thereby triggering automatically building oleans, even if there weren't already any processes doing this, and thus corrupting my oleans again). Again, this one should have a really easy fix in the VSCode extension.

Gabriel Ebner (Jun 11 2023 at 00:58):

A related problem to all of this is that I can not kill the server if no Lean 4 files are open

I'm pretty sure that merely restarting the server doesn't kill Lake.

Alex Keizer (Jun 11 2023 at 09:37):

How often do we actually want multiple lake builds running simultaneously for the same project? Could we circumvent some of the conflicts by having some lockfile mechanism in place?

Alex Keizer (Jun 11 2023 at 09:39):

That would allow both the current seamless experience of just opening a file and working on it, without having to manually trigger the dependency build, and, if some other process is currently holding the lock, vscode knows to wait until the lock is released before trying a build

Sebastian Ullrich (Jun 11 2023 at 13:18):

Scott Morrison said:

Because I was opening and closing multiple files while editing the initial branch, there may be hidden lake processes still compiling oleans for files that aren't even open in the editor, and I have no way of knowing if these are still running

With a lockfile shared by Lake and Cache, you will know. cache get will block until the Lake processes are finished building (which of course still shouldn't happen invisibly in the background).

If I proceed to step 2c above (opening a file in the new branch to begin reading) before confirming that 2b is finished, VSCode will begin compiling the file, and overwriting the oleans being produced by lake exe cache get.

Analogously, with a lockfile, here the language server will wait for cache get to finish before compiling any remaining missing dependencies, if any.

Kyle Miller (Jun 11 2023 at 16:20):

Scott Morrison said:

  1. When VSCode opens a file, if dependencies are not already built it should simply error, rather than starting building upstream oleans.

I'd be in favor of this change -- opening a file shouldn't automatically trigger an expensive (and sometimes a little destructive) process.

This would also help improve usability in Gitpod, where files are automatically saved as you edit them, so if you want to open a file that imports an edited one to consult it (since you might be wanting to see how something is used while refactoring), you're faced with a potentially large recompilation unless you remember to turn off the server manually -- and then when you turn the server back on you have to wait for your file to re-typecheck up to the place you're editing.

Lockfiles might solve some problems (while also introducing an extra piece of state to have to manually clear when things go wrong), but they won't solve the "I just want to see the contents of a file that's hundreds of imports away" problem

Sebastian Ullrich (Jun 11 2023 at 20:51):

I believe all three parts will be essential to make the process really robust, now we just have to implement them

Shreyas Srinivas (Jun 11 2023 at 21:02):

Sebastian Ullrich said:

I believe all three parts will be essential to make the process really robust, now we just have to implement them

Does the above discusssion cover the following scenario : Someone opens vscode without starting cache. So the extension acquires the lock and starts building files, and then the user runs cache inside vscode's terminal, which will wait for the build to finish before starting the cache process. The ideal behaviour is that the extension acquire that lock and ask the user if it should search for cached oleans, get them, and then build . Or ideally before running build, caches are always fetched first (with an option to turn it off if a user wants that).

Sebastian Ullrich (Jun 11 2023 at 21:08):

Integrating the cache into the extension and/or Lake is yet another orthogonal but welcome improvement

Alex Keizer (Jun 11 2023 at 21:09):

One of the three parts mentioned is that vscode should not automatically start building upon opening a file. Instead, if dependencies are outdated vscode should display some kind of prompt "Please press {hotkey} to build dependencies". But, indeed, it would be nice if lake build would transparently consult the cache

Shreyas Srinivas (Jun 11 2023 at 21:12):

Does the extension use lake to build?

Shreyas Srinivas (Jun 11 2023 at 21:13):

If yes, then making lake build cache aware and coordinated at the project level by a lockfile, would solve the problem all around.

Shreyas Srinivas (Jun 11 2023 at 21:16):

If no, then maybe eventually it should be using lake build. Separation and isolation of concerns and all that.

Mac Malone (Jun 13 2023 at 17:13):

Note that integrate cache into lake build does not require a Lake change, it can be done by using the existing extraDepTargets feature in the mathlib configuration file.

Sebastian Ullrich (Jun 27 2023 at 16:09):

Oh my, I just learned/rediscovered how elan makes sure your Lean build processes actually die on Ctrl-C on Windows: https://github.com/leanprover/elan/blob/master/src/elan-cli/job.rs#L3-L18
That's a lot of platform specific code!

Sebastian Ullrich (Jun 27 2023 at 16:15):

So perhaps we want to keep that logic in elan and add a Unix analogue as well. Then we would only have to make sure that we start worker processes through elan

Sebastian Ullrich (Jun 27 2023 at 16:16):

This should mean that on Windows, restarting the entire server already kills runaway lake/lean processes. Could anyone confirm that?

Floris van Doorn (Jun 28 2023 at 14:23):

Sebastian Ullrich said:

This should mean that on Windows, restarting the entire server already kills runaway lake/lean processes. Could anyone confirm that?

I think so, yes. I did the following to test:

  • Update mathlib, do not get cache
  • Open random deep mathlib file in VSCode -> many lean.exe processes start (and keep starting and ending)
  • Close file, open empty .lean file. -> processes still continue and new processes continue to spawn
  • Run Lean 4: Restart Server -> server restarts, Lean processes this file, but no lean.exe process that uses >=1% CPU remains, and no new lean.exe processes are spawned.

Floris van Doorn (Jun 28 2023 at 14:25):

So it seems like it is already mostly working correctly on Windows. It would be nice if closing a file would also stop the runaway lean processes.

Sebastian Ullrich (Jun 28 2023 at 14:50):

That's great to hear! And making sure the file worker is started through Lean as well would exactly take care of that last issue, yes!

Sebastian Ullrich (Jun 28 2023 at 14:51):

The rare scenario that actually works better on Windows than on all other platforms

Gabriel Ebner (Jun 28 2023 at 17:17):

Sebastian Ullrich said:

So perhaps we want to keep that logic in elan and add a Unix analogue as well. Then we would only have to make sure that we start worker processes through elan

I think the logic makes a lot of sense for lake serve. But not at all for lake env bash. IMHO this logic belongs in the file worker, not in elan.

Sebastian Ullrich (Jun 28 2023 at 17:34):

It might be more appropriate to solve in lake/lean than in elan, but now it also looks so much easier to solve in Rust than in Lean

Sebastian Ullrich (Jul 03 2023 at 11:05):

I want to address this issue this week, but I am undecided between the two approaches of where to handle the build discussed above:

Keep building in the file worker:

  • The issue of build process orphans is not affecting Windows and I know how to solve it on POSIX
  • An option for preventing building when opening a file could be smuggled in as an extension of the didOpen request so that the editor can decide which actions should trigger a build
  • Gabriel mentioned it would be nice to trigger dependency builds without depending on a running server, though I'm not sure what the use case for that is. In the end running the file worker is usually what we want to build the dependencies for. There is also a potential issue in racing server creation and Lake building when run as elan wrappers as elan toolchain installation is not safe to do concurrently.
  • We should change the build progress message to include the entire log, with sufficient debouncing

Move building to the editor:

  • Needs to be reimplemented in each editor. If we make the new behavior opt-in, we can start with changing vscode-lean4 only, but also that is one of the parts of Lean that I know least about.
  • Does it make sense not to reuse the info view for reporting progress on dependency builds? I feel an automatically popping up terminal window might be annoying, especially when having auto-build enabled. Though perhaps we can debounce that as well.
  • How do we synchronize starting the file worker with building when auto-build is enabled? Is delaying didOpen in the middleware indefinitely safe?

Sebastian Ullrich (Jul 03 2023 at 11:28):

There are technical limitations as noted above, but I would also be interested in what people think would be the ideal interface longer term (tl;dr of technical message: should dependency building happen in the info view or in an editor terminal?). Note that in an ideal system, building also isn't destructive: build results would go into a user-wide cache that merely destructs your free filesystem space

Sebastian Ullrich (Jul 03 2023 at 12:22):

The info view would also be the correct approach if we want to eventually make dependency build message structured instead of plain text, even if they will never be interactive like message from the current file

Sebastian Ullrich (Jul 03 2023 at 12:23):

On the other hand, decoupling building and serving means that you do not lose the open file's state if building its dependencies fails. Oh dear...

Mac Malone (Jul 03 2023 at 13:29):

@Sebastian Ullrich One other option for handling this would be to integrate the server watchdog with Lake and have it determine whether to build dependencies prior to opening a file worker on a didOpen request. I know we discussed integrated the watchdog with Lake previously and @Gabriel Ebner was for the idea. This would also eliminate possible race conditions, as we would no longer be spawning multiple instances of Lake and remove start-up overhead (e.g., loading the workspace).

Mac Malone (Jul 03 2023 at 13:31):

Though, admittedly, this may be more of a long-term solution, and maybe a shorter-term quick fix is more desirable.

Sebastian Ullrich (Jul 03 2023 at 13:37):

I'm not sure I understand the significance of calling a Lake API in the watchdog vs calling a Lake cmdline in the file worker in this context

Sebastian Ullrich (Jul 03 2023 at 13:38):

I.e. what part this is meant to solve/improve

Mac Malone (Jul 03 2023 at 13:49):

@Sebastian Ullrich The idea is to integrate watchdog code into lake serve, not the other way around. Here is the old thread, for reference. This would prevent races between multiple lakes, and inconsistencies that may result. It would also allow more configuration of how the server works through lakefile options (which may be useful for toggling didOpen builds or other such features).

Mac Malone (Jul 03 2023 at 13:51):

It would also potentially allow for smart import processing , tagging modules with failed builds or falling back to old oleans if a build fails and tagging a warning on the module.

Sebastian Ullrich (Jul 03 2023 at 14:00):

Oh I see, that direction. To be honest, I am not immediately concerned about any of these in the context of this thread. As for consistency, we still need process lock files if we want server and cmdline invocations (such as cache get) to coexist.

Scott Morrison (Jul 04 2023 at 07:43):

I'm not too concerned about the "lose the open file's state if building its dependencies fails" issue.

It seems like keeping things in the infoview is nicer to the user than opening terminal windows for them. I think I'd been in favour of terminal windows previously, but only to make it really obvious to the user where builds were actively happening. If we are confident that builds terminate when the relevant editor pane closes, and that builds initiated from different editor panes correctly wait for each other, then the terminal is clunky and the infoview is preferred.

It sounds from what you say above that this is also easier, or at least less editor specific! It would be great to have these changes, so easy is good. :-)

Sebastian Ullrich (Jul 04 2023 at 07:48):

Thanks Scott, that is exactly what I wanted to hear :big_smile: . It is at the very least the approach I am personally more comfortable with implementing, yes.

Sebastian Ullrich (Jul 04 2023 at 09:55):

Step 1: lean4#2307

Scott Morrison (Jul 04 2023 at 10:09):

Just tested locally, works great!

Sebastian Ullrich (Jul 04 2023 at 10:20):

Thanks for testing, we should have all platforms covered then!


Last updated: Dec 20 2023 at 11:08 UTC