Zulip Chat Archive

Stream: lean4

Topic: Lake server uses options from the wrong package


Mario Carneiro (Jul 20 2023 at 01:46):

If you ctrl-click on a import Std.Data.List.Basic from mathlib, you get errors when processing the file because of autoimplicits (mathlib uses relaxedAutoImplicit=false, std doesn't). This seems like a configuration issue in lake, it should be using the options from the original package.

Mario Carneiro (Jul 20 2023 at 01:49):

cc: @Mac

Mac Malone (Jul 20 2023 at 01:51):

@Mario Carneiro This is a VSCode issue. VSCode processes go-to-def files using the workspace server, which is configured according to the root package. Thus, if you go-to-def any file within a mathlib workspace, it will use mathlib's configuration.

Mario Carneiro (Jul 20 2023 at 01:52):

I'm also suspecting that mathlib should not be setting moreServerArgs at all, the option relaxedAutoImplicit=false is relevant to compilation and should be set directly on lean, not on the server

Mario Carneiro (Jul 20 2023 at 01:52):

it is currently being set on both

Mario Carneiro (Jul 20 2023 at 01:53):

do moreLeanArgs settings also take effect when browsing relevant files in the server?

Mario Carneiro (Jul 20 2023 at 01:55):

it doesn't seem to be

Mario Carneiro (Jul 20 2023 at 01:56):

which is awkward, I think it means it is impossible to set these options correctly

Mario Carneiro (Jul 20 2023 at 02:11):

@Mac This doesn't really sound like a vscode issue though. We have one server running, okay, but the options to pass to lean when running a file worker need to be customized per-file, how is that configuration passed along? How should it be passed along?

Mario Carneiro (Jul 20 2023 at 02:13):

also cc: @Sebastian Ullrich since this seems like a server issue

Mac Malone (Jul 20 2023 at 02:21):

@Mario Carneiro The most principled way to fix this would be for Lake to manage the watchdog and spawn workers with proper settings based on its knowledge of the workspace configuration (and the ability to load any workspaces in subdirectories).

Mac Malone (Jul 20 2023 at 02:22):

However, that is a large refactor, and the question is what is the best large refactor to do next.

Mario Carneiro (Jul 20 2023 at 02:23):

how is this data getting around right now?

Mac Malone (Jul 20 2023 at 02:25):

Lake spawns the server with fixed environment variables and arguments, the server spawns a lake process to build module dependencies and receives some information about their configuration in the build output before elaborating the module. That is the sum total of their interaction.

Mario Carneiro (Jul 20 2023 at 02:25):

am I right in understanding that:

  • lake serve is run at the workspace root by vscode
  • lake picks up the moreServerArgs settings from the root lakefile
  • lake calls lean --server with these args
  • Lean.Server stores these args somewhere and passes them to file workers

Mac Malone (Jul 20 2023 at 02:25):

Yes

Mario Carneiro (Jul 20 2023 at 02:26):

and I guess the only reason the server can even find the files is because lake sets the LEAN_SRC_PATH

Mario Carneiro (Jul 20 2023 at 02:26):

how does the server rebuild the file dependencies if it's out of date?

Mac Malone (Jul 20 2023 at 04:32):

@Mario Carneiro lake print-paths

Mario Carneiro (Jul 20 2023 at 04:33):

(let me take this moment to comment on how poorly named that command is)

Mac Malone (Jul 20 2023 at 04:35):

@Mario Carneiro my understanding is that Sebastian is responsible for the name ;)

Mac Malone (Jul 20 2023 at 04:36):

And, admittedly, it does print paths.

Mario Carneiro (Jul 20 2023 at 04:37):

On that note I also have a simple implementation of the JSON export mentioned at !lake#176 , and perhaps if it is merged into lake it can be used by the server as an actual "print-paths" so it knows where the packages are and the lake options for them, although this would require the json data structures being in the Lean package or otherwise in some commonly accessible location

Mario Carneiro (Jul 20 2023 at 04:37):

perhaps the recent move of lake into lean4 repo will simplify this

Mario Carneiro (Jul 20 2023 at 04:55):

Here is the export program: https://github.com/digama0/lean-cache/blob/main/lake-ext/Main.lean . Is there anything you think should not be included in the output @Mac ?

Mario Carneiro (Jul 20 2023 at 05:00):

sample output: workspace-manifest.json

Mario Carneiro (Jul 20 2023 at 05:04):

Most of the data is generic stuff useful for any tool like lake exe cache or the server to find your way around the packages and libraries in the workspace. Some of it is specific to the user's machine, e.g. absolute paths to the project itself or to the lean installation folder

Mario Carneiro (Jul 20 2023 at 05:05):

another unfortunate thing is that it does not reflect any -K flag specific things encoded by meta if in the lakefile, but this is kind of inherent in the way lake processes the lakefile

Mac Malone (Jul 20 2023 at 05:50):

@Mario Carneiro That is definitely very different from what I was thinking of for 176. I would not include build-related data like lean lib/exe configs or environment/platform data like Lake.Env in the manifest. What I would stick in there is just layout data from the workspace/package (e.g., buildDir, libDir, etc.) along with paths (e.g., leanPath, etc. -- some of which may just be derivable from the layout info) and package configurations options (e.g., -K settings).

Mac Malone (Jul 20 2023 at 05:51):

As such, I suspect you have a different kind of use case in mind. I am curious to see how you use those results in practice.

Mario Carneiro (Jul 20 2023 at 05:51):

See the other thread :)

Mac Malone (Jul 20 2023 at 05:51):

What other thread?

Mario Carneiro (Jul 20 2023 at 05:51):

https://leanprover.zulipchat.com/#narrow/stream/287929-mathlib4/topic/lean-cache.3A.20lake.20exe.20cache.20rewrite

Mac Malone (Jul 20 2023 at 05:57):

Oh, I see you are mostly using the exports to generate traces yourself.

Mac Malone (Jul 20 2023 at 05:59):

However, you are not really using much of what you export either. So, my initial thought of just layout would get you most of the way there, the only thing else you are looking for is the arguments.

Mario Carneiro (Jul 20 2023 at 06:00):

the only part of the Lake.Env I am using right now is the githash

Mario Carneiro (Jul 20 2023 at 06:01):

I guess because it's a component in the trace

Mac Malone (Jul 20 2023 at 06:01):

which you could also just call lean --githash to get.

Mario Carneiro (Jul 20 2023 at 06:02):

what a waste of 183 ms

Mario Carneiro (Jul 20 2023 at 06:04):

I don't see a reason to keep that out of the workspace manifest, in fact do you have a clear statement of what things should not be in there and why?

Mario Carneiro (Jul 20 2023 at 06:04):

because (as you can tell) I took a pretty maximalist approach, stashing pretty much everything that can be json encoded from lake's state

Mac Malone (Jul 20 2023 at 06:04):

Well, I would argue it is duplicate information.

Mario Carneiro (Jul 20 2023 at 06:05):

duplicating what?

Mac Malone (Jul 20 2023 at 06:05):

Information already derivable from the lean-toolchain.

Mario Carneiro (Jul 20 2023 at 06:05):

the lean toolchain does not have a githash?

Mario Carneiro (Jul 20 2023 at 06:05):

you have to hit the network to turn it into a githash

Mario Carneiro (Jul 20 2023 at 06:06):

that's the whole point of precomputing this stuff

Mario Carneiro (Jul 20 2023 at 06:06):

it is all derivable, of course, it just takes way too long to do it all the time

Mac Malone (Jul 20 2023 at 06:06):

Also, your custom trace computation is relying on underspecified behavior of Lake (the trace computation is not part of Lake's public API).

Mario Carneiro (Jul 20 2023 at 06:06):

that's nice

Mac Malone (Jul 20 2023 at 06:07):

It is fine for now (and it may be fine for a long while), but I would not consider breaking it a breaking change. It is also why I would keep your version of lake_ext independent.

Mario Carneiro (Jul 20 2023 at 06:08):

if you change it, this program would just change to match

Mac Malone (Jul 20 2023 at 06:08):

Mario Carneiro said:

you have to hit the network to turn it into a githash

In order to run Lake, you need to have already downloaded the toolchain, so it is just a CLI call, not a network call.

Mario Carneiro (Jul 20 2023 at 06:08):

I'm not running lake though, that's the point

Mac Malone (Jul 20 2023 at 06:09):

Or Lean.

Mario Carneiro (Jul 20 2023 at 06:09):

I'm not running lean either

Mario Carneiro (Jul 20 2023 at 06:09):

I'm running curl

Mac Malone (Jul 20 2023 at 06:09):

You cannot do anything of relevance without a downloaded toolchain.

Mac Malone (Jul 20 2023 at 06:09):

What is the point of downloading oleans without a toolchain?

Mario Carneiro (Jul 20 2023 at 06:10):

the point is that running external tools is slow and I'm trying to make lake exe cache not take 10 seconds

Mac Malone (Jul 20 2023 at 06:10):

Mario Carneiro said:

I'm not running lake though, that's the point

According to your stats, running Lake only has a 1s impact, that does not seem to be the major time factor here.

Mario Carneiro (Jul 20 2023 at 06:11):

That's a 1 second impact whenever you run anything using lake, including lake env

Mario Carneiro (Jul 20 2023 at 06:11):

I assure you env doesn't take 1 second

Mac Malone (Jul 20 2023 at 06:11):

Yes, and I agree that is something that needs to be fixed.

Mac Malone (Jul 20 2023 at 06:12):

But that is separate from the trace computation you are doing.

Mario Carneiro (Jul 20 2023 at 06:12):

the trace computation is about 4 seconds IIRC, it can be brought down to 1 second by caching the hashes of oleans (which I would like to upstream to lake as well)

Mac Malone (Jul 20 2023 at 06:13):

My point with the time save was that your new cache has went from 10s with lake+old-cache to 4s with new-cache plus 1 sec from lake.

Mac Malone (Jul 20 2023 at 06:14):

What interests me is why why new-cache is still 2x faster even with Lake factored out.

Mario Carneiro (Jul 20 2023 at 06:14):

old-cache isn't even doing the trace computation

Mario Carneiro (Jul 20 2023 at 06:15):

it's one of the more expensive parts of new-cache

Mac Malone (Jul 20 2023 at 06:15):

Okay, then I am very confused.

Mac Malone (Jul 20 2023 at 06:16):

Mario Carneiro said:

the trace computation is about 4 seconds IIRC, it can be brought down to 1 second by caching the hashes of oleans (which I would like to upstream to lake as well)

What do you mean by caching the hash of an olean?

Mario Carneiro (Jul 20 2023 at 06:20):

The trace computation for a lean file involves the olean+ilean hash of the dependencies. This is silly, because those are output files. You can calculate the hash of these files right after you build them and store a .olean.trace file, then when you are later looking for the hash of this file you instead just read it out of the .olean.trace file. These files are huge and numerous, reading them is the majority of the cost of the trace computation

Mac Malone (Jul 20 2023 at 06:20):

Mario Carneiro said:

the point is that running external tools is slow and I'm trying to make lake exe cache not take 10 seconds

Which is fair. My suggestion is that Lake be updated to export the layout data into the manifest and lean_ext can be used to export trace related stuff into a different manifest (e.g., the args and githash).

Mario Carneiro (Jul 20 2023 at 06:21):

sure, if you want two json files that's fine too

Mario Carneiro (Jul 20 2023 at 06:22):

it would be preferable to not have to ship lean_ext though, and it's really hard to write that code from rust without FFI bindings for lake

Mac Malone (Jul 20 2023 at 06:22):

Mario Carneiro said:

The trace computation for a lean file involves the olean+ilean hash of the dependencies. This is silly, because those are output files. You can calculate the hash of these files right after you build them and store a .olean.trace file, then when you are later looking for the hash of this file you instead just read it out of the .olean.trace file.

Ah so you are saying to precompute the hash on (every) build, store it, and used the stored version when doing a trace double-check. That sounds good to me.

Mac Malone (Jul 20 2023 at 06:27):

Mario Carneiro said:

it would be preferable to not have to ship lean_ext though

What is the difficulty with shipping a separate package? The reason why I don't want to include in Lake is simply because this part of Lake (trace computation / build details) is not meant to be public API and thus I do not want to present a false sense of guaranteed maintenance or continued functioning.

Mac Malone (Jul 20 2023 at 06:28):

Along these lines, it is worth noting your trace computation will break with custom dependencies / builds for a library / executable.

Mario Carneiro (Jul 20 2023 at 06:28):

which part of lake_ext don't you want to export?

Mac Malone (Jul 20 2023 at 06:28):

Primarily, build configs.

Mario Carneiro (Jul 20 2023 at 06:30):

The trace computation is partial, if you depend on precompileModules then it doesn't try to compute the trace, and as a result it doesn't have to deal with extern libs either

Mac Malone (Jul 20 2023 at 06:30):

Build configs are a user-facing customization feature. That is, the public interface is a stable part of Lake, but how they are used internally to perform builds is not.

Mario Carneiro (Jul 20 2023 at 06:30):

I'm pretty sure the computation is correct in all cases where it returns a value

Mac Malone (Jul 20 2023 at 06:31):

@Mario Carneiro oh, what about extraDepTargets?

Mac Malone (Jul 20 2023 at 06:31):

(and cloud releases i.e. preferReleaseBuild)

Mario Carneiro (Jul 20 2023 at 06:31):

you can't have any under these conditions, I checked

Mario Carneiro (Jul 20 2023 at 06:32):

(I had to include all of these things because they are inputs to the hash)

Mac Malone (Jul 20 2023 at 06:32):

@Mario Carneiro you can't have them under what condition?

Mac Malone (Jul 20 2023 at 06:32):

(you can certainly have them without precompileModules)

Mac Malone (Jul 20 2023 at 06:33):

or do you mean you check to make sure the array is empty?

Mario Carneiro (Jul 20 2023 at 06:34):

if precompileModules is false, then you always have 0 precompile modules, so you have 0 modJobs, 0 externJobs and 0 externDynlibs

Mac Malone (Jul 20 2023 at 06:34):

Also, a near-term TODO is to add extraDepTargets (or similar) to libraries and executables as well.

Mac Malone (Jul 20 2023 at 06:34):

Mario Carneiro said:

if precompileModules is false, then you always have 0 precompile modules, so you have 0 modJobs, 0 externJobs and 0 externDynlibs

Yes, but you can still set extraDepTargets on a package, which will feed into the library, executable, and module traces.

Mac Malone (Jul 20 2023 at 06:35):

(e.g., the extraDepJob in Module.recBuildDeps)

Mario Carneiro (Jul 20 2023 at 06:37):

(did you rebase origin/master?)

Mac Malone (Jul 20 2023 at 06:37):

In fact, more customization of builds is something that, for instance, Wojciech could really use in ProofWidgets.

Mac Malone (Jul 20 2023 at 06:37):

Mario Carneiro said:

(did you rebase origin/master?)

huh?

Mac Malone (Jul 20 2023 at 06:38):

not in a good while?

Mario Carneiro (Jul 20 2023 at 06:38):

I can't ff merge master, I'm on a commit from 3 weeks ago which looks like it got rewritten

Mac Malone (Jul 20 2023 at 06:39):

Mac said:

In fact, more customization of builds is something that, for instance, Wojciech could really use in ProofWidgets.

The reason I am mentioned this is because any such customization will further introduce opportunities to break static trace computation.

Mac Malone (Jul 20 2023 at 06:39):

Mario Carneiro said:

I can't ff merge master, I'm on a commit from 3 weeks ago which looks like it got rewritten

Yes, possibly, if you were unlucky and happen to pull right when I was fixing a test.

Mario Carneiro (Jul 20 2023 at 06:40):

okay I'm back on origin/master (I wasn't able to find recBuildDeps, I guess it was recently renamed)

Mac Malone (Jul 20 2023 at 06:41):

However, extraDepJob has been there for a long time.

Mario Carneiro (Jul 20 2023 at 06:41):

okay, I see it now... extraDepJob is there but it doesn't seem to contribute to the trace

Mac Malone (Jul 20 2023 at 06:42):

Mario Carneiro said:

okay I'm back on origin/master (I wasn't able to find recBuildDeps, I guess it was recently renamed)

Ah, yeah, that was introduced in a refactor 2 weeks ago. Sorry, I was going to mention that, but your rebase concern confused me and led me to believe something else was the matter. :sweat:

Mac Malone (Jul 20 2023 at 06:42):

Mario Carneiro said:

okay, I see it now... extraDepJob is there but it doesn't seem to contribute to the trace

Oh, you are right! That is a bug.

Mac Malone (Jul 20 2023 at 06:43):

I am surprised I never noticed that.

Mario Carneiro (Jul 20 2023 at 06:45):

I'll wait until you fix this upstream before handling it in new-cache, but I'm guessing that it will involve using none for the trace if anything we can't run shows up

Mario Carneiro (Jul 20 2023 at 06:48):

for the cache it's fine if the calculation is approximate, we just want to determine when lake will think the file is up to date so we don't have to download/unpack it

Mac Malone (Jul 20 2023 at 06:52):

@Mario Carneiro How often are you expecting to be able to run new-cache without a lake-ext right before? I am having a hard time coming up with cases where this would occur outside of exceptional circumstances.

Mac Malone (Jul 20 2023 at 06:53):

The reason I ask, is it might just be more useful to have lake spit out the dirty files instead?

Mario Carneiro (Jul 20 2023 at 06:55):

The intention is to have a workspace-manifest.trace file as well, so we know when to re-run lake-ext. (Right now it just prompts the user to do so)

Mario Carneiro (Jul 20 2023 at 06:56):

but in many situations it doesn't get invalidated even if you change branches

Mac Malone (Jul 20 2023 at 06:56):

How would that work? There does not seem to be a way to generate that without running lake, and if you are running lake you might as well just regenerate the file.

Mario Carneiro (Jul 20 2023 at 06:57):

generate what

Mac Malone (Jul 20 2023 at 06:57):

workspace-manifest.trace

Mario Carneiro (Jul 20 2023 at 06:57):

it's just the hash of the lakefile right now

Mac Malone (Jul 20 2023 at 06:58):

But the lakefile can dynamically change based on configuration and just pure IO?

Mac Malone (Jul 20 2023 at 06:58):

I guess the goal here is just best effort?

Mario Carneiro (Jul 20 2023 at 06:59):

sure, you could put a random number generator in the lakefile but then you have bigger problems

Mario Carneiro (Jul 20 2023 at 06:59):

we are indeed making the assumption that the workspace manifest is a pure function of the lakefile contents and the OS configuration

Mac Malone (Jul 20 2023 at 07:00):

The lakefile can also use things like platform, architecture, package versions, etc. to change the configuration.

Mario Carneiro (Jul 20 2023 at 07:00):

I'm not assuming it is platform independent

Mac Malone (Jul 20 2023 at 07:00):

Mario Carneiro said:

we are indeed making the assumption that the workspace manifest is a pure function of the lakefile contents and the OS configuration

Which is fine as a simplifying assumption, but not as a general one.

Mario Carneiro (Jul 20 2023 at 07:01):

I mean, you are already in trouble just with file paths

Mario Carneiro (Jul 20 2023 at 07:01):

it's fine for something that does caching, because if you have a random number generator in the lakefile then caching is pointless

Mac Malone (Jul 20 2023 at 07:01):

Mario Carneiro said:

I'm not assuming it is platform independent

Fair, but I was also thinking of a lakefile that builds differently depending on which system libraries are available.

Mario Carneiro (Jul 20 2023 at 07:02):

yeah, -K options are also problematic for the same reason

Mario Carneiro (Jul 20 2023 at 07:02):

there isn't much that can be done about it

Mario Carneiro (Jul 20 2023 at 07:03):

ultimately lakefile.lean is a program that generates output, and workspace-manifest.json is that output

Mario Carneiro (Jul 20 2023 at 07:04):

If there was a dont-cache-me-bro flag that you could set in the lakefile then it could be used to signal shenanigans and avoid this issue

Mario Carneiro (Jul 20 2023 at 07:05):

for most system configuration, I think it is safe to assume that it doesn't change that often, at least not between branches of the same project (which is the primary use case of lake exe cache)

Mario Carneiro (Jul 20 2023 at 07:14):

Mac said:

The reason I ask, is it might just be more useful to have lake spit out the dirty files instead?

This would be a useful feature regardless

Sebastian Ullrich (Jul 20 2023 at 09:09):

Is... there a tl;dr? Are we still talking about the original issue?

Sebastian Ullrich (Jul 20 2023 at 09:21):

Note that we have not yet implemented multi-root workspaces support in vscode-lean4, though I believe roots have to be added manually. In Emacs this all just works out of the box afaik...


Last updated: Dec 20 2023 at 11:08 UTC