Zulip Chat Archive
Stream: lean4
Topic: Lake server uses options from the wrong package
Mario Carneiro (Jul 20 2023 at 01:46):
If you ctrl-click on a import Std.Data.List.Basic
from mathlib, you get errors when processing the file because of autoimplicits (mathlib uses relaxedAutoImplicit=false
, std doesn't). This seems like a configuration issue in lake, it should be using the options from the original package.
Mario Carneiro (Jul 20 2023 at 01:49):
cc: @Mac
Mac Malone (Jul 20 2023 at 01:51):
@Mario Carneiro This is a VSCode issue. VSCode processes go-to-def files using the workspace server, which is configured according to the root package. Thus, if you go-to-def any file within a mathlib workspace, it will use mathlib's configuration.
Mario Carneiro (Jul 20 2023 at 01:52):
I'm also suspecting that mathlib should not be setting moreServerArgs
at all, the option relaxedAutoImplicit=false
is relevant to compilation and should be set directly on lean, not on the server
Mario Carneiro (Jul 20 2023 at 01:52):
it is currently being set on both
Mario Carneiro (Jul 20 2023 at 01:53):
do moreLeanArgs
settings also take effect when browsing relevant files in the server?
Mario Carneiro (Jul 20 2023 at 01:55):
it doesn't seem to be
Mario Carneiro (Jul 20 2023 at 01:56):
which is awkward, I think it means it is impossible to set these options correctly
Mario Carneiro (Jul 20 2023 at 02:11):
@Mac This doesn't really sound like a vscode issue though. We have one server running, okay, but the options to pass to lean when running a file worker need to be customized per-file, how is that configuration passed along? How should it be passed along?
Mario Carneiro (Jul 20 2023 at 02:13):
also cc: @Sebastian Ullrich since this seems like a server issue
Mac Malone (Jul 20 2023 at 02:21):
@Mario Carneiro The most principled way to fix this would be for Lake to manage the watchdog and spawn workers with proper settings based on its knowledge of the workspace configuration (and the ability to load any workspaces in subdirectories).
Mac Malone (Jul 20 2023 at 02:22):
However, that is a large refactor, and the question is what is the best large refactor to do next.
Mario Carneiro (Jul 20 2023 at 02:23):
how is this data getting around right now?
Mac Malone (Jul 20 2023 at 02:25):
Lake spawns the server with fixed environment variables and arguments, the server spawns a lake process to build module dependencies and receives some information about their configuration in the build output before elaborating the module. That is the sum total of their interaction.
Mario Carneiro (Jul 20 2023 at 02:25):
am I right in understanding that:
lake serve
is run at the workspace root by vscodelake
picks up themoreServerArgs
settings from the root lakefilelake
callslean --server
with these argsLean.Server
stores these args somewhere and passes them to file workers
Mac Malone (Jul 20 2023 at 02:25):
Yes
Mario Carneiro (Jul 20 2023 at 02:26):
and I guess the only reason the server can even find the files is because lake sets the LEAN_SRC_PATH
Mario Carneiro (Jul 20 2023 at 02:26):
how does the server rebuild the file dependencies if it's out of date?
Mac Malone (Jul 20 2023 at 04:32):
@Mario Carneiro lake print-paths
Mario Carneiro (Jul 20 2023 at 04:33):
(let me take this moment to comment on how poorly named that command is)
Mac Malone (Jul 20 2023 at 04:35):
@Mario Carneiro my understanding is that Sebastian is responsible for the name ;)
Mac Malone (Jul 20 2023 at 04:36):
And, admittedly, it does print paths.
Mario Carneiro (Jul 20 2023 at 04:37):
On that note I also have a simple implementation of the JSON export mentioned at !lake#176 , and perhaps if it is merged into lake it can be used by the server as an actual "print-paths" so it knows where the packages are and the lake options for them, although this would require the json data structures being in the Lean package or otherwise in some commonly accessible location
Mario Carneiro (Jul 20 2023 at 04:37):
perhaps the recent move of lake into lean4 repo will simplify this
Mario Carneiro (Jul 20 2023 at 04:55):
Here is the export program: https://github.com/digama0/lean-cache/blob/main/lake-ext/Main.lean . Is there anything you think should not be included in the output @Mac ?
Mario Carneiro (Jul 20 2023 at 05:00):
sample output: workspace-manifest.json
Mario Carneiro (Jul 20 2023 at 05:04):
Most of the data is generic stuff useful for any tool like lake exe cache
or the server to find your way around the packages and libraries in the workspace. Some of it is specific to the user's machine, e.g. absolute paths to the project itself or to the lean installation folder
Mario Carneiro (Jul 20 2023 at 05:05):
another unfortunate thing is that it does not reflect any -K
flag specific things encoded by meta if
in the lakefile, but this is kind of inherent in the way lake processes the lakefile
Mac Malone (Jul 20 2023 at 05:50):
@Mario Carneiro That is definitely very different from what I was thinking of for 176. I would not include build-related data like lean lib/exe configs or environment/platform data like Lake.Env
in the manifest. What I would stick in there is just layout data from the workspace/package (e.g., buildDir
, libDir
, etc.) along with paths (e.g., leanPath
, etc. -- some of which may just be derivable from the layout info) and package configurations options (e.g., -K
settings).
Mac Malone (Jul 20 2023 at 05:51):
As such, I suspect you have a different kind of use case in mind. I am curious to see how you use those results in practice.
Mario Carneiro (Jul 20 2023 at 05:51):
See the other thread :)
Mac Malone (Jul 20 2023 at 05:51):
What other thread?
Mario Carneiro (Jul 20 2023 at 05:51):
Mac Malone (Jul 20 2023 at 05:57):
Oh, I see you are mostly using the exports to generate traces yourself.
Mac Malone (Jul 20 2023 at 05:59):
However, you are not really using much of what you export either. So, my initial thought of just layout would get you most of the way there, the only thing else you are looking for is the arguments.
Mario Carneiro (Jul 20 2023 at 06:00):
the only part of the Lake.Env
I am using right now is the githash
Mario Carneiro (Jul 20 2023 at 06:01):
I guess because it's a component in the trace
Mac Malone (Jul 20 2023 at 06:01):
which you could also just call lean --githash
to get.
Mario Carneiro (Jul 20 2023 at 06:02):
what a waste of 183 ms
Mario Carneiro (Jul 20 2023 at 06:04):
I don't see a reason to keep that out of the workspace manifest, in fact do you have a clear statement of what things should not be in there and why?
Mario Carneiro (Jul 20 2023 at 06:04):
because (as you can tell) I took a pretty maximalist approach, stashing pretty much everything that can be json encoded from lake's state
Mac Malone (Jul 20 2023 at 06:04):
Well, I would argue it is duplicate information.
Mario Carneiro (Jul 20 2023 at 06:05):
duplicating what?
Mac Malone (Jul 20 2023 at 06:05):
Information already derivable from the lean-toolchain
.
Mario Carneiro (Jul 20 2023 at 06:05):
the lean toolchain does not have a githash?
Mario Carneiro (Jul 20 2023 at 06:05):
you have to hit the network to turn it into a githash
Mario Carneiro (Jul 20 2023 at 06:06):
that's the whole point of precomputing this stuff
Mario Carneiro (Jul 20 2023 at 06:06):
it is all derivable, of course, it just takes way too long to do it all the time
Mac Malone (Jul 20 2023 at 06:06):
Also, your custom trace computation is relying on underspecified behavior of Lake (the trace computation is not part of Lake's public API).
Mario Carneiro (Jul 20 2023 at 06:06):
that's nice
Mac Malone (Jul 20 2023 at 06:07):
It is fine for now (and it may be fine for a long while), but I would not consider breaking it a breaking change. It is also why I would keep your version of lake_ext
independent.
Mario Carneiro (Jul 20 2023 at 06:08):
if you change it, this program would just change to match
Mac Malone (Jul 20 2023 at 06:08):
Mario Carneiro said:
you have to hit the network to turn it into a githash
In order to run Lake, you need to have already downloaded the toolchain, so it is just a CLI call, not a network call.
Mario Carneiro (Jul 20 2023 at 06:08):
I'm not running lake though, that's the point
Mac Malone (Jul 20 2023 at 06:09):
Or Lean.
Mario Carneiro (Jul 20 2023 at 06:09):
I'm not running lean either
Mario Carneiro (Jul 20 2023 at 06:09):
I'm running curl
Mac Malone (Jul 20 2023 at 06:09):
You cannot do anything of relevance without a downloaded toolchain.
Mac Malone (Jul 20 2023 at 06:09):
What is the point of downloading oleans without a toolchain?
Mario Carneiro (Jul 20 2023 at 06:10):
the point is that running external tools is slow and I'm trying to make lake exe cache
not take 10 seconds
Mac Malone (Jul 20 2023 at 06:10):
Mario Carneiro said:
I'm not running lake though, that's the point
According to your stats, running Lake only has a 1s impact, that does not seem to be the major time factor here.
Mario Carneiro (Jul 20 2023 at 06:11):
That's a 1 second impact whenever you run anything using lake, including lake env
Mario Carneiro (Jul 20 2023 at 06:11):
I assure you env
doesn't take 1 second
Mac Malone (Jul 20 2023 at 06:11):
Yes, and I agree that is something that needs to be fixed.
Mac Malone (Jul 20 2023 at 06:12):
But that is separate from the trace computation you are doing.
Mario Carneiro (Jul 20 2023 at 06:12):
the trace computation is about 4 seconds IIRC, it can be brought down to 1 second by caching the hashes of oleans (which I would like to upstream to lake as well)
Mac Malone (Jul 20 2023 at 06:13):
My point with the time save was that your new cache has went from 10s with lake+old-cache to 4s with new-cache plus 1 sec from lake.
Mac Malone (Jul 20 2023 at 06:14):
What interests me is why why new-cache is still 2x faster even with Lake factored out.
Mario Carneiro (Jul 20 2023 at 06:14):
old-cache isn't even doing the trace computation
Mario Carneiro (Jul 20 2023 at 06:15):
it's one of the more expensive parts of new-cache
Mac Malone (Jul 20 2023 at 06:15):
Okay, then I am very confused.
Mac Malone (Jul 20 2023 at 06:16):
Mario Carneiro said:
the trace computation is about 4 seconds IIRC, it can be brought down to 1 second by caching the hashes of oleans (which I would like to upstream to lake as well)
What do you mean by caching the hash of an olean?
Mario Carneiro (Jul 20 2023 at 06:20):
The trace computation for a lean file involves the olean+ilean
hash of the dependencies. This is silly, because those are output files. You can calculate the hash of these files right after you build them and store a .olean.trace
file, then when you are later looking for the hash of this file you instead just read it out of the .olean.trace
file. These files are huge and numerous, reading them is the majority of the cost of the trace computation
Mac Malone (Jul 20 2023 at 06:20):
Mario Carneiro said:
the point is that running external tools is slow and I'm trying to make
lake exe cache
not take 10 seconds
Which is fair. My suggestion is that Lake be updated to export the layout data into the manifest and lean_ext
can be used to export trace related stuff into a different manifest (e.g., the args and githash).
Mario Carneiro (Jul 20 2023 at 06:21):
sure, if you want two json files that's fine too
Mario Carneiro (Jul 20 2023 at 06:22):
it would be preferable to not have to ship lean_ext
though, and it's really hard to write that code from rust without FFI bindings for lake
Mac Malone (Jul 20 2023 at 06:22):
Mario Carneiro said:
The trace computation for a lean file involves the
olean+ilean
hash of the dependencies. This is silly, because those are output files. You can calculate the hash of these files right after you build them and store a.olean.trace
file, then when you are later looking for the hash of this file you instead just read it out of the.olean.trace
file.
Ah so you are saying to precompute the hash on (every) build, store it, and used the stored version when doing a trace double-check. That sounds good to me.
Mac Malone (Jul 20 2023 at 06:27):
Mario Carneiro said:
it would be preferable to not have to ship
lean_ext
though
What is the difficulty with shipping a separate package? The reason why I don't want to include in Lake is simply because this part of Lake (trace computation / build details) is not meant to be public API and thus I do not want to present a false sense of guaranteed maintenance or continued functioning.
Mac Malone (Jul 20 2023 at 06:28):
Along these lines, it is worth noting your trace computation will break with custom dependencies / builds for a library / executable.
Mario Carneiro (Jul 20 2023 at 06:28):
which part of lake_ext
don't you want to export?
Mac Malone (Jul 20 2023 at 06:28):
Primarily, build configs.
Mario Carneiro (Jul 20 2023 at 06:30):
The trace computation is partial, if you depend on precompileModules then it doesn't try to compute the trace, and as a result it doesn't have to deal with extern libs either
Mac Malone (Jul 20 2023 at 06:30):
Build configs are a user-facing customization feature. That is, the public interface is a stable part of Lake, but how they are used internally to perform builds is not.
Mario Carneiro (Jul 20 2023 at 06:30):
I'm pretty sure the computation is correct in all cases where it returns a value
Mac Malone (Jul 20 2023 at 06:31):
@Mario Carneiro oh, what about extraDepTargets
?
Mac Malone (Jul 20 2023 at 06:31):
(and cloud releases i.e. preferReleaseBuild
)
Mario Carneiro (Jul 20 2023 at 06:31):
you can't have any under these conditions, I checked
Mario Carneiro (Jul 20 2023 at 06:32):
(I had to include all of these things because they are inputs to the hash)
Mac Malone (Jul 20 2023 at 06:32):
@Mario Carneiro you can't have them under what condition?
Mac Malone (Jul 20 2023 at 06:32):
(you can certainly have them without precompileModules
)
Mac Malone (Jul 20 2023 at 06:33):
or do you mean you check to make sure the array is empty?
Mario Carneiro (Jul 20 2023 at 06:34):
if precompileModules
is false, then you always have 0 precompile modules, so you have 0 modJobs, 0 externJobs and 0 externDynlibs
Mac Malone (Jul 20 2023 at 06:34):
Also, a near-term TODO is to add extraDepTargets
(or similar) to libraries and executables as well.
Mac Malone (Jul 20 2023 at 06:34):
Mario Carneiro said:
if
precompileModules
is false, then you always have 0 precompile modules, so you have 0 modJobs, 0 externJobs and 0 externDynlibs
Yes, but you can still set extraDepTargets
on a package
, which will feed into the library, executable, and module traces.
Mac Malone (Jul 20 2023 at 06:35):
(e.g., the extraDepJob
in Module.recBuildDeps
)
Mario Carneiro (Jul 20 2023 at 06:37):
(did you rebase origin/master
?)
Mac Malone (Jul 20 2023 at 06:37):
In fact, more customization of builds is something that, for instance, Wojciech could really use in ProofWidgets.
Mac Malone (Jul 20 2023 at 06:37):
Mario Carneiro said:
(did you rebase
origin/master
?)
huh?
Mac Malone (Jul 20 2023 at 06:38):
not in a good while?
Mario Carneiro (Jul 20 2023 at 06:38):
I can't ff merge master
, I'm on a commit from 3 weeks ago which looks like it got rewritten
Mac Malone (Jul 20 2023 at 06:39):
Mac said:
In fact, more customization of builds is something that, for instance, Wojciech could really use in ProofWidgets.
The reason I am mentioned this is because any such customization will further introduce opportunities to break static trace computation.
Mac Malone (Jul 20 2023 at 06:39):
Mario Carneiro said:
I can't ff merge
master
, I'm on a commit from 3 weeks ago which looks like it got rewritten
Yes, possibly, if you were unlucky and happen to pull right when I was fixing a test.
Mario Carneiro (Jul 20 2023 at 06:40):
okay I'm back on origin/master (I wasn't able to find recBuildDeps
, I guess it was recently renamed)
Mac Malone (Jul 20 2023 at 06:41):
However, extraDepJob
has been there for a long time.
Mario Carneiro (Jul 20 2023 at 06:41):
okay, I see it now... extraDepJob
is there but it doesn't seem to contribute to the trace
Mac Malone (Jul 20 2023 at 06:42):
Mario Carneiro said:
okay I'm back on origin/master (I wasn't able to find
recBuildDeps
, I guess it was recently renamed)
Ah, yeah, that was introduced in a refactor 2 weeks ago. Sorry, I was going to mention that, but your rebase concern confused me and led me to believe something else was the matter. :sweat:
Mac Malone (Jul 20 2023 at 06:42):
Mario Carneiro said:
okay, I see it now...
extraDepJob
is there but it doesn't seem to contribute to the trace
Oh, you are right! That is a bug.
Mac Malone (Jul 20 2023 at 06:43):
I am surprised I never noticed that.
Mario Carneiro (Jul 20 2023 at 06:45):
I'll wait until you fix this upstream before handling it in new-cache, but I'm guessing that it will involve using none
for the trace if anything we can't run shows up
Mario Carneiro (Jul 20 2023 at 06:48):
for the cache it's fine if the calculation is approximate, we just want to determine when lake
will think the file is up to date so we don't have to download/unpack it
Mac Malone (Jul 20 2023 at 06:52):
@Mario Carneiro How often are you expecting to be able to run new-cache
without a lake-ext
right before? I am having a hard time coming up with cases where this would occur outside of exceptional circumstances.
Mac Malone (Jul 20 2023 at 06:53):
The reason I ask, is it might just be more useful to have lake spit out the dirty files instead?
Mario Carneiro (Jul 20 2023 at 06:55):
The intention is to have a workspace-manifest.trace
file as well, so we know when to re-run lake-ext
. (Right now it just prompts the user to do so)
Mario Carneiro (Jul 20 2023 at 06:56):
but in many situations it doesn't get invalidated even if you change branches
Mac Malone (Jul 20 2023 at 06:56):
How would that work? There does not seem to be a way to generate that without running lake, and if you are running lake you might as well just regenerate the file.
Mario Carneiro (Jul 20 2023 at 06:57):
generate what
Mac Malone (Jul 20 2023 at 06:57):
workspace-manifest.trace
Mario Carneiro (Jul 20 2023 at 06:57):
it's just the hash of the lakefile right now
Mac Malone (Jul 20 2023 at 06:58):
But the lakefile can dynamically change based on configuration and just pure IO?
Mac Malone (Jul 20 2023 at 06:58):
I guess the goal here is just best effort?
Mario Carneiro (Jul 20 2023 at 06:59):
sure, you could put a random number generator in the lakefile but then you have bigger problems
Mario Carneiro (Jul 20 2023 at 06:59):
we are indeed making the assumption that the workspace manifest is a pure function of the lakefile contents and the OS configuration
Mac Malone (Jul 20 2023 at 07:00):
The lakefile can also use things like platform, architecture, package versions, etc. to change the configuration.
Mario Carneiro (Jul 20 2023 at 07:00):
I'm not assuming it is platform independent
Mac Malone (Jul 20 2023 at 07:00):
Mario Carneiro said:
we are indeed making the assumption that the workspace manifest is a pure function of the lakefile contents and the OS configuration
Which is fine as a simplifying assumption, but not as a general one.
Mario Carneiro (Jul 20 2023 at 07:01):
I mean, you are already in trouble just with file paths
Mario Carneiro (Jul 20 2023 at 07:01):
it's fine for something that does caching, because if you have a random number generator in the lakefile then caching is pointless
Mac Malone (Jul 20 2023 at 07:01):
Mario Carneiro said:
I'm not assuming it is platform independent
Fair, but I was also thinking of a lakefile that builds differently depending on which system libraries are available.
Mario Carneiro (Jul 20 2023 at 07:02):
yeah, -K
options are also problematic for the same reason
Mario Carneiro (Jul 20 2023 at 07:02):
there isn't much that can be done about it
Mario Carneiro (Jul 20 2023 at 07:03):
ultimately lakefile.lean
is a program that generates output, and workspace-manifest.json
is that output
Mario Carneiro (Jul 20 2023 at 07:04):
If there was a dont-cache-me-bro
flag that you could set in the lakefile then it could be used to signal shenanigans and avoid this issue
Mario Carneiro (Jul 20 2023 at 07:05):
for most system configuration, I think it is safe to assume that it doesn't change that often, at least not between branches of the same project (which is the primary use case of lake exe cache
)
Mario Carneiro (Jul 20 2023 at 07:14):
Mac said:
The reason I ask, is it might just be more useful to have lake spit out the dirty files instead?
This would be a useful feature regardless
Sebastian Ullrich (Jul 20 2023 at 09:09):
Is... there a tl;dr? Are we still talking about the original issue?
Sebastian Ullrich (Jul 20 2023 at 09:21):
Note that we have not yet implemented multi-root workspaces support in vscode-lean4, though I believe roots have to be added manually. In Emacs this all just works out of the box afaik...
Last updated: Dec 20 2023 at 11:08 UTC