Zulip Chat Archive
Stream: mathlib4
Topic: lake get-cache
Johan Commelin (Dec 20 2022 at 12:16):
How far are we from lake get-cache
? Switching branches and running lake build
is quite time consuming.
Arthur Paulino (Dec 20 2022 at 14:19):
Is it going to be a general lake
command or a Mathlib specific script (lake run get_cache
)?
My two cents:
- I don't think many (if any) other Lean 4 projects will implement caching as mathlib does, so implementing it as a
lake
command may be a bit wasteful - Mathlib maintainers will have much more freedom to adjust the script if it's a Mathlib self-contained thing
- If caching becomes a popular thing among Lean 4 projects someday, then upstreaming it to
lake
can become a more worthwhile effort
Implementing it as a lake
command has more complexities because it has to be more generic. For instance, the host might be a different one from Azure (a host config parameter enters the arena)
Shreyas Srinivas (Dec 20 2022 at 14:33):
Arthur Paulino said:
Is it going to be a general
lake
command or a Mathlib specific script (lake run get_cache
)?
My two cents:
- I don't think many (if any) other Lean 4 projects will implement caching as mathlib does, so implementing it as a
lake
command may be a bit wasteful- Mathlib maintainers will have much more freedom to adjust the script if it's a Mathlib self-contained thing
- If caching becomes a popular thing among Lean 4 projects someday, then upstreaming it to
lake
can become a more worthwhile effortImplementing it as a
lake
command has more complexities because it has to be more generic. For instance, the host might be a different one from Azure (a host config parameter enters the arena)
This might not be true. Aren't there projects like scilib
etc which might eventually get large enough? Other communities might benefit from lean
. I can imagine theoretical CS folks building their own tcslib
in the not too distant future. I am writing a tool that might get large for a formal verification project in lean and I can imagine this will be useful for me.
Kevin Buzzard (Dec 20 2022 at 14:36):
Right now the difference is that mathlib4 has many people working on it right now, and the other projects do not. I agree that in the future this might change in the sense that other projects might also get a lot of momentum, but mathlib4 is an extremely high priority project for lots of reasons.
Arthur Paulino (Dec 20 2022 at 14:38):
We agree that it can be useful, but I see it as an overengineering decision at this point in which mathlib folks can benefit from it today. There needs to be more discussions around caching as a lake
command. For instance, mathlib pushes content to Azure via a bash script that runs after merges. I doubt that would be the solution inherited by a generic lake
command. (a lake push-cache
command enters the arena)
Johan Commelin (Dec 20 2022 at 14:41):
I agree that it would be better to have a ml4-get-cache.sh
now rather then waiting 3 weeks for a lake get-cache
.
Mario Carneiro (Dec 20 2022 at 14:57):
it might still be lake run get_cache
though since you probably want that script to integrate with lake in some ways
Shreyas Srinivas (Dec 20 2022 at 17:25):
Makes sense to prioritise mathlib.
A question out of curiosity: Is there a solution to this issue along the lines of make?
Shreyas Srinivas (Dec 20 2022 at 17:32):
Something along the following lines of calling make from inside the lakefile
Shreyas Srinivas (Dec 20 2022 at 17:36):
then calling lean inside the make files with the target being the corresponding object file. I don't know enough about lake to make a meaningful conclusion about this possibility, hence the question.
Arthur Paulino (Dec 20 2022 at 17:40):
Lake scripts run in IO
already. It's simpler to write a single Lake script that does it all, even if it calls curl
indirectly
Shreyas Srinivas (Dec 20 2022 at 17:42):
Arthur Paulino said:
Lake scripts run in
IO
already. It's simpler to write a single Lake script that does it all, even if it callscurl
indirectly
I am framing the usage of makefiles as a quick and dirty fix. lake being a build tool is likely to have access to ways of calling external commands.
Shreyas Srinivas (Dec 20 2022 at 17:44):
But I see your point. Doing it inside lake scripts might be advantageous
Shreyas Srinivas (Dec 20 2022 at 17:46):
and also the remote cache solution will save the necessity of the first build as well. A few minutes ago I was watching the build process take > 18 minutes on the CI as it was apparently rebuilding every file (taking longer for some files). This seems wasteful.
Arthur Paulino (Dec 20 2022 at 17:49):
I find writing Lean code as easy (if not easier) than handling makefiles. It depends on the programmer, but as far as I can see, a Lake script is just as "quick and dirty" in this sense
Arthur Paulino (Dec 20 2022 at 19:07):
Since nobody has volunteered so far, I will do it as soon as I finish a task I'm currently working on
Scott Morrison (Dec 20 2022 at 22:53):
Please don't let perfect be the enemy of good (any caching solution is better than what we have now), but for discussion:
I'd love to see the following behaviour for a cache retrieving function:
- Hash the contents of
lakefile.lean
lake-manifest.json
andlean-toolchain
(and possibly any information about hardware architecture that makes oleans non-platform independent). Call that the root hash - For each lean file, combine the root hash, the hash of each imported file (in the same project), and the hash of the contents of the file, and call that the file hash.
- Send a request to a server with a list of all the hashes in current build job.
- Get back build artifacts for some subset (ideally everything, but take what you can get) of those.
- Build everything else, sending the resulting built artifacts back to the server.
In a zeroth approximation we would do no verification of the submitted build artifacts whatsoever. :-)
Scott Morrison (Dec 20 2022 at 22:54):
(i.e. roll our own minimal nix)
Arthur Paulino (Dec 20 2022 at 23:27):
I haven't started yet, but I am wondering if I should really use a Lake script since there isn't a self-contained solution for http requests. I will probably just do it in Python
Patrick Massot (Dec 20 2022 at 23:34):
Can't you just call curl
or something?
Arthur Paulino (Dec 20 2022 at 23:34):
I can but I don't know if it would work for Windows users
Arthur Paulino (Dec 20 2022 at 23:35):
(would it?)
Arthur Paulino (Dec 20 2022 at 23:45):
Okay, I'll call curl
then. If anything, it can be changed later
Sky Wilshaw (Dec 21 2022 at 00:03):
curl
does technically work on Windows, but depends on weird internet explorer stuff (which is susceptible to breakage) if I'm remembering correctly.
Sky Wilshaw (Dec 21 2022 at 00:03):
❯ curl https://www.google.com
curl : The response content cannot be parsed because the Internet Explorer engine is not available, or Internet Explorer's first-launch configuration is not complete. Specify the UseBasicParsing parameter and try again.
At line:1 char:1
+ curl https://www.google.com
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotImplemented: (:) [Invoke-WebRequest], NotSupportedException
+ FullyQualifiedErrorId : WebCmdletIEDomNotSupportedException,Microsoft.PowerShell.Commands.InvokeWebRequestCommand
Arthur Paulino (Dec 21 2022 at 00:18):
The lakefile will get pretty dirty. Should I make separate executables that can be run with lake exe get-cache
or something?
Arthur Paulino (Dec 21 2022 at 00:31):
(I will do it because some possibilities open up and dev time will be shorter)
Arthur Paulino (Dec 21 2022 at 01:30):
Alright, I was able to efficiently hash all relevant Lean files. Can someone point me to a place where I can check how to push and get files to/from the Azure server?
Gabriel Ebner (Dec 21 2022 at 01:45):
I can get you a blob storage on our Azure account. Do you like the name mathlib4cache
?
Gabriel Ebner (Dec 21 2022 at 01:46):
Or are you looking for API docs? https://learn.microsoft.com/en-us/rest/api/storageservices/put-blob
Arthur Paulino (Dec 21 2022 at 01:47):
Both actually
Arthur Paulino (Dec 21 2022 at 01:49):
This is the current status of the code btw: https://github.com/leanprover-community/mathlib4/blob/ap/caching/Caching/Utils.lean
Arthur Paulino (Dec 21 2022 at 01:50):
Gabriel Ebner said:
I can get you a blob storage on our Azure account. Do you like the name
mathlib4cache
?
Please let me know the URL once that's done :pray:
Scott Morrison (Dec 21 2022 at 02:14):
@Arthur Paulino , sorry I didn't suggest this earlier, but could you investigate whether you can use Lake.computeTrace
?
Scott Morrison (Dec 21 2022 at 02:14):
Reusing infrastructure from lake
, even if we're not planning on modifying lake itself in the short term, seems like a win.
Arthur Paulino (Dec 21 2022 at 02:31):
What does that function do?
Wojciech Nawrocki (Dec 21 2022 at 02:34):
Arthur Paulino said:
I haven't started yet, but I am wondering if I should really use a Lake script since there isn't a self-contained solution for http requests. I will probably just do it in Python
Fwiw Socket.lean
has a HTTP client example although the library seems pretty experimental.
Arthur Paulino (Dec 21 2022 at 02:37):
Yeah, curl
will make the code more stable and robust though
Gabriel Ebner (Dec 21 2022 at 02:45):
Fwiw Socket.lean has a HTTP client example although the library seems pretty experimental.
I just want to say out loud that this implementation is completely unsuitable for production. It doesn't do TLS, happy eyeballs or even just multiple A records, or really anything one might expect from an HTTP client.
Scott Morrison (Dec 21 2022 at 03:40):
Arthur Paulino said:
What does that function do?
I'm not a lake expert, and I'm struggling to find where all the bits and pieces are, but:
- at https://github.com/leanprover/lake/blob/master/Lake/Build/Common.lean#L35, you can see that
Lake.computeTrace file
wherefile : FilePath
computes something. - The lake README says: "A target's trace is derived from its various inputs (e.g., source file, Lean toolchain, imports, etc.)."
Scott Morrison (Dec 21 2022 at 03:40):
We could either experiment, or try summoning Mac. :-)
Johan Commelin (Dec 21 2022 at 05:16):
Scott Morrison said:
- Build everything else, sending the resulting built artifacts back to the server.
I do have a bit of an uneasy feeling about this final step. Can we at least require some security check before allowing random people to upload stuff? Because the build artifacts will effectively be executed on our personal computers, so they can do random crazy shit.
I guess it shouldn't be too hard to require the uploader to have write permissions to non-master branches of ml4?
Gabriel Ebner (Dec 21 2022 at 05:55):
I thought that was implicit, we're only going to give upload keys to trusted parties.
Gabriel Ebner (Dec 21 2022 at 05:56):
@Arthur Paulino I've tried to create a new storage account and container, but this keeps failing with "Server encountered an internal error. Please try again after some time." :sigh:
Scott Morrison (Dec 21 2022 at 06:00):
Gabriel Ebner said:
I thought that was implicit, we're only going to give upload keys to trusted parties.
Sounds reasonable. :-) As long as it is possible for us to have uploads from non-central servers, I think we get most of the benefit just by allowing those uploads from maintainers. (Or some similar set.)
Scott Morrison (Dec 21 2022 at 06:00):
I do an awful lot of building branches locally, then waiting for CI to do them again. :-)
Shreyas Srinivas (Dec 21 2022 at 07:32):
Does lean have a git library? AFAIK, Scott's comments are implementable in roughly this way:
- Query git for the current commit/staging area and the parent commit. (Git already has the hashes Scott mentions, so why reinvent the wheel)
- Query git for the list of files that have changed using the diff output
- Check if lakefile.lean has changed.
- Rebuild only the changed files. Pull either the local cache (existing build folder) or remote cache for the others
Sebastian Ullrich (Dec 21 2022 at 08:33):
Git does not help you at all with the imports-closed file hash Scott described. Which is the right way to do it. And which indeed should correspond to Lake's trace hash.
Sebastian Ullrich (Dec 21 2022 at 08:46):
Scott Morrison said:
- Send a request to a server with a list of all the hashes in current build job.
- Get back build artifacts for some subset (ideally everything, but take what you can get) of those.
So with a passive server these steps would likely be something like curl --parallel <server>/<trace1>.olean -o build/.../foo.olean <server>/<trace2>.olean -o build/.../bar.olean ...
. Which apparently does the right thing of continuing even after a missing file automatically.
Sebastian Ullrich (Dec 21 2022 at 08:48):
Note that you'll want to cache and download the .ilean
files as well, and have to create the corresponding .trace
files for Lake to accept the downloaded files.
Scott Morrison (Dec 21 2022 at 09:03):
Having to create the .trace
files pretty much decides that we will have to use Lake's hashing, I guess!
Arthur Paulino (Dec 21 2022 at 10:41):
I was enjoying the initial hashing idea because I could cache intermediate hashes in memory. I believe Lake's tracing relies on IO to see if a trace already exists?
Arthur Paulino (Dec 21 2022 at 10:46):
For example, if I want to hash file A that imports B and C but I've already hashed B and C, then I have to compute very little to hash A. Otherwise I compute what I need and cache it for future lookups.
How do I do that with Lake traces?
Arthur Paulino (Dec 21 2022 at 10:52):
I want to do this: https://github.com/leanprover-community/mathlib4/blob/0da94b48cea085f244b1b27f3d2aed81ff1cb21a/Caching/Utils.lean#L32
Arthur Paulino (Dec 21 2022 at 12:00):
Or maybe I can just put/get trace files as well
Henrik Böving (Dec 21 2022 at 12:28):
The only one that actually knows the ins and outs of lake is Mac, you can ping him but if he doesn't answer its pretty much on your own to figure out.
Arthur Paulino (Dec 21 2022 at 13:13):
It's okay. I'm going with the initial solution and it's computing all the hashes blazing fast. Now I just need to sync with Gabriel about URLs and authorizations. The rest of the infra already works well
Arthur Paulino (Dec 21 2022 at 13:14):
This is a small piece of the generated curl
command:
curl --parallel https://foo/2391993928140408502.olean -o build/lib/Mathlib/Init/ZeroOne.olean https://foo/2391993928140408502.ilean -o build/lib/Mathlib/Init/ZeroOne.ilean https://foo/2391993928140408502.trace -o build/lib/Mathlib/Init/ZeroOne.trace
Sebastian Ullrich (Dec 21 2022 at 14:11):
Arthur Paulino said:
It's okay. I'm going with the initial solution and it's computing all the hashes blazing fast. Now I just need to sync with Gabriel about URLs and authorizations. The rest of the infra already works well
What's your plan for making Lake accept these files then?
Sebastian Ullrich (Dec 21 2022 at 14:13):
I think the simplest solution would be:
- add an option to Lake that writes out trace files only, and also deletes old .oleans
- try and fetch any missing oleans by their trace files
- have Lake do a regular build to fill in any missing files
Arthur Paulino (Dec 21 2022 at 14:17):
Sebastian Ullrich said:
Arthur Paulino said:
It's okay. I'm going with the initial solution and it's computing all the hashes blazing fast. Now I just need to sync with Gabriel about URLs and authorizations. The rest of the infra already works well
What's your plan for making Lake accept these files then?
I'm pushing the trace files as well
Sebastian Ullrich (Dec 21 2022 at 14:19):
Hah, I guess that works!
Arthur Paulino (Dec 21 2022 at 14:26):
In a certain way, there are advantages to allowing mathlib maintainers to have a hashing algorithm that's agnostic to how Lake computes its hashes.
Also, I didn't look too closely but I got a little skeptical about Lake hashing files as Scott described (which I too think is the right way for our purposes). I couldn't find the piece of code that digs into the content of a source file and hashes that file according to the content of the imported files.
Sebastian Ullrich (Dec 21 2022 at 14:39):
That's Module.recBuildLean
Sebastian Ullrich (Dec 21 2022 at 14:40):
Arthur Paulino said:
In a certain way, there are advantages to allowing mathlib maintainers to have a hashing algorithm that's agnostic to how Lake computes its hashes.
I don't see how that can be helpful if Lake rebuilds anyway if the recomputed trace does not match the downloaded trace
Arthur Paulino (Dec 21 2022 at 14:46):
That would change the toolchain and thus change the root hash and all the hashes. But the advantage I'm talking about is in terms of freedom of maintenance for adaptation
Sebastian Ullrich (Dec 21 2022 at 14:51):
I don't know what toolchain you are talking about. But using a different hash for remote caching than for local recompilation checks surely is a quickfix at best and not a feature.
Sebastian Ullrich (Dec 21 2022 at 14:52):
Mathlib tooling should not fight against Lake
Arthur Paulino (Dec 21 2022 at 15:13):
Sebastian Ullrich said:
That's
Module.recBuildLean
Does it mean that it's tied to building?
Sebastian Ullrich (Dec 21 2022 at 15:23):
Yes, traces are currently generated during building afaik
Arthur Paulino (Dec 21 2022 at 15:34):
Alright, that's what I suspected because it makes so much sense to get the imports from a file by running Lean's frontend instead of fiddling with strings like I'm doing. I will go with the easier workaround for now so we don't have to build in order to know what we need to download
Arthur Paulino (Dec 21 2022 at 15:37):
In other words, I'm not dealing with stuff like
import
Lean.Data.HashMap
Which is accepted by Lean
Sebastian Ullrich (Dec 21 2022 at 16:04):
No, parsing imports is separate from building the actual file. How would it ever short-circuit builds otherwise? You're looking for docs4#Lean.parseImports'.
Arthur Paulino (Dec 21 2022 at 17:01):
Thanks!!
Arthur Paulino (Dec 23 2022 at 10:43):
Hi all :wave:
Hashing is working efficiently and (apparently) correctly as Scott firstly described. The last piece of the puzzle is performing the actual requests, for which I'm needing some help :pray:🏼
Sebastian Ullrich (Dec 23 2022 at 12:48):
Our curl templates above didn't look too bad to me, what's still missing?
Arthur Paulino (Dec 23 2022 at 13:35):
The curl for get
is formatted like that already, but it's missing the correct URL and possibly parameters to query from Azure (which I've never done before).
Now, for put
, I don't know how to format it. The only thing I know is that I need to get the auth key from a file like azure.key
or something
Sebastian Ullrich (Dec 23 2022 at 15:23):
I doubt you'll find many people on here with experience in Azure blob storage :) . But according to the docs Gabriel posted above, you mostly just need to set a few headers using -H
Arthur Paulino (Dec 23 2022 at 23:42):
Is there a dummy URL I can use for testing?
Arthur Paulino (Dec 27 2022 at 01:47):
Okay, I've finally found the time to finish it: mathlib4#1230
This PR implements caching for Mathlib build files. The basic API is:
lake exe cache get
to download missing build fileslake exe cache put
to upload files that are missing on the serverlake exe cache
to print the help menu
This caching method has a few enhancements on top of the solution we had for mathlib in Lean 3:
- Source files are content-addressed and build files are referenced by such hashes separately
- Minimized download traffic with queries that only pull files that are missing locally
- Minimized upload traffic with queries that only push files that are missing on the server
- A
zip
and aset
command that make branch switching easier, without required uploads/downloads
Mario Carneiro (Dec 27 2022 at 02:49):
nice christmas present :grinning_face_with_smiling_eyes:
Siddhartha Gadgil (Dec 27 2022 at 09:47):
Does this work for projects using mathlib?
Arthur Paulino (Dec 27 2022 at 10:17):
Not yet. I don't know if you can call lake exe cache
on this from a different project. But even if it's possible the code needs adjustments to know whether Mathlib
is a directory in the root folder or not
Arthur Paulino (Dec 27 2022 at 10:27):
But I don't want to worry about it for now. If anything, we can change it later. I want to avoid scope creep and get it ready to use for people doing the porting ASAP
Patrick Massot (Dec 27 2022 at 19:59):
Arthur Paulino said:
This caching method has a few enhancements on top of the solution we had for mathlib in Lean 3:
- A
zip
and aset
command that make branch switching easier, without required uploads/downloads
I don't understand that item. What is there that wasn't in leanproject?
Eric Wieser (Dec 27 2022 at 20:57):
Perhaps @Arthur Paulino is not aware of lean 3's mk-cache
(zip) and local behavior of get-cache
(set)?
Arthur Paulino (Dec 27 2022 at 20:58):
Maybe the difference is just that set
won't try downloading anything if a cache file isn't there
Eric Wieser (Dec 27 2022 at 21:01):
You can configure that with a flag for lean 3, the default is to try everything
Arthur Paulino (Dec 27 2022 at 21:02):
Then there is no difference on that feature. Let me remove it from the list
Arthur Paulino (Dec 27 2022 at 22:09):
Eric Wieser said:
You can configure that with a flag for lean 3, the default is to try everythingok not really, only with--from-url=example.com/has-no-caches
Oh then it's an upgrade indeed
Eric Wieser (Dec 27 2022 at 22:33):
Probably at best a new feature; I expect the number of people who want a cache but don't want to use ones from the server are counted on one hand
Eric Wieser (Dec 27 2022 at 22:34):
I've already learnt that the current --fallback=none
default in lean 3 is wanted by approximately no one (I chose it because it was backwards compatible)
Arthur Paulino (Dec 27 2022 at 23:14):
Yeah it has a very specific use. One example is working on multiple branches on a place without access to the internet
Eric Wieser (Dec 28 2022 at 09:46):
That works fine in lean3, because it still always tries local caches first
Eric Wieser (Dec 28 2022 at 09:47):
The only difference is that you get an error saying "no local cache and no internet connection" instead of "no local cache"
Arthur Paulino (Dec 28 2022 at 21:26):
mathlib4#1230 is ready for review again!
Now it also caches build files for Mathlib4 dependencies: Aesop
, Qq
and Std
. And in order to make that work, I had to implement a solution that made the implementation of caching for projects that import Mathlib a lot easier. So I went ahead and just did it (cc @Siddhartha Gadgil)
@Gabriel Ebner the infra for garbage collection is not ready yet, but I think people can already benefit from it.
Arthur Paulino (Dec 28 2022 at 21:32):
A lot changed so a new review round is super welcome. Tomorrow I will get the infra for garbage collection done (hopefully). It will just add more code, that's why a review is welcome at this point already
Arthur Paulino (Dec 29 2022 at 23:19):
Just finished the implementation of lake exe cache commit
, which stores a file that associates the current Git hash with a certain set of uploaded hashes. This will allow us to create garbage collection policies.
This is the output of lake exe cache
:
Mathlib4 caching CLI
Usage: cache [COMMAND]
Commands:
# No priviledge required
get Download and decompress linked files missing on the local cache
get! Download and decompress all linked files
mk Compress non-compressed build files into the local cache
mk! Compress build files into the local cache (no skipping)
set Decompress linked files
clear Delete non-linked files
clear! Delete everything on the local cache
# Priviledge required
put Run 'mk' then upload linked files missing on the server
put! Run 'mk' then upload all linked files
commit Run 'put' then writes a commit on the server
commit! Run 'put!' then (over)writes a commit on the server
collect TODO
* Linked files refer to local cache files with corresponding Lean sources
* Commands ending with '!' should be used manually, when hot-fixes are needed"
I added the collect
to the list but maybe it shouldn't be implemented in Lean
Henrik Böving (Dec 29 2022 at 23:37):
Just a little question, if i understand this correctly everyone can now upload cached .oleans right? Since .oleans can (I believe?) execute code via e.g. initialize
or overwriting syntax to some custom elaborator that escapes to IO
if I use this cache I am essentially extending attack surface of my machine to every person with push access to mathlib? Not that I want to claim people here have nefarious intentions, it's more of an attack surface thing. I now don't only have to trust myself to not mess stuff up but also dozens (in the future potentially more) people to not mess up themselves yes?
Jireh Loreaux (Dec 29 2022 at 23:38):
Yes, but I think this is only intended to be a temporary measure while we are porting, but maybe I'm mistaken.
Arthur Paulino (Dec 29 2022 at 23:39):
Not everyone can do that. Only people with the token for pushing stuff
Arthur Paulino (Dec 29 2022 at 23:40):
Once this is merged, I'm going to ask Gabriel to revoke the token he gave me for dev/testing purposes
Arthur Paulino (Dec 29 2022 at 23:42):
Ah wait, you mean forcing the mathlib4 bot to push it for me (if I were a malicious person). Yes. That's possible
Arthur Paulino (Dec 30 2022 at 01:28):
On a second thought, malicious code would have a different hash. The only way you can run bad code is if you, yourself, fetch a branch and run it on your PC. But the vulnerability isn't on the hashing scheme. It's a vulnerability that's already present by design
Arthur Paulino (Dec 30 2022 at 01:33):
The only way to make evil spread via this hashing scheme is by having a secret token that would allow you to overwrite olean files with malicious ones, really.
Arthur Paulino (Dec 30 2022 at 01:45):
everyone can now upload cached .oleans, right?
So, just to answer this clearly, no. Pushing data is restricted
Arthur Paulino (Dec 30 2022 at 02:04):
There is ONE genuine vulnerability: a hash collision with some malicious code whose olean was already sitting on the server!
Yaël Dillies (Dec 30 2022 at 06:38):
Nitpick: Isn't it "privilege"? or maybe my French is tricking me.
Arthur Paulino (Dec 30 2022 at 09:49):
No, you're right. I'll fix it, thanks!
Arthur Paulino (Jan 02 2023 at 10:03):
:ping_pong: mathlib#1230 is ready
Gabriel Ebner (Jan 04 2023 at 00:51):
FYI, this command is deployed now. On a mathlib4 checkout, you can run
lake exe cache get
And you should get fresh oleans!
Heather Macbeth (Jan 04 2023 at 01:16):
For those of us who are maintainers or "trusted users": are we supposed to be running some other command after local compiles, in order to upload the oleans we made? I think it was mentioned that that might be part of the workflow?
Heather Macbeth (Jan 04 2023 at 01:17):
Henrik Böving said:
Just a little question, if i understand this correctly everyone can now upload cached .oleans right?
Arthur Paulino said:
Not everyone can do that. Only people with the token for pushing stuff
Arthur Paulino (Jan 04 2023 at 01:26):
Gabriel already set it so that it will be done by a GitHub workflow routine, like it was for mathlib in Lean 3: https://github.com/leanprover-community/mathlib4/blob/35255de060eac3fa22895d074d6a713c42f2752a/.github/workflows/bors.yml#L102
Heather Macbeth (Jan 04 2023 at 01:26):
Even better!
Arthur Paulino (Jan 04 2023 at 01:29):
I haven't tested it on Windows. Please let me know if you find bugs
Mario Carneiro (Jan 04 2023 at 01:29):
I just tried it, it works great right out of the box. The only downside is that std is not cached, so when I pull master and then lake exe cache get
it still has to compile all the Std.*
files, but none of the Mathlib
files
Arthur Paulino (Jan 04 2023 at 01:31):
It caches Std, but for some reason it does that quick listing that only takes like 10 seconds
Arthur Paulino (Jan 04 2023 at 01:31):
I couldn't figure out why
Mario Carneiro (Jan 04 2023 at 01:33):
I ran lake build
after lake exe cache get
and it seemed to compile Std
, it took 15 s and did not list any Compiling Mathlib
lines
Arthur Paulino (Jan 04 2023 at 01:34):
Right, but how long would it take to go over the Std files without the cache? I think it would take a lot more
Arthur Paulino (Jan 04 2023 at 01:34):
I suspect Lake is recomputing the traces for some reason I couldn't figure out
Mario Carneiro (Jan 04 2023 at 01:35):
I see
Arthur Paulino (Jan 04 2023 at 01:39):
I really hope it works on Windows because that would be hard for a Linux user to debug :fear:
Gabriel Ebner (Jan 04 2023 at 01:53):
it still has to compile all the Std.* files
While it has to Compile
(cc) all files, it no longer has to Build
(lean) them. The binaries are platform-specific, so we didn't cache them.
Siddhartha Gadgil (Jan 04 2023 at 02:04):
It works beautifully for mathlib, but I haven't yet tried it for a project using mathlib as a dependency. For that will I have to add this to my lakefile?
lean_exe cache where
root := `Cache.Main
Arthur Paulino (Jan 04 2023 at 02:05):
I think you can just call lake exe cache get
. It works for LSpec, which allows us to call lake exe lspec
from any repo that uses it
Gabriel Ebner (Jan 04 2023 at 02:19):
You can call lake exe cache get
but it fails badly:
././lake-packages/mathlib/././Cache/Requests.lean:7:0: error: no such file or directory (error code: 2)
file: lake-packages/Mathlib/lakefile.lean
Arthur Paulino (Jan 04 2023 at 02:21):
I will take a look at it tomorrow, but this is not looking pretty. It seems to be failing to compile the cache
executable
Gabriel Ebner (Jan 04 2023 at 02:22):
I think this is just Mathlib
vs. mathlib
. I think it should be enough to lowercase the mathlibDepPath.
Arthur Paulino (Jan 04 2023 at 02:23):
Gabriel Ebner said:
it still has to compile all the Std.* files
While it has to
Compile
(cc) all files, it no longer has toBuild
(lean) them. The binaries are platform-specific, so we didn't cache them.
I see now. It's because of the runLinter
that's being compiled to a binary when we say lake build
right? It relies on some Std files
Mario Carneiro (Jan 04 2023 at 02:29):
I can confirm that the result of all those Compiling
lines was Linking runLinter
Mario Carneiro (Jan 04 2023 at 02:30):
I think runLinter
should probably be a bit more choosy about parts of Std it imports though
Mario Carneiro (Jan 04 2023 at 02:31):
oh I see, it imports Mathlib.Data.Array.Defs
and of course mathlib files love to import Std
Mario Carneiro (Jan 04 2023 at 02:31):
lol, Mathlib.Data.Array.Defs
is completely empty except for import Std
Mario Carneiro (Jan 04 2023 at 02:36):
Arthur Paulino (Jan 04 2023 at 12:12):
mathlib4#1326 should fix the issue related to cache get
when Mathlib4 is used as a dependency.
Btw, I find it confusing that there are Mathlib
(lib name), mathlib4
(repo name) and mathlib
(package name). What if they were all just Mathlib
?
Floris van Doorn (Jan 04 2023 at 12:15):
trying this on Windows gives me
Floris@MSI MINGW64 /d/projects/mathlib4 ((f46ac6ed...))
$ lake exe cache get
info: downloading component 'lean'
info: installing component 'lean'
warning: manifest out of date: package directory changed, use `lake update` to update
info: updating .\./lake-packages\std to revision 2919713bde15d55e3ea3625a03546531283bcb54
info: updating .\./lake-packages\aesop to revision c4477a2a7931e2490339d8087f599a45e89f25e7
info: Building Cache.IO
info: Building Cache.Hashing
info: Compiling Cache.IO
info: Compiling Cache.Hashing
info: Building Cache.Requests
info: Building Cache.Main
info: Compiling Cache.Requests
info: Compiling Cache.Main
info: Linking cache.exe
Attempting to download 787 file(s)
uncaught exception: unspecified system_category error (error code: 206)
Arthur Paulino (Jan 04 2023 at 12:22):
Hmm, seems like IO
errors aren't well defined for Windows. Do you have curl
and tar
on your PATH?
Floris van Doorn (Jan 04 2023 at 12:23):
yes
Floris@MSI MINGW64 /d/projects/mathlib (simps_aux_attr)
$ curl --version
curl 7.64.0 (x86_64-w64-mingw32) libcurl/7.64.0 OpenSSL/1.1.1a (Schannel) zlib/1.2.11 libidn2/2.1.1 nghttp2/1.36.0
Release-Date: 2019-02-06
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp smtp smtps telnet tftp
Features: AsynchDNS IDN IPv6 Largefile SSPI Kerberos SPNEGO NTLM SSL libz TLS-SRP HTTP2 HTTPS-proxy MultiSSL Metalink
Floris@MSI MINGW64 /d/projects/mathlib (simps_aux_attr)
$ tar --version
tar (GNU tar) 1.31
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by John Gilmore and Jay Fenlason.
Arthur Paulino (Jan 04 2023 at 12:25):
I think it's best to register that as an issue so maybe someone else can jump in and help us. I don't know what's happening with that error message :sad:
Mauricio Collares (Jan 04 2023 at 12:30):
"The filename or extension is too long". I guess there's a maximum number of chars that can be passed as arguments?
Reid Barton (Jan 04 2023 at 12:38):
Windows has a very low command line length limit--is that the issue? Is curl getting invoked with 787 arguments?
Reid Barton (Jan 04 2023 at 12:39):
(or it used to, at any rate--I don't really know anything about Windows)
Ruben Van de Velde (Jan 04 2023 at 12:43):
Also a low path length limit, depending on the API you use, as I recall. Also don't know much about windows :)
Reid Barton (Jan 04 2023 at 12:44):
https://learn.microsoft.com/en-us/troubleshoot/windows-client/shell-experience/command-line-string-limitation says 8KB. So that's definitely less than the length of 787 URLs (or even filenames).
Floris van Doorn (Jan 04 2023 at 12:45):
I think one issue is that the curl
I got with minGW
is very old, and doesn't support --parallel
yet.
Floris van Doorn (Jan 04 2023 at 12:45):
I'm now updating it
Reid Barton (Jan 04 2023 at 12:48):
Apparently there is curl -K
(or curl --config
) to specify the URLs to download with a file
Arthur Paulino (Jan 04 2023 at 12:52):
I'm going to use that for downloads and (hopefully) for uploads
Floris van Doorn (Jan 04 2023 at 12:53):
If I manually run the curl
command that Cache
uses, I indeed get the error
bash: /c/ProgramData/chocolatey/bin/curl: Argument list too long
Sebastian Ullrich (Jan 04 2023 at 13:32):
Ugh, it looks like libc++ doesn't correctly handle Windows error codes... high time to rewrite the Lean runtime in Lean
Raghuram (Jan 04 2023 at 14:13):
Arthur Paulino said:
mathlib4#1326 should fix the issue related to
cache get
when Mathlib4 is used as a dependency.
Why was this pull request closed?
Arthur Paulino (Jan 04 2023 at 14:13):
Because I'm including more fixes on a single PR
Arthur Paulino (Jan 04 2023 at 15:03):
Okay, mathlib #1333 should fix these issues
Yaël Dillies (Jan 04 2023 at 15:11):
Yeah, this is an optimisation I've been greatly looking forward to. But note that mathlib already reuses cache from previous pushes.
Yaël Dillies (Jan 04 2023 at 15:12):
The real difference is whether caches are still invalidated transitively or not.
Arthur Paulino (Jan 04 2023 at 15:13):
Yeah but they were addressed by commit hashes, so a new commit gets nothing from the cache and the building process will have to chew everything again
Arthur Paulino (Jan 04 2023 at 15:15):
Yaël Dillies said:
The real difference is whether caches are still invalidated transitively or not.
That still holds because the hash of a file involves the hash of all files imported by it
Yaël Dillies (Jan 04 2023 at 15:34):
So is there any optimisation compared to mathlib CI besides Lean 4s generally better performance?
Eric Wieser (Jan 04 2023 at 15:35):
So is there any optimisation compare to mathlib CI besides
Yes, mathlib CI does a worse job of merge commits
Yaël Dillies (Jan 04 2023 at 15:36):
Right. And anything more?
Eric Wieser (Jan 04 2023 at 15:37):
I would guess that downloading is faster, as if you modify data.set.basic
it sounds like the new system won't attempt to download any downstream oleans. Of course, that has to fight against the advantage of the tarball combining many downloads into one.
Yaël Dillies (Jan 04 2023 at 15:38):
Oh, and that also holds for the github CI?
Eric Wieser (Jan 04 2023 at 15:38):
By github CI I assume you mean mathlib3 CI?
Eric Wieser (Jan 04 2023 at 15:39):
Mathlib3 CI is basically a leanproject get-cache --fallback=download-first
but following a different parent commit on merges
Yaël Dillies (Jan 04 2023 at 15:39):
Yes, as opposed to leanproject get-c
run from a local terminal.
Eric Wieser (Jan 04 2023 at 15:41):
leanproject get-c
is stupid and only does an exact match (if you don't like that, pass --fallback
)
Eric Wieser (Jan 04 2023 at 15:43):
In terms of the resulting .olean
files and ignoring network traffic, --fallback=download-first
is essentially the same as what it sounds like @Arthur Paulino has implemented, but the new version does much better with merge commits
Eric Wieser (Jan 04 2023 at 15:44):
(the stupidness of get-c
is deliberate; it means that if you're checking out a random commit from mathlib, you get an error if you try to get the cache for a commit which was skipped over due to bors batching)
Eric Wieser (Jan 04 2023 at 15:46):
There is a poll to change the default
Arthur Paulino (Jan 04 2023 at 15:50):
Btw I don't take credit for the hashing scheme. Scott was the one who came up with it. I just implemented it
Eric Wieser (Jan 04 2023 at 15:54):
I would have implemented it in mathlib3, but the problem is that the hashing is all internal to Lean and difficult to expose to python
Arthur Paulino (Jan 04 2023 at 15:57):
Oh, I didn't use the hashing that Lake does btw. That would require some refactor in Lake because Lake only outputs traces during the building process and it wouldn't make sense to build in order to know the hashes. What I did use was Lean's capabilities to hash strings and to combine hashes into new ones
Arthur Paulino (Jan 04 2023 at 18:45):
@Floris van Doorn mathlib4#1333 has been merged. Can you try again please?
Gabriel Ebner (Jan 04 2023 at 18:52):
Eric Wieser said:
So is there any optimisation compare to mathlib CI besides
Yes, mathlib CI does a worse job of merge commits
Notably, the new cache also works with non-merge commits (i.e. squashes like bors does).
Kevin Buzzard (Jan 04 2023 at 19:04):
buzzard@buster:~/lean-projects/mathlib4$ lake exe cache get
Attempting to download 582 file(s)
Decompressing cache
uncaught exception:
gzip: stdin: unexpected end of file
tar: Child returned status 1
tar: Error is not recoverable: exiting now
buzzard@buster:~/lean-projects/mathlib4$ lake exe cache get
Attempting to download 432 file(s)
Decompressing cache
uncaught exception:
gzip: stdin: unexpected end of file
tar: Child returned status 1
tar: Error is not recoverable: exiting now
Ubuntu 22.04, lake Lake version 4.1.0-pre (Lean version 4.0.0-nightly-2022-12-23)
Gabriel Ebner (Jan 04 2023 at 19:05):
Does rm -r .cache
help?
Kevin Buzzard (Jan 04 2023 at 19:06):
buzzard@buster:~/lean-projects/mathlib4$ rm -r .cache
buzzard@buster:~/lean-projects/mathlib4$ lake exe cache get
Attempting to download 790 file(s)
Decompressing cache
uncaught exception:
gzip: stdin: unexpected end of file
tar: Child returned status 1
tar: Error is not recoverable: exiting now
Kevin Buzzard (Jan 04 2023 at 19:07):
Oh!
$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: lake-manifest.json
no changes added to commit (use "git add" and/or "git commit -a")
buzzard@buster:~/lean-projects/mathlib4$
No idea why that has happened.
Kevin Buzzard (Jan 04 2023 at 19:08):
I didn't modify any json file as far as I know.
Kevin Buzzard (Jan 04 2023 at 19:10):
I have stuff like
@@ -34,21 +10,9 @@
{"git":
{"url": "https://github.com/JLimperg/aesop",
"subDir?": null,
- "rev": "c4477a2a7931e2490339d8087f599a45e89f25e7",
+ "rev": "72610cec1cd686884a08704f5fa91b45262136f9",
"name": "aesop",
"inputRev?": "master"}},
- {"git":
- {"url": "https://github.com/hargonix/LeanInk",
- "subDir?": null,
- "rev": "2447df5cc6e48eb965c3c3fba87e46d353b5e9f1",
- "name": "leanInk",
- "inputRev?": "doc-gen"}},
- {"git":
- {"url": "https://github.com/xubaiw/Unicode.lean",
- "subDir?": null,
- "rev": "6dd6ae3a3839c8350a91876b090eda85cf538d1d",
- "name": "Unicode",
- "inputRev?": "main"}},
in the diffs.
Gabriel Ebner (Jan 04 2023 at 19:10):
That sounds like you did a lake update
(which you should only do if you're preparing a bump PR).
Gabriel Ebner (Jan 04 2023 at 19:10):
Just a hunch, does this print anything:
file .cache/* | grep -v gzip
Arthur Paulino (Jan 04 2023 at 19:12):
There is a slight possibility that I have pushed corrupted tar files during dev attempts :explosion:
Arthur Paulino (Jan 04 2023 at 19:12):
But it can be fixed with put!
Edit: no need... lake exe cache get
worked on master (on my machine) just now
Arthur Paulino (Jan 04 2023 at 19:24):
Another possibility: maybe this validation if flawed/flaky
Gabriel Ebner (Jan 04 2023 at 19:31):
This is a bit concerning.. I was just running curl --parallel
manually to download all of the blobs. Some were truncated, and curl got stuck in the middle. :fear:
Arthur Paulino (Jan 04 2023 at 19:40):
If it can get stuck on downloads, there's nothing stopping it from getting stuck during uploads
Gabriel Ebner (Jan 04 2023 at 19:42):
Uh-oh, when we cancel a build that probably kills curl. :scream:
Arthur Paulino (Jan 04 2023 at 19:43):
Can it be set on a separate workflow that never gets killed?
Gabriel Ebner (Jan 04 2023 at 19:47):
Apparently, if: always()
(which we set for the upload) prevents cancellation.
Arthur Paulino (Jan 04 2023 at 19:51):
Idea: doing a raw build with rm -rf .cache && lake clean && rm -rf lake-packages && lake exe cache get && lake build
right after uploading everything could spot flawed uploads
Arthur Paulino (Jan 04 2023 at 19:53):
If that fails, a lake exe cache put!
is needed. Then a new verification round is needed. Loop until it goes through
Gabriel Ebner (Jan 04 2023 at 19:53):
So far we haven't had any issues with uploads, and that check would not even detect missing uploads (it would just build a bit longer).
Gabriel Ebner (Jan 04 2023 at 19:54):
Silently doing a lake exe cache put!
seems like a bad idea, this would just hide hash collisions.
Arthur Paulino (Jan 04 2023 at 19:59):
Then maybe it's a manual get!
that may save the day for problematic downloads, since get
will skip (supposedly corrupted) files that are already on the FS. I mean, that's precisely the role of get!
Gabriel Ebner (Jan 04 2023 at 20:04):
This sounds good:
--remove-on-error
When curl returns an error when told to save output in a local
file, this option removes that saved file before exiting. This
prevents curl from leaving a partial file in the case of an er‐
ror during transfer.
Gabriel Ebner (Jan 04 2023 at 20:05):
I have no idea if this would help with the 404 pages:
-f, --fail
(HTTP) Fail fast with no output at all on server errors. This is
useful to enable scripts and users to better deal with failed
attempts. In normal cases when an HTTP server fails to deliver a
document, it returns an HTML document stating so (which often
also describes why and more). This flag will prevent curl from
outputting that and return error 22.
Arthur Paulino (Jan 04 2023 at 20:06):
Oh wow, so we may be able to get rid of that ugly ByteArray
check
Mauricio Collares (Jan 04 2023 at 20:07):
50 parallel transfers could be too much for some connections. Maybe pass --parallel-max 8
as well if it doesn't make things that much slower?
Mauricio Collares (Jan 04 2023 at 20:08):
There's also a --retry N
option
Arthur Paulino (Jan 04 2023 at 20:12):
Amazing, there are multiple possibilities for workarounds. @Gabriel Ebner do you want to go ahead and implement the solution you think is the best? Or do you want me to do so? In the later, I would rather being told the workaround you think will work the best
Gabriel Ebner (Jan 04 2023 at 20:19):
There's only issue here:
curl version 7.83.0 was released on April 27 2022
And ubuntu 22.04 still contains 7.81..
Arthur Paulino (Jan 04 2023 at 20:19):
There is a flexible solution that would allow extra args to be passed to curl
.
For example lake exe cache get --parallel-max 8 --retry 3
. But that's not very friendly
Arthur Paulino (Jan 04 2023 at 20:21):
Although the --remove-on-error
and the -f
flags look interesting regardless so maybe we can simply always use those
Gabriel Ebner (Jan 04 2023 at 20:21):
The problem is that --remove-on-error
was added in 7.83.
Arthur Paulino (Jan 04 2023 at 20:25):
Still, I think -f
with a check on exitCode == 0
should warn the user to try a get
again
Arthur Paulino (Jan 04 2023 at 20:26):
(I'm assuming -f
makes curl
return something !=0 in case of failure)
Gabriel Ebner (Jan 04 2023 at 20:29):
No --fail
does nothing of the sort, it will just ignore the errors. But apparently it does exactly what we want, it doesn't save 404 responses!
Arthur Paulino (Jan 04 2023 at 20:31):
I see. So we can get rid of that ugly ByteArray.startsWith
check. What about extra parameters like retry
and max-parallel
?
Gabriel Ebner (Jan 04 2023 at 20:36):
We have no indication that max-parallel
would fix any issues, so I'd leave it at the default value. Same with retry
.
Arthur Paulino (Jan 04 2023 at 20:37):
But I mean, allowing the user to send extra custom arguments to curl
Gabriel Ebner (Jan 04 2023 at 20:39):
Oh, that seems reasonable enough.
Sebastian Ullrich (Jan 04 2023 at 20:41):
You actually need a pretty high degree of parallelism to saturate HTTP 2.0 multiplexing when downloading many small files https://github.com/NixOS/nix/issues/5118
Reid Barton (Jan 04 2023 at 20:46):
curl --fail
does return a nonzero exit code on HTTP errors
Gabriel Ebner (Jan 04 2023 at 20:50):
Though we need to ignore that exit code since 404 is expected.
Floris van Doorn (Jan 04 2023 at 21:51):
Arthur Paulino said:
Floris van Doorn mathlib4#1333 has been merged. Can you try again please?
There is progress, I now get no error:
Floris@MSI MINGW64 /d/projects/Mathlib4 ((c0e8ec73...))
$ lake exe cache get
info: downloading component 'lean'
info: installing component 'lean'
warning: manifest out of date: package directory changed, use `lake update` to update
info: Building Cache.IO
info: Compiling Cache.IO
info: Building Cache.Hashing
info: Compiling Cache.Hashing
info: Building Cache.Requests
info: Compiling Cache.Requests
info: Building Cache.Main
info: Compiling Cache.Main
info: Linking cache.exe
Attempting to download 789 file(s)
Decompressing cache
However, doing lake build
immediately afterwards still recompiles everything. This takes about ~15 minutes for all of Std+Aesop+Mathlib, so I don't think it's using any caches.
Floris van Doorn (Jan 04 2023 at 21:52):
(git status
shows no changes)
Henrik Böving (Jan 04 2023 at 22:16):
Just a little remark, on my Linux machine everything is work correctly ootb :thumbs_up:
Would be a shame if lake was smart enough to notice the windows/linux difference here...
Arthur Paulino (Jan 04 2023 at 22:16):
@Floris van Doorn your manifest is out of date, meaning that its content may not be equivalent to the one used to make the hashing
Arthur Paulino (Jan 04 2023 at 22:17):
(the content of lake-manifest.json
is involved in the root hash, which, in turn, is in the mix of the hash of all files)
Henrik Böving (Jan 04 2023 at 22:18):
Should I add this to the doc-gen4 CI?
Arthur Paulino (Jan 04 2023 at 22:19):
What do you mean? (add what?)
Henrik Böving (Jan 04 2023 at 22:21):
Add the call to the cache to the doc-gen4 CI
Arthur Paulino (Jan 04 2023 at 22:21):
Ah, like a mini tutorial?
Gabriel Ebner (Jan 04 2023 at 22:22):
warning: manifest out of date: package directory changed, use
lake updateto update
@Floris van Doorn Do you get this every time on windows? I wonder if this is platform-independent?
"packagesDir": "./lake-packages",
Gabriel Ebner (Jan 04 2023 at 22:23):
@Floris van Doorn Can you do ls .cache
and check if any caches got downloaded at all? I wouldn't be surprised if hash filePath
gives a different result on windows than on linux/macos.
Henrik Böving (Jan 04 2023 at 22:24):
Arthur Paulino said:
Ah, like a mini tutorial?
no, doc-gen4's CI is responsible for building docs4#Nat so if I would use the cache I could basically shave 15 minutes off each CI run since I don't have to recompile
Arthur Paulino (Jan 04 2023 at 22:24):
Okay, that's the catch! It certainly will. Let me fix that ASAP
Arthur Paulino (Jan 04 2023 at 22:25):
Henrik Böving said:
Arthur Paulino said:
Ah, like a mini tutorial?
no, doc-gen4's CI is responsible for building docs4#Nat so if I would use the cache I could basically shave 15 minutes off each CI run since I don't have to recompile
Oh, definitely worth trying then
Henrik Böving (Jan 04 2023 at 22:26):
image.png
woop
Arthur Paulino (Jan 04 2023 at 22:29):
@Gabriel Ebner mathlib4#1339
Arthur Paulino (Jan 04 2023 at 22:33):
Oh wait, it's breaking
Arthur Paulino (Jan 04 2023 at 22:42):
Okay the PR is fixed but it will take some minutes to build because all the hashes changed
Arthur Paulino (Jan 04 2023 at 23:12):
Okay, it pushed everything and built correctly. Should be ready to merge :tada:
Arthur Paulino (Jan 04 2023 at 23:12):
@Floris van Doorn please try again when you have a chance :pray:
Gabriel Ebner (Jan 05 2023 at 00:29):
Another issue with packages depending on mathlib: the oleans get extracted to build/lib/Mathlib
and not lake-packages/mathlib/build/lib/Mathlib
.
Arthur Paulino (Jan 05 2023 at 00:36):
I see, -C
should do the trick here. Let me fix that
Arthur Paulino (Jan 05 2023 at 01:00):
Arthur Paulino (Jan 05 2023 at 01:05):
(sorry for so many PRs... it's hard to cover all use cases alone)
Siddhartha Gadgil (Jan 05 2023 at 01:52):
Things are still not working for me (on Ubuntu). After I did a lake update
, I deleted the .cache
folder and ran with this result:
$ lake exe cache get
Attempting to download 789 file(s)
Decompressing 501 file(s)
(base) gadgil@gadgil-XPS-15:~/code/LeanAide$ lake build
Building Mathlib.Init.Data.Nat.Notation
Building Mathlib.Data.KVMap
Building Mathlib.Util.MemoFix
Building Mathlib.Tactic.Relation.Rfl
Building Mathlib.Lean.Meta
Building Mathlib.Mathport.Rename
Building Mathlib.Tactic.Alias
Building Mathlib.Tactic.RunCmd
Building Mathlib.Mathport.Attributes
Building Mathlib.Tactic.LeftRight
Building Mathlib.Lean.LocalContext
Building Mathlib.Util.WhatsNew
error: > LEAN_PATH=././lake-packages/mathlib/build/lib:././lake-packages/Qq/build/lib:././lake-packages/aesop/build/lib:././lake-packages/std/build/lib:./build/lib LD_LIBRARY_PATH=/home/gadgil/.elan/toolchains/leanprover--lean4---nightly-2022-12-23/lib:/home/gadgil/software/z3/build::././lake-packages/mathlib/build/lib /home/gadgil/.elan/toolchains/leanprover--lean4---nightly-2022-12-23/bin/lean -DwarningAsError=true ././lake-packages/mathlib/././Mathlib/Data/KVMap.lean -R ././lake-packages/mathlib/./. -o ././lake-packages/mathlib/build/lib/Mathlib/Data/KVMap.olean -i ././lake-packages/mathlib/build/lib/Mathlib/Data/KVMap.ilean -c ././lake-packages/mathlib/build/ir/Mathlib/Data/KVMap.c
error: stdout:
././lake-packages/mathlib/././Mathlib/Data/KVMap.lean:17:31: error: expected token
error: external command `/home/gadgil/.elan/toolchains/leanprover--lean4---nightly-2022-12-23/bin/lean` exited with code 1
...
Arthur Paulino (Jan 05 2023 at 02:01):
Can you send the link to that repo please?
Gabriel Ebner (Jan 05 2023 at 02:05):
Oh, non-mathlib oleans now get extracted to interesting locations: :smile:
./lake-packages/mathlib/lake-packages/std/build/lib/Std/Tactic/NormCast/Lemmas.olean
Siddhartha Gadgil (Jan 05 2023 at 02:06):
Arthur Paulino said:
Can you send the link to that repo please?
https://github.com/siddhartha-gadgil/LeanAide.git
Siddhartha Gadgil (Jan 05 2023 at 02:07):
Sorry, I should clarify this is on the mathlib4
branch
Siddhartha Gadgil (Jan 05 2023 at 02:07):
And I have not pushed the lake update
that breaks things.
Arthur Paulino (Jan 05 2023 at 02:10):
Alright, it will be a lot easier to debug with an actual project at hand
Siddhartha Gadgil (Jan 05 2023 at 02:13):
Checking out the mathlib4
branch in the above and running "lake update" should get to the state I am working with.
Siddhartha Gadgil (Jan 05 2023 at 02:26):
Also, thanks a lot for working on this. Soon over 20 students will be checking out my course repository which depends on mathlib4
and building on various laptops (many fairly basic). And the mathlib4 will be updated periodically. Will help them hugely to have this.
Also I am hoping gitpod/github codespaces will be efficient with this.
Arthur Paulino (Jan 05 2023 at 03:43):
@Siddhartha Gadgil there's been a fix, but something's still weird. I'm taking a look
Arthur Paulino (Jan 05 2023 at 05:03):
The weird behavior was that some cache files weren't being found.
But it's not an issue with caching. It's just that your version of Aesop is not the same one listed on mathlib's manifest. And the hash divergences cascate downstream
Siddhartha Gadgil (Jan 05 2023 at 05:13):
Arthur Paulino said:
The weird behavior was that some cache files weren't being found on the server.
But it's not an issue with caching. It's just that your version of Aesop is not the same one listed on mathlib's manifest. And the hash divergences cascate downstream.
Mathlib's manifest needs to be updated
Thanks. I can test by changing my version of Aesop to match with mathlib (or just use Aesop as a transitive dependency)
Siddhartha Gadgil (Jan 05 2023 at 05:21):
My naive test failed. I will wait for the mathlib manifest update.
Floris van Doorn (Jan 05 2023 at 09:44):
Arthur Paulino said:
Floris van Doorn please try again when you have a chance :pray:
It works now! This is great! I get the files, lake build
is very quick, and opening a random mathlib4 file uses the cached files.
Floris van Doorn (Jan 05 2023 at 09:45):
Gabriel Ebner said:
warning: manifest out of date: package directory changed, use
lake updateto update
Floris van Doorn Do you get this every time on windows? I wonder if this is platform-independent?
"packagesDir": "./lake-packages",
Yes, I think I'm getting it on every commit. I was just ignoring it now. I can confirm that I'm getting it on the current commit (cfd16dd8
).
Floris van Doorn (Jan 05 2023 at 09:45):
Gabriel Ebner said:
Floris van Doorn Can you do
ls .cache
and check if any caches got downloaded at all? I wouldn't be surprised ifhash filePath
gives a different result on windows than on linux/macos.
You were correct that on yesterday's attempt no caches got downloaded.
Kevin Buzzard (Jan 05 2023 at 09:49):
I've still never managed to get this working :-( On the machine I'm currently using:
buzzard@brutus:~/lean-projects/mathlib4$ lake exe cache get
info: updating ././lake-packages/aesop to revision 72610cec1cd686884a08704f5fa91b45262136f9
error: unknown executable `cache`
buzzard@brutus:~/lean-projects/mathlib4$ lake --version
Lake version 4.1.0-pre (Lean version 4.0.0-nightly-2022-12-23)
buzzard@brutus:~/lean-projects/mathlib4$ git status
On branch j-loreaux/data.fin.basic
Your branch is up-to-date with 'origin/j-loreaux/data.fin.basic'.
nothing to commit, working tree clean
buzzard@brutus:~/lean-projects/mathlib4$
Mario Carneiro (Jan 05 2023 at 09:50):
You need to merge master into your branch
Floris van Doorn (Jan 05 2023 at 09:50):
@Kevin Buzzard does it work on current origin/master
?
Kevin Buzzard (Jan 05 2023 at 09:51):
Aah, this branch is several weeks old so that might well be the issue. Yes it works on master on this machine!
Floris van Doorn (Jan 05 2023 at 09:52):
I think it's good to mention somewhere (mathlib4 README, lake exe cache
output) that you need curl >= 7.66.0
to use the --parallel
flag. I'm probably not the only one with a 3+ year outdated curl.
Kevin Buzzard (Jan 05 2023 at 09:53):
I don't like that it says "Attempting to download 789 file(s)" and then just sits there, with me not knowing if it's broken or not.
Floris van Doorn (Jan 05 2023 at 09:53):
One further remark: if I (re)run lake exe cache get
when VSCode is open, I get an error on Windows.
Floris@MSI MINGW64 /d/projects/Mathlib4 ((cfd16dd8...))
$ lake exe cache get
warning: manifest out of date: package directory changed, use `lake update` to update
No files to download
Decompressing 789 file(s)
uncaught exception: lake-packages/std/build/lib/Std/Data/Array/Init/Lemmas.olean: Can't unlink already-existing object
tar: Error exit delayed from previous errors.
Kevin Buzzard (Jan 05 2023 at 09:55):
I still have the same problem on the other machine though (Ubuntu 20.04, curl 7.68.0)
buzzard@buster:~/lean-projects/mathlib4$ rm -rf .cache/
buzzard@buster:~/lean-projects/mathlib4$ lake exe cache get
Attempting to download 789 file(s)
Decompressing 789 file(s)
uncaught exception:
gzip: stdin: unexpected end of file
tar: Child returned status 1
tar: Error is not recoverable: exiting now
buzzard@buster:~/lean-projects/mathlib4$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
nothing to commit, working tree clean
buzzard@buster:~/lean-projects/mathlib4$ lake --version
Lake version 4.1.0-pre (Lean version 4.0.0-nightly-2023-01-04)
buzzard@buster:~/lean-projects/mathlib4$
Floris van Doorn (Jan 05 2023 at 10:04):
Just checking: git rev-parse HEAD
gives cfd16dd8874d0989ff38ccd8b4727b59f17683e7
? (i.e. you did pull on the other machine?)
Arthur Paulino (Jan 05 2023 at 10:29):
@Kevin Buzzard try lake exe cache get!
if you get an error like that. It looks like you have corrupted tar.gz files
Siddhartha Gadgil (Jan 05 2023 at 13:18):
While lake cache is really useful, I should report some weirdness in a project that has both a direct and transitive dependence on mathlib
. I can successfully build https://github.com/siddhartha-gadgil/proofs-and-programs-2023.git
git clone https://github.com/siddhartha-gadgil/proofs-and-programs-2023.git
cd proofs-and-programs-2023/
lake exe cache get
lake build
rm -rf build/
lake build
The weird thing is that the first time I try to build it complains about a missing file while building a dependency. If I delete the file and continue the build it resumes and builds fine.
My guess is that this is a hash collision.
Arthur Paulino (Jan 05 2023 at 13:23):
Does LeanAide build after cache get
?
Siddhartha Gadgil (Jan 05 2023 at 13:24):
Arthur Paulino said:
Does LeanAide build after
cache get
?
Yes, for LeanAide
cache get
is working perfectly
Arthur Paulino (Jan 05 2023 at 13:31):
Does it decompress all files it tries to download? (~790 files)
Kevin Buzzard (Jan 05 2023 at 13:41):
Arthur Paulino said:
Kevin Buzzard try
lake exe cache get!
if you get an error like that. It looks like you have corrupted tar.gz files
buzzard@buster:~/lean-projects/mathlib4$ git pull
Already up-to-date.
buzzard@buster:~/lean-projects/mathlib4$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
nothing to commit, working tree clean
buzzard@buster:~/lean-projects/mathlib4$ lake exe cache get!
Attempting to download 790 file(s)
Decompressing 790 file(s)
uncaught exception:
gzip: stdin: unexpected end of file
tar: Child returned status 1
tar: Error is not recoverable: exiting now
Siddhartha Gadgil (Jan 05 2023 at 13:49):
Arthur Paulino said:
Does it decompress all files it tries to download? (~790 files)
Yes, and I can even find in lake-packages
an .olean
file it claims is missing.
Siddhartha Gadgil (Jan 05 2023 at 13:50):
This could be some error in my intermediate dependency (there were some).
Arthur Paulino (Jan 05 2023 at 14:08):
@Kevin Buzzard I have a hunch that curl
isn't working as intended on your older machine. It could be tar
as well :thinking:
Kevin Buzzard (Jan 05 2023 at 14:58):
buzzard@buster:~/lean-projects$ git clone git@github.com:leanprover-community/mathlib4.git
Cloning into 'mathlib4'...
Warning: Permanently added the RSA host key for IP address '2a0c:5bc0:40:2fff::8c52:7904' to the list of known hosts.
remote: Enumerating objects: 21109, done.
remote: Counting objects: 100% (716/716), done.
remote: Compressing objects: 100% (327/327), done.
remote: Total 21109 (delta 448), reused 598 (delta 380), pack-reused 20393
Receiving objects: 100% (21109/21109), 6.75 MiB | 10.13 MiB/s, done.
Resolving deltas: 100% (14652/14652), done.
buzzard@buster:~/lean-projects$ cd mathlib4
buzzard@buster:~/lean-projects/mathlib4$ lake exe cache get!
info: cloning https://github.com/leanprover/std4 to ././lake-packages/std
info: cloning https://github.com/gebner/quote4 to ././lake-packages/Qq
info: cloning https://github.com/JLimperg/aesop to ././lake-packages/aesop
info: Building Cache.IO
info: Compiling Cache.IO
info: Building Cache.Hashing
info: Compiling Cache.Hashing
info: Building Cache.Requests
info: Compiling Cache.Requests
info: Building Cache.Main
info: Compiling Cache.Main
info: Linking cache
Attempting to download 790 file(s)
Decompressing 790 file(s)
uncaught exception:
gzip: stdin: unexpected end of file
tar: Child returned status 1
tar: Error is not recoverable: exiting now
buzzard@buster:~/lean-projects/mathlib4$ curl --version
curl 7.68.0 (x86_64-pc-linux-gnu) libcurl/7.68.0 OpenSSL/1.1.1f zlib/1.2.11 brotli/1.0.7 libidn2/2.2.0 libpsl/0.21.0 (+libidn2/2.2.0) libssh/0.9.3/openssl/zlib nghttp2/1.40.0 librtmp/2.3
Release-Date: 2020-01-08
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtmp rtsp scp sftp smb smbs smtp smtps telnet tftp
Features: AsynchDNS brotli GSS-API HTTP2 HTTPS-proxy IDN IPv6 Kerberos Largefile libz NTLM NTLM_WB PSL SPNEGO SSL TLS-SRP UnixSockets
buzzard@buster:~/lean-projects/mathlib4$ tar --version
tar (GNU tar) 1.30
Copyright © 2017 Free Software Foundation, Inc.
Licence GPLv3+: GNU GPL version 3 or later <https://gnu.org/licences/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by John Gilmore and Jay Fenlason.
buzzard@buster:~/lean-projects/mathlib4$
This is an Ubuntu 20.04 laptop. Has anyone else got it working on Ubuntu 20.04? The Ubuntu 22 machine I was using this morning was working fine.
Floris van Doorn (Jan 05 2023 at 15:27):
I also have Ubuntu 20.04, and I'm getting the same error indeed.
Floris van Doorn (Jan 05 2023 at 15:28):
I think curl
is indeed misbehaving, my .cache
folder seems to have a lot of empty files.
vandoorn@pc93-178:~/projects/mathlib4((HEAD detached at origin/master))$ ls .cache/ -al
total 73200
drwxrwxr-x 2 vandoorn vandoorn 36864 Jan 5 16:23 .
drwxrwxr-x 13 vandoorn vandoorn 4096 Jan 5 16:23 ..
-rw-rw-r-- 1 vandoorn vandoorn 502564 Jan 5 16:23 10004447746176569842.tar.gz
-rw-rw-r-- 1 vandoorn vandoorn 0 Jan 5 16:23 10015491352002644549.tar.gz
-rw-rw-r-- 1 vandoorn vandoorn 68157 Jan 5 16:23 10018821831230171317.tar.gz
-rw-rw-r-- 1 vandoorn vandoorn 0 Jan 5 16:23 1001945985573292759.tar.gz
-rw-rw-r-- 1 vandoorn vandoorn 0 Jan 5 16:23 10110306565386213522.tar.gz
-rw-rw-r-- 1 vandoorn vandoorn 174259 Jan 5 16:23 10127860650102912405.tar.gz
-rw-rw-r-- 1 vandoorn vandoorn 0 Jan 5 16:23 10128865545074468906.tar.gz
-rw-rw-r-- 1 vandoorn vandoorn 0 Jan 5 16:23 10130780366951701088.tar.gz
-rw-rw-r-- 1 vandoorn vandoorn 0 Jan 5 16:23 10152004466792651860.tar.gz
-rw-rw-r-- 1 vandoorn vandoorn 416066 Jan 5 16:23 10152297513229284087.tar.gz
-rw-rw-r-- 1 vandoorn vandoorn 25050 Jan 5 16:23 10181612834428050335.tar.gz
-rw-rw-r-- 1 vandoorn vandoorn 22944 Jan 5 16:23 10190268219082491724.tar.gz
-rw-rw-r-- 1 vandoorn vandoorn 85138 Jan 5 16:23 10213030320477203917.tar.gz
-rw-rw-r-- 1 vandoorn vandoorn 0 Jan 5 16:23 10214528561255169977.tar.gz
-rw-rw-r-- 1 vandoorn vandoorn 0 Jan 5 16:23 10221609186266928200.tar.gz
-rw-rw-r-- 1 vandoorn vandoorn 49324 Jan 5 16:23 10227971936698935467.tar.gz
-rw-rw-r-- 1 vandoorn vandoorn 169815 Jan 5 16:23 10237809332798837709.tar.gz
-rw-rw-r-- 1 vandoorn vandoorn 0 Jan 5 16:23 10329715650399075732.tar.gz
-rw-rw-r-- 1 vandoorn vandoorn 377804 Jan 5 16:23 10342692050553037471.tar.gz
-rw-rw-r-- 1 vandoorn vandoorn 0 Jan 5 16:23 10345492726296217978.tar.gz
-rw-rw-r-- 1 vandoorn vandoorn 329192 Jan 5 16:23 10358353886519116077.tar.gz
-rw-rw-r-- 1 vandoorn vandoorn 40472 Jan 5 16:23 10381171239261229408.tar.gz
-rw-rw-r-- 1 vandoorn vandoorn 200427 Jan 5 16:23 10388488594487384490.tar.gz
-rw-rw-r-- 1 vandoorn vandoorn 0 Jan 5 16:23 10394739949212461920.tar.gz
-rw-rw-r-- 1 vandoorn vandoorn 24468 Jan 5 16:23 10399526667226558710.tar.gz
-rw-rw-r-- 1 vandoorn vandoorn 0 Jan 5 16:23 10423936910949293364.tar.gz
[...]
Arthur Paulino (Jan 05 2023 at 15:31):
@Kevin Buzzard I just had that problem and then a get!
overwrote a file that wasn't downloaded properly.
What happens if you do this?
curl -X GET -f "https://lakecache.blob.core.windows.net/mathlib4/foo.bar" -o please.no
We expect curl
not to create that please.no
file on your file system
Kevin Buzzard (Jan 05 2023 at 15:33):
buzzard@buster:~/lean-projects/mathlib4/crap$ curl -X GET -f "https://lakecache.blob.core.windows.net/mathlib4/foo.bar" -o please.no
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
curl: (22) The requested URL returned error: 404 The specified blob does not exist.
buzzard@buster:~/lean-projects/mathlib4/crap$ ls
buzzard@buster:~/lean-projects/mathlib4/crap$
Kevin Buzzard (Jan 05 2023 at 18:04):
https://www.linuxcapable.com/how-to-install-upgrade-curl-on-ubuntu-20-04-lts/ fixed the problem for me, although I'm now running curl as modified by some random person on the internet
Eric Wieser (Jan 05 2023 at 21:42):
I get gzip: stdin: unexpected end of file
on gitpod (Ubuntu 20.04.5 LTS) too, with curl version
curl 7.68.0 (x86_64-pc-linux-gnu) libcurl/7.68.0 OpenSSL/1.1.1f zlib/1.2.11 brotli/1.0.7 libidn2/2.2.0 libpsl/0.21.0 (+libidn2/2.2.0) libssh/0.9.3/openssl/zlib nghttp2/1.40.0 librtmp/2.3
Release-Date: 2020-01-08
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtmp rtsp scp sftp smb smbs smtp smtps telnet tftp
Features: AsynchDNS brotli GSS-API HTTP2 HTTPS-proxy IDN IPv6 Kerberos Largefile libz NTLM NTLM_WB PSL SPNEGO SSL TLS-SRP UnixSockets
Eric Wieser (Jan 05 2023 at 21:45):
Following the instructions from the random person on the internet, I get the same error with the new version,
$ curl --version
curl 7.87.0 (x86_64-pc-linux-gnu) libcurl/7.87.0 OpenSSL/1.1.1f zlib/1.2.11 brotli/1.0.7 zstd/1.4.4 libidn2/2.2.0 libpsl/0.21.0 (+libidn2/2.2.0) libssh/0.9.3/openssl/zlib nghttp2/1.40.0 librtmp/2.3
Release-Date: 2022-12-21
Protocols: dict file ftp ftps gopher gophers http https imap imaps ldap ldaps mqtt pop3 pop3s rtmp rtsp scp sftp smb smbs smtp smtps telnet tftp
Features: alt-svc AsynchDNS brotli GSS-API HSTS HTTP2 HTTPS-proxy IPv6 Kerberos Largefile libz NTLM NTLM_WB PSL SPNEGO SSL threadsafe TLS-SRP UnixSockets zstd
Arthur Paulino (Jan 05 2023 at 21:48):
Have you tried get!
to overwrite corrupted downloads?
Eric Wieser (Jan 05 2023 at 21:49):
Yes, it was get!
that was failing
Eric Wieser (Jan 05 2023 at 21:49):
You should be able to debug this yourself if you want by opening gitpod with f1accc07b7ea20968ebf0d0d718e40d2a4528d7e checked out
Arthur Paulino (Jan 05 2023 at 21:59):
How do I access that?
Eric Wieser (Jan 05 2023 at 22:06):
gitpod.io/#https://github.com/leanprover-community/mathlib4, then git fetch origin $THAT_SHA && git checkout $THAT_SHA
Eric Wieser (Jan 05 2023 at 22:07):
You'll need to install the lean4 extension, open a lean file, then accept the popup to install elan
Arthur Paulino (Jan 05 2023 at 22:18):
Alright, I'll try to debug it later today. In the meantime you might want to try get!
again
Scott Morrison (Jan 05 2023 at 22:56):
Is this working for others on a mac? I'm just seeing:
% lake exe cache get
Attempting to download 797 file(s)
No cache files to decompress
returning instantly, and then lake build
compiles everything as normal.
Arthur Paulino (Jan 05 2023 at 23:03):
Hmm, more hash mismatches
Scott Morrison (Jan 05 2023 at 23:18):
Separately, could we make the cache commit
command more robust to build failures? E.g. see https://github.com/leanprover-community/mathlib4/actions/runs/3851007808/jobs/6561764041#step:8:9
Here CI built lots of files, but one failed, but then cache commit didn't upload anything at all.
Arien Malec (Jan 05 2023 at 23:23):
Scott Morrison said:
Is this working for others on a mac? I'm just seeing:
% lake exe cache get Attempting to download 797 file(s) No cache files to decompress
returning instantly, and then
lake build
compiles everything as normal.
I get cache downloads but am seeing separate issues
Gabriel Ebner (Jan 05 2023 at 23:26):
Separately, could we make the
cache commit
command more robust to build failures?
Arthur Paulino (Jan 05 2023 at 23:35):
But this has nothing to do with commit
right? We don't want to persist a commit with a file that didn't compile
Gabriel Ebner (Jan 05 2023 at 23:36):
Yeah, I think Scott just said commit
because it's the first command shown by github actions. While it's the second put
one that fails.
Arthur Paulino (Jan 05 2023 at 23:38):
Alright, this one should be a simple tweak
Arthur Paulino (Jan 05 2023 at 23:38):
@Eric Wieser I'm afraid I can't do much on that gitpod that doesn't decompress :sad:
Eric Wieser (Jan 05 2023 at 23:59):
What do you mean "that doesn't decompress"?
Eric Wieser (Jan 06 2023 at 00:00):
Do you mean the cache is just bad on that commit?
Arthur Paulino (Jan 06 2023 at 00:00):
No, I just mean that it's failing to decompress
Eric Wieser (Jan 06 2023 at 00:01):
I don't understand what "it" refers to
Arthur Paulino (Jan 06 2023 at 00:01):
The execution of the program on your gitpod
Eric Wieser (Jan 06 2023 at 00:02):
That's not "my" gitpod, that's a fresh gitpod instance spun up each time anyone clicks that link
Eric Wieser (Jan 06 2023 at 00:02):
Do you mean the docker image never started up and you never got to vscode?
Arthur Paulino (Jan 06 2023 at 00:03):
No, I just mean that I don't know what to do with files that may be arriving corrupted on that machine
Gabriel Ebner (Jan 06 2023 at 00:07):
FWIW, I just tried it out on gitpod and all files in .cache
were empty.
Eric Wieser (Jan 06 2023 at 00:11):
Arthur Paulino said:
No, I just mean that I don't know what to do with files that may be arriving corrupted on that machine
Oh, 'the program' refers to lake exe cache
; I thought you were referring to gitpod itself as "the program"!
Arthur Paulino (Jan 06 2023 at 00:57):
Arthur Paulino (Jan 06 2023 at 01:19):
About Mac issues... it's really hard for me to debug :dizzy:
Arthur Paulino (Jan 06 2023 at 01:20):
I don't have a computer with MacOS installed
Scott Morrison (Jan 06 2023 at 01:41):
@Arthur Paulino, I can give you shell access here if you promise to be nice. :-)
Arthur Paulino (Jan 06 2023 at 01:52):
I would need an instance of VS Code running. I'm not hardcore enough for emacs/neovim
Arthur Paulino (Jan 06 2023 at 01:55):
I'm up for a pair programming session though, so we can try to figure out why the hashes aren't matching on your machine
Arthur Paulino (Jan 06 2023 at 01:55):
Let me know if you have some free time tomorrow
Scott Morrison (Jan 06 2023 at 02:09):
I'm on Canberra time, which can be hard to coordinate with. I'm relatively flexible for the next 10 hours.
Scott Morrison (Jan 06 2023 at 02:10):
Separately, what do we think about giving out cache put
privileges to a few people. (In particular, me. :-) I have a big machine, and when I'm in reviewing mode, I do a lot of local compilation. Seems a shame to waste it.)
Scott Morrison (Jan 06 2023 at 02:12):
Very happy to try to diagnose the mac issues via pair programming, but you can in fact run a remote session in VS Code with nothing more than ssh access to the host. I do this all the time when I'm away from my desktop machine.
Scott Morrison (Jan 06 2023 at 02:12):
In particular, the terminal window in VSCode is just a shell on the remote machine.
Gabriel Ebner (Jan 06 2023 at 02:13):
I think lake exe cache
is way too flaky for that at the moment. For example it doesn't know if oleans are up to date. If you do echo this wont compile >> Mathlib/Algebra/Abs.lean
, then lake exe cache put
will upload the old oleans for the modified file.
Arthur Paulino (Jan 06 2023 at 02:33):
@Scott Morrison the advantage of pair programming is that I can run the hashing algorithm on my machine and you can run on yours and we have an optimized/parallelized way to identify the source of divergence
Scott Morrison (Jan 06 2023 at 02:33):
Sounds good. (That can also work with two VSCode windows, one local and one a remote session to my machine.)
Scott Morrison (Jan 06 2023 at 02:33):
I'm free now if it suits.
Arthur Paulino (Jan 06 2023 at 02:35):
It's almost midnight now... I need to sleep. Maybe I can just try the remote access tomorrow. We can set it up later
Xavier Roblot (Jan 06 2023 at 05:11):
Scott Morrison said:
Is this working for others on a mac? I'm just seeing:
% lake exe cache get Attempting to download 797 file(s) No cache files to decompress
returning instantly, and then
lake build
compiles everything as normal.
lake exe cache get
works perfectly on my Mac (M1 with OS 13.1 using MacPorts).
Yaël Dillies (Jan 06 2023 at 09:53):
Scott Morrison said:
Separately, what do we think about giving out
cache put
privileges to a few people.
I see two obvious options:
- allow all maintainers
- allow all reviewers
Certainly at this point in time, maintainers should be enough. But once mathlib4 has grown we will probably want reviewers to be in the loop as well.
Arthur Paulino (Jan 06 2023 at 11:06):
Gabriel Ebner said:
I think
lake exe cache
is way too flaky for that at the moment. For example it doesn't know if oleans are up to date. If you doecho this wont compile >> Mathlib/Algebra/Abs.lean
, thenlake exe cache put
will upload the old oleans for the modified file.
That's true but it and any file that, transitively or not, depends on it, will have some useless hash that should never match with anything anyone will download
Arthur Paulino (Jan 06 2023 at 11:09):
About put
privileges, just to say it out loud, I'm not the one who decides that. That's with the maintainers. I only have a key for dev purposes, which will expire in a few days
Arthur Paulino (Jan 06 2023 at 12:05):
@Scott Morrison cache
is computing the hashes correctly on your machine but cache get
was failing too fast (not even trying to download). So I checked curl --version
and it said "7.64.something", which is too old. It's exiting with an error because it can't do --parallel
Yaël Dillies (Jan 06 2023 at 12:42):
I too get Can't unlink already-existing object
on Windows from VScode.
Arthur Paulino (Jan 06 2023 at 12:50):
That looks like some IO error from tar
Arthur Paulino (Jan 06 2023 at 12:52):
Are you doing cache get
while you can see orange bars of compilation? The orange bars mean that olean
files might be in use and tar
won't have the permission to overwrite them
Yaël Dillies (Jan 06 2023 at 13:07):
Ah I see. Is there a Lean 4 equivalent to Lean: Check nothing
?
Yaël Dillies (Jan 06 2023 at 13:09):
Closing all files seems to solve it.
Arthur Paulino (Jan 06 2023 at 13:23):
Yaël Dillies said:
Ah I see. Is there a Lean 4 equivalent to
Lean: Check nothing
?
I don't know
Arthur Paulino (Jan 06 2023 at 13:24):
mathlib4#1383 implements a validation on the version of curl
It also removes an unnecessary workaround because of a bug that was fixed in core
Johan Commelin (Jan 06 2023 at 16:50):
Johan Commelin said:
How far are we from
lake get-cache
? Switching branches and runninglake build
is quite time consuming.
Since I started this thread, a lot has happened! @Arthur Paulino thank you so much for all the time and energy you've poured into this! It works flawlessly on my side. Great work! :octopus: :tada:
Arthur Paulino (Jan 06 2023 at 16:53):
Don't forget to lake exe cache clean
once in a while or your .cache
folder will just grow and grow
Yaël Dillies (Jan 06 2023 at 17:31):
Could we have a loading bar when lake
downloads the files? It's a bit dead in the chat terminal.
Yaël Dillies (Jan 06 2023 at 17:32):
Also, this Can't unlink already-existing object
is quite annoying as the only solution I found so far is closing all Lean files.
Arthur Paulino (Jan 06 2023 at 17:40):
I can try the bar later, but tar
won't trample a lock that your OS does
Arthur Paulino (Jan 06 2023 at 17:41):
If you don't want to lose your tabs, you can quit VS Code, do the caching thing and then open VS Code again
Arthur Paulino (Jan 06 2023 at 17:47):
Another thing you can try is pausing the Lean server in the top right corner of the infoview. Maybe that will unlock files that are being read so tar
will have the permission to overwrite them
Arthur Paulino (Jan 06 2023 at 17:55):
Please let ppl know if this works:pray:
Gabriel Ebner (Jan 06 2023 at 18:02):
Arthur Paulino said:
Gabriel Ebner said:
I think
lake exe cache
is way too flaky for that at the moment. For example it doesn't know if oleans are up to date. If you doecho this wont compile >> Mathlib/Algebra/Abs.lean
, thenlake exe cache put
will upload the old oleans for the modified file.That's true but it and any file that, transitively or not, depends on it, will have some useless hash that should never match with anything anyone will download
Other people will download it if you commit it. This can happen very easily, if you modify a file and then run lake exe cache put
without running lake build
first.
Sebastian Ullrich (Jan 06 2023 at 18:04):
Deleting an open file is, it turns out, not quite straightforward on Windows: https://github.com/leanprover/lean4/blob/fedf235cba35ed8bf6bf571cf38e6d8536b904ac/src/library/module.cpp#L85-L89. I'm not surprised tar
doesn't know about that. It would probably be easier to remove those files manually before invoking tar
, though that would require a new runtime function implementing the linked code.
Sebastian Ullrich (Jan 06 2023 at 18:06):
Or rather we should just make IO.FS.removeFile
do it
Arthur Paulino (Jan 06 2023 at 18:19):
@Sebastian Ullrich please let me know if/when you implement it so I can upgrade cache
Arien Malec (Jan 06 2023 at 19:02):
On MacOS 13.0.1:
lake exe cache clean!
arienmalec@Ariens-MBP-2 mwe % lake exe cache get
Attempting to download 804 file(s)
Decompressing 804 file(s)
arienmalec@Ariens-MBP-2 mwe % ls ./build/lib
Mathlib.ilean Mathlib.olean Mathlib.trace Mwe.ilean
object file './build/lib/Mathlib/Data/Sum/Basic.olean' of module Mathlib.Data.Sum.Basic does not exist
Arthur Paulino (Jan 06 2023 at 19:25):
Is that a project that imports mathlib?
Arthur Paulino (Jan 06 2023 at 19:33):
I have very little bandwidth to fully support cache
for projects that import Mathlib right now, sorry. I wanna focus on the support for people doing the port. Help is very much welcome though, and I'd be willing to answer questions and review PRs if someone wants to debug and fix cache
for that use case
Arien Malec (Jan 06 2023 at 19:37):
Gotcha -- yes, was trying to build an MWE space that used Mathlib
as a client.
Arthur Paulino (Jan 06 2023 at 19:39):
But FWIW, if your Lake project is properly set to use mathlib as a dependency, cache
should extract the olean files to ./lake-packages/...
, not ./build/...
. And you shouldn't be seeing an error message of a missed olean file from mathlib that is expected to be in ./build
. It should be in ./lake-packages/mathlib/build/...
Arien Malec (Jan 06 2023 at 19:43):
Then I assume there's some issue with lack of lake-packages
awareness here.
Arthur Paulino (Jan 06 2023 at 19:44):
There's probably something wrong with your lakefile.lean
. The #lean4 stream is the ideal place do get help about that
Arien Malec (Jan 06 2023 at 19:47):
Unlikely, since I wiped and recreated from scratch. But again, if using mathlib as a client isn't a current concern, no issue right now.
Yaël Dillies (Jan 07 2023 at 00:30):
Feature request: Now that we download files one by one, could we ask to get cache for a specific file?
Typically I don't need most files to work on a given pull request and it's a waste to download them.
Yaël Dillies (Jan 07 2023 at 00:32):
Arthur Paulino said:
Another thing you can try is pausing the Lean server in the top right corner of the infoview. Maybe that will unlock files that are being read so
tar
will have the permission to overwrite them
Doesn't work :(
Arthur Paulino (Jan 07 2023 at 08:02):
Yaël Dillies said:
Feature request: Now that we download files one by one, could we ask to get cache for a specific file?
Typically I don't need most files to work on a given pull request and it's a waste to download them.
Nice, I actually thought about that in the very beginning of the development. And now that the code is better structured, it's actually easy to implement it
Arthur Paulino (Jan 07 2023 at 08:20):
Do ilean
or olean
files keep track of the original paths of their respective Lean sources? I suspect this is probably why cache get
isn't working for projects that import mathlib, since the sources for mathlib files will live inside .lake-packages/mathlib
instead of .
. That would invalidate the olean/ilean files
Siddhartha Gadgil (Jan 07 2023 at 08:44):
It is actually mostly working, with an occasional failure as I reported earlier.
Eric Wieser (Jan 07 2023 at 23:55):
Gabriel Ebner said:
FWIW, I just tried it out on gitpod and all files in
.cache
were empty.
I now get only about half of the files being empty
Eric Wieser (Jan 07 2023 at 23:57):
(see https://github.com/leanprover-community/mathlib4/pull/1419 for an easy way to startup gitpod to test this)
Arthur Paulino (Jan 08 2023 at 23:02):
Yaël Dillies said:
Feature request: Now that we download files one by one, could we ask to get cache for a specific file?
Typically I don't need most files to work on a given pull request and it's a waste to download them.
Done in mathlib#1430 :+1:
Ruben Van de Velde (Jan 08 2023 at 23:05):
Arthur Paulino (Jan 08 2023 at 23:15):
I made it more powerful than what was requested to avoid multiple cache get
commands. Instead, you can request the download of cache files for lists of entire directories
Yaël Dillies (Jan 08 2023 at 23:17):
This is great. I had this in mind too, and then I thought it might be too much to ask :laughing:
Yaël Dillies (Jan 08 2023 at 23:18):
One improvement that would save me a lot of time is if there was an option to tell it to figure out which files to download based on the diff with master
. But that would be mathlib-specific, so I'm not sure how it would work.
Mario Carneiro (Jan 08 2023 at 23:22):
Doesn't it already only download the files that you don't have?
Mario Carneiro (Jan 08 2023 at 23:23):
so if you have been to master
recently then you will have cache files for it and any that are shared with the branch will be used
Yaël Dillies (Jan 08 2023 at 23:24):
Yes, but if I am working on two level files A
and B
which don't import each other, then I would like to spare myself writing lake exe cache get A B
because A
and B
will likely be longish names and I will need to get cache maybe 10 times in a day.
Mario Carneiro (Jan 08 2023 at 23:25):
you should get fish shell or something else with good command history
Yaël Dillies (Jan 08 2023 at 23:26):
Ehrm, Gitpod...
Arthur Paulino (Jan 08 2023 at 23:27):
I don't understand the idea. lake exe cache get
on master
should get you everything you don't have yet
Yaël Dillies (Jan 08 2023 at 23:31):
Yeah, and that's too much. I don't want to download the bazillion files that depend on A
or B
.
Arthur Paulino (Jan 08 2023 at 23:32):
Then lake exe cache get path/to/A path/to/B
Yaël Dillies (Jan 08 2023 at 23:33):
Yeah, which is slightly too long for me to type it out every time.
Yaël Dillies (Jan 08 2023 at 23:34):
Hence why it would be great if lake
could detect that only A
and B
were touched from master.
Arthur Paulino (Jan 08 2023 at 23:34):
Use tab completion (that's why .lean
is part of the file name, as opposed to what you suggested on the PR)
Arthur Paulino (Jan 08 2023 at 23:35):
Or just do lake exe cache get *A.lean *B.lean
Arthur Paulino (Jan 08 2023 at 23:37):
If A
is too generic and has a long path to it, like Some/Really/Big/Path/A.lean
, then do lake exe cache get *Path/A.lean
Arthur Paulino (Jan 08 2023 at 23:39):
I don't think the complexity you're asking for (parsing git diffs) is worth the effort here. Most people will just do history search on their terminals, when they're not just doing lake exe cache get
Yaël Dillies (Jan 08 2023 at 23:39):
Yes probably
Kevin Buzzard (Jan 08 2023 at 23:40):
This is just ridiculous. The time taken to type path/to/A is surely comparable to the time taken to just download the entire cache! The brilliant cache system means I don't have to wait ten minutes but surely we can wait ten seconds!
Yaël Dillies (Jan 08 2023 at 23:41):
Not if you're downloading it from my student room :grinning_face_with_smiling_eyes:
Kevin Buzzard (Jan 08 2023 at 23:41):
:-/
Arthur Paulino (Jan 08 2023 at 23:43):
The feature is also an attempt to circumvent the flakiness of curl -X GET
with less powerful connections, so that was another motivation
Yakov Pechersky (Jan 09 2023 at 01:22):
Another option is lake exe cache get **/*A.lean
to recursively go down. Or lake exe cache get $(find . -name "A.lean")
or $(find . -name "*A.lean")
, depending on your use case. Use the shell to help you! I think it's too much to ask cli utils to replicate all the file directory walking that other more basic utilities do.
Yakov Pechersky (Jan 09 2023 at 01:24):
If you care only (exactly) the files that have been modified wrt to master
, then $(git diff --name-only master --diff-filter=M)
.
Eric Wieser (Jan 09 2023 at 01:34):
Are the semantics of lake exe cache get $fname
"get the olean for $fname
only" or "get all transitively-imported oleans needed for $fname
"?
Eric Wieser (Jan 09 2023 at 01:38):
And isn't lake exe cache get *A.lean
expanded by the shell on Unix? If I do something sneaky like put a *
in a filename, your code is going to double-expand it
Mario Carneiro (Jan 09 2023 at 01:43):
Eric Wieser said:
If I do something sneaky like put a
*
in a filename, your code is going to double-expand it
please don't do that
Eric Wieser (Jan 09 2023 at 01:47):
Lean4 still allows anything you like as a module name as long as you import with \f<<>>
like lean 3 did, right? (I still agree that we certainly shouldn't do it in mathlib)
Mario Carneiro (Jan 09 2023 at 01:49):
Yes. Note that even Clear!.lean
caused some problems even though !
is an identifier character in lean 4. I think it's best to be conservative with filename characters for portability given how many ways people manipulate filenames
Mario Carneiro (Jan 09 2023 at 01:50):
shell scripts are quite often not hardened against this kind of chicanery
Gabriel Ebner (Jan 09 2023 at 01:50):
Eric Wieser said:
And isn't
lake exe cache get *A.lean
expanded by the shell on Unix? If I do something sneaky like put a*
in a filename, your code is going to double-expand it
Some shells also give an error if the wildcard can't be expanded. (I believe there's even a bash option for this.)
Eric Wieser (Jan 09 2023 at 01:50):
I assume @Arthur Paulino is a windows user
Eric Wieser (Jan 09 2023 at 01:51):
Does lean have any way to query "am I running on windows"?
Eric Wieser (Jan 09 2023 at 01:51):
Because based on this comment windows programs are expected to parse their own globs, but unix programs are not
Mario Carneiro (Jan 09 2023 at 01:51):
docs4#System.Platform.isWindows
Gabriel Ebner (Jan 09 2023 at 01:52):
windows programs are expected to parse their own globs
I believe windows programs are also expected to parse quotation marks. I thought both quotation marks and wildcard expansion were already done by libc though?
Eric Wieser (Jan 09 2023 at 01:53):
What does libc refer to on windows?
Mario Carneiro (Jan 09 2023 at 01:53):
windows has a libc
Eric Wieser (Jan 09 2023 at 01:54):
Windows has MSVCRT, doesn't it? Wouldn't libc only be around if you use mingw?
Arthur Paulino (Jan 09 2023 at 01:55):
I'm a linux user btw. Should I just use a different character then? Maybe %
?
Mario Carneiro (Jan 09 2023 at 01:55):
https://learn.microsoft.com/en-us/cpp/c-runtime-library/c-run-time-library-reference?view=msvc-170
Eric Wieser (Jan 09 2023 at 01:56):
Arthur Paulino said:
I'm a linux user btw. Should I just use a different character then? Maybe
%
?
Do your patterns have different semantics to the shell ones?
Eric Wieser (Jan 09 2023 at 01:56):
Because if they don't, then you probably don't need to implement them at all
Mario Carneiro (Jan 09 2023 at 01:56):
libc here meaning the library windows provides for making standards-conformant C programs (i.e. the ones with a main()
function instead of WinMain()
) run
Arthur Paulino (Jan 09 2023 at 01:56):
Eric Wieser said:
Because if they don't, then you probably don't need to implement them at all
Nope... then I can just match file names as they are?
Arthur Paulino (Jan 09 2023 at 01:57):
I mean, exact matches
Mario Carneiro (Jan 09 2023 at 01:57):
yes, on linux if you call foo *.lean
the shell will actually call foo A.lean B.lean
if those are the files in the directory
Eric Wieser (Jan 09 2023 at 01:57):
Gabriel Ebner said:
windows programs are expected to parse their own globs
I believe windows programs are also expected to parse quotation marks. I thought both quotation marks and wildcard expansion were already done by libc though?
By default, wildcards aren't expanded in command-line arguments. I don't know if Lean 4 is compiled with the opt-in parsing.
Arthur Paulino (Jan 09 2023 at 01:58):
*SomeFile.lean
did not work on my shell ootb btw. I relied on my match
Eric Wieser (Jan 09 2023 at 01:58):
What is the full path that you wanted*SomeFile.lean
to match?
Mario Carneiro (Jan 09 2023 at 02:00):
*SomeFile.lean
will match FooSomeFile.lean
in the same directory, but not Foo/SomeFile.lean
or Foo/BarSomeFile.lean
Mario Carneiro (Jan 09 2023 at 02:00):
you need to use **/*SomeFile.lean
to match the latter
Arthur Paulino (Jan 09 2023 at 02:01):
Ah, got it
Eric Wieser (Jan 09 2023 at 02:02):
You might need to change your shell settings to enable that, it's not on by default
Arthur Paulino (Jan 09 2023 at 02:02):
I will remove the custom matches then and use exact matches
Arthur Paulino (Jan 09 2023 at 02:13):
PR updated: mathlib4#1430
Arthur Paulino (Jan 09 2023 at 02:16):
I tried using curl
progress bar but it prints a lot of trash when trying to download files that don't exist on the server
Arthur Paulino (Jan 09 2023 at 02:17):
So I guess no progress bar until we have a Lean implementation of a proper lib for HTTP requests
Siddhartha Gadgil (Jan 09 2023 at 02:22):
This is far from anything I am competent with, but as a bridge would it be good to have a Lean ffi wrapper around libcurl instead of calling the different shells with curl?
Arthur Paulino (Jan 09 2023 at 02:26):
I believe someday we will have that. I definitely don't have time for it right now
Siddhartha Gadgil (Jan 09 2023 at 02:51):
I had considered this for LeanAide
but I am still unfamiliar with ffi (and in general unskilled in C/system stuff). But hopefully someone with the time and expertise will create a minimal lean-curl
which can then be expanded to more of the API.
Eric Wieser (Jan 09 2023 at 09:19):
I suspect distributing lake exe cache
with an appropriate libcurl might be rather annoying, especially on windows
Yaël Dillies (Jan 10 2023 at 08:30):
How hard would it be to have a button in the infoview that runs lake exe cache get X
when file X
is open?
Johan Commelin (Jan 10 2023 at 08:32):
Or Ctrl-Shift-P: Lake ...
something something?
Yaël Dillies (Jan 10 2023 at 08:32):
Yeah, that would be good too, although probably less discoverable.
Arthur Paulino (Jan 10 2023 at 08:34):
Yaël Dillies said:
How hard would it be to have a button in the infoview that runs
lake exe cache get X
when fileX
is open?
That won't work without the upgrade on IO.removeFile
to remove files being read
Yaël Dillies (Jan 10 2023 at 08:35):
Did that not already happen? I just ran lake exe cache get
with Algebra.Order.AbsoluteValue
and nothing complained about linked files.
Arthur Paulino (Jan 10 2023 at 08:36):
Nope... well, I haven't touched it.
Also, cache
is an application that lives in mathlib4
and the infoview is not mathlib4 specific
Yaël Dillies (Jan 10 2023 at 08:37):
That's not a problem. We can and do override infoview functions, or at least we could in Lean 3.
Yaël Dillies (Jan 10 2023 at 08:38):
See file#tactic/interactive_expr
Arthur Paulino (Jan 10 2023 at 08:38):
- But what would that do?
Arthur Paulino (Jan 10 2023 at 08:39):
Is it a button that only appears when you're in mathlib?
Yaël Dillies (Jan 10 2023 at 08:39):
Yes, that's how it would look like.
Arthur Paulino (Jan 10 2023 at 08:40):
So that's two things: I haven't touched the code that overwrites oleans being read and I don't know about these repo-specific buttons
Yaël Dillies (Jan 10 2023 at 08:40):
If you've ever tried to open core files, you will see that the infoview looks different to the one you get in mathlib. And that difference come from the two lines here: https://github.com/leanprover-community/mathlib/blob/master/src/tactic/interactive_expr.lean#L496-L497
Arthur Paulino (Jan 10 2023 at 08:42):
Arthur Paulino said:
So that's two things: I haven't touched the code that overwrites oleans being read and I don't know about these repo-specific buttons
But if someone else knows how to get around these two points, please go ahead
Yaël Dillies (Jan 10 2023 at 08:43):
I know how to do the second one. Indeed I had to touch the machinery myself in #15959. I have no idea about the first, though.
Arthur Paulino (Jan 10 2023 at 08:43):
For the former, we need an upgraded version of IO.removeFile
that Sebastian said should be implemented in core. I don't know the status of that
Arthur Paulino (Jan 10 2023 at 14:33):
:ping_pong: mathlib4#1430
Arthur Paulino (Jan 10 2023 at 23:25):
This one is a typo that @Yaël Dillies noticed. The PR is very easy to review:
mathlib4#1472
Yaël Dillies (Jan 12 2023 at 22:52):
@Arthur Paulino, whatever I try, I still cannot get lake exe cache get some_file
to work.
Yaël Dillies (Jan 12 2023 at 22:53):
$ lake exe cache get Mathlib/Data/Rat/*.lean
warning: manifest out of date: package directory changed, use `lake update` to update
uncaught exception: No match for Mathlib/Data/Rat/Basic.lean
Yaël Dillies (Jan 12 2023 at 22:53):
Is this a Windows bug, maybe?
Arthur Paulino (Jan 12 2023 at 22:54):
Aren't Windows paths build with \
instead of /
?
Yaël Dillies (Jan 12 2023 at 22:54):
As in, your match is expanded with /
, then it fails because Windows expects \
Yaël Dillies (Jan 12 2023 at 22:55):
Ah so your program isn't doing anything about it!
Yaël Dillies (Jan 12 2023 at 22:56):
It is unexpected that the exact syntax from the help menu can't be used because it is OS-dependent.
Yaël Dillies (Jan 12 2023 at 22:57):
lake exe cache get Mathlib\\Data\\Rat\\*.lean
does work, but that's unergonomic.
Arthur Paulino (Jan 12 2023 at 22:58):
Sorry, the paths in the hashmap are built with the separators that the current OS uses. Can't you at least use tab completion on Windows?
Yaël Dillies (Jan 12 2023 at 22:59):
Not sure how that works
Yaël Dillies (Jan 12 2023 at 22:59):
Actually, it seems like the tab completion only works with /
:face_palm:
Arthur Paulino (Jan 12 2023 at 23:03):
What the... :sad:
Yaël Dillies (Jan 12 2023 at 23:09):
Can your command just ignore Windows' weirdness and build the hasmap from /
?
Arthur Paulino (Jan 12 2023 at 23:11):
It would be a bit of a nightmare to implement that. I'm using System.FilePath
API to handle that for me
Yaël Dillies (Jan 12 2023 at 23:12):
/me is :face_with_symbols_on_mouth: Windows
Arthur Paulino (Jan 12 2023 at 23:14):
Even if I do replace /
for \\
on Windows on the arguments of cache get
, I'm not sure if automatic glob expansion works with /
on Windows
Arthur Paulino (Jan 12 2023 at 23:14):
@Yaël Dillies do you know how to test that?
Yaël Dillies (Jan 12 2023 at 23:15):
I'm afraid I am a Windows power-sufferer, not a Windows power-user.
Yaël Dillies (Jan 12 2023 at 23:15):
What is automatic glob expansion?
Arthur Paulino (Jan 12 2023 at 23:17):
Have you ever programmed something that reads arguments from the shell input?
Yaël Dillies (Jan 12 2023 at 23:18):
No, never
Arthur Paulino (Jan 12 2023 at 23:20):
Suppose we have
def main (args : List String) : IO Unit := do
IO.println args
that compiles to some binary ./build/bin/foo
. Then, with automatic glob expansion, if you call ./build/bin/foo *.txt
, your program will print every filepath that ends with .txt
in the current folder
Henrik Böving (Jan 12 2023 at 23:21):
Note that this only happens in a shell context
Henrik Böving (Jan 12 2023 at 23:22):
This is not the actual behaviour of the system calls that perform the call, your shell does the glob expansion.
Arthur Paulino (Jan 12 2023 at 23:22):
That's something the shell does for you. The source code of the program does not contain any logic to do that search
Sky Wilshaw (Jan 12 2023 at 23:26):
image.png It does work on Windows - at least in Powershell.
Sky Wilshaw (Jan 12 2023 at 23:26):
Unless cat
is doing some processing itself.
Arthur Paulino (Jan 12 2023 at 23:27):
Yeah can you test with a program that you write yourself please?
Arthur Paulino (Jan 12 2023 at 23:27):
It can be this simple:
def main (args : List String) : IO Unit := do
IO.println args
Arthur Paulino (Jan 12 2023 at 23:28):
If that works, I can do the upgrade tomorrow
Sky Wilshaw (Jan 12 2023 at 23:28):
I don't have easy access to a Lean executable at the minute but...
image.png
Sky Wilshaw (Jan 12 2023 at 23:29):
which is a bit disappointing.
Arthur Paulino (Jan 12 2023 at 23:31):
Well well well, I guess I can do a custom processing in get
/get!
parameters on Windows and do the match manually
Arthur Paulino (Jan 12 2023 at 23:32):
But it's not going to be as powerful as native glob expansions. I can handle one wildcard, at most. I had this logic implemented on a previous version
Eric Wieser (Jan 12 2023 at 23:38):
You shouldn't implement this in lake exe cache
Eric Wieser (Jan 12 2023 at 23:38):
If this isn't working, we need to set a compilation flag for lean somewhere
Eric Wieser (Jan 12 2023 at 23:39):
See this message
Sky Wilshaw (Jan 12 2023 at 23:40):
Oh wow, I love Windows' idiosyncrasies :/
Arthur Paulino (Jan 12 2023 at 23:42):
@Eric Wieser I don't understand it. Then it should be a matter of setting that flag in the compilation of Lean and everything should work ootb?
Eric Wieser (Jan 12 2023 at 23:47):
Yes, that would be my hope
Eric Wieser (Jan 12 2023 at 23:48):
In the meantime, windows users can just use a different shell
Eric Wieser (Jan 12 2023 at 23:48):
I would expect that most Lean users have git bash installed
Eric Wieser (Jan 12 2023 at 23:48):
Powershell has its own version of glob too
Eric Wieser (Jan 12 2023 at 23:49):
Arthur Paulino said:
Sorry, the paths in the hashmap are built with the separators that the current OS uses. Can't you at least use tab completion on Windows?
You should normalize the paths to use /
before using them in the hash map; otherwise won't your hash computation be wrong?
Arthur Paulino (Jan 12 2023 at 23:51):
If I did normalization, I would have to "denormalize" everytime i'd use it for pointing to files. That would be a mess. I don't hash the path per se. I hash its components (System.FilePath.components : FilePath -> List String
)
Arthur Paulino (Jan 12 2023 at 23:52):
Arthur Paulino (Jan 12 2023 at 23:55):
Eric Wieser said:
I would expect that most Lean users have git bash installed
@Yaël Dillies do you want to test git bash? :smile:
Eric Wieser (Jan 13 2023 at 00:03):
If I did normalization, I would have to "denormalize" everytime i'd use it for pointing to files
Windows should do this for you
Eric Wieser (Jan 13 2023 at 00:04):
Also, by normalize I also mean deal with paths like Mathlib/../Mathlib/../Data/Nat/../Nat/Basic.lean
Eric Wieser (Jan 13 2023 at 00:14):
Arthur Paulino said:
Eric Wieser I don't understand it. Then it should be a matter of setting that flag in the compilation of Lean and everything should work ootb?
Ah nevermind, Lean is built with msys2
(which expects you to run in a gnu runtime like git bash) not msvc
(for which the link above is relevant)
Eric Wieser (Jan 13 2023 at 00:14):
Any attempt to fix shell globbing within lake exe cache
itself will break people in shells that already handle this themselves.
Arthur Paulino (Jan 13 2023 at 00:37):
Eric Wieser said:
Any attempt to fix shell globbing within
lake exe cache
itself will break people in shells that already handle this themselves.
I disagree. We can process paths that get into the program and that still contain *
. That means the shell didn't do anything for the user. That's what I had in mind in my previous implementation
Arthur Paulino (Jan 13 2023 at 00:43):
@Eric Wieser @Gabriel Ebner if any of you give me a thumbs up, I can recover that customized post-processing behavior (and further replace /
s by \\
s on Windows in case the user used tab completion as Yael tried)
Eric Wieser (Jan 13 2023 at 00:54):
I can recover that customized post-processing behavior
I firmly believe that we shouldn't do that. If you want to be able to pass basic filename patterns to a program, use a shell that supports it. Everyone contributing to mathlib4 has git installed on the machine they develop on, and I'm pretty sure that comes with git bash on platforms not already equipped with a shell. I suppose if you really insist you could hide it between --cmd-glob
(meaning "I am using cmd
so need you to do glob expansion for me"). Even detecting windows would be a bad idea as you might be called from a reasonable shell.
Eric Wieser (Jan 13 2023 at 00:55):
(and further replace /s by \\s on Windows in case the user used tab completion as Yael tried)
did you see docs4#System.FilePath.normalize ? You should just use that all the time on the user input (and on the paths in your hash map)
Arthur Paulino (Jan 13 2023 at 00:57):
Eric Wieser said:
(and further replace /s by \\s on Windows in case the user used tab completion as Yael tried)
did you see docs4#System.FilePath.normalize ? You should just use that all the time on the user input (and on the paths in your hash map)
The paths in the hashmap are tailored from imported modules
Arthur Paulino (Jan 13 2023 at 00:58):
I can do an upgrade and do that on the input paths from the user in the get
and get!
commands
Eric Wieser (Jan 13 2023 at 00:59):
The paths in the hashmap are tailored from imported modules
What does this mean?
Arthur Paulino (Jan 13 2023 at 01:24):
It means that I don't build paths from the result of scanning the FS directly. If a file imports Foo.Bar
I make a file path from that module name
Arthur Paulino (Jan 13 2023 at 01:25):
Arthur Paulino said:
I can do an upgrade and do that on the input paths from the user in the
get
andget!
commands
Actually nvm. This won't solve any problem we are seeing today
Eric Wieser (Jan 13 2023 at 01:54):
I make a file path from that module name
How do you do that?
Arthur Paulino (Jan 13 2023 at 02:25):
Eric Wieser (Jan 13 2023 at 02:40):
Note you can use docs4#Lean.Name.components rather than stringifying and splitting
Eric Wieser (Jan 13 2023 at 02:44):
But I agree there's no need to normalize those paths
Eric Wieser (Jan 13 2023 at 02:45):
They already use a \
on windows
Arthur Paulino (Jan 13 2023 at 03:05):
Lean.Name.components return a list of names anyways
Arthur Paulino (Jan 13 2023 at 03:06):
I would have to stringfy each separately
Johan Commelin (Jan 13 2023 at 06:24):
Yaël Dillies said:
I'm afraid I am a Windows power-sufferer, not a Windows power-user.
It's time to leave the dark side! Step in to the light!
:penguin: :penguin: :penguin:
:octopus: :octopus: :octopus:
Kevin Buzzard (Jan 13 2023 at 07:29):
Yael are you not using git bash?
Yaël Dillies (Jan 13 2023 at 08:30):
I am using whatever VScode's terminal is, which seems to be bash. How do I change the default?
Eric Wieser (Jan 13 2023 at 08:42):
Arthur Paulino said:
I would have to stringfy each separately
Yes, but the names are allowed to contain .
so the two approaches are not equivalent
Eric Wieser (Jan 13 2023 at 10:15):
Yaël Dillies said:
I am using whatever VScode's terminal is, which seems to be bash. How do I change the default?
If it's bash then you already have the right things. cmd
is the bad one
Yaël Dillies (Jan 13 2023 at 10:16):
Ah well then my problem didn't get fixed :frown:
Eric Wieser (Jan 13 2023 at 10:28):
Can you paste the command you typed here?
Eric Wieser (Jan 13 2023 at 10:30):
Aha, I think I know the problem:
- The glob pattern match is working as expected
- You're using a unix shell so the glob generates
/
-style paths - Lean notices you're running on windows, so uses
\\
-style paths Cache
does string equality on the paths rather than path equality, so the match fails
Eric Wieser (Jan 13 2023 at 10:30):
Adding the call to .normalize
on user inputs as I suggested above will fix the problem
Arthur Paulino (Jan 13 2023 at 11:24):
Eric Wieser said:
Adding the call to
.normalize
on user inputs as I suggested above will fix the problem
Arthur Paulino (Jan 13 2023 at 11:27):
@Eric Wieser may I do a small optimization on your branch so we don't need to call Name.toString
at all?
Eric Wieser (Jan 13 2023 at 11:28):
I expect toString to be free
Arthur Paulino (Jan 13 2023 at 11:29):
Alright, I've approved it
Sky Wilshaw (Jan 13 2023 at 12:04):
@Yaël Dillies Might be useful: image.png
Yaël Dillies (Jan 13 2023 at 12:17):
Oh yeah I knew about that menu, but forgot that the default coudl be edited from here.
Eric Wieser (Jan 13 2023 at 12:19):
The fix should be merged in master, can you try again @Yaël Dillies?
Yaël Dillies (Jan 13 2023 at 12:28):
$ lake exe cache get Mathlib/Data/Rat/Floor.lean
warning: manifest out of date: package directory changed, use `lake update` to update
info: updating .\./lake-packages\std to revision fde95b16907bf38ea3f310af406868fc6bcf48d1
info: Compiling Cache.IO
info: Compiling Cache.Hashing
info: Compiling Cache.Requests
info: Compiling Cache.Main
error: > c:\Users\YaelD\.elan\toolchains\leanprover--lean4---nightly-2023-01-12\bin\leanc.exe -c -o .\build\ir\Cache\IO.o .\build\ir\Cache\IO.c -O3 -DNDEBUG
error: stderr:
uncaught exception: no such file or directory (error code: 2)
error: external command `c:\Users\YaelD\.elan\toolchains\leanprover--lean4---nightly-2023-01-12\bin\leanc.exe` exited with code 1
error: > c:\Users\YaelD\.elan\toolchains\leanprover--lean4---nightly-2023-01-12\bin\leanc.exe -c -o .\build\ir\Cache\Hashing.o .\build\ir\Cache\Hashing.c -O3 -DNDEBUG
error: stderr:
uncaught exception: no such file or directory (error code: 2)
error: external command `c:\Users\YaelD\.elan\toolchains\leanprover--lean4---nightly-2023-01-12\bin\leanc.exe` exited with code 1
error: > c:\Users\YaelD\.elan\toolchains\leanprover--lean4---nightly-2023-01-12\bin\leanc.exe -c -o .\build\ir\Cache\Requests.o .\build\ir\Cache\Requests.c -O3 -DNDEBUG
error: stderr:
uncaught exception: no such file or directory (error code: 2)
error: external command `c:\Users\YaelD\.elan\toolchains\leanprover--lean4---nightly-2023-01-12\bin\leanc.exe` exited with code 1
error: > c:\Users\YaelD\.elan\toolchains\leanprover--lean4---nightly-2023-01-12\bin\leanc.exe -c -o .\build\ir\Cache\Main.o .\build\ir\Cache\Main.c -O3 -DNDEBUG
error: stderr:
uncaught exception: no such file or directory (error code: 2)
error: external command `c:\Users\YaelD\.elan\toolchains\leanprover--lean4---nightly-2023-01-12\bin\leanc.exe` exited with code 1
Yaël Dillies (Jan 13 2023 at 12:28):
Seems to be unrelated, though?
Arthur Paulino (Jan 13 2023 at 12:32):
Try lake exe cache
to see if it prints the help menu
Arthur Paulino (Jan 13 2023 at 12:35):
If it doesn't print the help menu then it's indeed unrelated
Johan Commelin (Jan 17 2023 at 08:08):
Ignorant question: if I have 5 clones of mathlib4
, do they share a local cache? Or does each of them maintain its own cache?
Arthur Paulino (Jan 17 2023 at 10:35):
Each of them maintain their own cache (inside a gitignored .cache
folder)
Johan Commelin (Jan 17 2023 at 10:36):
I see. What do you think of using $HOME/.cache/lake/
or something like that?
Johan Commelin (Jan 17 2023 at 10:37):
I'm using git worktree
to checkout every branch in its own directory.
Johan Commelin (Jan 17 2023 at 10:37):
Which means I'm currently not really benefiting from local caching that much, and just redownloading the same files 10x per day
Arthur Paulino (Jan 17 2023 at 10:38):
I can add a check to see if the user has a MATHLIB_CACHE_DIR
env var defined. If so, use that instead
Arthur Paulino (Jan 17 2023 at 10:39):
Would that help?
Johan Commelin (Jan 17 2023 at 10:40):
@Arthur Paulino yeah, sounds good!
Arthur Paulino (Jan 17 2023 at 10:40):
But then you would have to export that env variable on every terminal session
Johan Commelin (Jan 17 2023 at 10:40):
I'm ok with that
Damiano Testa (Jan 17 2023 at 10:40):
I personally would prefer having control over where to store the cache: currently, I am unable to use mathlib4 in the department, since I cannot install things myself, due to permissions. If I can control as much as I can where files are placed, that would be helpful!
Mauricio Collares (Jan 17 2023 at 10:42):
Would writing it at $XDG_CACHE_HOME/mathlib be appropriate? A user could still do XDG_CACHE_HOME=/tmpdir lake exe cache...
to override it.
Damiano Testa (Jan 17 2023 at 10:44):
Mauricio, in this specific case, the $XDG_CACHE_HOME variable is set to a directory on which I have write permissions, so I would be happy with that as well!
Arthur Paulino (Jan 17 2023 at 11:40):
I will implement Mauricio's idea then
Johan Commelin (Jan 17 2023 at 11:41):
Thanks!
Arthur Paulino (Jan 17 2023 at 11:42):
Ah, and I will use .cache
as a fallback in case that's not defined
Arthur Paulino (Jan 17 2023 at 11:54):
Wait, is XDG_CACHE_HOME
some default path on Unix systems? I don't have it defined on my machine (Ubuntu 22.04)
Johan Commelin (Jan 17 2023 at 11:59):
https://wiki.archlinux.org/title/XDG_user_directories
Johan Commelin (Jan 17 2023 at 12:00):
Most distros do something with XDG
Arthur Paulino (Jan 17 2023 at 12:15):
Sebastian Ullrich (Jan 17 2023 at 12:26):
The correct fallback for XDG_CACHE_HOME
is $HOME/.cache
. It doesn't make sense to decide between a local and global cache based on the existence of this variable.
Arthur Paulino (Jan 17 2023 at 12:59):
I can change that. But what if HOME
is not defined?
Sebastian Ullrich (Jan 17 2023 at 13:02):
Then we're screwed
Sebastian Ullrich (Jan 17 2023 at 13:05):
And this is of course only on Linux, other platforms have completely different conventions https://crates.io/crates/dirs
Arthur Paulino (Jan 17 2023 at 13:14):
Then I'll leave .cache
as the safe fallback
Arthur Paulino (Jan 17 2023 at 13:14):
(but anyone is free to push to that branch and advocate for their solution, of course)
Arthur Paulino (Jan 17 2023 at 13:18):
mathlib4#1623 is updated
Rob Lewis (Jan 18 2023 at 05:06):
I may be misunderstanding something about the relation between get-cache and lake build
, so sorry if that's the case!
I have a project that uses mathlib as a dependency. I can run lake exe cache get
in the root of my project and it seems to successfully download and/or unpack a cache. But running lake build
tries to recompile everything. When I open a file with mathlib imports in vscode, it also starts recompiling. Any guess what I've done wrong here?
Curiously: this project is also set up for gitpod, and if I create a fresh gitpod instance (calling lake exe cache get
) I can open files with no recompilation. But lake build
still starts from the bottom.
Johan Commelin (Jan 18 2023 at 05:08):
@Rob Lewis I think Arthur has conjectured that we shouldn't be pinning master
in lakefiles, but pinning a specific hash/tag/release instead.
Johan Commelin (Jan 18 2023 at 05:08):
I haven't yet tested whether that helps, but it's worth a shot.
Rob Lewis (Jan 18 2023 at 05:08):
On a semi-related note, is there a standard way to stop recompilation when it gets triggered by vscode? In the above, when I opened a file with imports, a bunch of lake and lean processes started eating my CPU, and restarting the Lean server and even quitting vscode didn't kill them. I had to kill them manually
Rob Lewis (Jan 18 2023 at 05:09):
Johan Commelin said:
Rob Lewis I think Arthur has conjectured that we shouldn't be pinning
master
in lakefiles, but pinning a specific hash/tag/release instead.
Interesting -- will give that a shot!
Rob Lewis (Jan 18 2023 at 05:12):
Pinning a specific commit didn't seem to change anything. (Process: edit my project's lakefile.lean
, lake update
, lake exe cache get
, lake build
)
Heather Macbeth (Jan 18 2023 at 05:46):
I think maybe lake update
is something you're just not supposed to do when you have dependencies. (I seem to remember discussion of this.)
Heather Macbeth (Jan 18 2023 at 05:47):
So maybe instead of lake update
you want to manually copy in the lake-manifest of current mathlib. (well, or find a way to automate that.)
Heather Macbeth (Jan 18 2023 at 05:51):
Heather Macbeth said:
I think maybe
lake update
is something you're just not supposed to do when you have dependencies. (I seem to remember discussion of this.)
That's a big vague, but I guess my question is: is lake update
un-pinning your pinned hashes for the dependencies?
Rob Lewis (Jan 18 2023 at 06:07):
It would have to copy the manifest from mathlib and then add mathlib itself (per the error message when I tried). But even if I do that I see the same behavior. Updated process: my project pins mathlib4 to a commit and its manifest matches that of mathlib4, with the addition of mathlib4 itself. I clone a fresh copy of this repository (so no build artifacts) and run lake exe cache get
, lake build
Rob Lewis (Jan 18 2023 at 06:07):
Although, with this new process, I can open files that import mathlib before running lake build
Gabriel Ebner (Jan 18 2023 at 06:21):
I think Arthur has conjectured that we shouldn't be pinning
master
in lakefiles, but pinning a specific hash/tag/release instead.
This should make no difference whatsoever for the caching. We've been fixing quite a few bugs though over the last few weeks, are you on the latest mathlib4 already?
Gabriel Ebner (Jan 18 2023 at 06:22):
FWIW, mathport depends on mathlib4
and successfully uses lake exe cache get
in CI (i.e., avoiding building mathlib4)
Gabriel Ebner (Jan 18 2023 at 06:23):
Note: you need matching Lean versions of course. cp lake-packages/mathlib/lean-toolchain .
is your friend.
Gabriel Ebner (Jan 18 2023 at 06:24):
Note: of course std4/aesop/etc. versions also need to match exactly. Deleting any require std
etc. in your lakefile is a good start, followed by lake update
.
Gabriel Ebner (Jan 18 2023 at 06:25):
On a semi-related note, is there a standard way to stop recompilation when it gets triggered by vscode? In the above, when I opened a file with imports, a bunch of lake and lean processes started eating my CPU, and restarting the Lean server and even quitting vscode didn't kill them. I had to kill them manually
No. We should probably file that as a bug.
Heather Macbeth (Jan 18 2023 at 06:25):
Gabriel Ebner said:
Note: of course std4/aesop/etc. versions also need to match exactly. Deleting any
require std
etc. in your lakefile is a good start, followed bylake update
.
So will that ensure that in a project with just one dependency (mathlib), all the dependencies of that dependency (std/aesop/...) get updated only to whatever version mathlib is currently using -- not to latest version?
Gabriel Ebner (Jan 18 2023 at 06:27):
Yes. That's also been fixed recently (November last year).
Rob Lewis (Jan 18 2023 at 06:28):
I'm on the latest commit of mathlib4 and the toolchain files match. This is the entirety of my lakefile:
import Lake
open Lake DSL
require mathlib from git "https://github.com/leanprover-community/mathlib4" @ "d1e6d8643531bd36f7c7bb612413decb7cb07ede"
package «brown-cs22» {
-- add package configuration options here
}
lean_lib BrownCs22 {
-- add library configuration options here
}
@[default_target]
lean_exe «brown-cs22» {
root := `Main
}
Gabriel Ebner (Jan 18 2023 at 06:28):
lake update && cp lake-packages/mathlib/lean-toolchain .
and you should be good to go.
Rob Lewis (Jan 18 2023 at 06:28):
mathport is a good example for me to compare to -- I'll have to see if I can spot any relevant differences between that and my setup
Rob Lewis (Jan 18 2023 at 06:30):
rob@pop-os:~/teaching/cs22/s23/lean$ lake update && cp lake-packages/mathlib/lean-toolchain .
rob@pop-os:~/teaching/cs22/s23/lean$ lake exe cache get
No files to download
Decompressing 926 file(s)
rob@pop-os:~/teaching/cs22/s23/lean$ lake build
Compiling Std.Data.RBMap.Basic
Compiling Qq.Macro
Compiling Mathlib.Lean.Meta
Compiling Qq.Rw
Compiling Std.Lean.Meta.DiscrTree
Compiling Std.Data.List.Basic
Compiling Mathlib.Tactic.Relation.Symm
Compiling Mathlib.Tactic.Relation.Trans
^C
rob@pop-os:~/teaching/cs22/s23/lean$
:sad:
Gabriel Ebner (Jan 18 2023 at 06:30):
That's normal.
Gabriel Ebner (Jan 18 2023 at 06:30):
Compiling = running C compiler to produce native binary (not cached)
Gabriel Ebner (Jan 18 2023 at 06:30):
Building = running Lean (slow and cached)
Rob Lewis (Jan 18 2023 at 06:31):
Oh!
Heather Macbeth (Jan 18 2023 at 06:35):
Wait -- so which one does lake build
do?
Rob Lewis (Jan 18 2023 at 06:35):
So, what I'd like to do in Lean 3 language: I have my project that depends on mathlib. I'd like to build oleans for my project so that when a student opens one of my files, they don't have to wait. lean src --make
or whatever
Rob Lewis (Jan 18 2023 at 06:36):
Imagining that there's a file in my project that's slow to build that gets imported by lots of others, e.g. a "course library" file
Rob Lewis (Jan 18 2023 at 06:37):
I take it lake build
is not what I'm looking for?
Gabriel Ebner (Jan 18 2023 at 06:39):
lake build BrownCs22.MyModule
will build only that module and its dependencies. The reason lake build
compiles the C code is because it compiles runLinter
--it won't compile any the modules in your project. It will only build them.
Gabriel Ebner (Jan 18 2023 at 06:39):
Or are you asking if you can use lake exe cache get
for your project?
Rob Lewis (Jan 18 2023 at 06:43):
Nope! I'm looking for lake build BrownCs22.MyModule
, thanks so much
Arthur Paulino (Jan 18 2023 at 07:42):
Gabriel Ebner said:
Note: of course std4/aesop/etc. versions also need to match exactly. Deleting any
require std
etc. in your lakefile is a good start, followed bylake update
.
This is where I think things can go wrong: relying on Lake to figure out transitive dependencies. It's easy if your graph of dependencies is small, but it can become a nightmare for bigger graphs.
If I have a direct dependency on Std
and on Mathlib
but Mathlib
also depends on Std
, figuring out compatibility is a lot smoother if Mathlib
explicitly states the commit hash it needs from Std
in its lakefile.
Another situation where dependency UX can go bad is when you depend on two libs, but they implicitly depend on different versions of a certain lib.
Arthur Paulino (Jan 18 2023 at 07:44):
Ah, and parsing Lake manifests with the human eye is out of question in terms of UX. Please let's not rely on that
James Gallicchio (Jan 18 2023 at 07:47):
(are the current caching scripts really designed to work outside of the porting effort? It seems like more robust stuff will require a bit more infrastructure around dependency management)
Arthur Paulino (Jan 18 2023 at 07:56):
Yes and no. It is prepared to deal with the two situations: when you're in Mathlib or when you're importing Mathlib. I mean, it can understand in which scenario it's at, find sources and place olean files correctly. But it relies on precise management of dependencies. And Mathlib pinning the master
rev of Std
doesn't help
Arthur Paulino (Jan 18 2023 at 08:04):
Unless you're willing to rely on Lake to figure out transitive dependencies for you. But that will potentially make the life of people importing your project harder because they don't know which version of Std they will be downloading unless they look at your manifest
Arthur Paulino (Jan 18 2023 at 08:38):
Managing dependencies, however, is a world in itself. As a reference, Cargo adopts semantic versioning... but they also have a dedicated platform for hosting packages.
I will follow up on a thread in #lean4
Gabriel Ebner (Jan 18 2023 at 17:37):
If I have a direct dependency on Std and on Mathlib but Mathlib also depends on Std, figuring out compatibility is a lot smoother if Mathlib explicitly states the commit hash it needs from Std in its lakefile.
@Arthur Paulino I don't know how to get this through to you, but your description is exactly how Lake and mathlib4 are already set up right now. The lake-manifest.json
in mathlib4 explicitly states the commit hash from std4 (and from aesop, etc.). If you use lake update
in a project that imports mathlib4
, it will use these pinned revisions.
Gabriel Ebner (Jan 18 2023 at 17:45):
Another situation where dependency UX can go bad is when you depend on two libs, but they implicitly depend on different versions of a certain lib.
This is indeed a hard one. But I have an undocumented trick for you: the order of require
directives in a lakefile matters, i.e., they are resolved top-to-bottom. So as long as mathlib4 is your first require
, lake update
will take all dependency versions from mathlib4
. (And if you want a different version of aesop, you could put a require aesop
on top.)
Arthur Paulino (Jan 18 2023 at 18:37):
Alright, got it. Thanks! The order trick is important and I wasn't aware of it
Arthur Paulino (Jan 18 2023 at 18:48):
@Siddhartha Gadgil I think caching can work for you if you remove the requirements on Qq and Std
Siddhartha Gadgil (Jan 19 2023 at 00:44):
Thanks. i will try
Violeta Hernández (Jan 20 2023 at 04:48):
Violeta Hernández (Jan 20 2023 at 04:48):
What's going on here?
Violeta Hernández (Jan 20 2023 at 05:06):
Well, I just built Mathlib4 myself and it works
Violeta Hernández (Jan 20 2023 at 05:06):
Very very fast
Arthur Paulino (Jan 20 2023 at 07:41):
Looks like a partial failure
Arthur Paulino (Jan 20 2023 at 13:12):
@Violeta Hernández it's better to do lake exe cache get!
once to purge corrupted files
Adam Topaz (Jan 23 2023 at 16:33):
Should we add some instructions for lake exe get cache
on the mathlib4 readme.md?
Arthur Paulino (Jan 24 2023 at 12:01):
What's the plan for when we start getting hash collisions in UInt64
btw?
Eric Wieser (Jan 24 2023 at 12:04):
Use a more reasonable hash?
Arthur Paulino (Jan 24 2023 at 12:38):
That's the obvious part. But I mean, are we going to implement something in Lean? Use some C implementation? FFI to something that's done in Rust?
Sebastian Ullrich (Jan 24 2023 at 12:47):
Arthur Paulino said:
What's the plan for when we start getting hash collisions in
UInt64
btw?
What's the chance of that happening?
Arthur Paulino (Jan 24 2023 at 12:49):
It's low but I don't have a clear intuition for how likely that is to happen in the next year
Eric Wieser (Jan 24 2023 at 12:53):
Does the presence of a git
installation mean that you have a command-line hashing tool installed?
Arthur Paulino (Jan 24 2023 at 13:08):
Eric, I don't know why you're communicating with questions like that. Yes, Git has a hashing algorithm. But that doesn't give us an obvious way to plug it in our hashing scheme
Eric Wieser (Jan 24 2023 at 13:09):
Sorry, let me try and be clearer: could you build a long string of all the data you want to hash, and pipe it to git some-raw-hash-command
in order to punt the job of having native hashing down the road? Or would that be prohibitively slow? I'm not suggesting using git, but merely using the fact that git bundles a binary somewhere that can compute SHA1 hashes.
Arthur Paulino (Jan 24 2023 at 13:12):
I found this: https://git-scm.com/docs/git-hash-object
Arthur Paulino (Jan 24 2023 at 13:14):
The long string idea works, but we'd need to write it to a file and then hash. If we have ~1k files, that's ~1k writes and git hash calls though
Arthur Paulino (Jan 24 2023 at 13:18):
We have an implementation of the SHA3 hash here. But maybe that's an overkill. I know this is a very anticipated concern but I just don't want us to fall in a "fixing a plane while it's flying" territory. I think we should at least agree on a plan for when/if it happens
Eric Wieser (Jan 24 2023 at 13:23):
Arthur Paulino said:
The long string idea works, but we'd need to write it to a file and then hash. If we have ~1k files, that's ~1k writes and git hash calls though
Would piping it through stdin not work? Either way, the subprocess calls would add up I guess.
Arthur Paulino (Jan 24 2023 at 13:28):
Yes, the ideal solution wouldn't rely on more IO overhead, IMO
Johan Commelin (Jan 24 2023 at 13:43):
git hash-object
indeed has a --stdin
option.
James Gallicchio (Jan 24 2023 at 14:00):
i'm not sure it's worth worrying -- where do we rely on hashes to indicate equality?
James Gallicchio (Jan 24 2023 at 14:00):
(deleted)
Arthur Paulino (Jan 24 2023 at 14:03):
James Gallicchio said:
i'm not sure it's worth worrying -- where do we rely on hashes to indicate equality?
The hash indicates which olean
file we should download. A hash collision would trigger a download of some unrelated olean and I'm not fully aware of the consequences of that
Sebastian Ullrich (Jan 24 2023 at 14:05):
Unless Lake's trace has the same collision, it will surely rebuild the file
Arthur Paulino (Jan 24 2023 at 14:08):
That's good news then. So in terms of build safety, we're backed up by Lake
Eric Wieser (Jan 24 2023 at 14:08):
A collision could also cause uploads to either fail or invalidate old caches, right?
Arthur Paulino (Jan 24 2023 at 14:10):
Uploads would fail. But it's probably better to invalidate old caches :thinking:
Arthur Paulino (Jan 24 2023 at 14:34):
Sebastian Ullrich said:
Unless Lake's trace has the same collision, it will surely rebuild the file
Just to make sure... we also download .ilean
, .trace
, .c
and .c.trace
files. Would Lake still be able to recover from that? I'm asking because I don't know to what extend Lake trusts what's already in the file system versus always recomputing traces
Sebastian Ullrich (Jan 24 2023 at 14:59):
Yes, comparing the trace on disk with the recomputed trace in memory is how Lake decides what to rebuild.
Gabriel Ebner (Jan 24 2023 at 19:13):
Lake uses the same hash as cache
, so a hash collision in the file contents would affect both of them.
Gabriel Ebner (Jan 24 2023 at 19:17):
I've been thinking about adding a better, cryptographic hash function. For example blake3 is about as fast our current hash, at least if you use the optimized assembly version. But then I saw that Lake needs to be built by three different build systems (make, nix, and lake) and my motivation dropped drastically.
Arthur Paulino (Jan 24 2023 at 19:21):
We're using Rust's blake3 via FFI here. But that would require users to have cargo
installed
Gabriel Ebner (Jan 24 2023 at 19:34):
Using blake3 via Rust is a very roundabout way of doing this. Blake3 has a nice C implementation, there's no need to depend on Rust. (At least when using it in Lean/Lake. I guess you're already using Rust for other libraries as well, so it's not an extra dependency for you.)
Arthur Paulino (Jan 24 2023 at 19:40):
On that, we would prefer using the C implementation directly. But configuring the Rust FFI looked easier than adapting the C code.
Arthur Paulino (Jan 24 2023 at 20:12):
If you manage to get that into std4
, it would not only make mathlib4's caching better but also our team happier
Mario Carneiro (Jan 24 2023 at 20:23):
speaking of which, if someone wants to upstream the cache
script to std4
that would be great
Mario Carneiro (Jan 24 2023 at 20:24):
ideally in such a way that mathlib doesn't need to maintain a separate copy of the script
Arthur Paulino (Jan 24 2023 at 20:30):
The proper way of doing so is by having a generic package for caching
Arthur Paulino (Jan 24 2023 at 20:32):
A less ideal way of doing so is by migrating Cache
to std4
so mathlib4 would inherit lake exe cache
Mario Carneiro (Jan 24 2023 at 20:32):
I have no idea how packages and lake scripts interact, but if you can make something work that's fine
Mario Carneiro (Jan 24 2023 at 20:33):
I just want this capability to be widely available somehow
Mario Carneiro (Jan 24 2023 at 20:35):
I prefer not to add new packages though since they add maintenance overhead when we do lean bumps and the like
Mario Carneiro (Jan 24 2023 at 20:35):
that feels like a wider conversation though that we don't need to broach now
Heather Macbeth (Jan 28 2023 at 02:58):
lake exe cache get
currently fails for me on freshly-cloned mathlib4 with
uncaught exception: ./build/lib/Mathlib/Data/Nat/Prime.olean: truncated gzip input
tar: Error exit delayed from previous errors.
Does anyone else experience this?
Kevin Buzzard (Jan 28 2023 at 08:49):
The only trick I know right now is to change get
to get!
. Is that any use?
Arthur Paulino (Jan 28 2023 at 10:18):
Yes. @Heather Macbeth you might have ended up with some corrupted gz file(s). get
doesn't overwrite local files but get!
will try to download everything you need from scratch, overwriting existing (maybe corrupted) files
This is something that will be improved once we have our own reliable HTTP request package in Lean 4. Right now we are using curl
and it's hard to know what it does from outside with enough details without falling into further version restrictions
Arthur Paulino (Jan 28 2023 at 11:50):
Can we rename this thread to lake exe cache get
to avoid eventual confusions?
Last updated: Dec 20 2023 at 11:08 UTC