Zulip Chat Archive

Stream: mathlib4

Topic: lake get-cache


Johan Commelin (Dec 20 2022 at 12:16):

How far are we from lake get-cache? Switching branches and running lake build is quite time consuming.

Arthur Paulino (Dec 20 2022 at 14:19):

Is it going to be a general lake command or a Mathlib specific script (lake run get_cache)?
My two cents:

  • I don't think many (if any) other Lean 4 projects will implement caching as mathlib does, so implementing it as a lake command may be a bit wasteful
  • Mathlib maintainers will have much more freedom to adjust the script if it's a Mathlib self-contained thing
  • If caching becomes a popular thing among Lean 4 projects someday, then upstreaming it to lake can become a more worthwhile effort

Implementing it as a lake command has more complexities because it has to be more generic. For instance, the host might be a different one from Azure (a host config parameter enters the arena)

Shreyas Srinivas (Dec 20 2022 at 14:33):

Arthur Paulino said:

Is it going to be a general lake command or a Mathlib specific script (lake run get_cache)?
My two cents:

  • I don't think many (if any) other Lean 4 projects will implement caching as mathlib does, so implementing it as a lake command may be a bit wasteful
  • Mathlib maintainers will have much more freedom to adjust the script if it's a Mathlib self-contained thing
  • If caching becomes a popular thing among Lean 4 projects someday, then upstreaming it to lake can become a more worthwhile effort

Implementing it as a lake command has more complexities because it has to be more generic. For instance, the host might be a different one from Azure (a host config parameter enters the arena)

This might not be true. Aren't there projects like scilib etc which might eventually get large enough? Other communities might benefit from lean. I can imagine theoretical CS folks building their own tcslib in the not too distant future. I am writing a tool that might get large for a formal verification project in lean and I can imagine this will be useful for me.

Kevin Buzzard (Dec 20 2022 at 14:36):

Right now the difference is that mathlib4 has many people working on it right now, and the other projects do not. I agree that in the future this might change in the sense that other projects might also get a lot of momentum, but mathlib4 is an extremely high priority project for lots of reasons.

Arthur Paulino (Dec 20 2022 at 14:38):

We agree that it can be useful, but I see it as an overengineering decision at this point in which mathlib folks can benefit from it today. There needs to be more discussions around caching as a lake command. For instance, mathlib pushes content to Azure via a bash script that runs after merges. I doubt that would be the solution inherited by a generic lake command. (a lake push-cache command enters the arena)

Johan Commelin (Dec 20 2022 at 14:41):

I agree that it would be better to have a ml4-get-cache.sh now rather then waiting 3 weeks for a lake get-cache.

Mario Carneiro (Dec 20 2022 at 14:57):

it might still be lake run get_cache though since you probably want that script to integrate with lake in some ways

Shreyas Srinivas (Dec 20 2022 at 17:25):

Makes sense to prioritise mathlib.
A question out of curiosity: Is there a solution to this issue along the lines of make?

Shreyas Srinivas (Dec 20 2022 at 17:32):

Something along the following lines of calling make from inside the lakefile

Shreyas Srinivas (Dec 20 2022 at 17:36):

then calling lean inside the make files with the target being the corresponding object file. I don't know enough about lake to make a meaningful conclusion about this possibility, hence the question.

Arthur Paulino (Dec 20 2022 at 17:40):

Lake scripts run in IO already. It's simpler to write a single Lake script that does it all, even if it calls curl indirectly

Shreyas Srinivas (Dec 20 2022 at 17:42):

Arthur Paulino said:

Lake scripts run in IO already. It's simpler to write a single Lake script that does it all, even if it calls curl indirectly

I am framing the usage of makefiles as a quick and dirty fix. lake being a build tool is likely to have access to ways of calling external commands.

Shreyas Srinivas (Dec 20 2022 at 17:44):

But I see your point. Doing it inside lake scripts might be advantageous

Shreyas Srinivas (Dec 20 2022 at 17:46):

and also the remote cache solution will save the necessity of the first build as well. A few minutes ago I was watching the build process take > 18 minutes on the CI as it was apparently rebuilding every file (taking longer for some files). This seems wasteful.

Arthur Paulino (Dec 20 2022 at 17:49):

I find writing Lean code as easy (if not easier) than handling makefiles. It depends on the programmer, but as far as I can see, a Lake script is just as "quick and dirty" in this sense

Arthur Paulino (Dec 20 2022 at 19:07):

Since nobody has volunteered so far, I will do it as soon as I finish a task I'm currently working on

Scott Morrison (Dec 20 2022 at 22:53):

Please don't let perfect be the enemy of good (any caching solution is better than what we have now), but for discussion:

I'd love to see the following behaviour for a cache retrieving function:

  • Hash the contents of lakefile.lean lake-manifest.json and lean-toolchain (and possibly any information about hardware architecture that makes oleans non-platform independent). Call that the root hash
  • For each lean file, combine the root hash, the hash of each imported file (in the same project), and the hash of the contents of the file, and call that the file hash.
  • Send a request to a server with a list of all the hashes in current build job.
  • Get back build artifacts for some subset (ideally everything, but take what you can get) of those.
  • Build everything else, sending the resulting built artifacts back to the server.

In a zeroth approximation we would do no verification of the submitted build artifacts whatsoever. :-)

Scott Morrison (Dec 20 2022 at 22:54):

(i.e. roll our own minimal nix)

Arthur Paulino (Dec 20 2022 at 23:27):

I haven't started yet, but I am wondering if I should really use a Lake script since there isn't a self-contained solution for http requests. I will probably just do it in Python

Patrick Massot (Dec 20 2022 at 23:34):

Can't you just call curl or something?

Arthur Paulino (Dec 20 2022 at 23:34):

I can but I don't know if it would work for Windows users

Arthur Paulino (Dec 20 2022 at 23:35):

(would it?)

Arthur Paulino (Dec 20 2022 at 23:45):

Okay, I'll call curl then. If anything, it can be changed later

Sky Wilshaw (Dec 21 2022 at 00:03):

curl does technically work on Windows, but depends on weird internet explorer stuff (which is susceptible to breakage) if I'm remembering correctly.

Sky Wilshaw (Dec 21 2022 at 00:03):

 curl https://www.google.com
curl : The response content cannot be parsed because the Internet Explorer engine is not available, or Internet Explorer's first-launch configuration is not complete. Specify the UseBasicParsing parameter and try again.
At line:1 char:1
+ curl https://www.google.com
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotImplemented: (:) [Invoke-WebRequest], NotSupportedException
    + FullyQualifiedErrorId : WebCmdletIEDomNotSupportedException,Microsoft.PowerShell.Commands.InvokeWebRequestCommand

Arthur Paulino (Dec 21 2022 at 00:18):

The lakefile will get pretty dirty. Should I make separate executables that can be run with lake exe get-cache or something?

Arthur Paulino (Dec 21 2022 at 00:31):

(I will do it because some possibilities open up and dev time will be shorter)

Arthur Paulino (Dec 21 2022 at 01:30):

Alright, I was able to efficiently hash all relevant Lean files. Can someone point me to a place where I can check how to push and get files to/from the Azure server?

Gabriel Ebner (Dec 21 2022 at 01:45):

I can get you a blob storage on our Azure account. Do you like the name mathlib4cache?

Gabriel Ebner (Dec 21 2022 at 01:46):

Or are you looking for API docs? https://learn.microsoft.com/en-us/rest/api/storageservices/put-blob

Arthur Paulino (Dec 21 2022 at 01:47):

Both actually

Arthur Paulino (Dec 21 2022 at 01:49):

This is the current status of the code btw: https://github.com/leanprover-community/mathlib4/blob/ap/caching/Caching/Utils.lean

Arthur Paulino (Dec 21 2022 at 01:50):

Gabriel Ebner said:

I can get you a blob storage on our Azure account. Do you like the name mathlib4cache?

Please let me know the URL once that's done :pray:

Scott Morrison (Dec 21 2022 at 02:14):

@Arthur Paulino , sorry I didn't suggest this earlier, but could you investigate whether you can use Lake.computeTrace?

Scott Morrison (Dec 21 2022 at 02:14):

Reusing infrastructure from lake, even if we're not planning on modifying lake itself in the short term, seems like a win.

Arthur Paulino (Dec 21 2022 at 02:31):

What does that function do?

Wojciech Nawrocki (Dec 21 2022 at 02:34):

Arthur Paulino said:

I haven't started yet, but I am wondering if I should really use a Lake script since there isn't a self-contained solution for http requests. I will probably just do it in Python

Fwiw Socket.lean has a HTTP client example although the library seems pretty experimental.

Arthur Paulino (Dec 21 2022 at 02:37):

Yeah, curl will make the code more stable and robust though

Gabriel Ebner (Dec 21 2022 at 02:45):

Fwiw Socket.lean has a HTTP client example although the library seems pretty experimental.

I just want to say out loud that this implementation is completely unsuitable for production. It doesn't do TLS, happy eyeballs or even just multiple A records, or really anything one might expect from an HTTP client.

Scott Morrison (Dec 21 2022 at 03:40):

Arthur Paulino said:

What does that function do?

I'm not a lake expert, and I'm struggling to find where all the bits and pieces are, but:

  1. at https://github.com/leanprover/lake/blob/master/Lake/Build/Common.lean#L35, you can see that Lake.computeTrace file where file : FilePath computes something.
  2. The lake README says: "A target's trace is derived from its various inputs (e.g., source file, Lean toolchain, imports, etc.)."

Scott Morrison (Dec 21 2022 at 03:40):

We could either experiment, or try summoning Mac. :-)

Johan Commelin (Dec 21 2022 at 05:16):

Scott Morrison said:

  • Build everything else, sending the resulting built artifacts back to the server.

I do have a bit of an uneasy feeling about this final step. Can we at least require some security check before allowing random people to upload stuff? Because the build artifacts will effectively be executed on our personal computers, so they can do random crazy shit.
I guess it shouldn't be too hard to require the uploader to have write permissions to non-master branches of ml4?

Gabriel Ebner (Dec 21 2022 at 05:55):

I thought that was implicit, we're only going to give upload keys to trusted parties.

Gabriel Ebner (Dec 21 2022 at 05:56):

@Arthur Paulino I've tried to create a new storage account and container, but this keeps failing with "Server encountered an internal error. Please try again after some time." :sigh:

Scott Morrison (Dec 21 2022 at 06:00):

Gabriel Ebner said:

I thought that was implicit, we're only going to give upload keys to trusted parties.

Sounds reasonable. :-) As long as it is possible for us to have uploads from non-central servers, I think we get most of the benefit just by allowing those uploads from maintainers. (Or some similar set.)

Scott Morrison (Dec 21 2022 at 06:00):

I do an awful lot of building branches locally, then waiting for CI to do them again. :-)

Shreyas Srinivas (Dec 21 2022 at 07:32):

Does lean have a git library? AFAIK, Scott's comments are implementable in roughly this way:

  1. Query git for the current commit/staging area and the parent commit. (Git already has the hashes Scott mentions, so why reinvent the wheel)
  2. Query git for the list of files that have changed using the diff output
  3. Check if lakefile.lean has changed.
  4. Rebuild only the changed files. Pull either the local cache (existing build folder) or remote cache for the others

Sebastian Ullrich (Dec 21 2022 at 08:33):

Git does not help you at all with the imports-closed file hash Scott described. Which is the right way to do it. And which indeed should correspond to Lake's trace hash.

Sebastian Ullrich (Dec 21 2022 at 08:46):

Scott Morrison said:

  • Send a request to a server with a list of all the hashes in current build job.
  • Get back build artifacts for some subset (ideally everything, but take what you can get) of those.

So with a passive server these steps would likely be something like curl --parallel <server>/<trace1>.olean -o build/.../foo.olean <server>/<trace2>.olean -o build/.../bar.olean .... Which apparently does the right thing of continuing even after a missing file automatically.

Sebastian Ullrich (Dec 21 2022 at 08:48):

Note that you'll want to cache and download the .ilean files as well, and have to create the corresponding .trace files for Lake to accept the downloaded files.

Scott Morrison (Dec 21 2022 at 09:03):

Having to create the .trace files pretty much decides that we will have to use Lake's hashing, I guess!

Arthur Paulino (Dec 21 2022 at 10:41):

I was enjoying the initial hashing idea because I could cache intermediate hashes in memory. I believe Lake's tracing relies on IO to see if a trace already exists?

Arthur Paulino (Dec 21 2022 at 10:46):

For example, if I want to hash file A that imports B and C but I've already hashed B and C, then I have to compute very little to hash A. Otherwise I compute what I need and cache it for future lookups.

How do I do that with Lake traces?

Arthur Paulino (Dec 21 2022 at 10:52):

I want to do this: https://github.com/leanprover-community/mathlib4/blob/0da94b48cea085f244b1b27f3d2aed81ff1cb21a/Caching/Utils.lean#L32

Arthur Paulino (Dec 21 2022 at 12:00):

Or maybe I can just put/get trace files as well

Henrik Böving (Dec 21 2022 at 12:28):

The only one that actually knows the ins and outs of lake is Mac, you can ping him but if he doesn't answer its pretty much on your own to figure out.

Arthur Paulino (Dec 21 2022 at 13:13):

It's okay. I'm going with the initial solution and it's computing all the hashes blazing fast. Now I just need to sync with Gabriel about URLs and authorizations. The rest of the infra already works well

Arthur Paulino (Dec 21 2022 at 13:14):

This is a small piece of the generated curl command:

curl --parallel https://foo/2391993928140408502.olean -o build/lib/Mathlib/Init/ZeroOne.olean https://foo/2391993928140408502.ilean -o build/lib/Mathlib/Init/ZeroOne.ilean https://foo/2391993928140408502.trace -o build/lib/Mathlib/Init/ZeroOne.trace

Sebastian Ullrich (Dec 21 2022 at 14:11):

Arthur Paulino said:

It's okay. I'm going with the initial solution and it's computing all the hashes blazing fast. Now I just need to sync with Gabriel about URLs and authorizations. The rest of the infra already works well

What's your plan for making Lake accept these files then?

Sebastian Ullrich (Dec 21 2022 at 14:13):

I think the simplest solution would be:

  • add an option to Lake that writes out trace files only, and also deletes old .oleans
  • try and fetch any missing oleans by their trace files
  • have Lake do a regular build to fill in any missing files

Arthur Paulino (Dec 21 2022 at 14:17):

Sebastian Ullrich said:

Arthur Paulino said:

It's okay. I'm going with the initial solution and it's computing all the hashes blazing fast. Now I just need to sync with Gabriel about URLs and authorizations. The rest of the infra already works well

What's your plan for making Lake accept these files then?

I'm pushing the trace files as well

Sebastian Ullrich (Dec 21 2022 at 14:19):

Hah, I guess that works!

Arthur Paulino (Dec 21 2022 at 14:26):

In a certain way, there are advantages to allowing mathlib maintainers to have a hashing algorithm that's agnostic to how Lake computes its hashes.

Also, I didn't look too closely but I got a little skeptical about Lake hashing files as Scott described (which I too think is the right way for our purposes). I couldn't find the piece of code that digs into the content of a source file and hashes that file according to the content of the imported files.

Sebastian Ullrich (Dec 21 2022 at 14:39):

That's Module.recBuildLean

Sebastian Ullrich (Dec 21 2022 at 14:40):

Arthur Paulino said:

In a certain way, there are advantages to allowing mathlib maintainers to have a hashing algorithm that's agnostic to how Lake computes its hashes.

I don't see how that can be helpful if Lake rebuilds anyway if the recomputed trace does not match the downloaded trace

Arthur Paulino (Dec 21 2022 at 14:46):

That would change the toolchain and thus change the root hash and all the hashes. But the advantage I'm talking about is in terms of freedom of maintenance for adaptation

Sebastian Ullrich (Dec 21 2022 at 14:51):

I don't know what toolchain you are talking about. But using a different hash for remote caching than for local recompilation checks surely is a quickfix at best and not a feature.

Sebastian Ullrich (Dec 21 2022 at 14:52):

Mathlib tooling should not fight against Lake

Arthur Paulino (Dec 21 2022 at 15:13):

Sebastian Ullrich said:

That's Module.recBuildLean

Does it mean that it's tied to building?

Sebastian Ullrich (Dec 21 2022 at 15:23):

Yes, traces are currently generated during building afaik

Arthur Paulino (Dec 21 2022 at 15:34):

Alright, that's what I suspected because it makes so much sense to get the imports from a file by running Lean's frontend instead of fiddling with strings like I'm doing. I will go with the easier workaround for now so we don't have to build in order to know what we need to download

Arthur Paulino (Dec 21 2022 at 15:37):

In other words, I'm not dealing with stuff like

import
Lean.Data.HashMap

Which is accepted by Lean

Sebastian Ullrich (Dec 21 2022 at 16:04):

No, parsing imports is separate from building the actual file. How would it ever short-circuit builds otherwise? You're looking for docs4#Lean.parseImports'.

Arthur Paulino (Dec 21 2022 at 17:01):

Thanks!!

Arthur Paulino (Dec 23 2022 at 10:43):

Hi all :wave:
Hashing is working efficiently and (apparently) correctly as Scott firstly described. The last piece of the puzzle is performing the actual requests, for which I'm needing some help :pray:🏼

Sebastian Ullrich (Dec 23 2022 at 12:48):

Our curl templates above didn't look too bad to me, what's still missing?

Arthur Paulino (Dec 23 2022 at 13:35):

The curl for get is formatted like that already, but it's missing the correct URL and possibly parameters to query from Azure (which I've never done before).

Now, for put, I don't know how to format it. The only thing I know is that I need to get the auth key from a file like azure.key or something

Sebastian Ullrich (Dec 23 2022 at 15:23):

I doubt you'll find many people on here with experience in Azure blob storage :) . But according to the docs Gabriel posted above, you mostly just need to set a few headers using -H

Arthur Paulino (Dec 23 2022 at 23:42):

Is there a dummy URL I can use for testing?

Arthur Paulino (Dec 27 2022 at 01:47):

Okay, I've finally found the time to finish it: mathlib4#1230

This PR implements caching for Mathlib build files. The basic API is:

  • lake exe cache get to download missing build files
  • lake exe cache put to upload files that are missing on the server
  • lake exe cache to print the help menu

This caching method has a few enhancements on top of the solution we had for mathlib in Lean 3:

  1. Source files are content-addressed and build files are referenced by such hashes separately
  2. Minimized download traffic with queries that only pull files that are missing locally
  3. Minimized upload traffic with queries that only push files that are missing on the server
  4. A zip and a set command that make branch switching easier, without required uploads/downloads

Mario Carneiro (Dec 27 2022 at 02:49):

nice christmas present :grinning_face_with_smiling_eyes:

Siddhartha Gadgil (Dec 27 2022 at 09:47):

Does this work for projects using mathlib?

Arthur Paulino (Dec 27 2022 at 10:17):

Not yet. I don't know if you can call lake exe cache on this from a different project. But even if it's possible the code needs adjustments to know whether Mathlib is a directory in the root folder or not

Arthur Paulino (Dec 27 2022 at 10:27):

But I don't want to worry about it for now. If anything, we can change it later. I want to avoid scope creep and get it ready to use for people doing the porting ASAP

Patrick Massot (Dec 27 2022 at 19:59):

Arthur Paulino said:

This caching method has a few enhancements on top of the solution we had for mathlib in Lean 3:

  1. A zip and a set command that make branch switching easier, without required uploads/downloads

I don't understand that item. What is there that wasn't in leanproject?

Eric Wieser (Dec 27 2022 at 20:57):

Perhaps @Arthur Paulino is not aware of lean 3's mk-cache (zip) and local behavior of get-cache (set)?

Arthur Paulino (Dec 27 2022 at 20:58):

Maybe the difference is just that set won't try downloading anything if a cache file isn't there

Eric Wieser (Dec 27 2022 at 21:01):

You can configure that with a flag for lean 3, the default is to try everything

Arthur Paulino (Dec 27 2022 at 21:02):

Then there is no difference on that feature. Let me remove it from the list

Arthur Paulino (Dec 27 2022 at 22:09):

Eric Wieser said:

You can configure that with a flag for lean 3, the default is to try everything ok not really, only with --from-url=example.com/has-no-caches

Oh then it's an upgrade indeed

Eric Wieser (Dec 27 2022 at 22:33):

Probably at best a new feature; I expect the number of people who want a cache but don't want to use ones from the server are counted on one hand

Eric Wieser (Dec 27 2022 at 22:34):

I've already learnt that the current --fallback=none default in lean 3 is wanted by approximately no one (I chose it because it was backwards compatible)

Arthur Paulino (Dec 27 2022 at 23:14):

Yeah it has a very specific use. One example is working on multiple branches on a place without access to the internet

Eric Wieser (Dec 28 2022 at 09:46):

That works fine in lean3, because it still always tries local caches first

Eric Wieser (Dec 28 2022 at 09:47):

The only difference is that you get an error saying "no local cache and no internet connection" instead of "no local cache"

Arthur Paulino (Dec 28 2022 at 21:26):

mathlib4#1230 is ready for review again!

Now it also caches build files for Mathlib4 dependencies: Aesop, Qq and Std. And in order to make that work, I had to implement a solution that made the implementation of caching for projects that import Mathlib a lot easier. So I went ahead and just did it (cc @Siddhartha Gadgil)

@Gabriel Ebner the infra for garbage collection is not ready yet, but I think people can already benefit from it.

Arthur Paulino (Dec 28 2022 at 21:32):

A lot changed so a new review round is super welcome. Tomorrow I will get the infra for garbage collection done (hopefully). It will just add more code, that's why a review is welcome at this point already

Arthur Paulino (Dec 29 2022 at 23:19):

Just finished the implementation of lake exe cache commit, which stores a file that associates the current Git hash with a certain set of uploaded hashes. This will allow us to create garbage collection policies.

This is the output of lake exe cache:

Mathlib4 caching CLI
Usage: cache [COMMAND]

Commands:
  # No priviledge required
  get       Download and decompress linked files missing on the local cache
  get!      Download and decompress all linked files
  mk        Compress non-compressed build files into the local cache
  mk!       Compress build files into the local cache (no skipping)
  set       Decompress linked files
  clear     Delete non-linked files
  clear!    Delete everything on the local cache

  # Priviledge required
  put       Run 'mk' then upload linked files missing on the server
  put!      Run 'mk' then upload all linked files
  commit    Run 'put' then writes a commit on the server
  commit!   Run 'put!' then (over)writes a commit on the server
  collect   TODO

* Linked files refer to local cache files with corresponding Lean sources
* Commands ending with '!' should be used manually, when hot-fixes are needed"

I added the collect to the list but maybe it shouldn't be implemented in Lean

Henrik Böving (Dec 29 2022 at 23:37):

Just a little question, if i understand this correctly everyone can now upload cached .oleans right? Since .oleans can (I believe?) execute code via e.g. initialize or overwriting syntax to some custom elaborator that escapes to IO if I use this cache I am essentially extending attack surface of my machine to every person with push access to mathlib? Not that I want to claim people here have nefarious intentions, it's more of an attack surface thing. I now don't only have to trust myself to not mess stuff up but also dozens (in the future potentially more) people to not mess up themselves yes?

Jireh Loreaux (Dec 29 2022 at 23:38):

Yes, but I think this is only intended to be a temporary measure while we are porting, but maybe I'm mistaken.

Arthur Paulino (Dec 29 2022 at 23:39):

Not everyone can do that. Only people with the token for pushing stuff

Arthur Paulino (Dec 29 2022 at 23:40):

Once this is merged, I'm going to ask Gabriel to revoke the token he gave me for dev/testing purposes

Arthur Paulino (Dec 29 2022 at 23:42):

Ah wait, you mean forcing the mathlib4 bot to push it for me (if I were a malicious person). Yes. That's possible

Arthur Paulino (Dec 30 2022 at 01:28):

On a second thought, malicious code would have a different hash. The only way you can run bad code is if you, yourself, fetch a branch and run it on your PC. But the vulnerability isn't on the hashing scheme. It's a vulnerability that's already present by design

Arthur Paulino (Dec 30 2022 at 01:33):

The only way to make evil spread via this hashing scheme is by having a secret token that would allow you to overwrite olean files with malicious ones, really.

Arthur Paulino (Dec 30 2022 at 01:45):

everyone can now upload cached .oleans, right?

So, just to answer this clearly, no. Pushing data is restricted

Arthur Paulino (Dec 30 2022 at 02:04):

There is ONE genuine vulnerability: a hash collision with some malicious code whose olean was already sitting on the server!

Yaël Dillies (Dec 30 2022 at 06:38):

Nitpick: Isn't it "privilege"? or maybe my French is tricking me.

Arthur Paulino (Dec 30 2022 at 09:49):

No, you're right. I'll fix it, thanks!

Arthur Paulino (Jan 02 2023 at 10:03):

:ping_pong: mathlib#1230 is ready

Gabriel Ebner (Jan 04 2023 at 00:51):

FYI, this command is deployed now. On a mathlib4 checkout, you can run

lake exe cache get

And you should get fresh oleans!

Heather Macbeth (Jan 04 2023 at 01:16):

For those of us who are maintainers or "trusted users": are we supposed to be running some other command after local compiles, in order to upload the oleans we made? I think it was mentioned that that might be part of the workflow?

Heather Macbeth (Jan 04 2023 at 01:17):

Henrik Böving said:

Just a little question, if i understand this correctly everyone can now upload cached .oleans right?

Arthur Paulino said:

Not everyone can do that. Only people with the token for pushing stuff

Arthur Paulino (Jan 04 2023 at 01:26):

Gabriel already set it so that it will be done by a GitHub workflow routine, like it was for mathlib in Lean 3: https://github.com/leanprover-community/mathlib4/blob/35255de060eac3fa22895d074d6a713c42f2752a/.github/workflows/bors.yml#L102

Heather Macbeth (Jan 04 2023 at 01:26):

Even better!

Arthur Paulino (Jan 04 2023 at 01:29):

I haven't tested it on Windows. Please let me know if you find bugs

Mario Carneiro (Jan 04 2023 at 01:29):

I just tried it, it works great right out of the box. The only downside is that std is not cached, so when I pull master and then lake exe cache get it still has to compile all the Std.* files, but none of the Mathlib files

Arthur Paulino (Jan 04 2023 at 01:31):

It caches Std, but for some reason it does that quick listing that only takes like 10 seconds

Arthur Paulino (Jan 04 2023 at 01:31):

I couldn't figure out why

Mario Carneiro (Jan 04 2023 at 01:33):

I ran lake build after lake exe cache get and it seemed to compile Std, it took 15 s and did not list any Compiling Mathlib lines

Arthur Paulino (Jan 04 2023 at 01:34):

Right, but how long would it take to go over the Std files without the cache? I think it would take a lot more

Arthur Paulino (Jan 04 2023 at 01:34):

I suspect Lake is recomputing the traces for some reason I couldn't figure out

Mario Carneiro (Jan 04 2023 at 01:35):

I see

Arthur Paulino (Jan 04 2023 at 01:39):

I really hope it works on Windows because that would be hard for a Linux user to debug :fear:

Gabriel Ebner (Jan 04 2023 at 01:53):

it still has to compile all the Std.* files

While it has to Compile (cc) all files, it no longer has to Build (lean) them. The binaries are platform-specific, so we didn't cache them.

Siddhartha Gadgil (Jan 04 2023 at 02:04):

It works beautifully for mathlib, but I haven't yet tried it for a project using mathlib as a dependency. For that will I have to add this to my lakefile?

lean_exe cache where
  root := `Cache.Main

Arthur Paulino (Jan 04 2023 at 02:05):

I think you can just call lake exe cache get. It works for LSpec, which allows us to call lake exe lspec from any repo that uses it

Gabriel Ebner (Jan 04 2023 at 02:19):

You can call lake exe cache get but it fails badly:

././lake-packages/mathlib/././Cache/Requests.lean:7:0: error: no such file or directory (error code: 2)
  file: lake-packages/Mathlib/lakefile.lean

Arthur Paulino (Jan 04 2023 at 02:21):

I will take a look at it tomorrow, but this is not looking pretty. It seems to be failing to compile the cache executable

Gabriel Ebner (Jan 04 2023 at 02:22):

I think this is just Mathlib vs. mathlib. I think it should be enough to lowercase the mathlibDepPath.

Arthur Paulino (Jan 04 2023 at 02:23):

Gabriel Ebner said:

it still has to compile all the Std.* files

While it has to Compile (cc) all files, it no longer has to Build (lean) them. The binaries are platform-specific, so we didn't cache them.

I see now. It's because of the runLinter that's being compiled to a binary when we say lake build right? It relies on some Std files

Mario Carneiro (Jan 04 2023 at 02:29):

I can confirm that the result of all those Compiling lines was Linking runLinter

Mario Carneiro (Jan 04 2023 at 02:30):

I think runLinter should probably be a bit more choosy about parts of Std it imports though

Mario Carneiro (Jan 04 2023 at 02:31):

oh I see, it imports Mathlib.Data.Array.Defs and of course mathlib files love to import Std

Mario Carneiro (Jan 04 2023 at 02:31):

lol, Mathlib.Data.Array.Defs is completely empty except for import Std

Mario Carneiro (Jan 04 2023 at 02:36):

mathlib4#1322

Arthur Paulino (Jan 04 2023 at 12:12):

mathlib4#1326 should fix the issue related to cache get when Mathlib4 is used as a dependency.

Btw, I find it confusing that there are Mathlib (lib name), mathlib4 (repo name) and mathlib (package name). What if they were all just Mathlib?

Floris van Doorn (Jan 04 2023 at 12:15):

trying this on Windows gives me

Floris@MSI MINGW64 /d/projects/mathlib4 ((f46ac6ed...))
$ lake exe cache get
info: downloading component 'lean'
info: installing component 'lean'
warning: manifest out of date: package directory changed, use `lake update` to update
info: updating .\./lake-packages\std to revision 2919713bde15d55e3ea3625a03546531283bcb54
info: updating .\./lake-packages\aesop to revision c4477a2a7931e2490339d8087f599a45e89f25e7
info: Building Cache.IO
info: Building Cache.Hashing
info: Compiling Cache.IO
info: Compiling Cache.Hashing
info: Building Cache.Requests
info: Building Cache.Main
info: Compiling Cache.Requests
info: Compiling Cache.Main
info: Linking cache.exe
Attempting to download 787 file(s)
uncaught exception: unspecified system_category error (error code: 206)

Arthur Paulino (Jan 04 2023 at 12:22):

Hmm, seems like IO errors aren't well defined for Windows. Do you have curl and tar on your PATH?

Floris van Doorn (Jan 04 2023 at 12:23):

yes

Floris@MSI MINGW64 /d/projects/mathlib (simps_aux_attr)
$ curl --version
curl 7.64.0 (x86_64-w64-mingw32) libcurl/7.64.0 OpenSSL/1.1.1a (Schannel) zlib/1.2.11 libidn2/2.1.1 nghttp2/1.36.0
Release-Date: 2019-02-06
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp smtp smtps telnet tftp
Features: AsynchDNS IDN IPv6 Largefile SSPI Kerberos SPNEGO NTLM SSL libz TLS-SRP HTTP2 HTTPS-proxy MultiSSL Metalink

Floris@MSI MINGW64 /d/projects/mathlib (simps_aux_attr)
$ tar --version
tar (GNU tar) 1.31
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by John Gilmore and Jay Fenlason.

Arthur Paulino (Jan 04 2023 at 12:25):

I think it's best to register that as an issue so maybe someone else can jump in and help us. I don't know what's happening with that error message :sad:

Mauricio Collares (Jan 04 2023 at 12:30):

"The filename or extension is too long". I guess there's a maximum number of chars that can be passed as arguments?

Reid Barton (Jan 04 2023 at 12:38):

Windows has a very low command line length limit--is that the issue? Is curl getting invoked with 787 arguments?

Reid Barton (Jan 04 2023 at 12:39):

(or it used to, at any rate--I don't really know anything about Windows)

Ruben Van de Velde (Jan 04 2023 at 12:43):

Also a low path length limit, depending on the API you use, as I recall. Also don't know much about windows :)

Reid Barton (Jan 04 2023 at 12:44):

https://learn.microsoft.com/en-us/troubleshoot/windows-client/shell-experience/command-line-string-limitation says 8KB. So that's definitely less than the length of 787 URLs (or even filenames).

Floris van Doorn (Jan 04 2023 at 12:45):

I think one issue is that the curl I got with minGW is very old, and doesn't support --parallel yet.

Floris van Doorn (Jan 04 2023 at 12:45):

I'm now updating it

Reid Barton (Jan 04 2023 at 12:48):

Apparently there is curl -K (or curl --config) to specify the URLs to download with a file

Arthur Paulino (Jan 04 2023 at 12:52):

I'm going to use that for downloads and (hopefully) for uploads

Floris van Doorn (Jan 04 2023 at 12:53):

If I manually run the curl command that Cache uses, I indeed get the error
bash: /c/ProgramData/chocolatey/bin/curl: Argument list too long

Sebastian Ullrich (Jan 04 2023 at 13:32):

Ugh, it looks like libc++ doesn't correctly handle Windows error codes... high time to rewrite the Lean runtime in Lean

Raghuram (Jan 04 2023 at 14:13):

Arthur Paulino said:

mathlib4#1326 should fix the issue related to cache get when Mathlib4 is used as a dependency.

Why was this pull request closed?

Arthur Paulino (Jan 04 2023 at 14:13):

Because I'm including more fixes on a single PR

Arthur Paulino (Jan 04 2023 at 15:03):

Okay, mathlib #1333 should fix these issues

Yaël Dillies (Jan 04 2023 at 15:11):

Yeah, this is an optimisation I've been greatly looking forward to. But note that mathlib already reuses cache from previous pushes.

Yaël Dillies (Jan 04 2023 at 15:12):

The real difference is whether caches are still invalidated transitively or not.

Arthur Paulino (Jan 04 2023 at 15:13):

Yeah but they were addressed by commit hashes, so a new commit gets nothing from the cache and the building process will have to chew everything again

Arthur Paulino (Jan 04 2023 at 15:15):

Yaël Dillies said:

The real difference is whether caches are still invalidated transitively or not.

That still holds because the hash of a file involves the hash of all files imported by it

Yaël Dillies (Jan 04 2023 at 15:34):

So is there any optimisation compared to mathlib CI besides Lean 4s generally better performance?

Eric Wieser (Jan 04 2023 at 15:35):

So is there any optimisation compare to mathlib CI besides

Yes, mathlib CI does a worse job of merge commits

Yaël Dillies (Jan 04 2023 at 15:36):

Right. And anything more?

Eric Wieser (Jan 04 2023 at 15:37):

I would guess that downloading is faster, as if you modify data.set.basic it sounds like the new system won't attempt to download any downstream oleans. Of course, that has to fight against the advantage of the tarball combining many downloads into one.

Yaël Dillies (Jan 04 2023 at 15:38):

Oh, and that also holds for the github CI?

Eric Wieser (Jan 04 2023 at 15:38):

By github CI I assume you mean mathlib3 CI?

Eric Wieser (Jan 04 2023 at 15:39):

Mathlib3 CI is basically a leanproject get-cache --fallback=download-first but following a different parent commit on merges

Yaël Dillies (Jan 04 2023 at 15:39):

Yes, as opposed to leanproject get-c run from a local terminal.

Eric Wieser (Jan 04 2023 at 15:41):

leanproject get-c is stupid and only does an exact match (if you don't like that, pass --fallback)

Eric Wieser (Jan 04 2023 at 15:43):

In terms of the resulting .olean files and ignoring network traffic, --fallback=download-first is essentially the same as what it sounds like @Arthur Paulino has implemented, but the new version does much better with merge commits

Eric Wieser (Jan 04 2023 at 15:44):

(the stupidness of get-c is deliberate; it means that if you're checking out a random commit from mathlib, you get an error if you try to get the cache for a commit which was skipped over due to bors batching)

Eric Wieser (Jan 04 2023 at 15:46):

There is a poll to change the default

Arthur Paulino (Jan 04 2023 at 15:50):

Btw I don't take credit for the hashing scheme. Scott was the one who came up with it. I just implemented it

Eric Wieser (Jan 04 2023 at 15:54):

I would have implemented it in mathlib3, but the problem is that the hashing is all internal to Lean and difficult to expose to python

Arthur Paulino (Jan 04 2023 at 15:57):

Oh, I didn't use the hashing that Lake does btw. That would require some refactor in Lake because Lake only outputs traces during the building process and it wouldn't make sense to build in order to know the hashes. What I did use was Lean's capabilities to hash strings and to combine hashes into new ones

Arthur Paulino (Jan 04 2023 at 18:45):

@Floris van Doorn mathlib4#1333 has been merged. Can you try again please?

Gabriel Ebner (Jan 04 2023 at 18:52):

Eric Wieser said:

So is there any optimisation compare to mathlib CI besides

Yes, mathlib CI does a worse job of merge commits

Notably, the new cache also works with non-merge commits (i.e. squashes like bors does).

Kevin Buzzard (Jan 04 2023 at 19:04):

buzzard@buster:~/lean-projects/mathlib4$ lake exe cache get
Attempting to download 582 file(s)
Decompressing cache
uncaught exception:
gzip: stdin: unexpected end of file
tar: Child returned status 1
tar: Error is not recoverable: exiting now

buzzard@buster:~/lean-projects/mathlib4$ lake exe cache get
Attempting to download 432 file(s)
Decompressing cache
uncaught exception:
gzip: stdin: unexpected end of file
tar: Child returned status 1
tar: Error is not recoverable: exiting now

Ubuntu 22.04, lake Lake version 4.1.0-pre (Lean version 4.0.0-nightly-2022-12-23)

Gabriel Ebner (Jan 04 2023 at 19:05):

Does rm -r .cache help?

Kevin Buzzard (Jan 04 2023 at 19:06):

buzzard@buster:~/lean-projects/mathlib4$ rm -r .cache
buzzard@buster:~/lean-projects/mathlib4$ lake exe cache get
Attempting to download 790 file(s)
Decompressing cache
uncaught exception:
gzip: stdin: unexpected end of file
tar: Child returned status 1
tar: Error is not recoverable: exiting now

Kevin Buzzard (Jan 04 2023 at 19:07):

Oh!

$ git status
On branch master
Your branch is up-to-date with 'origin/master'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
    modified:   lake-manifest.json

no changes added to commit (use "git add" and/or "git commit -a")
buzzard@buster:~/lean-projects/mathlib4$

No idea why that has happened.

Kevin Buzzard (Jan 04 2023 at 19:08):

I didn't modify any json file as far as I know.

Kevin Buzzard (Jan 04 2023 at 19:10):

I have stuff like

@@ -34,21 +10,9 @@
   {"git":
    {"url": "https://github.com/JLimperg/aesop",
     "subDir?": null,
-    "rev": "c4477a2a7931e2490339d8087f599a45e89f25e7",
+    "rev": "72610cec1cd686884a08704f5fa91b45262136f9",
     "name": "aesop",
     "inputRev?": "master"}},
-  {"git":
-   {"url": "https://github.com/hargonix/LeanInk",
-    "subDir?": null,
-    "rev": "2447df5cc6e48eb965c3c3fba87e46d353b5e9f1",
-    "name": "leanInk",
-    "inputRev?": "doc-gen"}},
-  {"git":
-   {"url": "https://github.com/xubaiw/Unicode.lean",
-    "subDir?": null,
-    "rev": "6dd6ae3a3839c8350a91876b090eda85cf538d1d",
-    "name": "Unicode",
-    "inputRev?": "main"}},

in the diffs.

Gabriel Ebner (Jan 04 2023 at 19:10):

That sounds like you did a lake update (which you should only do if you're preparing a bump PR).

Gabriel Ebner (Jan 04 2023 at 19:10):

Just a hunch, does this print anything:

file .cache/* | grep -v gzip

Arthur Paulino (Jan 04 2023 at 19:12):

There is a slight possibility that I have pushed corrupted tar files during dev attempts :explosion:

Arthur Paulino (Jan 04 2023 at 19:12):

But it can be fixed with put!

Edit: no need... lake exe cache get worked on master (on my machine) just now

Arthur Paulino (Jan 04 2023 at 19:24):

Another possibility: maybe this validation if flawed/flaky

Gabriel Ebner (Jan 04 2023 at 19:31):

This is a bit concerning.. I was just running curl --parallel manually to download all of the blobs. Some were truncated, and curl got stuck in the middle. :fear:

Arthur Paulino (Jan 04 2023 at 19:40):

If it can get stuck on downloads, there's nothing stopping it from getting stuck during uploads

Gabriel Ebner (Jan 04 2023 at 19:42):

Uh-oh, when we cancel a build that probably kills curl. :scream:

Arthur Paulino (Jan 04 2023 at 19:43):

Can it be set on a separate workflow that never gets killed?

Gabriel Ebner (Jan 04 2023 at 19:47):

Apparently, if: always() (which we set for the upload) prevents cancellation.

Arthur Paulino (Jan 04 2023 at 19:51):

Idea: doing a raw build with rm -rf .cache && lake clean && rm -rf lake-packages && lake exe cache get && lake build right after uploading everything could spot flawed uploads

Arthur Paulino (Jan 04 2023 at 19:53):

If that fails, a lake exe cache put! is needed. Then a new verification round is needed. Loop until it goes through

Gabriel Ebner (Jan 04 2023 at 19:53):

So far we haven't had any issues with uploads, and that check would not even detect missing uploads (it would just build a bit longer).

Gabriel Ebner (Jan 04 2023 at 19:54):

Silently doing a lake exe cache put! seems like a bad idea, this would just hide hash collisions.

Arthur Paulino (Jan 04 2023 at 19:59):

Then maybe it's a manual get! that may save the day for problematic downloads, since get will skip (supposedly corrupted) files that are already on the FS. I mean, that's precisely the role of get!

Gabriel Ebner (Jan 04 2023 at 20:04):

This sounds good:

       --remove-on-error
              When curl returns an error when told to save output in  a  local
              file,  this  option removes that saved file before exiting. This
              prevents curl from leaving a partial file in the case of an  er
              ror during transfer.

Gabriel Ebner (Jan 04 2023 at 20:05):

I have no idea if this would help with the 404 pages:

       -f, --fail
              (HTTP) Fail fast with no output at all on server errors. This is
              useful to enable scripts and users to better  deal  with  failed
              attempts. In normal cases when an HTTP server fails to deliver a
              document, it returns an HTML document stating  so  (which  often
              also  describes  why and more). This flag will prevent curl from
              outputting that and return error 22.

Arthur Paulino (Jan 04 2023 at 20:06):

Oh wow, so we may be able to get rid of that ugly ByteArray check

Mauricio Collares (Jan 04 2023 at 20:07):

50 parallel transfers could be too much for some connections. Maybe pass --parallel-max 8 as well if it doesn't make things that much slower?

Mauricio Collares (Jan 04 2023 at 20:08):

There's also a --retry N option

Arthur Paulino (Jan 04 2023 at 20:12):

Amazing, there are multiple possibilities for workarounds. @Gabriel Ebner do you want to go ahead and implement the solution you think is the best? Or do you want me to do so? In the later, I would rather being told the workaround you think will work the best

Gabriel Ebner (Jan 04 2023 at 20:19):

There's only issue here:

curl version 7.83.0 was released on April 27 2022

And ubuntu 22.04 still contains 7.81..

Arthur Paulino (Jan 04 2023 at 20:19):

There is a flexible solution that would allow extra args to be passed to curl.
For example lake exe cache get --parallel-max 8 --retry 3. But that's not very friendly

Arthur Paulino (Jan 04 2023 at 20:21):

Although the --remove-on-error and the -f flags look interesting regardless so maybe we can simply always use those

Gabriel Ebner (Jan 04 2023 at 20:21):

The problem is that --remove-on-error was added in 7.83.

Arthur Paulino (Jan 04 2023 at 20:25):

Still, I think -f with a check on exitCode == 0 should warn the user to try a get again

Arthur Paulino (Jan 04 2023 at 20:26):

(I'm assuming -f makes curl return something !=0 in case of failure)

Gabriel Ebner (Jan 04 2023 at 20:29):

No --fail does nothing of the sort, it will just ignore the errors. But apparently it does exactly what we want, it doesn't save 404 responses!

Arthur Paulino (Jan 04 2023 at 20:31):

I see. So we can get rid of that ugly ByteArray.startsWith check. What about extra parameters like retry and max-parallel?

Gabriel Ebner (Jan 04 2023 at 20:36):

We have no indication that max-parallel would fix any issues, so I'd leave it at the default value. Same with retry.

Arthur Paulino (Jan 04 2023 at 20:37):

But I mean, allowing the user to send extra custom arguments to curl

Gabriel Ebner (Jan 04 2023 at 20:39):

Oh, that seems reasonable enough.

Sebastian Ullrich (Jan 04 2023 at 20:41):

You actually need a pretty high degree of parallelism to saturate HTTP 2.0 multiplexing when downloading many small files https://github.com/NixOS/nix/issues/5118

Reid Barton (Jan 04 2023 at 20:46):

curl --fail does return a nonzero exit code on HTTP errors

Gabriel Ebner (Jan 04 2023 at 20:50):

Though we need to ignore that exit code since 404 is expected.

Floris van Doorn (Jan 04 2023 at 21:51):

Arthur Paulino said:

Floris van Doorn mathlib4#1333 has been merged. Can you try again please?

There is progress, I now get no error:

Floris@MSI MINGW64 /d/projects/Mathlib4 ((c0e8ec73...))
$ lake exe cache get
info: downloading component 'lean'
info: installing component 'lean'
warning: manifest out of date: package directory changed, use `lake update` to update
info: Building Cache.IO
info: Compiling Cache.IO
info: Building Cache.Hashing
info: Compiling Cache.Hashing
info: Building Cache.Requests
info: Compiling Cache.Requests
info: Building Cache.Main
info: Compiling Cache.Main
info: Linking cache.exe
Attempting to download 789 file(s)
Decompressing cache

However, doing lake build immediately afterwards still recompiles everything. This takes about ~15 minutes for all of Std+Aesop+Mathlib, so I don't think it's using any caches.

Floris van Doorn (Jan 04 2023 at 21:52):

(git status shows no changes)

Henrik Böving (Jan 04 2023 at 22:16):

Just a little remark, on my Linux machine everything is work correctly ootb :thumbs_up:

Would be a shame if lake was smart enough to notice the windows/linux difference here...

Arthur Paulino (Jan 04 2023 at 22:16):

@Floris van Doorn your manifest is out of date, meaning that its content may not be equivalent to the one used to make the hashing

Arthur Paulino (Jan 04 2023 at 22:17):

(the content of lake-manifest.json is involved in the root hash, which, in turn, is in the mix of the hash of all files)

Henrik Böving (Jan 04 2023 at 22:18):

Should I add this to the doc-gen4 CI?

Arthur Paulino (Jan 04 2023 at 22:19):

What do you mean? (add what?)

Henrik Böving (Jan 04 2023 at 22:21):

Add the call to the cache to the doc-gen4 CI

Arthur Paulino (Jan 04 2023 at 22:21):

Ah, like a mini tutorial?

Gabriel Ebner (Jan 04 2023 at 22:22):

warning: manifest out of date: package directory changed, use lake update to update

@Floris van Doorn Do you get this every time on windows? I wonder if this is platform-independent?

 "packagesDir": "./lake-packages",

Gabriel Ebner (Jan 04 2023 at 22:23):

@Floris van Doorn Can you do ls .cache and check if any caches got downloaded at all? I wouldn't be surprised if hash filePath gives a different result on windows than on linux/macos.

Henrik Böving (Jan 04 2023 at 22:24):

Arthur Paulino said:

Ah, like a mini tutorial?

no, doc-gen4's CI is responsible for building docs4#Nat so if I would use the cache I could basically shave 15 minutes off each CI run since I don't have to recompile

Arthur Paulino (Jan 04 2023 at 22:24):

Okay, that's the catch! It certainly will. Let me fix that ASAP

Arthur Paulino (Jan 04 2023 at 22:25):

Henrik Böving said:

Arthur Paulino said:

Ah, like a mini tutorial?

no, doc-gen4's CI is responsible for building docs4#Nat so if I would use the cache I could basically shave 15 minutes off each CI run since I don't have to recompile

Oh, definitely worth trying then

Henrik Böving (Jan 04 2023 at 22:26):

image.png
woop

Arthur Paulino (Jan 04 2023 at 22:29):

@Gabriel Ebner mathlib4#1339

Arthur Paulino (Jan 04 2023 at 22:33):

Oh wait, it's breaking

Arthur Paulino (Jan 04 2023 at 22:42):

Okay the PR is fixed but it will take some minutes to build because all the hashes changed

Arthur Paulino (Jan 04 2023 at 23:12):

Okay, it pushed everything and built correctly. Should be ready to merge :tada:

Arthur Paulino (Jan 04 2023 at 23:12):

@Floris van Doorn please try again when you have a chance :pray:

Gabriel Ebner (Jan 05 2023 at 00:29):

Another issue with packages depending on mathlib: the oleans get extracted to build/lib/Mathlib and not lake-packages/mathlib/build/lib/Mathlib.

Arthur Paulino (Jan 05 2023 at 00:36):

I see, -C should do the trick here. Let me fix that

Arthur Paulino (Jan 05 2023 at 01:00):

mathlib4#1341

Arthur Paulino (Jan 05 2023 at 01:05):

(sorry for so many PRs... it's hard to cover all use cases alone)

Siddhartha Gadgil (Jan 05 2023 at 01:52):

Things are still not working for me (on Ubuntu). After I did a lake update, I deleted the .cache folder and ran with this result:

$ lake exe cache get
Attempting to download 789 file(s)
Decompressing 501 file(s)
(base) gadgil@gadgil-XPS-15:~/code/LeanAide$ lake build
Building Mathlib.Init.Data.Nat.Notation
Building Mathlib.Data.KVMap
Building Mathlib.Util.MemoFix
Building Mathlib.Tactic.Relation.Rfl
Building Mathlib.Lean.Meta
Building Mathlib.Mathport.Rename
Building Mathlib.Tactic.Alias
Building Mathlib.Tactic.RunCmd
Building Mathlib.Mathport.Attributes
Building Mathlib.Tactic.LeftRight
Building Mathlib.Lean.LocalContext
Building Mathlib.Util.WhatsNew
error: > LEAN_PATH=././lake-packages/mathlib/build/lib:././lake-packages/Qq/build/lib:././lake-packages/aesop/build/lib:././lake-packages/std/build/lib:./build/lib LD_LIBRARY_PATH=/home/gadgil/.elan/toolchains/leanprover--lean4---nightly-2022-12-23/lib:/home/gadgil/software/z3/build::././lake-packages/mathlib/build/lib /home/gadgil/.elan/toolchains/leanprover--lean4---nightly-2022-12-23/bin/lean -DwarningAsError=true ././lake-packages/mathlib/././Mathlib/Data/KVMap.lean -R ././lake-packages/mathlib/./. -o ././lake-packages/mathlib/build/lib/Mathlib/Data/KVMap.olean -i ././lake-packages/mathlib/build/lib/Mathlib/Data/KVMap.ilean -c ././lake-packages/mathlib/build/ir/Mathlib/Data/KVMap.c
error: stdout:
././lake-packages/mathlib/././Mathlib/Data/KVMap.lean:17:31: error: expected token
error: external command `/home/gadgil/.elan/toolchains/leanprover--lean4---nightly-2022-12-23/bin/lean` exited with code 1
...

Arthur Paulino (Jan 05 2023 at 02:01):

Can you send the link to that repo please?

Gabriel Ebner (Jan 05 2023 at 02:05):

Oh, non-mathlib oleans now get extracted to interesting locations: :smile:

./lake-packages/mathlib/lake-packages/std/build/lib/Std/Tactic/NormCast/Lemmas.olean

Siddhartha Gadgil (Jan 05 2023 at 02:06):

Arthur Paulino said:

Can you send the link to that repo please?

https://github.com/siddhartha-gadgil/LeanAide.git

Siddhartha Gadgil (Jan 05 2023 at 02:07):

Sorry, I should clarify this is on the mathlib4 branch

Siddhartha Gadgil (Jan 05 2023 at 02:07):

And I have not pushed the lake update that breaks things.

Arthur Paulino (Jan 05 2023 at 02:10):

Alright, it will be a lot easier to debug with an actual project at hand

Siddhartha Gadgil (Jan 05 2023 at 02:13):

Checking out the mathlib4 branch in the above and running "lake update" should get to the state I am working with.

Siddhartha Gadgil (Jan 05 2023 at 02:26):

Also, thanks a lot for working on this. Soon over 20 students will be checking out my course repository which depends on mathlib4 and building on various laptops (many fairly basic). And the mathlib4 will be updated periodically. Will help them hugely to have this.

Also I am hoping gitpod/github codespaces will be efficient with this.

Arthur Paulino (Jan 05 2023 at 03:43):

@Siddhartha Gadgil there's been a fix, but something's still weird. I'm taking a look

Arthur Paulino (Jan 05 2023 at 05:03):

The weird behavior was that some cache files weren't being found.

But it's not an issue with caching. It's just that your version of Aesop is not the same one listed on mathlib's manifest. And the hash divergences cascate downstream

Siddhartha Gadgil (Jan 05 2023 at 05:13):

Arthur Paulino said:

The weird behavior was that some cache files weren't being found on the server.

But it's not an issue with caching. It's just that your version of Aesop is not the same one listed on mathlib's manifest. And the hash divergences cascate downstream.

Mathlib's manifest needs to be updated

Thanks. I can test by changing my version of Aesop to match with mathlib (or just use Aesop as a transitive dependency)

Siddhartha Gadgil (Jan 05 2023 at 05:21):

My naive test failed. I will wait for the mathlib manifest update.

Floris van Doorn (Jan 05 2023 at 09:44):

Arthur Paulino said:

Floris van Doorn please try again when you have a chance :pray:

It works now! This is great! I get the files, lake build is very quick, and opening a random mathlib4 file uses the cached files.

Floris van Doorn (Jan 05 2023 at 09:45):

Gabriel Ebner said:

warning: manifest out of date: package directory changed, use lake update to update

Floris van Doorn Do you get this every time on windows? I wonder if this is platform-independent?

 "packagesDir": "./lake-packages",

Yes, I think I'm getting it on every commit. I was just ignoring it now. I can confirm that I'm getting it on the current commit (cfd16dd8).

Floris van Doorn (Jan 05 2023 at 09:45):

Gabriel Ebner said:

Floris van Doorn Can you do ls .cache and check if any caches got downloaded at all? I wouldn't be surprised if hash filePath gives a different result on windows than on linux/macos.

You were correct that on yesterday's attempt no caches got downloaded.

Kevin Buzzard (Jan 05 2023 at 09:49):

I've still never managed to get this working :-( On the machine I'm currently using:

buzzard@brutus:~/lean-projects/mathlib4$ lake exe cache get
info: updating ././lake-packages/aesop to revision 72610cec1cd686884a08704f5fa91b45262136f9
error: unknown executable `cache`
buzzard@brutus:~/lean-projects/mathlib4$ lake --version
Lake version 4.1.0-pre (Lean version 4.0.0-nightly-2022-12-23)
buzzard@brutus:~/lean-projects/mathlib4$ git status
On branch j-loreaux/data.fin.basic
Your branch is up-to-date with 'origin/j-loreaux/data.fin.basic'.

nothing to commit, working tree clean
buzzard@brutus:~/lean-projects/mathlib4$

Mario Carneiro (Jan 05 2023 at 09:50):

You need to merge master into your branch

Floris van Doorn (Jan 05 2023 at 09:50):

@Kevin Buzzard does it work on current origin/master?

Kevin Buzzard (Jan 05 2023 at 09:51):

Aah, this branch is several weeks old so that might well be the issue. Yes it works on master on this machine!

Floris van Doorn (Jan 05 2023 at 09:52):

I think it's good to mention somewhere (mathlib4 README, lake exe cache output) that you need curl >= 7.66.0 to use the --parallel flag. I'm probably not the only one with a 3+ year outdated curl.

Kevin Buzzard (Jan 05 2023 at 09:53):

I don't like that it says "Attempting to download 789 file(s)" and then just sits there, with me not knowing if it's broken or not.

Floris van Doorn (Jan 05 2023 at 09:53):

One further remark: if I (re)run lake exe cache get when VSCode is open, I get an error on Windows.

Floris@MSI MINGW64 /d/projects/Mathlib4 ((cfd16dd8...))
$ lake exe cache get
warning: manifest out of date: package directory changed, use `lake update` to update
No files to download
Decompressing 789 file(s)
uncaught exception: lake-packages/std/build/lib/Std/Data/Array/Init/Lemmas.olean: Can't unlink already-existing object
tar: Error exit delayed from previous errors.

Kevin Buzzard (Jan 05 2023 at 09:55):

I still have the same problem on the other machine though (Ubuntu 20.04, curl 7.68.0)

buzzard@buster:~/lean-projects/mathlib4$ rm -rf .cache/
buzzard@buster:~/lean-projects/mathlib4$ lake exe cache get
Attempting to download 789 file(s)
Decompressing 789 file(s)
uncaught exception:
gzip: stdin: unexpected end of file
tar: Child returned status 1
tar: Error is not recoverable: exiting now

buzzard@buster:~/lean-projects/mathlib4$ git status
On branch master
Your branch is up-to-date with 'origin/master'.

nothing to commit, working tree clean
buzzard@buster:~/lean-projects/mathlib4$ lake --version
Lake version 4.1.0-pre (Lean version 4.0.0-nightly-2023-01-04)
buzzard@buster:~/lean-projects/mathlib4$

Floris van Doorn (Jan 05 2023 at 10:04):

Just checking: git rev-parse HEAD gives cfd16dd8874d0989ff38ccd8b4727b59f17683e7? (i.e. you did pull on the other machine?)

Arthur Paulino (Jan 05 2023 at 10:29):

@Kevin Buzzard try lake exe cache get! if you get an error like that. It looks like you have corrupted tar.gz files

Siddhartha Gadgil (Jan 05 2023 at 13:18):

While lake cache is really useful, I should report some weirdness in a project that has both a direct and transitive dependence on mathlib. I can successfully build https://github.com/siddhartha-gadgil/proofs-and-programs-2023.git

git clone https://github.com/siddhartha-gadgil/proofs-and-programs-2023.git
 cd proofs-and-programs-2023/
 lake exe cache get
 lake build
 rm -rf build/
 lake build

The weird thing is that the first time I try to build it complains about a missing file while building a dependency. If I delete the file and continue the build it resumes and builds fine.

My guess is that this is a hash collision.

Arthur Paulino (Jan 05 2023 at 13:23):

Does LeanAide build after cache get?

Siddhartha Gadgil (Jan 05 2023 at 13:24):

Arthur Paulino said:

Does LeanAide build after cache get?

Yes, for LeanAide cache get is working perfectly

Arthur Paulino (Jan 05 2023 at 13:31):

Does it decompress all files it tries to download? (~790 files)

Kevin Buzzard (Jan 05 2023 at 13:41):

Arthur Paulino said:

Kevin Buzzard try lake exe cache get! if you get an error like that. It looks like you have corrupted tar.gz files

buzzard@buster:~/lean-projects/mathlib4$ git pull
Already up-to-date.
buzzard@buster:~/lean-projects/mathlib4$ git status
On branch master
Your branch is up-to-date with 'origin/master'.

nothing to commit, working tree clean
buzzard@buster:~/lean-projects/mathlib4$ lake exe cache get!
Attempting to download 790 file(s)
Decompressing 790 file(s)
uncaught exception:
gzip: stdin: unexpected end of file
tar: Child returned status 1
tar: Error is not recoverable: exiting now

Siddhartha Gadgil (Jan 05 2023 at 13:49):

Arthur Paulino said:

Does it decompress all files it tries to download? (~790 files)

Yes, and I can even find in lake-packages an .olean file it claims is missing.

Siddhartha Gadgil (Jan 05 2023 at 13:50):

This could be some error in my intermediate dependency (there were some).

Arthur Paulino (Jan 05 2023 at 14:08):

@Kevin Buzzard I have a hunch that curl isn't working as intended on your older machine. It could be tar as well :thinking:

Kevin Buzzard (Jan 05 2023 at 14:58):

buzzard@buster:~/lean-projects$ git clone git@github.com:leanprover-community/mathlib4.git
Cloning into 'mathlib4'...
Warning: Permanently added the RSA host key for IP address '2a0c:5bc0:40:2fff::8c52:7904' to the list of known hosts.
remote: Enumerating objects: 21109, done.
remote: Counting objects: 100% (716/716), done.
remote: Compressing objects: 100% (327/327), done.
remote: Total 21109 (delta 448), reused 598 (delta 380), pack-reused 20393
Receiving objects: 100% (21109/21109), 6.75 MiB | 10.13 MiB/s, done.
Resolving deltas: 100% (14652/14652), done.
buzzard@buster:~/lean-projects$ cd mathlib4
buzzard@buster:~/lean-projects/mathlib4$ lake exe cache get!
info: cloning https://github.com/leanprover/std4 to ././lake-packages/std
info: cloning https://github.com/gebner/quote4 to ././lake-packages/Qq
info: cloning https://github.com/JLimperg/aesop to ././lake-packages/aesop
info: Building Cache.IO
info: Compiling Cache.IO
info: Building Cache.Hashing
info: Compiling Cache.Hashing
info: Building Cache.Requests
info: Compiling Cache.Requests
info: Building Cache.Main
info: Compiling Cache.Main
info: Linking cache
Attempting to download 790 file(s)
Decompressing 790 file(s)
uncaught exception:
gzip: stdin: unexpected end of file
tar: Child returned status 1
tar: Error is not recoverable: exiting now

buzzard@buster:~/lean-projects/mathlib4$ curl --version
curl 7.68.0 (x86_64-pc-linux-gnu) libcurl/7.68.0 OpenSSL/1.1.1f zlib/1.2.11 brotli/1.0.7 libidn2/2.2.0 libpsl/0.21.0 (+libidn2/2.2.0) libssh/0.9.3/openssl/zlib nghttp2/1.40.0 librtmp/2.3
Release-Date: 2020-01-08
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtmp rtsp scp sftp smb smbs smtp smtps telnet tftp
Features: AsynchDNS brotli GSS-API HTTP2 HTTPS-proxy IDN IPv6 Kerberos Largefile libz NTLM NTLM_WB PSL SPNEGO SSL TLS-SRP UnixSockets
buzzard@buster:~/lean-projects/mathlib4$ tar --version
tar (GNU tar) 1.30
Copyright © 2017 Free Software Foundation, Inc.
Licence GPLv3+: GNU GPL version 3 or later <https://gnu.org/licences/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by John Gilmore and Jay Fenlason.
buzzard@buster:~/lean-projects/mathlib4$

This is an Ubuntu 20.04 laptop. Has anyone else got it working on Ubuntu 20.04? The Ubuntu 22 machine I was using this morning was working fine.

Floris van Doorn (Jan 05 2023 at 15:27):

I also have Ubuntu 20.04, and I'm getting the same error indeed.

Floris van Doorn (Jan 05 2023 at 15:28):

I think curl is indeed misbehaving, my .cache folder seems to have a lot of empty files.

vandoorn@pc93-178:~/projects/mathlib4((HEAD detached at origin/master))$ ls .cache/ -al
total 73200
drwxrwxr-x  2 vandoorn vandoorn   36864 Jan  5 16:23 .
drwxrwxr-x 13 vandoorn vandoorn    4096 Jan  5 16:23 ..
-rw-rw-r--  1 vandoorn vandoorn  502564 Jan  5 16:23 10004447746176569842.tar.gz
-rw-rw-r--  1 vandoorn vandoorn       0 Jan  5 16:23 10015491352002644549.tar.gz
-rw-rw-r--  1 vandoorn vandoorn   68157 Jan  5 16:23 10018821831230171317.tar.gz
-rw-rw-r--  1 vandoorn vandoorn       0 Jan  5 16:23 1001945985573292759.tar.gz
-rw-rw-r--  1 vandoorn vandoorn       0 Jan  5 16:23 10110306565386213522.tar.gz
-rw-rw-r--  1 vandoorn vandoorn  174259 Jan  5 16:23 10127860650102912405.tar.gz
-rw-rw-r--  1 vandoorn vandoorn       0 Jan  5 16:23 10128865545074468906.tar.gz
-rw-rw-r--  1 vandoorn vandoorn       0 Jan  5 16:23 10130780366951701088.tar.gz
-rw-rw-r--  1 vandoorn vandoorn       0 Jan  5 16:23 10152004466792651860.tar.gz
-rw-rw-r--  1 vandoorn vandoorn  416066 Jan  5 16:23 10152297513229284087.tar.gz
-rw-rw-r--  1 vandoorn vandoorn   25050 Jan  5 16:23 10181612834428050335.tar.gz
-rw-rw-r--  1 vandoorn vandoorn   22944 Jan  5 16:23 10190268219082491724.tar.gz
-rw-rw-r--  1 vandoorn vandoorn   85138 Jan  5 16:23 10213030320477203917.tar.gz
-rw-rw-r--  1 vandoorn vandoorn       0 Jan  5 16:23 10214528561255169977.tar.gz
-rw-rw-r--  1 vandoorn vandoorn       0 Jan  5 16:23 10221609186266928200.tar.gz
-rw-rw-r--  1 vandoorn vandoorn   49324 Jan  5 16:23 10227971936698935467.tar.gz
-rw-rw-r--  1 vandoorn vandoorn  169815 Jan  5 16:23 10237809332798837709.tar.gz
-rw-rw-r--  1 vandoorn vandoorn       0 Jan  5 16:23 10329715650399075732.tar.gz
-rw-rw-r--  1 vandoorn vandoorn  377804 Jan  5 16:23 10342692050553037471.tar.gz
-rw-rw-r--  1 vandoorn vandoorn       0 Jan  5 16:23 10345492726296217978.tar.gz
-rw-rw-r--  1 vandoorn vandoorn  329192 Jan  5 16:23 10358353886519116077.tar.gz
-rw-rw-r--  1 vandoorn vandoorn   40472 Jan  5 16:23 10381171239261229408.tar.gz
-rw-rw-r--  1 vandoorn vandoorn  200427 Jan  5 16:23 10388488594487384490.tar.gz
-rw-rw-r--  1 vandoorn vandoorn       0 Jan  5 16:23 10394739949212461920.tar.gz
-rw-rw-r--  1 vandoorn vandoorn   24468 Jan  5 16:23 10399526667226558710.tar.gz
-rw-rw-r--  1 vandoorn vandoorn       0 Jan  5 16:23 10423936910949293364.tar.gz
[...]

Arthur Paulino (Jan 05 2023 at 15:31):

@Kevin Buzzard I just had that problem and then a get! overwrote a file that wasn't downloaded properly.

What happens if you do this?
curl -X GET -f "https://lakecache.blob.core.windows.net/mathlib4/foo.bar" -o please.no

We expect curl not to create that please.no file on your file system

Kevin Buzzard (Jan 05 2023 at 15:33):

buzzard@buster:~/lean-projects/mathlib4/crap$ curl -X GET -f "https://lakecache.blob.core.windows.net/mathlib4/foo.bar" -o please.no
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
curl: (22) The requested URL returned error: 404 The specified blob does not exist.
buzzard@buster:~/lean-projects/mathlib4/crap$ ls
buzzard@buster:~/lean-projects/mathlib4/crap$

Kevin Buzzard (Jan 05 2023 at 18:04):

https://www.linuxcapable.com/how-to-install-upgrade-curl-on-ubuntu-20-04-lts/ fixed the problem for me, although I'm now running curl as modified by some random person on the internet

Eric Wieser (Jan 05 2023 at 21:42):

I get gzip: stdin: unexpected end of file on gitpod (Ubuntu 20.04.5 LTS) too, with curl version

curl 7.68.0 (x86_64-pc-linux-gnu) libcurl/7.68.0 OpenSSL/1.1.1f zlib/1.2.11 brotli/1.0.7 libidn2/2.2.0 libpsl/0.21.0 (+libidn2/2.2.0) libssh/0.9.3/openssl/zlib nghttp2/1.40.0 librtmp/2.3
Release-Date: 2020-01-08
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtmp rtsp scp sftp smb smbs smtp smtps telnet tftp
Features: AsynchDNS brotli GSS-API HTTP2 HTTPS-proxy IDN IPv6 Kerberos Largefile libz NTLM NTLM_WB PSL SPNEGO SSL TLS-SRP UnixSockets

Eric Wieser (Jan 05 2023 at 21:45):

Following the instructions from the random person on the internet, I get the same error with the new version,

$ curl --version
curl 7.87.0 (x86_64-pc-linux-gnu) libcurl/7.87.0 OpenSSL/1.1.1f zlib/1.2.11 brotli/1.0.7 zstd/1.4.4 libidn2/2.2.0 libpsl/0.21.0 (+libidn2/2.2.0) libssh/0.9.3/openssl/zlib nghttp2/1.40.0 librtmp/2.3
Release-Date: 2022-12-21
Protocols: dict file ftp ftps gopher gophers http https imap imaps ldap ldaps mqtt pop3 pop3s rtmp rtsp scp sftp smb smbs smtp smtps telnet tftp
Features: alt-svc AsynchDNS brotli GSS-API HSTS HTTP2 HTTPS-proxy IPv6 Kerberos Largefile libz NTLM NTLM_WB PSL SPNEGO SSL threadsafe TLS-SRP UnixSockets zstd

Arthur Paulino (Jan 05 2023 at 21:48):

Have you tried get! to overwrite corrupted downloads?

Eric Wieser (Jan 05 2023 at 21:49):

Yes, it was get! that was failing

Eric Wieser (Jan 05 2023 at 21:49):

You should be able to debug this yourself if you want by opening gitpod with f1accc07b7ea20968ebf0d0d718e40d2a4528d7e checked out

Arthur Paulino (Jan 05 2023 at 21:59):

How do I access that?

Eric Wieser (Jan 05 2023 at 22:06):

gitpod.io/#https://github.com/leanprover-community/mathlib4, then git fetch origin $THAT_SHA && git checkout $THAT_SHA

Eric Wieser (Jan 05 2023 at 22:07):

You'll need to install the lean4 extension, open a lean file, then accept the popup to install elan

Arthur Paulino (Jan 05 2023 at 22:18):

Alright, I'll try to debug it later today. In the meantime you might want to try get! again

Scott Morrison (Jan 05 2023 at 22:56):

Is this working for others on a mac? I'm just seeing:

% lake exe cache get
Attempting to download 797 file(s)
No cache files to decompress

returning instantly, and then lake build compiles everything as normal.

Arthur Paulino (Jan 05 2023 at 23:03):

Hmm, more hash mismatches

Scott Morrison (Jan 05 2023 at 23:18):

Separately, could we make the cache commit command more robust to build failures? E.g. see https://github.com/leanprover-community/mathlib4/actions/runs/3851007808/jobs/6561764041#step:8:9

Here CI built lots of files, but one failed, but then cache commit didn't upload anything at all.

Arien Malec (Jan 05 2023 at 23:23):

Scott Morrison said:

Is this working for others on a mac? I'm just seeing:

% lake exe cache get
Attempting to download 797 file(s)
No cache files to decompress

returning instantly, and then lake build compiles everything as normal.

I get cache downloads but am seeing separate issues

Gabriel Ebner (Jan 05 2023 at 23:26):

Separately, could we make the cache commit command more robust to build failures?

mathlib4#1336

Arthur Paulino (Jan 05 2023 at 23:35):

But this has nothing to do with commit right? We don't want to persist a commit with a file that didn't compile

Gabriel Ebner (Jan 05 2023 at 23:36):

Yeah, I think Scott just said commit because it's the first command shown by github actions. While it's the second put one that fails.

Arthur Paulino (Jan 05 2023 at 23:38):

Alright, this one should be a simple tweak

Arthur Paulino (Jan 05 2023 at 23:38):

@Eric Wieser I'm afraid I can't do much on that gitpod that doesn't decompress :sad:

Eric Wieser (Jan 05 2023 at 23:59):

What do you mean "that doesn't decompress"?

Eric Wieser (Jan 06 2023 at 00:00):

Do you mean the cache is just bad on that commit?

Arthur Paulino (Jan 06 2023 at 00:00):

No, I just mean that it's failing to decompress

Eric Wieser (Jan 06 2023 at 00:01):

I don't understand what "it" refers to

Arthur Paulino (Jan 06 2023 at 00:01):

The execution of the program on your gitpod

Eric Wieser (Jan 06 2023 at 00:02):

That's not "my" gitpod, that's a fresh gitpod instance spun up each time anyone clicks that link

Eric Wieser (Jan 06 2023 at 00:02):

Do you mean the docker image never started up and you never got to vscode?

Arthur Paulino (Jan 06 2023 at 00:03):

No, I just mean that I don't know what to do with files that may be arriving corrupted on that machine

Gabriel Ebner (Jan 06 2023 at 00:07):

FWIW, I just tried it out on gitpod and all files in .cache were empty.

Eric Wieser (Jan 06 2023 at 00:11):

Arthur Paulino said:

No, I just mean that I don't know what to do with files that may be arriving corrupted on that machine

Oh, 'the program' refers to lake exe cache; I thought you were referring to gitpod itself as "the program"!

Arthur Paulino (Jan 06 2023 at 00:57):

mathlib4#1361

Arthur Paulino (Jan 06 2023 at 01:19):

About Mac issues... it's really hard for me to debug :dizzy:

Arthur Paulino (Jan 06 2023 at 01:20):

I don't have a computer with MacOS installed

Scott Morrison (Jan 06 2023 at 01:41):

@Arthur Paulino, I can give you shell access here if you promise to be nice. :-)

Arthur Paulino (Jan 06 2023 at 01:52):

I would need an instance of VS Code running. I'm not hardcore enough for emacs/neovim

Arthur Paulino (Jan 06 2023 at 01:55):

I'm up for a pair programming session though, so we can try to figure out why the hashes aren't matching on your machine

Arthur Paulino (Jan 06 2023 at 01:55):

Let me know if you have some free time tomorrow

Scott Morrison (Jan 06 2023 at 02:09):

I'm on Canberra time, which can be hard to coordinate with. I'm relatively flexible for the next 10 hours.

Scott Morrison (Jan 06 2023 at 02:10):

Separately, what do we think about giving out cache put privileges to a few people. (In particular, me. :-) I have a big machine, and when I'm in reviewing mode, I do a lot of local compilation. Seems a shame to waste it.)

Scott Morrison (Jan 06 2023 at 02:12):

Very happy to try to diagnose the mac issues via pair programming, but you can in fact run a remote session in VS Code with nothing more than ssh access to the host. I do this all the time when I'm away from my desktop machine.

Scott Morrison (Jan 06 2023 at 02:12):

In particular, the terminal window in VSCode is just a shell on the remote machine.

Gabriel Ebner (Jan 06 2023 at 02:13):

I think lake exe cache is way too flaky for that at the moment. For example it doesn't know if oleans are up to date. If you do echo this wont compile >> Mathlib/Algebra/Abs.lean, then lake exe cache put will upload the old oleans for the modified file.

Arthur Paulino (Jan 06 2023 at 02:33):

@Scott Morrison the advantage of pair programming is that I can run the hashing algorithm on my machine and you can run on yours and we have an optimized/parallelized way to identify the source of divergence

Scott Morrison (Jan 06 2023 at 02:33):

Sounds good. (That can also work with two VSCode windows, one local and one a remote session to my machine.)

Scott Morrison (Jan 06 2023 at 02:33):

I'm free now if it suits.

Arthur Paulino (Jan 06 2023 at 02:35):

It's almost midnight now... I need to sleep. Maybe I can just try the remote access tomorrow. We can set it up later

Xavier Roblot (Jan 06 2023 at 05:11):

Scott Morrison said:

Is this working for others on a mac? I'm just seeing:

% lake exe cache get
Attempting to download 797 file(s)
No cache files to decompress

returning instantly, and then lake build compiles everything as normal.

lake exe cache get works perfectly on my Mac (M1 with OS 13.1 using MacPorts).

Yaël Dillies (Jan 06 2023 at 09:53):

Scott Morrison said:

Separately, what do we think about giving out cache put privileges to a few people.

I see two obvious options:

  • allow all maintainers
  • allow all reviewers

Certainly at this point in time, maintainers should be enough. But once mathlib4 has grown we will probably want reviewers to be in the loop as well.

Arthur Paulino (Jan 06 2023 at 11:06):

Gabriel Ebner said:

I think lake exe cache is way too flaky for that at the moment. For example it doesn't know if oleans are up to date. If you do echo this wont compile >> Mathlib/Algebra/Abs.lean, then lake exe cache put will upload the old oleans for the modified file.

That's true but it and any file that, transitively or not, depends on it, will have some useless hash that should never match with anything anyone will download

Arthur Paulino (Jan 06 2023 at 11:09):

About put privileges, just to say it out loud, I'm not the one who decides that. That's with the maintainers. I only have a key for dev purposes, which will expire in a few days

Arthur Paulino (Jan 06 2023 at 12:05):

@Scott Morrison cache is computing the hashes correctly on your machine but cache get was failing too fast (not even trying to download). So I checked curl --version and it said "7.64.something", which is too old. It's exiting with an error because it can't do --parallel

Yaël Dillies (Jan 06 2023 at 12:42):

I too get Can't unlink already-existing object on Windows from VScode.

Arthur Paulino (Jan 06 2023 at 12:50):

That looks like some IO error from tar

Arthur Paulino (Jan 06 2023 at 12:52):

Are you doing cache get while you can see orange bars of compilation? The orange bars mean that olean files might be in use and tar won't have the permission to overwrite them

Yaël Dillies (Jan 06 2023 at 13:07):

Ah I see. Is there a Lean 4 equivalent to Lean: Check nothing?

Yaël Dillies (Jan 06 2023 at 13:09):

Closing all files seems to solve it.

Arthur Paulino (Jan 06 2023 at 13:23):

Yaël Dillies said:

Ah I see. Is there a Lean 4 equivalent to Lean: Check nothing?

I don't know

Arthur Paulino (Jan 06 2023 at 13:24):

mathlib4#1383 implements a validation on the version of curl
It also removes an unnecessary workaround because of a bug that was fixed in core

Johan Commelin (Jan 06 2023 at 16:50):

Johan Commelin said:

How far are we from lake get-cache? Switching branches and running lake build is quite time consuming.

Since I started this thread, a lot has happened! @Arthur Paulino thank you so much for all the time and energy you've poured into this! It works flawlessly on my side. Great work! :octopus: :tada:

Arthur Paulino (Jan 06 2023 at 16:53):

Don't forget to lake exe cache clean once in a while or your .cache folder will just grow and grow

Yaël Dillies (Jan 06 2023 at 17:31):

Could we have a loading bar when lake downloads the files? It's a bit dead in the chat terminal.

Yaël Dillies (Jan 06 2023 at 17:32):

Also, this Can't unlink already-existing object is quite annoying as the only solution I found so far is closing all Lean files.

Arthur Paulino (Jan 06 2023 at 17:40):

I can try the bar later, but tar won't trample a lock that your OS does

Arthur Paulino (Jan 06 2023 at 17:41):

If you don't want to lose your tabs, you can quit VS Code, do the caching thing and then open VS Code again

Arthur Paulino (Jan 06 2023 at 17:47):

Another thing you can try is pausing the Lean server in the top right corner of the infoview. Maybe that will unlock files that are being read so tar will have the permission to overwrite them

Arthur Paulino (Jan 06 2023 at 17:55):

Please let ppl know if this works:pray:

Gabriel Ebner (Jan 06 2023 at 18:02):

Arthur Paulino said:

Gabriel Ebner said:

I think lake exe cache is way too flaky for that at the moment. For example it doesn't know if oleans are up to date. If you do echo this wont compile >> Mathlib/Algebra/Abs.lean, then lake exe cache put will upload the old oleans for the modified file.

That's true but it and any file that, transitively or not, depends on it, will have some useless hash that should never match with anything anyone will download

Other people will download it if you commit it. This can happen very easily, if you modify a file and then run lake exe cache put without running lake build first.

Sebastian Ullrich (Jan 06 2023 at 18:04):

Deleting an open file is, it turns out, not quite straightforward on Windows: https://github.com/leanprover/lean4/blob/fedf235cba35ed8bf6bf571cf38e6d8536b904ac/src/library/module.cpp#L85-L89. I'm not surprised tar doesn't know about that. It would probably be easier to remove those files manually before invoking tar, though that would require a new runtime function implementing the linked code.

Sebastian Ullrich (Jan 06 2023 at 18:06):

Or rather we should just make IO.FS.removeFile do it

Arthur Paulino (Jan 06 2023 at 18:19):

@Sebastian Ullrich please let me know if/when you implement it so I can upgrade cache

Arien Malec (Jan 06 2023 at 19:02):

On MacOS 13.0.1:

lake exe cache clean!
arienmalec@Ariens-MBP-2 mwe % lake exe cache get
Attempting to download 804 file(s)
Decompressing 804 file(s)
arienmalec@Ariens-MBP-2 mwe % ls ./build/lib
Mathlib.ilean   Mathlib.olean   Mathlib.trace   Mwe.ilean
object file './build/lib/Mathlib/Data/Sum/Basic.olean' of module Mathlib.Data.Sum.Basic does not exist

Arthur Paulino (Jan 06 2023 at 19:25):

Is that a project that imports mathlib?

Arthur Paulino (Jan 06 2023 at 19:33):

I have very little bandwidth to fully support cache for projects that import Mathlib right now, sorry. I wanna focus on the support for people doing the port. Help is very much welcome though, and I'd be willing to answer questions and review PRs if someone wants to debug and fix cache for that use case

Arien Malec (Jan 06 2023 at 19:37):

Gotcha -- yes, was trying to build an MWE space that used Mathlib as a client.

Arthur Paulino (Jan 06 2023 at 19:39):

But FWIW, if your Lake project is properly set to use mathlib as a dependency, cache should extract the olean files to ./lake-packages/..., not ./build/.... And you shouldn't be seeing an error message of a missed olean file from mathlib that is expected to be in ./build. It should be in ./lake-packages/mathlib/build/...

Arien Malec (Jan 06 2023 at 19:43):

Then I assume there's some issue with lack of lake-packages awareness here.

Arthur Paulino (Jan 06 2023 at 19:44):

There's probably something wrong with your lakefile.lean. The #lean4 stream is the ideal place do get help about that

Arien Malec (Jan 06 2023 at 19:47):

Unlikely, since I wiped and recreated from scratch. But again, if using mathlib as a client isn't a current concern, no issue right now.

Yaël Dillies (Jan 07 2023 at 00:30):

Feature request: Now that we download files one by one, could we ask to get cache for a specific file?

Typically I don't need most files to work on a given pull request and it's a waste to download them.

Yaël Dillies (Jan 07 2023 at 00:32):

Arthur Paulino said:

Another thing you can try is pausing the Lean server in the top right corner of the infoview. Maybe that will unlock files that are being read so tar will have the permission to overwrite them

Doesn't work :(

Arthur Paulino (Jan 07 2023 at 08:02):

Yaël Dillies said:

Feature request: Now that we download files one by one, could we ask to get cache for a specific file?

Typically I don't need most files to work on a given pull request and it's a waste to download them.

Nice, I actually thought about that in the very beginning of the development. And now that the code is better structured, it's actually easy to implement it

Arthur Paulino (Jan 07 2023 at 08:20):

Do ilean or olean files keep track of the original paths of their respective Lean sources? I suspect this is probably why cache get isn't working for projects that import mathlib, since the sources for mathlib files will live inside .lake-packages/mathlib instead of .. That would invalidate the olean/ilean files

Siddhartha Gadgil (Jan 07 2023 at 08:44):

It is actually mostly working, with an occasional failure as I reported earlier.

Eric Wieser (Jan 07 2023 at 23:55):

Gabriel Ebner said:

FWIW, I just tried it out on gitpod and all files in .cache were empty.

I now get only about half of the files being empty

Eric Wieser (Jan 07 2023 at 23:57):

(see https://github.com/leanprover-community/mathlib4/pull/1419 for an easy way to startup gitpod to test this)

Arthur Paulino (Jan 08 2023 at 23:02):

Yaël Dillies said:

Feature request: Now that we download files one by one, could we ask to get cache for a specific file?

Typically I don't need most files to work on a given pull request and it's a waste to download them.

Done in mathlib#1430 :+1:

Ruben Van de Velde (Jan 08 2023 at 23:05):

mathlib4#1430

Arthur Paulino (Jan 08 2023 at 23:15):

I made it more powerful than what was requested to avoid multiple cache get commands. Instead, you can request the download of cache files for lists of entire directories

Yaël Dillies (Jan 08 2023 at 23:17):

This is great. I had this in mind too, and then I thought it might be too much to ask :laughing:

Yaël Dillies (Jan 08 2023 at 23:18):

One improvement that would save me a lot of time is if there was an option to tell it to figure out which files to download based on the diff with master. But that would be mathlib-specific, so I'm not sure how it would work.

Mario Carneiro (Jan 08 2023 at 23:22):

Doesn't it already only download the files that you don't have?

Mario Carneiro (Jan 08 2023 at 23:23):

so if you have been to master recently then you will have cache files for it and any that are shared with the branch will be used

Yaël Dillies (Jan 08 2023 at 23:24):

Yes, but if I am working on two level files A and B which don't import each other, then I would like to spare myself writing lake exe cache get A B because A and B will likely be longish names and I will need to get cache maybe 10 times in a day.

Mario Carneiro (Jan 08 2023 at 23:25):

you should get fish shell or something else with good command history

Yaël Dillies (Jan 08 2023 at 23:26):

Ehrm, Gitpod...

Arthur Paulino (Jan 08 2023 at 23:27):

I don't understand the idea. lake exe cache get on master should get you everything you don't have yet

Yaël Dillies (Jan 08 2023 at 23:31):

Yeah, and that's too much. I don't want to download the bazillion files that depend on A or B.

Arthur Paulino (Jan 08 2023 at 23:32):

Then lake exe cache get path/to/A path/to/B

Yaël Dillies (Jan 08 2023 at 23:33):

Yeah, which is slightly too long for me to type it out every time.

Yaël Dillies (Jan 08 2023 at 23:34):

Hence why it would be great if lake could detect that only A and B were touched from master.

Arthur Paulino (Jan 08 2023 at 23:34):

Use tab completion (that's why .lean is part of the file name, as opposed to what you suggested on the PR)

Arthur Paulino (Jan 08 2023 at 23:35):

Or just do lake exe cache get *A.lean *B.lean

Arthur Paulino (Jan 08 2023 at 23:37):

If A is too generic and has a long path to it, like Some/Really/Big/Path/A.lean, then do lake exe cache get *Path/A.lean

Arthur Paulino (Jan 08 2023 at 23:39):

I don't think the complexity you're asking for (parsing git diffs) is worth the effort here. Most people will just do history search on their terminals, when they're not just doing lake exe cache get

Yaël Dillies (Jan 08 2023 at 23:39):

Yes probably

Kevin Buzzard (Jan 08 2023 at 23:40):

This is just ridiculous. The time taken to type path/to/A is surely comparable to the time taken to just download the entire cache! The brilliant cache system means I don't have to wait ten minutes but surely we can wait ten seconds!

Yaël Dillies (Jan 08 2023 at 23:41):

Not if you're downloading it from my student room :grinning_face_with_smiling_eyes:

Kevin Buzzard (Jan 08 2023 at 23:41):

:-/

Arthur Paulino (Jan 08 2023 at 23:43):

The feature is also an attempt to circumvent the flakiness of curl -X GET with less powerful connections, so that was another motivation

Yakov Pechersky (Jan 09 2023 at 01:22):

Another option is lake exe cache get **/*A.lean to recursively go down. Or lake exe cache get $(find . -name "A.lean") or $(find . -name "*A.lean"), depending on your use case. Use the shell to help you! I think it's too much to ask cli utils to replicate all the file directory walking that other more basic utilities do.

Yakov Pechersky (Jan 09 2023 at 01:24):

If you care only (exactly) the files that have been modified wrt to master, then $(git diff --name-only master --diff-filter=M).

Eric Wieser (Jan 09 2023 at 01:34):

Are the semantics of lake exe cache get $fname "get the olean for $fname only" or "get all transitively-imported oleans needed for $fname"?

Eric Wieser (Jan 09 2023 at 01:38):

And isn't lake exe cache get *A.lean expanded by the shell on Unix? If I do something sneaky like put a * in a filename, your code is going to double-expand it

Mario Carneiro (Jan 09 2023 at 01:43):

Eric Wieser said:

If I do something sneaky like put a * in a filename, your code is going to double-expand it

please don't do that

Eric Wieser (Jan 09 2023 at 01:47):

Lean4 still allows anything you like as a module name as long as you import with \f<<>> like lean 3 did, right? (I still agree that we certainly shouldn't do it in mathlib)

Mario Carneiro (Jan 09 2023 at 01:49):

Yes. Note that even Clear!.lean caused some problems even though ! is an identifier character in lean 4. I think it's best to be conservative with filename characters for portability given how many ways people manipulate filenames

Mario Carneiro (Jan 09 2023 at 01:50):

shell scripts are quite often not hardened against this kind of chicanery

Gabriel Ebner (Jan 09 2023 at 01:50):

Eric Wieser said:

And isn't lake exe cache get *A.lean expanded by the shell on Unix? If I do something sneaky like put a * in a filename, your code is going to double-expand it

Some shells also give an error if the wildcard can't be expanded. (I believe there's even a bash option for this.)

Eric Wieser (Jan 09 2023 at 01:50):

I assume @Arthur Paulino is a windows user

Eric Wieser (Jan 09 2023 at 01:51):

Does lean have any way to query "am I running on windows"?

Eric Wieser (Jan 09 2023 at 01:51):

Because based on this comment windows programs are expected to parse their own globs, but unix programs are not

Mario Carneiro (Jan 09 2023 at 01:51):

docs4#System.Platform.isWindows

Gabriel Ebner (Jan 09 2023 at 01:52):

windows programs are expected to parse their own globs

I believe windows programs are also expected to parse quotation marks. I thought both quotation marks and wildcard expansion were already done by libc though?

Eric Wieser (Jan 09 2023 at 01:53):

What does libc refer to on windows?

Mario Carneiro (Jan 09 2023 at 01:53):

windows has a libc

Eric Wieser (Jan 09 2023 at 01:54):

Windows has MSVCRT, doesn't it? Wouldn't libc only be around if you use mingw?

Arthur Paulino (Jan 09 2023 at 01:55):

I'm a linux user btw. Should I just use a different character then? Maybe %?

Mario Carneiro (Jan 09 2023 at 01:55):

https://learn.microsoft.com/en-us/cpp/c-runtime-library/c-run-time-library-reference?view=msvc-170

Eric Wieser (Jan 09 2023 at 01:56):

Arthur Paulino said:

I'm a linux user btw. Should I just use a different character then? Maybe %?

Do your patterns have different semantics to the shell ones?

Eric Wieser (Jan 09 2023 at 01:56):

Because if they don't, then you probably don't need to implement them at all

Mario Carneiro (Jan 09 2023 at 01:56):

libc here meaning the library windows provides for making standards-conformant C programs (i.e. the ones with a main() function instead of WinMain()) run

Arthur Paulino (Jan 09 2023 at 01:56):

Eric Wieser said:

Because if they don't, then you probably don't need to implement them at all

Nope... then I can just match file names as they are?

Arthur Paulino (Jan 09 2023 at 01:57):

I mean, exact matches

Mario Carneiro (Jan 09 2023 at 01:57):

yes, on linux if you call foo *.lean the shell will actually call foo A.lean B.lean if those are the files in the directory

Eric Wieser (Jan 09 2023 at 01:57):

Gabriel Ebner said:

windows programs are expected to parse their own globs

I believe windows programs are also expected to parse quotation marks. I thought both quotation marks and wildcard expansion were already done by libc though?

By default, wildcards aren't expanded in command-line arguments. I don't know if Lean 4 is compiled with the opt-in parsing.

Arthur Paulino (Jan 09 2023 at 01:58):

*SomeFile.lean did not work on my shell ootb btw. I relied on my match

Eric Wieser (Jan 09 2023 at 01:58):

What is the full path that you wanted*SomeFile.lean to match?

Mario Carneiro (Jan 09 2023 at 02:00):

*SomeFile.lean will match FooSomeFile.lean in the same directory, but not Foo/SomeFile.lean or Foo/BarSomeFile.lean

Mario Carneiro (Jan 09 2023 at 02:00):

you need to use **/*SomeFile.lean to match the latter

Arthur Paulino (Jan 09 2023 at 02:01):

Ah, got it

Eric Wieser (Jan 09 2023 at 02:02):

You might need to change your shell settings to enable that, it's not on by default

Arthur Paulino (Jan 09 2023 at 02:02):

I will remove the custom matches then and use exact matches

Arthur Paulino (Jan 09 2023 at 02:13):

PR updated: mathlib4#1430

Arthur Paulino (Jan 09 2023 at 02:16):

I tried using curl progress bar but it prints a lot of trash when trying to download files that don't exist on the server

Arthur Paulino (Jan 09 2023 at 02:17):

So I guess no progress bar until we have a Lean implementation of a proper lib for HTTP requests

Siddhartha Gadgil (Jan 09 2023 at 02:22):

This is far from anything I am competent with, but as a bridge would it be good to have a Lean ffi wrapper around libcurl instead of calling the different shells with curl?

Arthur Paulino (Jan 09 2023 at 02:26):

I believe someday we will have that. I definitely don't have time for it right now

Siddhartha Gadgil (Jan 09 2023 at 02:51):

I had considered this for LeanAide but I am still unfamiliar with ffi (and in general unskilled in C/system stuff). But hopefully someone with the time and expertise will create a minimal lean-curl which can then be expanded to more of the API.

Eric Wieser (Jan 09 2023 at 09:19):

I suspect distributing lake exe cache with an appropriate libcurl might be rather annoying, especially on windows

Yaël Dillies (Jan 10 2023 at 08:30):

How hard would it be to have a button in the infoview that runs lake exe cache get X when file X is open?

Johan Commelin (Jan 10 2023 at 08:32):

Or Ctrl-Shift-P: Lake ... something something?

Yaël Dillies (Jan 10 2023 at 08:32):

Yeah, that would be good too, although probably less discoverable.

Arthur Paulino (Jan 10 2023 at 08:34):

Yaël Dillies said:

How hard would it be to have a button in the infoview that runs lake exe cache get X when file X is open?

That won't work without the upgrade on IO.removeFile to remove files being read

Yaël Dillies (Jan 10 2023 at 08:35):

Did that not already happen? I just ran lake exe cache get with Algebra.Order.AbsoluteValue and nothing complained about linked files.

Arthur Paulino (Jan 10 2023 at 08:36):

Nope... well, I haven't touched it.

Also, cache is an application that lives in mathlib4 and the infoview is not mathlib4 specific

Yaël Dillies (Jan 10 2023 at 08:37):

That's not a problem. We can and do override infoview functions, or at least we could in Lean 3.

Yaël Dillies (Jan 10 2023 at 08:38):

See file#tactic/interactive_expr

Arthur Paulino (Jan 10 2023 at 08:38):

  1. But what would that do?

Arthur Paulino (Jan 10 2023 at 08:39):

Is it a button that only appears when you're in mathlib?

Yaël Dillies (Jan 10 2023 at 08:39):

Yes, that's how it would look like.

Arthur Paulino (Jan 10 2023 at 08:40):

So that's two things: I haven't touched the code that overwrites oleans being read and I don't know about these repo-specific buttons

Yaël Dillies (Jan 10 2023 at 08:40):

If you've ever tried to open core files, you will see that the infoview looks different to the one you get in mathlib. And that difference come from the two lines here: https://github.com/leanprover-community/mathlib/blob/master/src/tactic/interactive_expr.lean#L496-L497

Arthur Paulino (Jan 10 2023 at 08:42):

Arthur Paulino said:

So that's two things: I haven't touched the code that overwrites oleans being read and I don't know about these repo-specific buttons

But if someone else knows how to get around these two points, please go ahead

Yaël Dillies (Jan 10 2023 at 08:43):

I know how to do the second one. Indeed I had to touch the machinery myself in #15959. I have no idea about the first, though.

Arthur Paulino (Jan 10 2023 at 08:43):

For the former, we need an upgraded version of IO.removeFile that Sebastian said should be implemented in core. I don't know the status of that

Arthur Paulino (Jan 10 2023 at 14:33):

:ping_pong: mathlib4#1430

Arthur Paulino (Jan 10 2023 at 23:25):

This one is a typo that @Yaël Dillies noticed. The PR is very easy to review:
mathlib4#1472

Yaël Dillies (Jan 12 2023 at 22:52):

@Arthur Paulino, whatever I try, I still cannot get lake exe cache get some_file to work.

Yaël Dillies (Jan 12 2023 at 22:53):

$ lake exe cache get Mathlib/Data/Rat/*.lean
warning: manifest out of date: package directory changed, use `lake update` to update
uncaught exception: No match for Mathlib/Data/Rat/Basic.lean

Yaël Dillies (Jan 12 2023 at 22:53):

Is this a Windows bug, maybe?

Arthur Paulino (Jan 12 2023 at 22:54):

Aren't Windows paths build with \ instead of /?

Yaël Dillies (Jan 12 2023 at 22:54):

As in, your match is expanded with /, then it fails because Windows expects \

Yaël Dillies (Jan 12 2023 at 22:55):

Ah so your program isn't doing anything about it!

Yaël Dillies (Jan 12 2023 at 22:56):

It is unexpected that the exact syntax from the help menu can't be used because it is OS-dependent.

Yaël Dillies (Jan 12 2023 at 22:57):

lake exe cache get Mathlib\\Data\\Rat\\*.lean

does work, but that's unergonomic.

Arthur Paulino (Jan 12 2023 at 22:58):

Sorry, the paths in the hashmap are built with the separators that the current OS uses. Can't you at least use tab completion on Windows?

Yaël Dillies (Jan 12 2023 at 22:59):

Not sure how that works

Yaël Dillies (Jan 12 2023 at 22:59):

Actually, it seems like the tab completion only works with / :face_palm:

Arthur Paulino (Jan 12 2023 at 23:03):

What the... :sad:

Yaël Dillies (Jan 12 2023 at 23:09):

Can your command just ignore Windows' weirdness and build the hasmap from /?

Arthur Paulino (Jan 12 2023 at 23:11):

It would be a bit of a nightmare to implement that. I'm using System.FilePath API to handle that for me

Yaël Dillies (Jan 12 2023 at 23:12):

/me is :face_with_symbols_on_mouth: Windows

Arthur Paulino (Jan 12 2023 at 23:14):

Even if I do replace / for \\ on Windows on the arguments of cache get, I'm not sure if automatic glob expansion works with / on Windows

Arthur Paulino (Jan 12 2023 at 23:14):

@Yaël Dillies do you know how to test that?

Yaël Dillies (Jan 12 2023 at 23:15):

I'm afraid I am a Windows power-sufferer, not a Windows power-user.

Yaël Dillies (Jan 12 2023 at 23:15):

What is automatic glob expansion?

Arthur Paulino (Jan 12 2023 at 23:17):

Have you ever programmed something that reads arguments from the shell input?

Yaël Dillies (Jan 12 2023 at 23:18):

No, never

Arthur Paulino (Jan 12 2023 at 23:20):

Suppose we have

def main (args : List String) : IO Unit := do
  IO.println args

that compiles to some binary ./build/bin/foo. Then, with automatic glob expansion, if you call ./build/bin/foo *.txt, your program will print every filepath that ends with .txt in the current folder

Henrik Böving (Jan 12 2023 at 23:21):

Note that this only happens in a shell context

Henrik Böving (Jan 12 2023 at 23:22):

This is not the actual behaviour of the system calls that perform the call, your shell does the glob expansion.

Arthur Paulino (Jan 12 2023 at 23:22):

That's something the shell does for you. The source code of the program does not contain any logic to do that search

Sky Wilshaw (Jan 12 2023 at 23:26):

image.png It does work on Windows - at least in Powershell.

Sky Wilshaw (Jan 12 2023 at 23:26):

Unless cat is doing some processing itself.

Arthur Paulino (Jan 12 2023 at 23:27):

Yeah can you test with a program that you write yourself please?

Arthur Paulino (Jan 12 2023 at 23:27):

It can be this simple:

def main (args : List String) : IO Unit := do
  IO.println args

Arthur Paulino (Jan 12 2023 at 23:28):

If that works, I can do the upgrade tomorrow

Sky Wilshaw (Jan 12 2023 at 23:28):

I don't have easy access to a Lean executable at the minute but...
image.png

Sky Wilshaw (Jan 12 2023 at 23:29):

which is a bit disappointing.

Arthur Paulino (Jan 12 2023 at 23:31):

Well well well, I guess I can do a custom processing in get/get! parameters on Windows and do the match manually

Arthur Paulino (Jan 12 2023 at 23:32):

But it's not going to be as powerful as native glob expansions. I can handle one wildcard, at most. I had this logic implemented on a previous version

Eric Wieser (Jan 12 2023 at 23:38):

You shouldn't implement this in lake exe cache

Eric Wieser (Jan 12 2023 at 23:38):

If this isn't working, we need to set a compilation flag for lean somewhere

Eric Wieser (Jan 12 2023 at 23:39):

See this message

Sky Wilshaw (Jan 12 2023 at 23:40):

Oh wow, I love Windows' idiosyncrasies :/

Arthur Paulino (Jan 12 2023 at 23:42):

@Eric Wieser I don't understand it. Then it should be a matter of setting that flag in the compilation of Lean and everything should work ootb?

Eric Wieser (Jan 12 2023 at 23:47):

Yes, that would be my hope

Eric Wieser (Jan 12 2023 at 23:48):

In the meantime, windows users can just use a different shell

Eric Wieser (Jan 12 2023 at 23:48):

I would expect that most Lean users have git bash installed

Eric Wieser (Jan 12 2023 at 23:48):

Powershell has its own version of glob too

Eric Wieser (Jan 12 2023 at 23:49):

Arthur Paulino said:

Sorry, the paths in the hashmap are built with the separators that the current OS uses. Can't you at least use tab completion on Windows?

You should normalize the paths to use / before using them in the hash map; otherwise won't your hash computation be wrong?

Arthur Paulino (Jan 12 2023 at 23:51):

If I did normalization, I would have to "denormalize" everytime i'd use it for pointing to files. That would be a mess. I don't hash the path per se. I hash its components (System.FilePath.components : FilePath -> List String)

Arthur Paulino (Jan 12 2023 at 23:52):

https://github.com/leanprover-community/mathlib4/blob/aec2f1d48aa89d0e47360ef080ff0fb487766533/Cache/Hashing.lean#L79

Arthur Paulino (Jan 12 2023 at 23:55):

Eric Wieser said:

I would expect that most Lean users have git bash installed

@Yaël Dillies do you want to test git bash? :smile:

Eric Wieser (Jan 13 2023 at 00:03):

If I did normalization, I would have to "denormalize" everytime i'd use it for pointing to files

Windows should do this for you

Eric Wieser (Jan 13 2023 at 00:04):

Also, by normalize I also mean deal with paths like Mathlib/../Mathlib/../Data/Nat/../Nat/Basic.lean

Eric Wieser (Jan 13 2023 at 00:14):

Arthur Paulino said:

Eric Wieser I don't understand it. Then it should be a matter of setting that flag in the compilation of Lean and everything should work ootb?

Ah nevermind, Lean is built with msys2 (which expects you to run in a gnu runtime like git bash) not msvc (for which the link above is relevant)

Eric Wieser (Jan 13 2023 at 00:14):

Any attempt to fix shell globbing within lake exe cache itself will break people in shells that already handle this themselves.

Arthur Paulino (Jan 13 2023 at 00:37):

Eric Wieser said:

Any attempt to fix shell globbing within lake exe cache itself will break people in shells that already handle this themselves.

I disagree. We can process paths that get into the program and that still contain *. That means the shell didn't do anything for the user. That's what I had in mind in my previous implementation

Arthur Paulino (Jan 13 2023 at 00:43):

@Eric Wieser @Gabriel Ebner if any of you give me a thumbs up, I can recover that customized post-processing behavior (and further replace /s by \\s on Windows in case the user used tab completion as Yael tried)

Eric Wieser (Jan 13 2023 at 00:54):

I can recover that customized post-processing behavior

I firmly believe that we shouldn't do that. If you want to be able to pass basic filename patterns to a program, use a shell that supports it. Everyone contributing to mathlib4 has git installed on the machine they develop on, and I'm pretty sure that comes with git bash on platforms not already equipped with a shell. I suppose if you really insist you could hide it between --cmd-glob (meaning "I am using cmd so need you to do glob expansion for me"). Even detecting windows would be a bad idea as you might be called from a reasonable shell.

Eric Wieser (Jan 13 2023 at 00:55):

(and further replace /s by \\s on Windows in case the user used tab completion as Yael tried)

did you see docs4#System.FilePath.normalize ? You should just use that all the time on the user input (and on the paths in your hash map)

Arthur Paulino (Jan 13 2023 at 00:57):

Eric Wieser said:

(and further replace /s by \\s on Windows in case the user used tab completion as Yael tried)

did you see docs4#System.FilePath.normalize ? You should just use that all the time on the user input (and on the paths in your hash map)

The paths in the hashmap are tailored from imported modules

Arthur Paulino (Jan 13 2023 at 00:58):

I can do an upgrade and do that on the input paths from the user in the get and get! commands

Eric Wieser (Jan 13 2023 at 00:59):

The paths in the hashmap are tailored from imported modules

What does this mean?

Arthur Paulino (Jan 13 2023 at 01:24):

It means that I don't build paths from the result of scanning the FS directly. If a file imports Foo.Bar I make a file path from that module name

Arthur Paulino (Jan 13 2023 at 01:25):

Arthur Paulino said:

I can do an upgrade and do that on the input paths from the user in the get and get! commands

Actually nvm. This won't solve any problem we are seeing today

Eric Wieser (Jan 13 2023 at 01:54):

I make a file path from that module name

How do you do that?

Arthur Paulino (Jan 13 2023 at 02:25):

https://github.com/leanprover-community/mathlib4/blob/aec2f1d48aa89d0e47360ef080ff0fb487766533/Cache/Hashing.lean#L42-L49

Eric Wieser (Jan 13 2023 at 02:40):

Note you can use docs4#Lean.Name.components rather than stringifying and splitting

Eric Wieser (Jan 13 2023 at 02:44):

But I agree there's no need to normalize those paths

Eric Wieser (Jan 13 2023 at 02:45):

They already use a \ on windows

Arthur Paulino (Jan 13 2023 at 03:05):

Lean.Name.components return a list of names anyways

Arthur Paulino (Jan 13 2023 at 03:06):

I would have to stringfy each separately

Johan Commelin (Jan 13 2023 at 06:24):

Yaël Dillies said:

I'm afraid I am a Windows power-sufferer, not a Windows power-user.

It's time to leave the dark side! Step in to the light!

:penguin: :penguin: :penguin:
:octopus: :octopus: :octopus:

Kevin Buzzard (Jan 13 2023 at 07:29):

Yael are you not using git bash?

Yaël Dillies (Jan 13 2023 at 08:30):

I am using whatever VScode's terminal is, which seems to be bash. How do I change the default?

Eric Wieser (Jan 13 2023 at 08:42):

Arthur Paulino said:

I would have to stringfy each separately

Yes, but the names are allowed to contain . so the two approaches are not equivalent

Eric Wieser (Jan 13 2023 at 10:15):

Yaël Dillies said:

I am using whatever VScode's terminal is, which seems to be bash. How do I change the default?

If it's bash then you already have the right things. cmd is the bad one

Yaël Dillies (Jan 13 2023 at 10:16):

Ah well then my problem didn't get fixed :frown:

Eric Wieser (Jan 13 2023 at 10:28):

Can you paste the command you typed here?

Eric Wieser (Jan 13 2023 at 10:30):

Aha, I think I know the problem:

  • The glob pattern match is working as expected
  • You're using a unix shell so the glob generates/-style paths
  • Lean notices you're running on windows, so uses \\-style paths
  • Cache does string equality on the paths rather than path equality, so the match fails

Eric Wieser (Jan 13 2023 at 10:30):

Adding the call to .normalize on user inputs as I suggested above will fix the problem

Arthur Paulino (Jan 13 2023 at 11:24):

Eric Wieser said:

Adding the call to .normalize on user inputs as I suggested above will fix the problem

mathlib4#1532

Arthur Paulino (Jan 13 2023 at 11:27):

@Eric Wieser may I do a small optimization on your branch so we don't need to call Name.toString at all?

Eric Wieser (Jan 13 2023 at 11:28):

I expect toString to be free

Arthur Paulino (Jan 13 2023 at 11:29):

Alright, I've approved it

Sky Wilshaw (Jan 13 2023 at 12:04):

@Yaël Dillies Might be useful: image.png

Yaël Dillies (Jan 13 2023 at 12:17):

Oh yeah I knew about that menu, but forgot that the default coudl be edited from here.

Eric Wieser (Jan 13 2023 at 12:19):

The fix should be merged in master, can you try again @Yaël Dillies?

Yaël Dillies (Jan 13 2023 at 12:28):

$ lake exe cache get Mathlib/Data/Rat/Floor.lean
warning: manifest out of date: package directory changed, use `lake update` to update
info: updating .\./lake-packages\std to revision fde95b16907bf38ea3f310af406868fc6bcf48d1
info: Compiling Cache.IO
info: Compiling Cache.Hashing
info: Compiling Cache.Requests
info: Compiling Cache.Main
error: > c:\Users\YaelD\.elan\toolchains\leanprover--lean4---nightly-2023-01-12\bin\leanc.exe -c -o .\build\ir\Cache\IO.o .\build\ir\Cache\IO.c -O3 -DNDEBUG
error: stderr:
uncaught exception: no such file or directory (error code: 2)
error: external command `c:\Users\YaelD\.elan\toolchains\leanprover--lean4---nightly-2023-01-12\bin\leanc.exe` exited with code 1
error: > c:\Users\YaelD\.elan\toolchains\leanprover--lean4---nightly-2023-01-12\bin\leanc.exe -c -o .\build\ir\Cache\Hashing.o .\build\ir\Cache\Hashing.c -O3 -DNDEBUG
error: stderr:
uncaught exception: no such file or directory (error code: 2)
error: external command `c:\Users\YaelD\.elan\toolchains\leanprover--lean4---nightly-2023-01-12\bin\leanc.exe` exited with code 1
error: > c:\Users\YaelD\.elan\toolchains\leanprover--lean4---nightly-2023-01-12\bin\leanc.exe -c -o .\build\ir\Cache\Requests.o .\build\ir\Cache\Requests.c -O3 -DNDEBUG
error: stderr:
uncaught exception: no such file or directory (error code: 2)
error: external command `c:\Users\YaelD\.elan\toolchains\leanprover--lean4---nightly-2023-01-12\bin\leanc.exe` exited with code 1
error: > c:\Users\YaelD\.elan\toolchains\leanprover--lean4---nightly-2023-01-12\bin\leanc.exe -c -o .\build\ir\Cache\Main.o .\build\ir\Cache\Main.c -O3 -DNDEBUG
error: stderr:
uncaught exception: no such file or directory (error code: 2)
error: external command `c:\Users\YaelD\.elan\toolchains\leanprover--lean4---nightly-2023-01-12\bin\leanc.exe` exited with code 1

Yaël Dillies (Jan 13 2023 at 12:28):

Seems to be unrelated, though?

Arthur Paulino (Jan 13 2023 at 12:32):

Try lake exe cache to see if it prints the help menu

Arthur Paulino (Jan 13 2023 at 12:35):

If it doesn't print the help menu then it's indeed unrelated

Johan Commelin (Jan 17 2023 at 08:08):

Ignorant question: if I have 5 clones of mathlib4, do they share a local cache? Or does each of them maintain its own cache?

Arthur Paulino (Jan 17 2023 at 10:35):

Each of them maintain their own cache (inside a gitignored .cache folder)

Johan Commelin (Jan 17 2023 at 10:36):

I see. What do you think of using $HOME/.cache/lake/ or something like that?

Johan Commelin (Jan 17 2023 at 10:37):

I'm using git worktree to checkout every branch in its own directory.

Johan Commelin (Jan 17 2023 at 10:37):

Which means I'm currently not really benefiting from local caching that much, and just redownloading the same files 10x per day

Arthur Paulino (Jan 17 2023 at 10:38):

I can add a check to see if the user has a MATHLIB_CACHE_DIR env var defined. If so, use that instead

Arthur Paulino (Jan 17 2023 at 10:39):

Would that help?

Johan Commelin (Jan 17 2023 at 10:40):

@Arthur Paulino yeah, sounds good!

Arthur Paulino (Jan 17 2023 at 10:40):

But then you would have to export that env variable on every terminal session

Johan Commelin (Jan 17 2023 at 10:40):

I'm ok with that

Damiano Testa (Jan 17 2023 at 10:40):

I personally would prefer having control over where to store the cache: currently, I am unable to use mathlib4 in the department, since I cannot install things myself, due to permissions. If I can control as much as I can where files are placed, that would be helpful!

Mauricio Collares (Jan 17 2023 at 10:42):

Would writing it at $XDG_CACHE_HOME/mathlib be appropriate? A user could still do XDG_CACHE_HOME=/tmpdir lake exe cache... to override it.

Damiano Testa (Jan 17 2023 at 10:44):

Mauricio, in this specific case, the $XDG_CACHE_HOME variable is set to a directory on which I have write permissions, so I would be happy with that as well!

Arthur Paulino (Jan 17 2023 at 11:40):

I will implement Mauricio's idea then

Johan Commelin (Jan 17 2023 at 11:41):

Thanks!

Arthur Paulino (Jan 17 2023 at 11:42):

Ah, and I will use .cache as a fallback in case that's not defined

Arthur Paulino (Jan 17 2023 at 11:54):

Wait, is XDG_CACHE_HOME some default path on Unix systems? I don't have it defined on my machine (Ubuntu 22.04)

Johan Commelin (Jan 17 2023 at 11:59):

https://wiki.archlinux.org/title/XDG_user_directories

Johan Commelin (Jan 17 2023 at 12:00):

Most distros do something with XDG

Arthur Paulino (Jan 17 2023 at 12:15):

mathlib4#1623

Sebastian Ullrich (Jan 17 2023 at 12:26):

The correct fallback for XDG_CACHE_HOME is $HOME/.cache. It doesn't make sense to decide between a local and global cache based on the existence of this variable.

Arthur Paulino (Jan 17 2023 at 12:59):

I can change that. But what if HOME is not defined?

Sebastian Ullrich (Jan 17 2023 at 13:02):

Then we're screwed

Sebastian Ullrich (Jan 17 2023 at 13:05):

And this is of course only on Linux, other platforms have completely different conventions https://crates.io/crates/dirs

Arthur Paulino (Jan 17 2023 at 13:14):

Then I'll leave .cache as the safe fallback

Arthur Paulino (Jan 17 2023 at 13:14):

(but anyone is free to push to that branch and advocate for their solution, of course)

Arthur Paulino (Jan 17 2023 at 13:18):

mathlib4#1623 is updated

Rob Lewis (Jan 18 2023 at 05:06):

I may be misunderstanding something about the relation between get-cache and lake build, so sorry if that's the case!

I have a project that uses mathlib as a dependency. I can run lake exe cache get in the root of my project and it seems to successfully download and/or unpack a cache. But running lake build tries to recompile everything. When I open a file with mathlib imports in vscode, it also starts recompiling. Any guess what I've done wrong here?

Curiously: this project is also set up for gitpod, and if I create a fresh gitpod instance (calling lake exe cache get) I can open files with no recompilation. But lake build still starts from the bottom.

Johan Commelin (Jan 18 2023 at 05:08):

@Rob Lewis I think Arthur has conjectured that we shouldn't be pinning master in lakefiles, but pinning a specific hash/tag/release instead.

Johan Commelin (Jan 18 2023 at 05:08):

I haven't yet tested whether that helps, but it's worth a shot.

Rob Lewis (Jan 18 2023 at 05:08):

On a semi-related note, is there a standard way to stop recompilation when it gets triggered by vscode? In the above, when I opened a file with imports, a bunch of lake and lean processes started eating my CPU, and restarting the Lean server and even quitting vscode didn't kill them. I had to kill them manually

Rob Lewis (Jan 18 2023 at 05:09):

Johan Commelin said:

Rob Lewis I think Arthur has conjectured that we shouldn't be pinning master in lakefiles, but pinning a specific hash/tag/release instead.

Interesting -- will give that a shot!

Rob Lewis (Jan 18 2023 at 05:12):

Pinning a specific commit didn't seem to change anything. (Process: edit my project's lakefile.lean, lake update, lake exe cache get, lake build)

Heather Macbeth (Jan 18 2023 at 05:46):

I think maybe lake update is something you're just not supposed to do when you have dependencies. (I seem to remember discussion of this.)

Heather Macbeth (Jan 18 2023 at 05:47):

So maybe instead of lake update you want to manually copy in the lake-manifest of current mathlib. (well, or find a way to automate that.)

Heather Macbeth (Jan 18 2023 at 05:51):

Heather Macbeth said:

I think maybe lake update is something you're just not supposed to do when you have dependencies. (I seem to remember discussion of this.)

That's a big vague, but I guess my question is: is lake update un-pinning your pinned hashes for the dependencies?

Rob Lewis (Jan 18 2023 at 06:07):

It would have to copy the manifest from mathlib and then add mathlib itself (per the error message when I tried). But even if I do that I see the same behavior. Updated process: my project pins mathlib4 to a commit and its manifest matches that of mathlib4, with the addition of mathlib4 itself. I clone a fresh copy of this repository (so no build artifacts) and run lake exe cache get, lake build

Rob Lewis (Jan 18 2023 at 06:07):

Although, with this new process, I can open files that import mathlib before running lake build

Gabriel Ebner (Jan 18 2023 at 06:21):

I think Arthur has conjectured that we shouldn't be pinning master in lakefiles, but pinning a specific hash/tag/release instead.

This should make no difference whatsoever for the caching. We've been fixing quite a few bugs though over the last few weeks, are you on the latest mathlib4 already?

Gabriel Ebner (Jan 18 2023 at 06:22):

FWIW, mathport depends on mathlib4 and successfully uses lake exe cache get in CI (i.e., avoiding building mathlib4)

Gabriel Ebner (Jan 18 2023 at 06:23):

Note: you need matching Lean versions of course. cp lake-packages/mathlib/lean-toolchain . is your friend.

Gabriel Ebner (Jan 18 2023 at 06:24):

Note: of course std4/aesop/etc. versions also need to match exactly. Deleting any require std etc. in your lakefile is a good start, followed by lake update.

Gabriel Ebner (Jan 18 2023 at 06:25):

On a semi-related note, is there a standard way to stop recompilation when it gets triggered by vscode? In the above, when I opened a file with imports, a bunch of lake and lean processes started eating my CPU, and restarting the Lean server and even quitting vscode didn't kill them. I had to kill them manually

No. We should probably file that as a bug.

Heather Macbeth (Jan 18 2023 at 06:25):

Gabriel Ebner said:

Note: of course std4/aesop/etc. versions also need to match exactly. Deleting any require std etc. in your lakefile is a good start, followed by lake update.

So will that ensure that in a project with just one dependency (mathlib), all the dependencies of that dependency (std/aesop/...) get updated only to whatever version mathlib is currently using -- not to latest version?

Gabriel Ebner (Jan 18 2023 at 06:27):

Yes. That's also been fixed recently (November last year).

Rob Lewis (Jan 18 2023 at 06:28):

I'm on the latest commit of mathlib4 and the toolchain files match. This is the entirety of my lakefile:

import Lake
open Lake DSL


require mathlib from git "https://github.com/leanprover-community/mathlib4" @ "d1e6d8643531bd36f7c7bb612413decb7cb07ede"

package «brown-cs22» {
  -- add package configuration options here
}

lean_lib BrownCs22 {
  -- add library configuration options here
}

@[default_target]
lean_exe «brown-cs22» {
  root := `Main
}

Gabriel Ebner (Jan 18 2023 at 06:28):

lake update && cp lake-packages/mathlib/lean-toolchain . and you should be good to go.

Rob Lewis (Jan 18 2023 at 06:28):

mathport is a good example for me to compare to -- I'll have to see if I can spot any relevant differences between that and my setup

Rob Lewis (Jan 18 2023 at 06:30):

rob@pop-os:~/teaching/cs22/s23/lean$ lake update && cp lake-packages/mathlib/lean-toolchain .
rob@pop-os:~/teaching/cs22/s23/lean$ lake exe cache get
No files to download
Decompressing 926 file(s)
rob@pop-os:~/teaching/cs22/s23/lean$ lake build
Compiling Std.Data.RBMap.Basic
Compiling Qq.Macro
Compiling Mathlib.Lean.Meta
Compiling Qq.Rw
Compiling Std.Lean.Meta.DiscrTree
Compiling Std.Data.List.Basic
Compiling Mathlib.Tactic.Relation.Symm
Compiling Mathlib.Tactic.Relation.Trans
^C
rob@pop-os:~/teaching/cs22/s23/lean$

:sad:

Gabriel Ebner (Jan 18 2023 at 06:30):

That's normal.

Gabriel Ebner (Jan 18 2023 at 06:30):

Compiling = running C compiler to produce native binary (not cached)

Gabriel Ebner (Jan 18 2023 at 06:30):

Building = running Lean (slow and cached)

Rob Lewis (Jan 18 2023 at 06:31):

Oh!

Heather Macbeth (Jan 18 2023 at 06:35):

Wait -- so which one does lake build do?

Rob Lewis (Jan 18 2023 at 06:35):

So, what I'd like to do in Lean 3 language: I have my project that depends on mathlib. I'd like to build oleans for my project so that when a student opens one of my files, they don't have to wait. lean src --make or whatever

Rob Lewis (Jan 18 2023 at 06:36):

Imagining that there's a file in my project that's slow to build that gets imported by lots of others, e.g. a "course library" file

Rob Lewis (Jan 18 2023 at 06:37):

I take it lake build is not what I'm looking for?

Gabriel Ebner (Jan 18 2023 at 06:39):

lake build BrownCs22.MyModule will build only that module and its dependencies. The reason lake build compiles the C code is because it compiles runLinter--it won't compile any the modules in your project. It will only build them.

Gabriel Ebner (Jan 18 2023 at 06:39):

Or are you asking if you can use lake exe cache get for your project?

Rob Lewis (Jan 18 2023 at 06:43):

Nope! I'm looking for lake build BrownCs22.MyModule, thanks so much

Arthur Paulino (Jan 18 2023 at 07:42):

Gabriel Ebner said:

Note: of course std4/aesop/etc. versions also need to match exactly. Deleting any require std etc. in your lakefile is a good start, followed by lake update.

This is where I think things can go wrong: relying on Lake to figure out transitive dependencies. It's easy if your graph of dependencies is small, but it can become a nightmare for bigger graphs.

If I have a direct dependency on Std and on Mathlib but Mathlib also depends on Std, figuring out compatibility is a lot smoother if Mathlib explicitly states the commit hash it needs from Std in its lakefile.

Another situation where dependency UX can go bad is when you depend on two libs, but they implicitly depend on different versions of a certain lib.

Arthur Paulino (Jan 18 2023 at 07:44):

Ah, and parsing Lake manifests with the human eye is out of question in terms of UX. Please let's not rely on that

James Gallicchio (Jan 18 2023 at 07:47):

(are the current caching scripts really designed to work outside of the porting effort? It seems like more robust stuff will require a bit more infrastructure around dependency management)

Arthur Paulino (Jan 18 2023 at 07:56):

Yes and no. It is prepared to deal with the two situations: when you're in Mathlib or when you're importing Mathlib. I mean, it can understand in which scenario it's at, find sources and place olean files correctly. But it relies on precise management of dependencies. And Mathlib pinning the master rev of Std doesn't help

Arthur Paulino (Jan 18 2023 at 08:04):

Unless you're willing to rely on Lake to figure out transitive dependencies for you. But that will potentially make the life of people importing your project harder because they don't know which version of Std they will be downloading unless they look at your manifest

Arthur Paulino (Jan 18 2023 at 08:38):

Managing dependencies, however, is a world in itself. As a reference, Cargo adopts semantic versioning... but they also have a dedicated platform for hosting packages.

I will follow up on a thread in #lean4

Gabriel Ebner (Jan 18 2023 at 17:37):

If I have a direct dependency on Std and on Mathlib but Mathlib also depends on Std, figuring out compatibility is a lot smoother if Mathlib explicitly states the commit hash it needs from Std in its lakefile.

@Arthur Paulino I don't know how to get this through to you, but your description is exactly how Lake and mathlib4 are already set up right now. The lake-manifest.json in mathlib4 explicitly states the commit hash from std4 (and from aesop, etc.). If you use lake update in a project that imports mathlib4, it will use these pinned revisions.

Gabriel Ebner (Jan 18 2023 at 17:45):

Another situation where dependency UX can go bad is when you depend on two libs, but they implicitly depend on different versions of a certain lib.

This is indeed a hard one. But I have an undocumented trick for you: the order of require directives in a lakefile matters, i.e., they are resolved top-to-bottom. So as long as mathlib4 is your first require, lake update will take all dependency versions from mathlib4. (And if you want a different version of aesop, you could put a require aesop on top.)

Arthur Paulino (Jan 18 2023 at 18:37):

Alright, got it. Thanks! The order trick is important and I wasn't aware of it

Arthur Paulino (Jan 18 2023 at 18:48):

@Siddhartha Gadgil I think caching can work for you if you remove the requirements on Qq and Std

Siddhartha Gadgil (Jan 19 2023 at 00:44):

Thanks. i will try

Violeta Hernández (Jan 20 2023 at 04:48):

image.png

Violeta Hernández (Jan 20 2023 at 04:48):

What's going on here?

Violeta Hernández (Jan 20 2023 at 05:06):

Well, I just built Mathlib4 myself and it works

Violeta Hernández (Jan 20 2023 at 05:06):

Very very fast

Arthur Paulino (Jan 20 2023 at 07:41):

Looks like a partial failure

Arthur Paulino (Jan 20 2023 at 13:12):

@Violeta Hernández it's better to do lake exe cache get! once to purge corrupted files

Adam Topaz (Jan 23 2023 at 16:33):

Should we add some instructions for lake exe get cache on the mathlib4 readme.md?

Arthur Paulino (Jan 24 2023 at 12:01):

What's the plan for when we start getting hash collisions in UInt64 btw?

Eric Wieser (Jan 24 2023 at 12:04):

Use a more reasonable hash?

Arthur Paulino (Jan 24 2023 at 12:38):

That's the obvious part. But I mean, are we going to implement something in Lean? Use some C implementation? FFI to something that's done in Rust?

Sebastian Ullrich (Jan 24 2023 at 12:47):

Arthur Paulino said:

What's the plan for when we start getting hash collisions in UInt64 btw?

What's the chance of that happening?

Arthur Paulino (Jan 24 2023 at 12:49):

It's low but I don't have a clear intuition for how likely that is to happen in the next year

Eric Wieser (Jan 24 2023 at 12:53):

Does the presence of a git installation mean that you have a command-line hashing tool installed?

Arthur Paulino (Jan 24 2023 at 13:08):

Eric, I don't know why you're communicating with questions like that. Yes, Git has a hashing algorithm. But that doesn't give us an obvious way to plug it in our hashing scheme

Eric Wieser (Jan 24 2023 at 13:09):

Sorry, let me try and be clearer: could you build a long string of all the data you want to hash, and pipe it to git some-raw-hash-command in order to punt the job of having native hashing down the road? Or would that be prohibitively slow? I'm not suggesting using git, but merely using the fact that git bundles a binary somewhere that can compute SHA1 hashes.

Arthur Paulino (Jan 24 2023 at 13:12):

I found this: https://git-scm.com/docs/git-hash-object

Arthur Paulino (Jan 24 2023 at 13:14):

The long string idea works, but we'd need to write it to a file and then hash. If we have ~1k files, that's ~1k writes and git hash calls though

Arthur Paulino (Jan 24 2023 at 13:18):

We have an implementation of the SHA3 hash here. But maybe that's an overkill. I know this is a very anticipated concern but I just don't want us to fall in a "fixing a plane while it's flying" territory. I think we should at least agree on a plan for when/if it happens

Eric Wieser (Jan 24 2023 at 13:23):

Arthur Paulino said:

The long string idea works, but we'd need to write it to a file and then hash. If we have ~1k files, that's ~1k writes and git hash calls though

Would piping it through stdin not work? Either way, the subprocess calls would add up I guess.

Arthur Paulino (Jan 24 2023 at 13:28):

Yes, the ideal solution wouldn't rely on more IO overhead, IMO

Johan Commelin (Jan 24 2023 at 13:43):

git hash-object indeed has a --stdin option.

James Gallicchio (Jan 24 2023 at 14:00):

i'm not sure it's worth worrying -- where do we rely on hashes to indicate equality?

James Gallicchio (Jan 24 2023 at 14:00):

(deleted)

Arthur Paulino (Jan 24 2023 at 14:03):

James Gallicchio said:

i'm not sure it's worth worrying -- where do we rely on hashes to indicate equality?

The hash indicates which olean file we should download. A hash collision would trigger a download of some unrelated olean and I'm not fully aware of the consequences of that

Sebastian Ullrich (Jan 24 2023 at 14:05):

Unless Lake's trace has the same collision, it will surely rebuild the file

Arthur Paulino (Jan 24 2023 at 14:08):

That's good news then. So in terms of build safety, we're backed up by Lake

Eric Wieser (Jan 24 2023 at 14:08):

A collision could also cause uploads to either fail or invalidate old caches, right?

Arthur Paulino (Jan 24 2023 at 14:10):

Uploads would fail. But it's probably better to invalidate old caches :thinking:

Arthur Paulino (Jan 24 2023 at 14:34):

Sebastian Ullrich said:

Unless Lake's trace has the same collision, it will surely rebuild the file

Just to make sure... we also download .ilean, .trace, .c and .c.trace files. Would Lake still be able to recover from that? I'm asking because I don't know to what extend Lake trusts what's already in the file system versus always recomputing traces

Sebastian Ullrich (Jan 24 2023 at 14:59):

Yes, comparing the trace on disk with the recomputed trace in memory is how Lake decides what to rebuild.

Gabriel Ebner (Jan 24 2023 at 19:13):

Lake uses the same hash as cache, so a hash collision in the file contents would affect both of them.

Gabriel Ebner (Jan 24 2023 at 19:17):

I've been thinking about adding a better, cryptographic hash function. For example blake3 is about as fast our current hash, at least if you use the optimized assembly version. But then I saw that Lake needs to be built by three different build systems (make, nix, and lake) and my motivation dropped drastically.

Arthur Paulino (Jan 24 2023 at 19:21):

We're using Rust's blake3 via FFI here. But that would require users to have cargo installed

Gabriel Ebner (Jan 24 2023 at 19:34):

Using blake3 via Rust is a very roundabout way of doing this. Blake3 has a nice C implementation, there's no need to depend on Rust. (At least when using it in Lean/Lake. I guess you're already using Rust for other libraries as well, so it's not an extra dependency for you.)

Arthur Paulino (Jan 24 2023 at 19:40):

On that, we would prefer using the C implementation directly. But configuring the Rust FFI looked easier than adapting the C code.

Arthur Paulino (Jan 24 2023 at 20:12):

If you manage to get that into std4, it would not only make mathlib4's caching better but also our team happier

Mario Carneiro (Jan 24 2023 at 20:23):

speaking of which, if someone wants to upstream the cache script to std4 that would be great

Mario Carneiro (Jan 24 2023 at 20:24):

ideally in such a way that mathlib doesn't need to maintain a separate copy of the script

Arthur Paulino (Jan 24 2023 at 20:30):

The proper way of doing so is by having a generic package for caching

Arthur Paulino (Jan 24 2023 at 20:32):

A less ideal way of doing so is by migrating Cache to std4 so mathlib4 would inherit lake exe cache

Mario Carneiro (Jan 24 2023 at 20:32):

I have no idea how packages and lake scripts interact, but if you can make something work that's fine

Mario Carneiro (Jan 24 2023 at 20:33):

I just want this capability to be widely available somehow

Mario Carneiro (Jan 24 2023 at 20:35):

I prefer not to add new packages though since they add maintenance overhead when we do lean bumps and the like

Mario Carneiro (Jan 24 2023 at 20:35):

that feels like a wider conversation though that we don't need to broach now

Heather Macbeth (Jan 28 2023 at 02:58):

lake exe cache get currently fails for me on freshly-cloned mathlib4 with

uncaught exception: ./build/lib/Mathlib/Data/Nat/Prime.olean: truncated gzip input
tar: Error exit delayed from previous errors.

Does anyone else experience this?

Kevin Buzzard (Jan 28 2023 at 08:49):

The only trick I know right now is to change get to get!. Is that any use?

Arthur Paulino (Jan 28 2023 at 10:18):

Yes. @Heather Macbeth you might have ended up with some corrupted gz file(s). get doesn't overwrite local files but get! will try to download everything you need from scratch, overwriting existing (maybe corrupted) files

This is something that will be improved once we have our own reliable HTTP request package in Lean 4. Right now we are using curl and it's hard to know what it does from outside with enough details without falling into further version restrictions

Arthur Paulino (Jan 28 2023 at 11:50):

Can we rename this thread to lake exe cache get to avoid eventual confusions?


Last updated: Dec 20 2023 at 11:08 UTC