Zulip Chat Archive

Stream: mathlib4

Topic: problems with `lake exe cache`


Scott Morrison (Jun 16 2023 at 06:19):

We currently have two problems with lake exe cache get in mathlib4:

  • it does not cache anything from ProofWidgets (this is usually not a problem, but anytime you change lean versions or touch the lakefile these files will rebuild, but could be cached)
  • lake exe cache get Archive/Imo/Imo2001Q6.lean fails rather than either retrieving a cache for the IMO file and all its dependencies, or just retrieving a cache for the mathlib4 files it depends on.

!4#5109 fixes these two problems.

Scott Morrison (Jun 16 2023 at 06:20):

@Mario Carneiro asked to move the discussion of that PR here; he had concerns about whether we should be fixing these problems before planning how to allow non-mathlib caches, or upstreaming cache.

Scott Morrison (Jun 16 2023 at 06:21):

My preference is to not to wait on those problems (which we've neither made a plan for, nor done the work for) in order to fix these immediate problems.

That said, I totally agree that we should solve both of those problems!

Mario Carneiro (Jun 16 2023 at 06:21):

The ProofWidgets fix is unproblematic to me

Mario Carneiro (Jun 16 2023 at 06:21):

But I don't think we should be caching the archive files at all

Mario Carneiro (Jun 16 2023 at 06:22):

I'm not sure exactly what you meant by the cache not understanding archive inputs

Mario Carneiro (Jun 16 2023 at 06:23):

but I don't think we should waste azure space, and more importantly users' download time, on cache files for things that are never used as dependencies

Scott Morrison (Jun 16 2023 at 06:23):

Try running lake exe cache get Archive/Imo/Imo2001Q6.lean:

% lake exe cache get Archive/Imo/Imo2001Q6.lean
uncaught exception: No match for Archive/Imo/Imo2001Q6.lean

Mario Carneiro (Jun 16 2023 at 06:24):

that's a weird way to say file not found

Scott Morrison (Jun 16 2023 at 06:24):

No -- the file exists just fine.

Mario Carneiro (Jun 16 2023 at 06:24):

that's a weird way to say cache file not found

Scott Morrison (Jun 16 2023 at 06:25):

It's not even that --- it hasn't gone looking on the azure server by that point.

Mario Carneiro (Jun 16 2023 at 06:25):

yeah I get it

Mario Carneiro (Jun 16 2023 at 06:25):

what do you expect it to say if you cache get the readme file?

Scott Morrison (Jun 16 2023 at 06:25):

Okay, good point. :-)

Scott Morrison (Jun 16 2023 at 06:26):

It just seems surprising to me to not do this. The only reason I made this PR was that I honestly typed lake exe cache get Archive/Imo/Imo2001Q6.lean earlier today, when looking at Yury's IMO PRs!

Mario Carneiro (Jun 16 2023 at 06:27):

We can certainly improve the error message, like I said that's a weird way to say that the file's not there

Mario Carneiro (Jun 16 2023 at 06:27):

but the file isn't there

Mario Carneiro (Jun 16 2023 at 06:27):

and I don't think it should be

Mario Carneiro (Jun 16 2023 at 06:28):

Although, when you typed lake exe cache get Archive/Imo/Imo2001Q6.lean there is actually a reasonable interpretation of that command which isn't just an error, which is to download all the dependencies of that file

Mario Carneiro (Jun 16 2023 at 06:29):

ideally it should be able to do this even for files which are not mathlib files

Eric Wieser (Jun 16 2023 at 06:35):

I think there should either be a flag for this, or a different exit code (or both); for the sake of scripting, I think it's good to know whether exe cache get Foo.lean has obtained a complete or a partial cache

Mario Carneiro (Jun 16 2023 at 06:42):

right now cache get doesn't really have a plumbing version but it certainly could

Mario Carneiro (Jun 16 2023 at 06:42):

e.g. -q quiet mode and just return the result in the exit code

Mario Carneiro (Jun 16 2023 at 06:44):

I don't think there is any need to differentiate between a file like this which is not in the cache because it is not a mathlib file, compared to a file which is not in the cache because it just hasn't been uploaded yet

Mario Carneiro (Jun 16 2023 at 06:44):

at least in terms of the usual progress bar mechanics

Mario Carneiro (Jun 16 2023 at 06:45):

so you would just get 99% success in this case

Kevin Buzzard (Jun 16 2023 at 06:56):

I tell Xena people "first lake exe cache get, then lake build" and if you're just installing lean 4/mathlib for the first time then often the lake build does something

Scott Morrison (Jun 16 2023 at 07:01):

So it seems we have several options for lake exe cache get Archive/Foo.lean

  1. get everything, including Foo.olean
  2. get whatever dependencies of Foo are available in the cache, warning that Foo.olean is not covered by the caching policy
  3. get whatever dependencies of Foo are available in the cache, with the only clue that the olean is not available being a 99% success message
  4. fail with an error message that Foo is not covered by the caching policy
  5. fail with an inscrutable error message

We are currently at 5. My PR results in 1. (which, as Mario points out, comes at the cost of lake exe cache get now retrieving some extra files that users might prefer not to download.)

My personal preference for behaviour is in the order listed above. I'm happy to try to implement 2., 3., or 4. if there is consensus that one of those is preferable to 1. (Or, of course, to let someone else do this.)

Yaël Dillies (Jun 16 2023 at 07:04):

It seems you are missing
1.5 get all the dependencies of Foo.olean, including itself, when running lake exe cache get Archive/Foo.lean, but do not get it when running lake exe cache get

Mario Carneiro (Jun 16 2023 at 07:07):

I don't think there is any reason to download Archive/Foo.olean, not even your initial use case

Mario Carneiro (Jun 16 2023 at 07:08):

if you want to open the file in the editor Foo.olean is not of use to you

Yaël Dillies (Jun 16 2023 at 07:09):

Does that not apply to all leaf files? That would be a great way of reducing the size of the cache.

Mario Carneiro (Jun 16 2023 at 07:09):

well there are technically no leaf files in mathlib except Mathlib.lean

Yaël Dillies (Jun 16 2023 at 07:10):

Uh okay but maybe we don't want cache for Mathlib.lean either

Mario Carneiro (Jun 16 2023 at 07:10):

plus external users want to be able to import Mathlib

Mario Carneiro (Jun 16 2023 at 07:10):

so we do want it

Yaël Dillies (Jun 16 2023 at 07:11):

What about a --no-leaf flag?

Mario Carneiro (Jun 16 2023 at 07:11):

I think it would just be cache get --deps Mathlib/Foo.lean

Mario Carneiro (Jun 16 2023 at 07:12):

This is one of many reasons I would like to hurry up and get this integrated with lake, we are literally reinventing lake here and I don't like it

Scott Morrison (Jun 16 2023 at 07:27):

So we'll stick with 5. until such a time?

Mario Carneiro (Jun 16 2023 at 07:31):

well I would assume there are some easy ways to make the error message slightly less inscrutable

Scott Morrison (Jun 16 2023 at 07:31):

!4#5128 is the part we can do in the meantime, teaching cache about ProofWidgtes.

Scott Morrison (Jun 16 2023 at 07:34):

!4#5129 replaces uncaught exception: No match for Archive/Imo/Imo2001Q6.lean with uncaught exception: Archive/Imo/Imo2001Q6.lean is not covered by the olean cache..

Mario Carneiro (Jun 16 2023 at 07:37):

(you should double check that my println suggestion actually works as intended)

Scott Morrison (Jun 16 2023 at 07:49):

Maybe something useful we could do is plan some other features we would like to add to lake. I feel like with the current situation of "only Mac has ever contributed to lake", it might be good to have some small things we could add to get the ball rolling.

My suggestion would be lake mk-all (perhaps a better name), which regenerates Mathlib.lean, Archive.lean, and so on. (But obviously doing so generically, by looking in the lakefile to decide what needs doing.)

Other ideas?

Mario Carneiro (Jun 16 2023 at 07:51):

maybe mk-all should just be an exe, utilizing a lake plumbing command to make it a one-liner

Mario Carneiro (Jun 16 2023 at 07:51):

I don't see how lake would know what files to generate otherwise

Scott Morrison (Jun 16 2023 at 09:22):

I was imagining:

  • for each lean_lib XYZ where in the lakefile, generate XYZ.lean importing every .lean file under XYZ/

as we currently do in .github/workflows/build.yml.

Mario Carneiro (Jun 16 2023 at 09:34):

But lean_lib XYZ where imports XYZ.lean

Mario Carneiro (Jun 16 2023 at 09:34):

which doesn't exist (presumably) or is out of date

Scott Morrison (Jun 16 2023 at 09:38):

I don't think that's right. e.g. with current mathlib4, try:

  • delete ./Archive.lean
  • verify lake build works fine
  • (and therefore lake mk-all could work too)
  • but lake build Archive fails

Scott Morrison (Jun 16 2023 at 09:38):

the fact that one particular XYZ.lean is missing or out of date doesn't prevent us from running lake

Arthur Paulino (Jun 16 2023 at 11:32):

Mario Carneiro said:

This is one of many reasons I would like to hurry up and get this integrated with lake, we are literally reinventing lake here and I don't like it

Yes, please. lake#153

My initial idea for cache was that it would be a quick and dirty workaround while Lake itself doesn't implement a proper solution. The target audience was ppl doing the porting.

Yaël Dillies (Jun 16 2023 at 11:36):

The linkifier is broken because it links to leanprover-community/lake instead of leanprover/lake.

Arthur Paulino (Jun 16 2023 at 11:37):

https://github.com/leanprover/lake/issues/153


Last updated: Dec 20 2023 at 11:08 UTC