Zulip Chat Archive

Stream: general

Topic: cache clean


Michael Rothgang (Jan 08 2025 at 13:22):

Arthur Paulino said:

About mathlib cache, I will suggest something I suggested a while ago, hoping someone might pick up the task.

cache could run a counting routine to warn the user whenever more than X% of the cached files are orphan. This routine could be triggered after every call to the get command. This warn could

  • Just suggest calling cache clean (which only gets rid of orphan files) or
  • Prompt the user asking whether cache clean should run, as an y/n question

Sorry for the tangent!

Your plan worked, I had some time and was bored enough: #20567.
(It doesn't prompt the user, as I don't know how to do so in Lean, and didn't feel like diving into that rabbit hole.)

Sébastien Gouëzel (Jan 08 2025 at 13:31):

I've never used cache clean (sometimes, I just wipe by hand the content of the cache directory), so I've just tried it after your message. I've done lake exe cache clean maybe 5 minutes ago, and since then there is absolutely no output, I don't know if something is happening or if the command has crashed. If you're still bored, do you think adding a progress report or something to cache clean would be feasible?

Julian Berman (Jan 08 2025 at 13:33):

5 minutes to wait on a command that removes files would probably worry me (enough to kill it and make sure my operating system still exists)?

Sébastien Gouëzel (Jan 08 2025 at 13:34):

Yes, it worried me, so I went to the cache directory to check what is going on. The number of files is slowly decreasing.

Sébastien Gouëzel (Jan 08 2025 at 13:35):

And now it's done. Probably some bad interaction with the integrated antivirus (which I've disabled on mathlib, but I should probably also disable on the cache directory).

Notification Bot (Jan 08 2025 at 14:19):

5 messages were moved here from #general > new (mac) laptop for Lean by Arthur Paulino.

Arthur Paulino (Jan 08 2025 at 14:29):

I took the liberty to factor out the discussion around cache clean.

First, I'm glad the feature worked!

Now, if it's a common practice for people to keep multiple mathlib folders, that's an use case I didn't envision when I first thought of how clean should work. So let's talk about that.

Note that clean will remove files that are orphan in the current mathlib directory but won't care about whether those files are orphan on different mathlib directories or not. For the same reason, the counting heuristic proposed in #20567 will overestimate the number of orphan files.

In order for the multiple-mathlib-folders use case to work properly, there's no way around keeping a centralized file in the .cache directory pointing back to the mathlib folders that are using it. Then knowing whether a cache file is orphan or not has to be done w.r.t. every mathlib folder. That will fix the cache clean behavior as well as the orphan count.

A word of caution: beware of cases where different mathlib folders have different versions of cache such that the hashing scheme changes among them. I know the hashing scheme has been stable for months, but it's still worth being aware of this invariant


Last updated: May 02 2025 at 03:31 UTC