Zulip Chat Archive

Stream: lean4

Topic: Exclude .lake directory from backups with CACHEDIR.TAG


Henrik Lievonen (Aug 06 2024 at 13:26):

The .lake/ directory can grow quite large, and I think it should be marked for exclusion from backups as it can be recreated from the rest of the files. Adding a CACHEDIR.TAG file to .lake/ directory would accomplish this as many backup tools support it, including restic, Borg and GNU tar. And while at it, (some parts of) ~/.elan/ directory should also be marked as cache.

Other programming languages are already doing this, like Rust that is excluding both target/ and ~/.cargo/{git,registry} directories.

Kim Morrison (Aug 10 2024 at 01:25):

Sounds like a good idea to me, and if we can just follow exactly what rust does it's probably a no brainer.

Would you mind opening this as an issue on the lean4 repository? If it is sufficiently straightforward @Mac Malone might also be happy to see a PR implementing this, but I'd defer any decision making to him.

Mac Malone (Aug 10 2024 at 01:28):

I would be happy with a PR creating a .lake/CACHDIR.TAG on lake init/lake new. Also, feel free to make an issue for this feature on the Lean 4 repository, so that we can keep tract of the request.

Henrik Lievonen (Aug 12 2024 at 08:25):

Opened an issue.

Henrik Lievonen (Aug 12 2024 at 08:27):

Mac Malone said:

I would be happy with a PR creating a .lake/CACHDIR.TAG on lake init/lake new.

I think creating .lake/CACHEDIR.TAG only on lake init and lake new is too conservative. Say you clone Mathlib and run lake build or lake exe cache get. In this case, .lake/CACHEDIR.TAG should also be created.

Henrik Lievonen (Aug 12 2024 at 08:29):

The specification says the following:

The application should also regenerate the cache directory tag if it disappears: e.g., if the entire contents of the cache directory is deleted without the directory itself being deleted.

I don't think we should be this strict with the re-creation of the tag. Maybe a suitable middle ground would be to create .lake/CACHEDIR.TAG whenever .lake/ directory gets created, whether it is by lake new or lake build.

Kim Morrison (Aug 12 2024 at 10:54):

Given we are in a git-only ecosystem, there is some advantage to only creating it at new and init: if the maintainer of the project really doesn't want it, they can make it go away simply by not committing it.

Kim Morrison (Aug 12 2024 at 10:55):

I think I prefer the simplicity of just having this be part of the new project templating system, rather than new logic attached to the "create .lake" step.

Edward van de Meent (Aug 12 2024 at 11:05):

this sounds like more of a "what should be the default" than a "should this exist" argument... an option allowing the user to opt-in or -out of "always creating the tag" seems reasonable to me...

Edward van de Meent (Aug 12 2024 at 11:06):

and then the logic would be "if an author really doesn't want it, they can disable the feature adding it"

Kim Morrison (Aug 12 2024 at 11:06):

I agree, but also -- having an option to disable is extra complexity for a marginally valuable feature.

Joachim Breitner (Aug 12 2024 at 11:11):

Henrik Lievonen said:

Mac Malone said:

I would be happy with a PR creating a .lake/CACHDIR.TAG on lake init/lake new.

I think creating .lake/CACHEDIR.TAG only on lake init and lake new is too conservative. Say you clone Mathlib and run lake build or lake exe cache get. In this case, .lake/CACHEDIR.TAG should also be created.

I agree with this. Wiping .lake is a very common rescue action, so it’s valuable that .lake as a whole is never committed to git.

Henrik Lievonen (Aug 12 2024 at 11:32):

Considering that the default template has /.lake in .gitignore, none of .lake/ should be included in the repository. This includes CACHEDIR.TAG. It is there to only mark that this directory is something that can be automatically recreated if necessary, and hence deleting it e.g. by ignoring it from backups should not have a catastrophic effect.

Henrik Lievonen (Aug 12 2024 at 11:33):

Also, if ignoring whole .lake/ directory seems too much, adding CACHEDIR.TAG to all .lake/build/ directories would already take us a long way. Those should definitely never include anything that cannot be recreated.

Kim Morrison (Aug 12 2024 at 11:55):

Can't we just have .lake/CACHEDIR.TAG in git, and everything else ignored?

Henrik Lievonen (Aug 12 2024 at 11:58):

I guess, by adding !/.lake/CACHEDIR.TAG to .gitignore, but I don't see what it would accomplish. The file is there just to indicate that the contents of the directory are computer-generated.

Kim Morrison (Aug 12 2024 at 12:00):

Given that people routinely rm -rf .lake it does sound like we would have to regenerate this file on every lake build. To be honest it sounds like not worth the bother, but I'll defer to Mac.

Jon Eugster (Aug 12 2024 at 12:28):

I also agree with Joachim that rm -rf .lake/ is a very common thing to do when debugging if something is a problem with your own setup or a bug in lake.

I wonder if it's rather lake clean (or lake update, dont know the exact details about which parts of them are overlapping) that could regenerate such a file. I thought lake build calls one of them if .lake (or the manifest) is missing and that's probably the right time to create this file, too.

Henrik Lievonen (Aug 12 2024 at 13:21):

I would take rm -rf .lake/ as one more reason to not have CACHEDIR.TAG in git. Instead, it should be recreated whenever .lake/ directory gets created by lake.

Henrik Lievonen (Aug 15 2024 at 15:40):

Managed to find a couple of hours to implement this: lean4#5059


Last updated: May 02 2025 at 03:31 UTC