Zulip Chat Archive

Stream: lean4

Topic: deleting unused cache


Jon Eugster (Apr 10 2024 at 14:48):

I need a routine to delete "unused" files to reduce the size of the .lake folder. As my old routine is failing, I would like to ask if anybody knows how to achieve that in a more robust way.

The use case is the following: A project (NNG) imports Mathlib and downloads the entire cache (multiple GB worth of .oleans), but it only imports Mathlib.Tactic.Common. I know that the user can't/doesn't modify the imports, so I would like to get rid of anything that hasn't been used (in the sense that it is not built by running lake build inside the project, i.e. not imported by the project's main file).

What's the best way to determine which oleans I should delete? My ideas so far were

  • use unix timestamps to see which files were touched during lake build. Obviously that's buggy and broken now.
  • write a tool copying most of import-graph's logic to get a list of all used files, then delete the rest.

Is there a better approach? Maybe there are parts of the lake API that could be used?

Kim Morrison (Apr 15 2024 at 04:04):

Can you just have it run lake exe cache get Mathlib.Tactic.Common?

Jon Eugster (Apr 15 2024 at 16:23):

In theory, that's a good shout! I see two issues:

1) the post-update-hook calls the entire lake exe cache get. I assume I can work around that disabling Mathlib's post-update hook.

2) The typical usage scenario is the following: Kevin edits the NNG and imports a bit more of Mathlib; CI then builds everything and sends the built stuff to the server. Ideally Kevin does not need to manually update the CI build script to specify which parts of the cache it needs to download.


Last updated: May 02 2025 at 03:31 UTC