Zulip Chat Archive

Stream: lean4

Topic: lake -> global shared cache (like pnpm does)


Serhii Khoma (srghma) (Jun 15 2025 at 08:00):

Problem:

every time I need mathlib in some project I cp ~/projects/lean-some-old-project-that-already-has-mathlib/.lake/packages/mathlib ~/projects/lean-my-new-project/.lake/packages/mathlib

this creates many copies of the same mathlib

Solution:

Could lake build ln -s $XDG_CACHE_HOME/lake/mathlib ~/projects/lean-my-new-project/.lake/packages/mathlib just like pnpm makes links for node_modules?

Kim Morrison (Jun 16 2025 at 03:52):

It's quite complicated: every project might be using a different version of Mathlib. Who is responsible for cleaning up old ones? Deciding that they are old? etc.

Shreyas Srinivas (Jun 16 2025 at 22:40):

Garbage collection is not new in package management. Nix has one. As long as the standard tools (lake and elan) are used to manage toolchains and dependencies, they can add enough metadata (essentially reference counting) to handle garbage collection

Shreyas Srinivas (Jun 16 2025 at 22:42):

One can even add timestamps of last access from each project and ignore references which are too old during garbage collection. This would result in more aggressive garbage collection. Later, caches for old toolchains can be redownloaded if the corresponding projects are touched via lake again.

Kim Morrison (Jun 16 2025 at 23:10):

Looking forward to your PRs, @Shreyas Srinivas. :-)

Shreyas Srinivas (Jun 16 2025 at 23:16):

A PR for this feature makes much more sense after the lake caching feature has fully landed.

Shreyas Srinivas (Jun 16 2025 at 23:16):

Currently we only have mathlib caches.

Shreyas Srinivas (Jun 16 2025 at 23:17):

Although on second thoughts, a mathlib cache PR might be a good place to start testing ideas.

Shreyas Srinivas (Jun 17 2025 at 00:16):

My proposal involves changes to elan and lake. It is best that I know whether such PRs are even welcome and in line with the FROs plans before I start writing anything. And I am exceptionally skeptical that this would be met with anything other than laughter or serious derision given the scale of proposed changes. So maybe this is a script for a standup comedy. The idea is essentially an over-simplification of opam switches. At a bird's eye view level (not a crow, more like an eagle, altitude-wise):

  1. Equip elan with the ability to clone and store packages per toolchain and manage its lifecycle. This includes downloading caches, building binaries, maintaining metadata, garbage collection etc through the corresponding local lake installation. This is non-trivial work that might take a few months. But the basic idea is that elan will store all packages for a given toolchain.
  2. Add a package metadata file (or sqlite database) with the aforementioned metadata for each package that is downloaded, per toolchain.
  3. Every time lake requests a dependency, for a project on toolchain Y, it passes on the request to elan which looks into the folder and metadata for toolchain Y. In turn elan returns a symlink for lake to use for the package and performs updates to the metadata. This is another major departure from current practice and will also require months to get right.

The more bandaid like approach (essentially where everything is held together by coat hangers) is to do something mathlib cache specific, and modify cache to use a centralised storage, but this makes no sense if caching is arriving for all projects in a shape and form as yet unknown to the wider public.

Sebastian Ullrich (Jun 17 2025 at 01:22):

We are already onto that topic, stay tuned


Last updated: Dec 20 2025 at 21:32 UTC