Zulip Chat Archive
Stream: batteries
Topic: Move #find command to Std?
François G. Dorais (Feb 01 2024 at 07:52):
There are plenty of library search tools moving to Std. This one is older and not as sophisticated as newer search tools but it's a freebie at this time: it doesn't depend on anything properly Mathlib so moving it is basically just moving one file from Mathlib to Std.
Joachim Breitner (Feb 01 2024 at 12:08):
For good UX, it depends on having a pre-computed index, and the whole concept of these index file works well within mathlib, but doesn't easily clearly scale to a multi-package world. The tactics that are being moved to omega either have mathlib-specific hacks (not great), or could be sped up to not need a precomputed cache (like exact?
). I think this needs a solution first.
(Related thread: https://leanprover.zulipchat.com/#narrow/stream/113488-general/topic/Loogle.20is.20live!/near/417982759)
Mario Carneiro (Feb 01 2024 at 12:10):
Why does precomputed cache = mathlib-specific hack?
Mario Carneiro (Feb 01 2024 at 12:10):
just make the API to add a precomputed cache generic
Joachim Breitner (Feb 01 2024 at 12:12):
The std4 code no longer mentions mathlib in the file path, but at least in the coments:
/--
Once we reach Mathlib, and have `cache` available,
we may still want to load a precomputed cache for `exact?` from a `.olean` file.
This makes no sense here in Std, where there is no caching mechanism.
-/
def cachePath : IO FilePath := do
let sp ← searchPathRef.get
if let buildPath :: _ := sp then
let path := buildPath / "LibrarySearch.extra"
if ← path.pathExists then
return path
return ".lake" / "build" / "lib" / "LibrarySearch.extra"
Mario Carneiro (Feb 01 2024 at 12:13):
I think that path should not be hardcoded
Joachim Breitner (Feb 01 2024 at 12:13):
Even then:
Loading such a cache file only really makes sense when it was created for a root module (like Mathlib
) that imports everything you care about. This exists in mathlib, but as soon as I am working in my own little project, that imports packages A, B and C, then which index could exist, and built by whom?
Mario Carneiro (Feb 01 2024 at 12:13):
built by you
Mario Carneiro (Feb 01 2024 at 12:14):
obviously we also need an interface to make it easy for people to build their own cache
Joachim Breitner (Feb 01 2024 at 12:14):
So no cloud-built indexes (for the packages where you have some kind of cloud cache for oleans) at all?
Mario Carneiro (Feb 01 2024 at 12:15):
if there is also a method to use an existing cache (which mathlib uses), then you will benefit from that
Joachim Breitner (Feb 01 2024 at 12:15):
It seems to not degrade wrt the mathlib experience, but scale to multiple packages, we’d need something like “for every package known to lake, look for an index, load that, efficiently merge them, and then amend the index for everything not covered”
Mario Carneiro (Feb 01 2024 at 12:16):
I'm not going to suggest that because it sounds like work but if you want to do it go ahead
Mario Carneiro (Feb 01 2024 at 12:17):
mathlib + more and built from scratch are the main use cases we need to worry about for now
François G. Dorais (Feb 01 2024 at 12:17):
FWIW: I would prefer moving and _then_ improving. I think that is a good policy for things that are in the wrong place. Of course, whether this is in the wrong place is debatable.
Mario Carneiro (Feb 01 2024 at 12:18):
and as far as a MVP, I think "cache file with path provided downstream" and "built from scratch" suffice
Joachim Breitner (Feb 01 2024 at 12:18):
Even “mathlib + more” doesn’t work well with the current approach: It finds the mathlib index, loads it and now you won’t find anything from the “more” set
Mario Carneiro (Feb 01 2024 at 12:18):
which is to say, no "more"
Joachim Breitner (Feb 01 2024 at 12:19):
Fair enough… do we have an idea for a mechanism to find a suitable cache file?
Mario Carneiro (Feb 01 2024 at 12:19):
but I think it's not too hard to implement "more", you go over the environment and for everything not in mathlib you add it to the discrimination tree
Mario Carneiro (Feb 01 2024 at 12:19):
initialize : Option FilePath <- pure none
?
Joachim Breitner (Feb 01 2024 at 12:19):
So users manually set the file path?
Joachim Breitner (Feb 01 2024 at 12:20):
Or does every package set the filepath, and the last one wins? (not too bad actually)
Joachim Breitner (Feb 01 2024 at 12:22):
Since we already use the first component of the module name to more or less identify the package, another approach could be to look through
{ P.extra | P.M.M in imported module names}
and then a little bit of logic to pick the “best” one if multiple ones are found…
François G. Dorais (Feb 01 2024 at 12:22):
That's how $PATH works in shell, except that you have a choice to modify the path if you need to.
Mario Carneiro (Feb 01 2024 at 12:22):
for this MVP, there is at most one file path, i.e. either mathlib's extra file or nothing
Mario Carneiro (Feb 01 2024 at 12:23):
so there would be exactly one instance of setting it, in an early mathlib file
Joachim Breitner (Feb 01 2024 at 12:23):
That’s mathlib specific again, isn’t it? I thought we are trying to find a MVP that isn't mathlib-specific?
Joachim Breitner (Feb 01 2024 at 12:23):
Ah, you are looking for a mechainsm that isn’t inherently mathlib-specific, but would (initially) only used by mathlib?
Mario Carneiro (Feb 01 2024 at 12:23):
this isn't mathlib specific in the sense that std doesn't have to directly refer to a file in mathlib's file hierarchy
Mario Carneiro (Feb 01 2024 at 12:24):
and it means someone could have a not-mathlib, although I doubt that really happens in practice
Mario Carneiro (Feb 01 2024 at 12:25):
this is following on François G. Dorais 's point that we should get a basic version out without regressions and work to something more general later
Joachim Breitner (Feb 01 2024 at 12:25):
As long as we doubt that there is little point moving those tactics that don’t have good UX (i.e. are too slow) without an index, though, aren’t they?
Joachim Breitner (Feb 01 2024 at 12:25):
I guess that depends on whether nothing or something that’s slow is worse :-)
Mario Carneiro (Feb 01 2024 at 12:25):
Even without an index it's okay, users can just call #find
once at the top of the file, or init_find_index
or something
Mario Carneiro (Feb 01 2024 at 12:25):
or put that in an upstream file
Mario Carneiro (Feb 01 2024 at 12:26):
that's the "from scratch" workflow
Joachim Breitner (Feb 01 2024 at 12:26):
Yeah, or keep a separate tab open where you just use #find
without having to change imports too often. Fair enough.
Mario Carneiro (Feb 01 2024 at 12:27):
you can just import everything when preparing the index
Mario Carneiro (Feb 01 2024 at 12:27):
in fact, ideally #find
would be able to index files that aren't even imported
Mario Carneiro (Feb 01 2024 at 12:27):
but maybe that's too sci-fi
Joachim Breitner (Feb 01 2024 at 12:28):
BTW, are we taking about moving the #find
from mathlib, or importing the #find
provided by the Loogle library? (I originally wanted to replace #find
in mathlib, but then never got all relevant PRs merged, and now it’s in a library.)
Mario Carneiro (Feb 01 2024 at 12:28):
uh... are they not the same?
Joachim Breitner (Feb 01 2024 at 12:29):
No, https://github.com/leanprover-community/mathlib4/pull/6363 got never merged
Joachim Breitner (Feb 01 2024 at 12:30):
And, in a way, any project can already import loogle as a library dependency and have (that) #find
without needing mathlib; it’s not std, but still closer to that goal. (This isn’t done a lot yet and may have rough edges, but in principle should be possible.)
Mario Carneiro (Feb 01 2024 at 12:31):
looks like you are the one who closed the PR
Joachim Breitner (Feb 01 2024 at 12:33):
Right, I am not blaming anyone here!
Joachim Breitner (Feb 01 2024 at 12:33):
But as we all agree, this isn’t mathlib specific functionality.
Mario Carneiro (Feb 01 2024 at 12:34):
well it will have to be reviewed at some point, and TBH mathlib has a lot more bandwidth for that than std
Mario Carneiro (Feb 01 2024 at 12:35):
does #find
use the same cache as library_search?
Mario Carneiro (Feb 01 2024 at 12:36):
or are there two caches in there
Joachim Breitner (Feb 01 2024 at 12:36):
#loogle
(to disambiguate with the old #find
that Sebastian wrote) uses it’s own cache with different data structures
Mario Carneiro (Feb 01 2024 at 12:36):
is mathlib generating this cache?
Joachim Breitner (Feb 01 2024 at 12:37):
No. (But it could, of course, should it depend on Loogle
)
Mario Carneiro (Feb 01 2024 at 12:37):
so that means that adding #loogle
to mathlib would increase the download size of mathlib significantly?
Mario Carneiro (Feb 01 2024 at 12:38):
how much?
Joachim Breitner (Feb 01 2024 at 12:38):
~230MB uncompressed
(The suffix-trie for the name matching taking most space, the constant-map was I think much smaller)
Mario Carneiro (Feb 01 2024 at 12:43):
hm, I'm sure we can do better but it's not important for the MVP
Mario Carneiro (Feb 01 2024 at 12:43):
I would encourage you to get this into mathlib first
Joachim Breitner (Feb 01 2024 at 12:43):
Yes, I’m not claiming exhaustive reaping of low hanging fruit there
Mario Carneiro (Feb 01 2024 at 12:44):
this will resolve a lot of the logistical issues
Joachim Breitner (Feb 01 2024 at 12:44):
By copying the code, or by require
the library?
Mario Carneiro (Feb 01 2024 at 12:44):
copying the code I think
Mario Carneiro (Feb 01 2024 at 12:45):
require the library is also an option for mathlib but I'm not sure it will address the logistical issues
Mario Carneiro (Feb 01 2024 at 12:45):
because when it moves to std it will be closer to a "copy the code" situation
Mario Carneiro (Feb 01 2024 at 12:46):
how is loogle currently handling the generation of the index?
Mario Carneiro (Feb 01 2024 at 12:46):
doesn't mathlib need to be imported for that?
Joachim Breitner (Feb 01 2024 at 12:47):
Is there someone interested in co-owning that code? I would prefer to keep at least the code that runs the loogle service separate, for easier iteration, so maybe someone who actually wants to use it in mathlib can help maintain that copy?
Mario Carneiro (Feb 01 2024 at 12:47):
you want it to be separate from the version in std as well?
Joachim Breitner (Feb 01 2024 at 12:47):
echo "Replace toolchain by mathlib's"
curl -L https://raw.githubusercontent.com/leanprover-community/mathlib4/master/lean-toolchain -o lean-toolchain
cat lean-toolchain
rm ./lake-manifest.json
echo "Updating lake"
export LOOGLE_SECCOMP=true
lake update
cat ./lake-manifest.json
echo "Fetching mathlib cache"
lake exe cache get
echo "Building loogle"
lake build loogle
echo "Building mathlib cache"
lake exec loogle --write-index ./.lake/build/lib/LoogleMathlibCache.extra
Joachim Breitner (Feb 01 2024 at 12:48):
It can also be built mathlib-style, using #runCmd
, but since I am anyways building a CLI this is faster, as it runs in compiled code.
Mario Carneiro (Feb 01 2024 at 12:49):
I don't see a thing in that script that adds mathlib to the lakefile
Joachim Breitner (Feb 01 2024 at 12:50):
you want it to be separate from the version in std as well?
I did not think much about that yet; I guess I was waiting first for an answer to the question of ergonomic (lake-managed? reservoir-cloud-build caches) precomputed indexes; to me that seems a prerequisite
Mario Carneiro (Feb 01 2024 at 12:51):
I think the #find
in std should be agnostic to cache distribution methods
Joachim Breitner (Feb 01 2024 at 12:51):
It’s already in the lakefile; although I’d probably separate that out into a “loogle-deploy” package so that the loogle lakefile itself doesn’t have to, once people want to import the loogle package
Mario Carneiro (Feb 01 2024 at 12:51):
right, my point being that mathlib can't require
loogle if loogle requires mathlib
Joachim Breitner (Feb 01 2024 at 12:52):
Luckily, Loogle the library doesn't, and I can change that when and if mathlib wants to import Loogle.
Mario Carneiro (Feb 01 2024 at 12:52):
You will still want the loogle library of course, for the web stuff
Mario Carneiro (Feb 01 2024 at 12:53):
but I think the core logic of #find
should move out of the loogle package and into std
Mario Carneiro (Feb 01 2024 at 12:54):
and I want it to go via mathlib because otherwise I don't think it will be reviewed and tested well enough
Mario Carneiro (Feb 01 2024 at 12:54):
mathlib is one of the primary clients of #find
Joachim Breitner (Feb 01 2024 at 12:55):
No objections. Does anyone here want to help with that?
Mario Carneiro (Feb 01 2024 at 12:55):
in the short term the fact that loogle imports mathlib already means that it won't even break or need duplication in the interim
Mario Carneiro (Feb 01 2024 at 12:56):
I think you are probably going to have to be the one taking point on this
Mario Carneiro (Feb 01 2024 at 12:56):
since you are the maintainer of the loogle library
Mario Carneiro (Feb 01 2024 at 12:56):
but well, if someone volunteers then that's great
Mario Carneiro (Feb 01 2024 at 12:59):
it doesn't seem like a major priority, so it's fine by me if this just sits until you or me or someone else has the time to take it on
Joachim Breitner (Feb 01 2024 at 12:59):
Agreed
Joachim Breitner (Feb 01 2024 at 13:01):
I’m just a bit hesitant because I personally find (heh) I don’t need it as #find
, as the web interface scratches all the itches I have, and completely egoistically it’s more fun to work on code that I can just push to. So my intrinsic motivation to get it into mathlib isn’t that high, and I’d prefer to support someone who actually wants to use it.
Eric Rodriguez (Feb 01 2024 at 13:06):
I'm trying to get my metaprogramming up to scratch, and I think I'm one of Loogle's most prolific (ab)users - I'm happy to help out; I assume most of this would be modularising and porting to mathlib? I can fully understand wanting to keep it in a state of being able to edit easily - iteration within Mathlib is not fun.
Joachim Breitner (Feb 01 2024 at 13:16):
yes, this file and dependencies: https://github.com/nomeata/loogle/blob/master/Loogle/Find.lean (the main Loogle
is the CLI). If you want to see that through to become Mathlib.Tactic.Find
, that’d be great! I think it’s ok if they go out of sync temporarily; I can copy back improvements to keep the diff smaller.
Mario Carneiro (Feb 01 2024 at 13:21):
and completely egoistically it’s more fun to work on code that I can just push to
This is probably obvious and goes without saying, but this is unfortunately mutually exclusive with getting it into std
Mario Carneiro (Feb 01 2024 at 13:23):
if it's still changing rapidly or experimental, it should not be moved to std yet, although you can still iterate in mathlib without too much friction as long as you allow for enough lead time
Mario Carneiro (Feb 01 2024 at 13:25):
(the initial commit is the hardest PR, bugfixes to existing tactics are relatively easier to review)
Joachim Breitner (Feb 01 2024 at 13:31):
Even with a #find implementation in std or mathlib it could still be that loogle.lean-fro.org runs on a copy of the code, or a non-merged PR, so I there is probably cake to be had and eaten too.
Mario Carneiro (Feb 01 2024 at 13:43):
sure, if you want to do that it's fine as well
Last updated: May 02 2025 at 03:31 UTC