Zulip Chat Archive

Stream: general

Topic: Reservoir search


Martin Dvořák (May 22 2024 at 14:29):

We need a program (ideally a tactic) that will search the entire Reservoir for lemmas that

  • can be stated using only (at most) Mathlib;
  • are proved in any project that uses a reasonably new version of Lean.

Towards the future, we cannot expect that all potentially useful lemmas will find their way to Mathlib.

Henrik Böving (May 22 2024 at 15:02):

That's pretty much just loogle with a sufficiently large database.

Martin Dvořák (May 22 2024 at 15:10):

I often have trouble specifying the loogle query precisely enough.

Martin Dvořák (May 22 2024 at 15:11):

Something at least as powerful as exact? will be highly desirable.

Henrik Böving (May 22 2024 at 15:17):

Having an exact like tactic that runs locally on your computer would require you to download all of reservoir and compile it (or once we have a global build cache download all of the cache) and then set up a local search data structure for it and run queries against that. This is not feasible. If you look at other tools with larger eco systems such as Isabelle you will find that despite having super powerful tactics like sledgehammer they do (for the same reason) only work on imported theories. The way to go for them was also a web search engine, namely Serapis: https://behemoth.cl.cam.ac.uk/search/.

Henrik Böving (May 22 2024 at 15:25):

Note that even this is a very hard undertaking: If you are looking for lemmas across the entire eco system you always have to ask yourself which version of each package are you interested in. Sure we could always have the latest version indexed and do a search like that but it is not entirely clear that the latest version will always work for everyone. One approach to solve this is to forcibly integrate the entire eco system like Haskell does it with stack. They basically freeze the eco system every $TIME_INTERVAL and provide a "release of the entire eco system that works well together" if you wish. This then allows them to provide Hoogle (the similarity to Loogle is not a coincidence) instances that are capeable of searching the entire eco system at a certain point in time.

However such an approach might not be very popular with high velocity projects such as Mathlib as you get lots and lots of lemmas in each day so having an index from 2 weeks ago might already not be enough.

Martin Dvořák (May 22 2024 at 15:27):

How big is currently the total size of all .oleans in Reservoire?

Henrik Böving (May 22 2024 at 15:27):

Reservoir doesn't have a build cache yet.

Martin Dvořák (May 22 2024 at 15:28):

All right. And do you any rough estimate? Are we talking about terabytes?

Henrik Böving (May 22 2024 at 15:29):

It entirely depends on how big Lean is going to become. If you check the statistics from Rust: https://lib.rs/stats they have tens of thousands of packages right now. If we grow that big sure terrabytes are imaginable. But this type of growth is still far away.

Martin Dvořák (May 22 2024 at 15:30):

I am happy to buy a new HDD and store one or two TB of .oleans on my computer.

Joachim Breitner (May 22 2024 at 15:47):

Why do you need something like exact?? Based on your first description it seems you want to search for existing theorems, and filter them based on where their definitions come from. So no proof search needed.

My approach would be to build each package of interest (i.e. one that imports mathlib, it seems), import it, traverse the environment, look at each theorem, collect the constants mentioned therein, and check if they are all from mathlib or below.

Martin Dvořák (May 22 2024 at 16:10):

Yeah, I don't really know what is feasible, but something like what @Joachim Breitner mentioned will be needed so that we can reasonable reuse each other's work.

Martin Dvořák (May 23 2024 at 09:10):

Is it on the roadmap?

Kim Morrison (May 23 2024 at 09:20):

No. This seems like the sort of thing that can be worked on externally; no FRO superpowers required.

Martin Dvořák (May 23 2024 at 10:25):

How can I incentivize the creation? Would RFC in Mathlib make sense?

Kim Morrison (May 23 2024 at 10:31):

Write up a more detailed spec than Joachim's suggestion? Try writing a prototype?

Utensil Song (May 23 2024 at 10:51):

It would be nice if loogle can be adapted to work on any project, export the index, remove the Mathlib part to reduce the size, then it could be another build target that Reservoir builds, and these indexes are gathered on loogle server.

Then a loogle search can be performed on such a combined index, could be disabled by default to save resources, but I doubt that it would really consume much more than mathlib.

Utensil Song (May 23 2024 at 10:56):

Actually loogle CLI already supports such functionality, then the only missing part is to create a dummy project to import all packages of interest (that builds on the latest toolchain on Reservoir), then have loogle export its index.

Joachim Breitner (May 23 2024 at 10:56):

I'd say a prototype by anyone who needs this feature (what's the use case even?) would be a first step. It seems rather specialized to me, but maybe I'm not seeing the full picture yet.

Joachim Breitner (May 23 2024 at 10:57):

I mean the originally discussed feature

Joachim Breitner (May 23 2024 at 10:58):

Integrating Loogle into reservoir, either per package or even per toolchain indexing all packages would be great and I think something we clearly want eventually.

Joachim Breitner (May 23 2024 at 11:00):

But here too: this can be built by anyone, separately of reservoir, initially. And more likely sooner than waiting for a fully integrated solution

Martin Dvořák (May 23 2024 at 11:02):

I often have troubles to specify loogle queries precisely enough.
Towards the future, I'd need something at least as strong as exact? is on Mathlib.

Martin Dvořák (May 23 2024 at 11:08):

Joachim Breitner said:

It seems rather specialized

I disagree with the quoted part.

I think it is very natural, when I have a goal whose statement is meaningful to everybody, to ask whether anybody has a proof of it.

I think something like what I requested would be the most fundamental tool to enable collaboration on a scale larger than one project.

Martin Dvořák (May 23 2024 at 11:27):

Note that such an incentive is very well aligned with most of our philosophies.

One of the things I often hear is that almost every lemma should be public. At this point, we have Mathlib, which is full of public lemmas, but lemmas in all other projects are reüsable almost as little as if they were all private.

Another important initiative that (indirectly) brought many of us to Lean is Thomas Hales's Formal Abstracts.
https://formalabstracts.github.io/
The underlying assumption is that, once something is formalized (even if it was only the statement, not the proof), it is machine-searchable and accessible for everybody who can express the statement in given formal language. Well, in the first place, we should be able to formally search for theorems that have been fully formalized. That's why we need a search that covers more than just projects we have already imported. We cannot wait until every useful lemma gets into Mathlib.

The only philosophy that my proposal is misaligned with, we want Mathlib to be a large monorepo. This reduces the need to have a search covering more than one project.

Joachim Breitner (May 23 2024 at 12:02):

Martin Dvořák said:

I think it is very natural, when I have a goal whose statement is meaningful to everybody, to ask whether anybody has a proof of it.

I originally misunderstood your specification, it sounded to me like you want to list theorems with the property that they could be expressed in mathlib, but are proved in a separate library. A general search for a given goal is different, generally useful and indeed similar to exact?. Sorry for the confusion.

Martin Dvořák (May 28 2024 at 10:48):

Is anybody going to work on it soon? Unfortunately, I don't have skills to implement such a search; and, in the upcoming 19 months, I will not have time to acquire the necessary skills.

What I can offer is to start a library Mathlib++ that will not contain any public definitions but will accept almost any lemmas that were not accepted into Mathlib. Mathlib++ would not be supposed to be human-searchable; it would be just something everybody can import in place of Mathlib in order to make exact? more powerful (albeit slower). My task would be to regularly bumb Mathlib++ to the newest Mathlib so that it can be reasonably used by Mathlib users. Like an unofficial extension. What do you think?

Martin Dvořák (May 28 2024 at 11:19):

Upon a bit more thinking I believe that Mathlib++ would be more useful anyways because new Lean version are currently too frequent to gather sufficient body of lemmas on the same Lean version when they are not in one library.

Utensil Song (May 28 2024 at 11:23):

Maybe a require and import only Mathlib++ is more maintainable.

Martin Dvořák (May 28 2024 at 11:27):

I don't know what you mean by "require" but it would be used in place of importing Mathlib because it would give you Mathlib plus some extra lemmas.

Utensil Song (May 28 2024 at 11:44):

I mean the way to accept no specific lemmas, but only require other repos in the lakefile and import them into the MathlibPlusPlus package then people can import MathlibPlusPlus then run exact? etc.

Martin Dvořák (May 28 2024 at 11:56):

Well, I don't want to do it because

  • you would have to deal with different versions of Lean and Mathlib;
  • you would get too many things into context (many theorems about non-Mathlib definitions);
  • you would have to be worried about name collisions;
  • I want to keep at least a tiny bit of quality control (e.g. 89455616 + 4123186 < 9465152145644156321 should not be a lemma).

Utensil Song (May 28 2024 at 12:01):

I think these lemmas should go into mathlib or other topic-centric repos.

And what I suggested was something that could fulfill your original search idea, with certain control over the eligible criteria. Just my random thoughts after seeing this topic.

Martin Dvořák (May 28 2024 at 12:05):

Note that there are many lemmas that would be useful for Mathlib users but not for Mathlib developers.
I guess I should start a new topic about Mathlib++.
I am starting to thing that such an "extras" library could be developed almost independently of reservoir search.
The important thing is that we need at least one of them to happen asap, otherwise we keep reïnventing lemmas others have proved.

Utensil Song (May 28 2024 at 12:12):

That sound like a MathlibSnippets or MathlibRecipes to me.

Martin Dvořák (May 28 2024 at 12:15):

Are you suggesting a name for the new library?

Martin Dvořák (May 28 2024 at 12:15):

My first idea was MathlibExtras but it already means something else.

Kim Morrison (May 28 2024 at 12:19):

(MathlibExtras is no longer used. Nevertheless, I'm skeptical this is a good idea.)

Martin Dvořák (May 28 2024 at 12:21):

Alright, I'll create a new topic for it.

Utensil Song (May 28 2024 at 12:24):

Martin Dvořák said:

Are you suggesting a name for the new library?

No, I was trying to understand its scope by trying to name it. And I'm also skeptical because of the old MathlibExtras.

Martin Dvořák (May 28 2024 at 12:43):

Martin Dvořák said:

Alright, I'll create a new topic for it.

https://leanprover.zulipchat.com/#narrow/stream/113488-general/topic/Mathlib.2B.2B

Martin Dvořák (Jan 20 2025 at 09:58):

Any progress?
Is there a way to search for theorems outside of Mathlib?
Is there a way to search for theorems in Mathlib outside of the master branch?
@Joachim Breitner

Joachim Breitner (Jan 20 2025 at 09:59):

With loogle? Only if you run loogle locally, so far.

Martin Dvořák (Jan 20 2025 at 10:00):

Is there a way to run it locally but consider the entire reservoir?

Joachim Breitner (Jan 20 2025 at 10:01):

If you create a dummy package that imports all the packages you care about, yes.

Martin Dvořák (Jan 20 2025 at 10:01):

I tried to do it last year, and as others had warned me, it was horrible.

Joachim Breitner (Jan 20 2025 at 10:02):

Building all of reservoir locally, or using loogle on it?

Martin Dvořák (Jan 20 2025 at 10:02):

Importing multiple projects with Mathlib dependencies.

Joachim Breitner (Jan 20 2025 at 10:03):

Ah, that’s likely :-) But unavoidable with loogle’s current design, I fear.

Eric Wieser (Jan 20 2025 at 10:20):

What made it horrible?

Eric Wieser (Jan 20 2025 at 10:20):

Doesn't checking out the v4.15.0 tag of every project mostly just work?

Martin Dvořák (Jan 20 2025 at 10:40):

Eric Wieser said:

What made it horrible?

Lack of my technical skills, probably.

Martin Dvořák (Jan 20 2025 at 10:41):

Eric Wieser said:

Doesn't checking out the v4.15.0 tag of every project mostly just work?

Is it something that projects in Lean 4 generally do? Or something that reservoir annotates on top of them?

Kim Morrison (Jan 21 2025 at 02:29):

No, this is up to individual projects for now.

Srivatsa Srinivas (Jan 21 2025 at 04:50):

Joachim Breitner said:

Integrating Loogle into reservoir, either per package or even per toolchain indexing all packages would be great and I think something we clearly want eventually.

Can't we just build the Decl2Cache over all of Reservoir? Or is that extremely inefficient

Srivatsa Srinivas (Jan 21 2025 at 05:05):

I guess the find tactic works by using the NameRel map in order to find terms that contain a particular name in the query, for example "_ \leq 0" will try to find all the terms that Loogle has processed that contain both \leq and 0. After this it uses the names of those terms to get their Expr values from the environment. It can then filter those terms which satisfy the particular pattern. Therefore Loogle only stores two data structures, NameRel which maps the names of subterms to those of the terms containing it, and SuffixTrie to search for substrings of names

Srivatsa Srinivas (Jan 21 2025 at 07:25):

(deleted)

Joachim Breitner (Jan 21 2025 at 08:51):

But note that lean supports pattern as queries, and they are elaborated normally, so you need a single environment in which to elaborate them.

I’m not saying that it’s impossible to somehow juggle many incompatible environments, but I shy away from the complexity.

Srivatsa Srinivas (Jan 21 2025 at 17:27):

Thanks! My next project is going to have to deal with this some how.

Thanks for Loogle and LeanEgg as good starting points on how to deal with exprs

Martin Dvořák (Apr 09 2025 at 17:31):

What is the best way to search for theorems in projects downstream of Mathlib?

Henrik Böving (Apr 09 2025 at 17:40):

The answer hasn't changed since you asked the last time.


Last updated: May 02 2025 at 03:31 UTC