Zulip Chat Archive

Stream: general

Topic: Mathlib++

Martin Dvořák (May 28 2024 at 12:43):

There are many lemmas that would be potentially useful for Mathlib users but not for Mathlib developers.
I think we should create a separate library for them because pushing these lemmas into Mathlib often meets resistance of Mathlib maintainers.
There is a growing frustration among Mathlib contributors whose PRs get rejected.
At the same time, there is a lot of frustration among Mathlib users who think that "a lemma for this basic thing should definitely already exist" but exact? gives them nothing.
I don't think, however, that we should aim to change Mathlib standards.
It is amazing we have Mathlib as a a nearly-all-encompassing monorepo and it is important that Mathlib has high standards for contributions.
The problem is that Mathlib is de-facto the only mathematical library in Lean 4.
There are many lemmas that would be great to have somewhere but Mathlib is not the right place for them and "ordinary" downstream projects are not well searchable in the current ecosystem (see previous discussion for context).
That's why I propose we should create a library Mathlib++ that would import Mathlib and contain some "extras".
Some principles I suggest:

Mathlib++ must not contain any public definitions or abbreviations (private definitions for the sake of proving something are acceptable and encouraged).
Mathlib++ must not contain lemmas that exact? can one-shot after importing Mathlib; if something gets to Mathlib, it must be removed from Mathlib++.
Mathlib++ should contain lemmas that have at least a tiny chance of being used by Mathlib users.
Mathlib++ must be frequently updated to the newest version of Mathlib (so that it can be imported in place of Mathlib without losing access to new Mathlib features).
Mathlib++ lemmas don't have to be human-searchable, they don't have to be documented, and it is not strictly necessary that files are well organized; the important thing is that exact? can find them in some meaningful use cases.

What do you think?

Martin Dvořák (May 28 2024 at 12:44):

/poll If we happen to create such a library, what should its name be?
Mathlib++
MathlibExtras
Mathlibits

Yaël Dillies (May 28 2024 at 12:54):

Somewhat tangentially, here is a message I sent to the reviewers stream.
Yaël Dillies said:

My views on the utility of a mathematical library are as such:

It should provide a single coherent framework for mathematics

It should list most of the reasonably mature developments that have been made in Lean

Mathlib achieves the first but not the second. Isabelle AFP achieves the second but not the first.

Hence I was wondering if we could acquire a Lean AFP repo whose entry requirements would be much lower than Mathlib. Similarly to the Isabelle AFP, it would be made of many Lean libraries, which would each have their own requirements (in particular they could use different Lean/Mathlib versions) and could import each other. It would act as a staging place for developments that are not yet mature enough for Mathlib, but mature enough that they shouldn't lie forgotten in someone's random repo

LeanCamCombi de facto acts as a "LeanAFP for combinatorics". I've been collecting there a bunch of combinatorics formalisations that didn't make it to mathlib3 and PRing them to Mathlib from there.

PS: I know that the "they could use different Lean/Mathlib versions" part would be a challenge for Lake, so having a single version for all libraries in the AFP would be an acceptable compromise at the start

Yaël Dillies (May 28 2024 at 12:55):

It was however remarked that this was not much more than "Reservoir, but in one repo"

Chris Birkbeck (May 28 2024 at 13:02):

I had wondered about something like this, but I was just thinking of a mathlib+++ where anything that passes CI is ok (and imports mathlib like the above). But I have no idea how not to spend all one's time just fixing everything that breaks each time we updated mathlib. I agree that having things in repos that don't make it into mathlib seems like a waste, but I don't know a solution to this, other than PRing often.

On the other hand, if this became too popular maybe people wouldn't bother PRing to mathlib which would be bad.

Michael Rothgang (May 28 2024 at 13:06):

The other day, talking to @Yves Jäckle gave me a similar idea: I've informally called it "not-yet-mathlib"; perhaps "mathlib-staging" or "archive of formal Lean proofs" is an even nicer name.
Not everybody who formalises something useful has the time (or energy, or perhaps even expertise) to bring this all the way into mathlib. Often, polishing something for mathlib may mean generalising it a lot (to the point you're re-proving things), lots of code improvements and refactoring --- and waiting for the sometimes glacial review process to complete.

That said: there's a lot of value in moving all things possible to mathlib. If possible, hiring somebody to polish formalisation code and move it into mathlib would be very worthwhile. (Sometimes, that's not so hard. In general, it might require subject-specific knowledge.)

Features I like about this idea:

lowers the bar to entry
could increase discoverability of formalisations (though: perhaps the real solution is to have better reservoir search, at least on the level of concepts)
being maintained to match mathlib is useful

I am worried

that updating this huge body of code will be too much work for an individual
this lowers incentives to PR material to mathlib --- which would be a net negative, in my opinion

Martin Dvořák (May 28 2024 at 13:07):

Chris Birkbeck said:

On the other hand, if this became too popular maybe people wouldn't bother PRing to mathlib which would be bad.

Mathlib++ wouldn't contain any "new theories", so Mathlib would have its future uncontested by Mathlib++.
Mathlib++ would be for utilities for working with "existing theories" that did not meat Mathlib standards but might be useful for ordinary Mathlib users.

Chris Birkbeck (May 28 2024 at 13:08):

Martin Dvořák said:

Chris Birkbeck said:

On the other hand, if this became too popular maybe people wouldn't bother PRing to mathlib which would be bad.

Mathlib++ wouldn't contain any "new theories", so Mathlib would have its future certain.
Mathlib++ would be for utilities for working with "existing theories" that did not meat Mathlib standards but might be useful for Mathlib users.

Yeah sure I know what you mean, I was just thinking of the updating problem and how it would also apply to mathlib++, since as it grows it would be an increasingly harder task to update mathlib.

Michael Rothgang (May 28 2024 at 13:10):

I just wrote the following: I wonder if this is the best solution, though. Taking a step back: can you articulate precisely, @Martin Dvorak, what problem you're trying to solve? Is it

review process to mathlib takes too long
review process can mean rewriting code many times over (to make it general enough)
not all work is in mathlib, since moving something to mathlib takes time and energy
as a user, my basic lemma X is not found (since it's a very special case of a much more general mathlib lemma)

I think all of these are worth addressing, but have different solutions.

So, as I understand it, the answer is "just the last item": mathlib++ is only for convenience lemmas, not about new definitions.

Martin Dvořák (May 28 2024 at 13:12):

Michael Rothgang said:

I just wrote the following: I wonder if this is the best solution, though. Taking a step back: can you articulate precisely, @Martin Dvorak, what problem you're trying to solve? Is it

review process to mathlib takes too long

review process can mean rewriting code many times over (to make it general enough)

not all work is in mathlib, since moving something to mathlib takes time and energy

as a user, my basic lemma X is not found (since it's a very special case of a much more general mathlib lemma)

I think all of these are worth addressing, but have different solutions.

So, as I understand it, the answer is "just the last item": mathlib++ is only for convenience lemmas, not about new definitions.

Basic lemma X is not found because it didn't make it to Mathlib (for various reasons).

My point is that (1) and (2) and (3) lead to (4), which is bad especially for ordinary Mathlib users who just want to have a convenient tool and don't plan to contribute to Mathlib.

Michael Rothgang (May 28 2024 at 13:23):

So, to give a particular example, an "elementary probability" library that re-defines probability theory over finite probability spaces is not in scope of Mathlib++, right?

Andrew Yang (May 28 2024 at 13:39):

I'd say the review process for lemmas are more lenient than definitions to begin with, and the suggestions are mostly for maintainability. So either Mathlib++ will have no review process and it would be painful to bump it (to achieve point 4), or it will have some form of a code style check and one might as well PR it into mathlib itself.

Martin Dvořák (May 28 2024 at 13:53):

Andrew Yang said:

So either Mathlib++ will have no review process and it would be painful to bump it (to achieve point 4), or it will have some form of a code style check and one might as well PR it into mathlib itself.

False dichotomy.

Have a look at #10629 for example. This lemma would be potentially useful but not as useful as the generalized version, which I did not provide. For contribution to Mathlib, such a generalization would be needed, but I am not willing to work on it. Of course, the general version would be best, but if the available options are having only my version in Mathlib++ or no version at all being accessible for other users, I think the former option is better.

Or have a look at #12806 for a different kind of blocker. I suggested two lemmas, but they were presumably deemed as bad for the library design (I am extrapolating from a single review here because everybody else ignored it for so long that it acquired a merge conflict and hence disappeared from the queue) even tho they would help Mathlib users.

There are just so many things that are bad for Mathlib maintainers but good for Mathlib users.

Yaël Dillies (May 28 2024 at 13:56):

Martin Dvořák said:

everybody else ignored it for so long that it acquired a merge conflict

Just noting that PRs acquire merge conflicts pretty quickly these days

Shreyas Srinivas (May 28 2024 at 14:18):

If you think mathlib PRs take time, check out the batteries queue. We are clearly beginning to hit scale problems

Yaël Dillies (May 28 2024 at 14:19):

#batt-queue ?

Shreyas Srinivas (May 28 2024 at 14:22):

What we need is a way of spreading out responsibilities of maintaining to non-100x lean devs and users without compromising on the quality of the ecosystem.

Ruben Van de Velde (May 28 2024 at 14:23):

Just send out the batt-signal

Josha Dekker (May 28 2024 at 14:24):

My two cents on the matter after having read the above:

I think that having a place to collect 'pre-Mathlib' results (e.g. those that don't meet style standard, are not sufficiently general, those that would require effort to get into Mathlib) might be good for multiple reasons:

it shows people that the result exists
it allows people to prepare minor results which may eventually be relevant for Mathlib, but not yet (as they are preludes for some bigger result, where only the bigger results mandates including them to Mathlib)
it provides for a nice to-do list for people seeking to contribute to Mathlib

As such, I think perhaps that the format of Mathlib++ as another library may not be the optimal route to go: what if instead you make it into some dictionary/wiki: each (group of strongly related) results can go to a separate page, which could e.g. include:

Lean code that works with some Lean + Mathlib version (as demonstrated by the fact that some PR or something built)
The main objections to adding this to Mathlib in the current form
Any discussions that have already been had on adding the results to Mathlib + future pointers
A list of projects/results which would need these results
The author of the original code or someone who claims responsibility

I think this approach has several advantages:

it does not require constant bumping
it makes insightful why the result is useful and what needs to be done to add it
it is lightweight (a wiki) rather than another library. In particular, I would be afraid that, if it easier to add to Mathlib++ as a library, it might start growing at a faster pace than Mathlib, which may take up quite a bit of memory...
it provides for a to-do list for people that have some Lean time but don't know what exactly to tackle

Antoine Chambert-Loir (May 28 2024 at 14:25):

For the two lemmas that are pointed out by Martin, I believe something else could be useful: a stackoverflow-like wiki of “how to do this in Lean?”.
Lemmas that are direct application of a mathlib mechanism are unlikely to be added to mathlib, probably because the library won't be searchable, there will be tons of ways to do simple things. Rather, we need, and especially newcomers need a place where simple stuff is solved, with explanations and pointers.
(I don't believe that stackoverflow is now the good place to collect this.)

Yaël Dillies (May 28 2024 at 14:27):

Mostly, it would be great if this wiki was not review-bound. That would relieve a lot of review pressure

Josha Dekker (May 28 2024 at 14:38):

Even requests for new results could go in such a wiki
(missing results are proven results which have been generalized in the wrong direction, after all)

Jireh Loreaux (May 28 2024 at 14:43):

@Martin Dvořák I've noticed that you have mentioned several times you want exact? to work and it fails, and then you complain (sorry, perhaps this is not the right word; I don't mean to sound negative) that it should work and then you suggest some new piece of engineering on top of Mathlib to make it work. Personally, I think you're just expecting too much of exact?. It's really not meant to find every simple thing. Often other tactics / approaches are the way to go.

Martin Dvořák (May 28 2024 at 15:18):

Josha Dekker said:

My two cents on the matter after having read the above:
(...)

You suggest the optimistic approach: Organize information about unfinished things and somebody will create a Mathlib-ready version that will eventually get into Mathlib.

I suggest the pessimistic approach: Assume that many things will never get done in the most general way that Mathlib would like to have — how to make them useful for Mathlib users nevertheless?

Martin Dvořák (May 28 2024 at 15:19):

Antoine Chambert-Loir said:

For the two lemmas that are pointed out by Martin, I believe something else could be useful: a stackoverflow-like wiki of “how to do this in Lean?”.

Searching a wiki is such a pain — compared to writing exact? and immediately getting the result.

Josha Dekker (May 28 2024 at 15:21):

Martin Dvořák said:

Josha Dekker said:

My two cents on the matter after having read the above:
(...)

You suggest the optimistic approach: Organize information about unfinished things and somebody will create a Mathlib-ready version that will eventually get into Mathlib.

I suggest the pessimistic approach: Assume that many things will never get done in the most general way that Mathlib would like to have — how to make them useful for Mathlib users nevertheless?

No, not necessarily. A wiki will still have good searchability, so even results that never make it into Mathlib can be found there. I think that putting everything in one large library would not necessarily be extremely helpful, as exact? will still fail on occasions. However, if you like, you could automatically generate a library from such a wiki by sorry’ing every proof, giving you some more flexibility

Martin Dvořák (May 28 2024 at 15:46):

Jireh Loreaux said:

Martin Dvořák I've noticed that you have mentioned several times you want exact? to work and it fails, and then you complain (sorry, perhaps this is not the right word; I don't mean to sound negative) that it should work and then you suggest some new piece of engineering on top of Mathlib to make it work. Personally, I think you're just expecting too much of exact?. It's really not meant to find every simple thing. Often other tactics / approaches are the way to go.

I expected somebody would say that I want from exact? too much.
There is one important thing I need to say about it.
You are making a mistake that people who very well understand certain area often make.

I recently saw a very inspirational video. It is about Magic, a game I sometimes play, but it provides some lessons that are applicable to other areas. Here, I would like to highlight the lesson No.1, which starts at this timestamp: https://youtu.be/QHHg99hwQGY?si=v5Td-G1irdy-9u-6&t=134

The mistake is Fighting Against Human Nature.
If something feels like there should a ready-to-use lemma for it, there should better be a ready-to-use lemma for it.
Experts like you will say "why would you need a lemma — you can call positivity to obtain it" or something like it (ignore for a moment a the tactic in question does not exist yet if we talk about the last example I gave; I am arguing why the lemma should exist even if given tactic exists at the same time).
The problem is that this is not how a beginner thinks!
If something feels like "there must be a lemma for it", the thoughts don't immediately go into "could I use a tactic to automate it" (in the best scenario, exact? fails and the user proceeds to use the recommended tactic instead; in the worst scenario, exact? fails and the user gives up, leaves the project unfinished and never touches Lean again).
Expecting the user not to expect a lemma for basic things is Fighting Against Human Nature.

Imagine an Average Joe, who (to make things worse) has very little will to learn new things, needs to get something done.
What tools would have to be at his hand so that Average Joe finishes the job before he gives up?

As Mark Rosewater puts it:
"Don't change your players to match your game. Change the game to match your players."
Lean and Mathlib is used by many "players" who never comment on Zulip, so we don't even know how much their mindset is different from the mindset of Lean users on Zulip.

Martin Dvořák (May 28 2024 at 16:00):

Michael Rothgang said:

So, to give a particular example, an "elementary probability" library that re-defines probability theory over finite probability spaces is not in scope of Mathlib++, right?

Your suggestion falls under a second thing that was on my mind (which I didn't bring up because I didn't want to post too much).
There should be a library for things that Mathlib doesn't have and Mathlib maintainers are sure Mathlib will never have.
Most importantly, such a library should contain API for special cases of "difficult general theories".
For example, there should be ε-δ API for basic limits that undergraduates can easily use.
Such a thing isn't expected to be a part of Mathlib; separate teaching libraries are developed.
However, it would be convenient to have one centralized library for such things that would be regularly updated to the newest Mathlib in order to easy the transition between "beginner stuff" and "advanced stuff".
The current teaching libraries often use "Lean dialects" which are too far from "standard Lean" — which may be a good thing, but also a bad thing.

I think such a library should be built on top of Mathlib++.

Henrik Böving (May 28 2024 at 16:12):

Regarding the "lemma instead of a tactic" point. I feel like this approach will not scale. Many of our tactics can cover infinitely large theories, you will not be able to avoid tactics indefinitely. The solution for this type of stuff could instead be a tactic like hint (iirc that's the name?) Which tries out many tactics and sees if one of them succeeds. If we wanted to make this more peformant we could additionally inform a hint style tactic with "triggers" that make it try certain tactics on a goal based on the shape of a goal.

In general this process can also be improved with the things we are already doing, more automation, standards for the way that a lemma should be named to ease discover ability, tools like loogle. These approaches are proven to work with people that know what they are doing very well already. I don't believe that we should develop a library geared towards people that cannot use these tools but instead make them accessible enough such that they can be used. This can be done throigh improvements on the tools or their integration, the way they are taught to people including good documentation etc.

Regarding less generalized APIs than mathlib. I'm not an expert on mathlib or large scale formalization at all but providing theories that are powered by mathlib but only expose an educational framework does sound reasonable to me for teaching purposes. However such a project would also need to have high standards. This is because it would be one of the first contact points of many mathematics students with Lean and should thus be very high quality in order to not scare them away.

Martin Dvořák (May 28 2024 at 16:17):

Henrik Böving said:

Many of our tactics can cover infinitely large theories, you will not be able to avoid tactics indefinitely.

My point is not that we should provide so many lemmas that beginners wouldn't need other tactics at all.
My point is that we should provide a lemma for situations where most beginners feel like there should a lemma.
If a beginner writes exact? and it fails, even if subsequent hint or suggest succeeds, we are already losing popularity points for not providing the thing the user expected.

Henrik Böving (May 28 2024 at 16:18):

My point is that maybe pointing users to exact? As a first step is already a wrong idea, precisely because we will never be able to generally make it succeed.

Martin Dvořák (May 28 2024 at 16:20):

Thanks for your point! I will keep it in mind for the next time I teach Lean.

Patrick Massot (May 28 2024 at 17:52):

Martin, I really don’t think this is a good idea. What you propose is not a Mathlib++, it’s a Mathlib--, a bad quality version of Mathlib that would be useful to nobody (not even you). I think trying to make it useful would reinvent being Mathlib (or a library with similar quality goals). You seem to think that Mathlib has high standards to annoy contributors, but it’s the opposite.

Patrick Massot (May 28 2024 at 17:53):

And of course Henrik is right about tactics. The only road to scalability is to get automation good enough to have less lemmas per definition, not more.

Patrick Massot (May 28 2024 at 17:54):

I think what you want, but don’t know that you want, is to replace exact? by auto. I’m not saying that auto is already good enough, but it will be eventually.

Bolton Bailey (May 28 2024 at 19:28):

I, for one, agree with the need for more discoverability for lemmas that are not yet in mathlib. Seeing this tweet has made me very nervous that someone might duplicate 10 hours of work at some point on account of not searching the PRs for a theorem.

Bolton Bailey (May 28 2024 at 19:32):

It would be nice, perhaps, to at least have the docs/loogle/moogle show results for lemmas that are in CI-passing PRs, appropriately tagged with a :construction_zone: to indicate that the result is not in mathlib yet.

Bolton Bailey (May 28 2024 at 19:50):

I think a point of tension here is this: There are mathlib contributors who do most of their mathlib contribution directly in mathlib PRs, and there are or will be people who write lean projects who have no interest in upstreaming their stuff to mathlib or any other repository. So as long as these two types of Lean users exist, we can't have all Lean lemmas in one repo, so for this reason it seems like the only way to make all lean lemmas accessible is through some web search interface.

Bolton Bailey (May 28 2024 at 20:16):

Chris Birkbeck said:

I had wondered about something like this, but I was just thinking of a mathlib+++ where anything that passes CI is ok (and imports mathlib like the above). But I have no idea how not to spend all one's time just fixing everything that breaks each time we updated mathlib. I agree that having things in repos that don't make it into mathlib seems like a waste, but I don't know a solution to this, other than PRing often.

On the other hand, if this became too popular maybe people wouldn't bother PRing to mathlib which would be bad.

I had a similar thought - a version of mathlib where the only PRs that get reviewed are ones that add new proof_wanted theorems, and which simply accept automatically any pull request that improves some measures of performance, or which doesn't hurt performance too much while completing a wanted theorem. It would be interesting to see how that would develop (but of course it would be something of a technical lift).

llllvvuu (May 28 2024 at 21:01):

My impression is that Lean and Mathlib are both quite bleeding-edge and take a lot of work to coordinate upgrades, so it might not be sustainable to try to maintain another (even larger) monorepo. A decentralized approach (Reservoir) seems more scalable since, while it pushes maintenance onto users (dependency conflicts), it is more surmountable as people have more independence in fixing and working around breakage.

Indeed having great categorization / descriptions / search over Reservoir projects is probably all that is needed.

Tactics (and AI) are probably the future, but we mortals can only speculate. It seems good to also expect a traditional software ecosystem.

llllvvuu (May 28 2024 at 21:07):

Also, I could imagine getting to a point where there is a de-facto non-Mathlib project for whatever area you're working on, such that most of the time you only need one non-Mathlib project, which doesn't create dependency conflicts.

This is actually already what I observe, every project I've seen has exactly one non-Mathlib dependency.

llllvvuu (May 28 2024 at 21:21):

Regarding wiki, I hope search and AI will grow to be able to extract answers out of Zulip and GitHub (including people's random projects) instead of needing to maintain a separate explicitly structured thing

Jireh Loreaux (May 28 2024 at 21:43):

Martin Dvořák said:

The problem is that this is not how a beginner thinks!
If something feels like "there must be a lemma for it", the thoughts don't immediately go into "could I use a tactic to automate it" (in the best scenario, exact? fails and the user proceeds to use the recommended tactic instead; in the worst scenario, exact? fails and the user gives up, leaves the project unfinished and never touches Lean again).
Expecting the user not to expect a lemma for basic things is Fighting Against Human Nature.

Some of this has already been addressed above, but let me point out a few things.

From my experience newcomers (especially students) using Lean try to use tactics in all kinds of ways that don't work. For instance, they use simp when they mean norm_num, they try to rw with implications, they think simp will solve anything and everything they deem "simple", and all kinds of other problems. So, I could argue that these are "human nature", but doing so would lead to scope creep on the part of all these various tactics. This could make them unusably slow, or blur the boundaries between them. This is not good for the ecosystem. Instead, users just need to spend time learning. If they think one tactic is going to save them from any mess, they are sorely mistaken.
As above, beginners just need to spend time learning, and that's a nontrivial problem; learning takes time and effort. Of course, we want to ease the transition where it makes sense, but I think here more automation is key, not necessarily more lemmas. (Indeed, more lemmas can make searching harder, not easier.)
In stark contrast to your point that users should be looking for lemmas and not reaching for tactics, Heather Macbeth's course / textbook aimed at introducing students to proof takes the exact opposite approach. She has as an explicit goal that students should never need to go searching the library for some lemma, and instead should reach for automation (some of it expressly designed for her course), and she teaches them how to use all of those tactics. Of course, at some point, users will need to appeal to existing theorems, but I think her approach shows that it has the potential to be much later than one might expect. Moreover, by the time they get to that point, they should be reasonably proficient at using Lean and its various tactics, so they won't need to reach for tools like exact? all the time.

Martin Dvořák (May 28 2024 at 21:58):

Jireh Loreaux said:

Some of this has already been addressed above, but let me point out a few things.

Interesting perspective; thanks for sharing!

Mario Carneiro (May 28 2024 at 22:54):

Shreyas Srinivas said:

If you think mathlib PRs take time, check out the batteries queue. We are clearly beginning to hit scale problems

Batteries is currently suffering a severe staffing problem re: maintenance. Unfortunately I don't have the time to organize the "hiring process" for new maintainers either :(

Bulhwi Cha (May 28 2024 at 23:27):

At least, I can review batteries#809. But I also have to do other work.

Filippo A. E. Nuccio (May 29 2024 at 08:10):

Antoine Chambert-Loir said:

For the two lemmas that are pointed out by Martin, I believe something else could be useful: a stackoverflow-like wiki of “how to do this in Lean?”.
Lemmas that are direct application of a mathlib mechanism are unlikely to be added to mathlib, probably because the library won't be searchable, there will be tons of ways to do simple things. Rather, we need, and especially newcomers need a place where simple stuff is solved, with explanations and pointers.
(I don't believe that stackoverflow is now the good place to collect this.)

@Antoine Chambert-Loir I believe this is one of the most interesting suggestions of the whole thread, but I wonder why you think that stackoverflow is not the right place. You mean that the "general" stackoverflow is not, or that something like LeanOverflow/MathlibOverflow would still not be the right place?

Eric Wieser (May 29 2024 at 08:15):

Perhaps worth noting that the proof assistant stack exchange seems to be struggling to meet the activity bar; if pursuing that platform to collect information is of interest, it would be better to do so sooner rather than later

Martin Dvořák (May 29 2024 at 08:17):

Antoine Chambert-Loir said:

Lemmas that are direct application of a mathlib mechanism are unlikely to be added to mathlib, probably because the library won't be searchable, there will be tons of ways to do simple things.

I have a different opinion. If there are tons of ways to do simple things, then the first thing users try is more likely to succeed.

I think that better than educating the users how to use our tool is to make our tool so that it can be used without being educated.

Filippo A. E. Nuccio (May 29 2024 at 08:19):

Eric Wieser said:

Perhaps worth noting that the proof assistant stack exchange seems to be struggling to meet the activity bar; if pursuing that platform to collect information is of interest, it would be better to do so sooner rather than later

Right, I have been wondering whether the simultaneous presence of several proof assistants was partly to blame for this state of affairs.

Antoine Chambert-Loir (May 29 2024 at 10:16):

The issue with SO, for me, is its behavior wrt generating AI, in particular the fact that they own everything which is on the website and resell it.

Yves Jäckle (May 29 2024 at 18:16):

Would it be possible to have a stream here on zulip where people can point to their stuff and add "tags", so that zulips search engine can be used to find them ? A sort of "Here's code for X".

I'll take Yael's APAP repo as an example, though I don't know how much of it has been brought to mathlib. A post pointing to it might look like this:

Name: Arithmetic Progressions - Almost Periodicity
Link: https://github.com/YaelDillies/LeanAPAP

Content tags:
Kelley-Meka bound on Roth numbers, discrete (difference) convolution, discrete Lp norms, discrete Fourier transform, Marcinkiewicz-Zygmund inequality, ...

API tags:
Indicator function, Finset density, WithLp.equiv, double dual embedding, ...

I am not familiar with that project, so there is a lot that could be added in terms of tags.

The point is that one would simply write a verbose version of the major topics or lemmas and definitions of ones files as tags. People exploring the project could add tags as responses in a thread (topic), if they would have liked to see them. The worst that could happen when using this search method is that you look at project API that has already been brought to mathlib. Assuming users have checked mathlib or used loogle or moogle before trying this method, this would only result in a small waste of time, which is, in my opinion, better then rewriting existing code.

Patrick Massot (May 29 2024 at 18:21):

This could easily be part of reservoir.

Mario Carneiro (May 29 2024 at 18:35):

I think this would be a useful value-add on top of reservoir, that is, a curated "most useful projects in lean" list independent of reservoir's "projects with a pulse" list

Mario Carneiro (May 29 2024 at 18:36):

But it could certainly be presented from within reservoir, modulo the question of who is doing the curating

Siddhartha Gadgil (May 30 2024 at 04:04):

What is Mathlib++ supposed to add over a collection of separate repositories with proper dependencies?

One of the goals of Mathlib is to make sure all parts work together, to avoid duplication and fragmentations. I can see Mathlib++ with the goal of this and only this (for example, suboptimal proofs are fine). But for this goal, perhaps something like scala's community build will work.

Concretely, I suggest:

Have a repository that just has Mathlib and a bunch of other repositories (with concrete tags) as dependencies, plus CI to make sure things are fine.
Some scrutiny to minimize duplication.
It is the responsibility of each repo to have a version for each stable toolchain and the corresponding Mathlib. Failing this the repo is removed.

Ruben Van de Velde (May 30 2024 at 05:11):

One thing to consider is how we can make it possible for the community to step in to keep projects up to date if the original author is (temporarily) unavailable to take PRs

Ralf Stephan (May 30 2024 at 07:18):

PRs with a pre-specified keyword in the name that keep the project up to date could be checked by CI and flagged as "official" fix, without being merged.

Kim Morrison (May 30 2024 at 07:34):

My take from the discussion so far is that everything valuable will be eventually covered by Reservoir, and that Mathlib++ itself as discussed here only adds negative value (by actively encouraging poor quality work).

Kim Morrison (May 30 2024 at 07:36):

I think a great idea would be for someone to volunteer to maintain a "SupLean" repository, which aims to require/import everything on Reservoir meeting some quality/popularity threshold, and reports back to project maintainers when they have made choices that make it impossible to import everything simultaneously. Then either users or a web service could run exact? in this "import everything" file.

Kim Morrison (May 30 2024 at 07:37):

Hopefully this could even be automated, with some tooling identifying pairs of projects in the collection set that can't be simultaneously imported.

Kim Morrison (May 30 2024 at 07:37):

(I am very keen to namespace all of Mathlib, and I think this would be a prerequisite for something like this working at all.)

Martin Dvořák (May 30 2024 at 07:40):

Kim Morrison said:

Mathlib++ itself as discussed here only adds negative value (by actively encouraging poor quality work).

I wouldn't say I am "actively encouraging poor quality work".
If anything, it will be "actively collecting poor quality work".
I believe it is almost always better to have a messy proof with a lot of room for improvement than not having the lemma proved at all.

Martin Dvořák (May 30 2024 at 07:44):

I think we all can agree that Mathlib++ will not be necessary for collaboration on a larger scale.
We just need a mechanism that will allow users to discover "the lemma you want has been already proved; you can import it here".

Jireh Loreaux (May 30 2024 at 07:44):

Martin Dvořák said:

I wouldn't say I am "actively encouraging poor quality work".
If anything, it will be "actively collecting poor quality work".

I don't think these are meaningfully different. If there's a place to collect them, people who don't want to put forth the effort / time to contribute to Mathlib will be incentivized to add poor quality things to this repo. Without this, they may be incentivized to contribute to Mathlib to avoid the code rotting.

Martin Dvořák (May 30 2024 at 07:47):

Jireh Loreaux said:

Without this, they may be incentivized to contribute to Mathlib to avoid the code rotting.

Here I would like to return to the first sentence of the whole thread:
"There are many lemmas that would be potentially useful for Mathlib users but not for Mathlib developers."
Where will these lemmas go so that users can find them?

Martin Dvořák (May 30 2024 at 07:57):

Kim Morrison said:

I think a great idea would be for someone to volunteer to maintain a "SupLean" repository, which aims to require/import everything on Reservoir meeting some quality/popularity threshold, and reports back to project maintainers when they have made choices that make it impossible to import everything simultaneously. Then either users or a web service could run exact? in this "import everything" file.

Sounds great, indeed!

Do you think we will have a tool that will allow a Lean v4.8.0 project to import a Lean v4.6.0 project?

Yaël Dillies (May 30 2024 at 08:18):

Kim Morrison said:

(I am very keen to namespace all of Mathlib, and I think this would be a prerequisite for something like this working at all.)

This is still completely unrealistic if

doc-search still shows the fully qualified name of lemmas
Mathlib still contains super long namespace names like CategoryTheory, MeasureTheory or OmegaCompletePartialOrder

Martin Dvořák (May 30 2024 at 08:24):

Kim Morrison said:

(I am very keen to namespace all of Mathlib, and I think this would be a prerequisite for something like this working at all.)

This is one of the things I don't like hearing but cannot argue against.
Deep down I know you are right.
At the same time, I don't want to have all the boilerplate.

Utensil Song (May 30 2024 at 09:38):

Martin Dvořák said:

Kim Morrison said:

I think a great idea would be for someone to volunteer to maintain a "SupLean" repository, which aims to require/import everything on Reservoir meeting some quality/popularity threshold, and reports back to project maintainers when they have made choices that make it impossible to import everything simultaneously. Then either users or a web service could run exact? in this "import everything" file.

Sounds great, indeed!

This is actually the same idea I suggested in the previous topic, but you stated why you don't want to do it, and started this topic. What's changed?

In priciple this can be easily implemented automatically as now we have lakefile.toml. The missing piece is an API to query reservior to get the projects that pass the build in recent toolchains.

Martin Dvořák (May 30 2024 at 09:46):

Utensil Song said:

What's changed?

First I misunderstood your proposal, then I misjudged.

As for the decision Mathlib++ vs Sink project, my opinion will probably depend on whether we can import projects that use different versions of Lean (and Mathlib).

Ruben Van de Velde (May 30 2024 at 09:48):

If you actually want to use the results in those projects, you're going to need them on the same versions anyway

Utensil Song (May 30 2024 at 10:08):

The idea is to use the same toolchain to import them. It's reservior's goal to help tracking and somewhat incentivizing the maintainers of Lean packages to build in the latest toolchain.

The automatic bump script in the SupLean project could bump Lean versions in the Lean rc version stage, check if all already-required projects pass in the latest toolchain, if not, open an issue and gently ping the maintainer. Hopefully this process could make them all build in the latest stable toolchains, then SupLean would at least be able to work for each stable Lean version.

Utensil Song (May 30 2024 at 10:14):

Before these package build in the latest toolchain, they could be automatically removed from the SupLean project. And when they build again, SupLean will add them back, so SupLean will alway have the complete set of packages that work for the chosen toolchain, available to a exact? query.

Ideally, to improve the stability, a package that passed build for a toolchain but failed in a later commit will not be removed, but to fix to the commit that works for the toolchain, until it has a newer commit that builds on the latest toolchain. This might require extra support from reservior query API.

Utensil Song (May 30 2024 at 10:22):

IMHO, the SupLean idea is more practical than Mathlib++ because the former is decentralized and automatable in nature, and the latter is a centralized non-official repo, manually maintained, so it would diverge maintenance efforts, contributor efforts, and might not be sustainable.

Kim Morrison (May 30 2024 at 10:37):

Someone could start on SupLean right away. Just pick half a dozen down-stream of Mathlib projects, and worry about automating the list once Reservoir is a bit more ready for prime time.

Martin Dvořák (May 30 2024 at 10:40):

Kim Morrison said:

Someone could start on SupLean right away.

All right, I'll do it.

Martin Dvořák (May 30 2024 at 10:43):

Ruben Van de Velde said:

If you actually want to use the results in those projects, you're going to need them on the same versions anyway

@Shreyas Srinivas has some insight into dependencies and versions. Can you please share it here?

Martin Dvořák (May 30 2024 at 10:46):

/poll What should be the name of the project that imports other projects?
SupLean
LeanSup
LeanTop
LeanSink

Shreyas Srinivas (May 30 2024 at 10:49):

It is not a huge insight so much as an observation that sooner or later some form of version numbering and dependency management will be needed. Further that exact toolchain matching may not be necessary. But all this discussion already happened in the middle of last year. No point rehashing it.

Utensil Song (May 30 2024 at 10:50):

I think Sup in SupLean means docs#Sup, no need for a LeanSup variant.

Shreyas Srinivas (May 30 2024 at 10:51):

The other thing is, no matter what you call this project, it is going to be a huge and frustrating maintenance nightmare to chase after people to keep their projects up to date.

Shreyas Srinivas (May 30 2024 at 10:52):

Currently reservoir tracks which project build against which toolchain

Martin Dvořák (May 30 2024 at 10:53):

Fortunately, I am dumb enough to jump head-first into it.

Shreyas Srinivas (May 30 2024 at 10:53):

So people make their informed choices. They can fork the projects they wish to use an updated version of and so on.

Yaël Dillies (May 30 2024 at 10:53):

Note that we had automated version branches in Lean 3 using a script. It would be great to restore that

Eric Wieser (May 30 2024 at 10:53):

Version branches help for lean versions, but not downstream projects with mathlib versions

Eric Wieser (May 30 2024 at 10:55):

Encouraging downstream projects to bump via a mathlib tagged commit would probably go a long way

Shreyas Srinivas (May 30 2024 at 10:55):

Martin Dvořák said:

Fortunately, I am dumb enough to jump head-first into it.

Consider the utility of this for the end user. What incentive do I have to use your super lean library which, with the promised cloud caching on reservoir, will simply fill up my storage.

Eric Wieser (May 30 2024 at 10:56):

I would be surprised if it uses up more than twice the storage of mathlib

Shreyas Srinivas (May 30 2024 at 10:56):

Eric Wieser said:

Encouraging downstream projects to bump via a mathlib tagged commit would probably go a long way

Where does that leave batteries dependent projects

Shreyas Srinivas (May 30 2024 at 10:57):

Eric Wieser said:

I would be surprised if it uses up more than twice the storage of mathlib

As I understand it, the FRO has much grander goals.

Shreyas Srinivas (May 30 2024 at 10:57):

Also Lean Copilot takes up 10GB

Shreyas Srinivas (May 30 2024 at 10:57):

That's 2.5x Mathlib already

Martin Dvořák (May 30 2024 at 10:58):

Shreyas Srinivas said:

What incentive do I have to use your super lean library which, with the promised cloud caching on reservoir, will simply fill up my storage.

For me, storage is cheaper than my time I need to reprove lemmas that other Lean users proved.

Shreyas Srinivas (May 30 2024 at 11:01):

One of two things is going to happen:

You don't have to re-prove anything. You can already just import their project.
You are going to have to update the proof because the author can't be reached or bothered. Because you will be maintaining it, this process might have to done over and over

So, depending on the repo and the lemma, you either haven't really solved the re-proving problem, or it doesn't even exist

Shreyas Srinivas (May 30 2024 at 11:02):

You are trying to solve a tooling issue with a non-tooling fix. With toolchain and dependency management that should/hopefully will be solved in lake and reservoir.

Mario Carneiro (May 30 2024 at 11:02):

Shreyas Srinivas said:

Eric Wieser said:

I would be surprised if it uses up more than twice the storage of mathlib

As I understand it, the FRO has much grander goals.

Sure, but so does mathlib. In a previous discussion I estimated that a quarter of all extant public lean code is in mathlib, and IIRC Mac measured it and it was even more, like 30% or so. It's an interesting metric to track, since it measures the extent of decentralization in the lean world

Mario Carneiro (May 30 2024 at 11:04):

My guess is that in the past 5 or so months this percentage has been increasing, not decreasing, although I'm well aware the FRO would like the needle to tip the other direction. But that's just speculation, we need more data to say for sure (and maybe reservoir could start collecting it?)

Mario Carneiro (May 30 2024 at 11:07):

Shreyas Srinivas said:

Also Lean Copilot takes up 10GB

Note that in the above metrics I'm counting lean code, not model weights. And it would be really silly to cache this as build data, since it's just downloading the data from somewhere else

Shreyas Srinivas (May 30 2024 at 11:09):

So 70% of lean code isn't mathlib? It has been less than a year since the port. That's pretty impressive.

Mario Carneiro (May 30 2024 at 11:10):

I don't think it's ever been more than 50%

Mario Carneiro (May 30 2024 at 11:10):

there are tons of student projects and such on github

Mario Carneiro (May 30 2024 at 11:11):

plus there are a few high-profile projects outside mathlib

Shreyas Srinivas (May 30 2024 at 11:11):

I am waiting for the wave of "implement the language in lean" similar to "implement it all in rust"

Shreyas Srinivas (May 30 2024 at 11:11):

Lean 4 is still very very young

Mario Carneiro (May 30 2024 at 11:11):

? That wave came and went

Ruben Van de Velde (May 30 2024 at 11:11):

Did something come of the automated mathlib bumping github action?

Mario Carneiro (May 30 2024 at 11:13):

dunno if that's on the lean-action roadmap cc: @Kim Morrison

Shreyas Srinivas (May 30 2024 at 11:15):

Anyway, to return to the point, @Martin Dvořák : I see this as an extremely laborious and thankless undertaking with few long term rewards. If the promised version management and cloud caching lands even within two years from now, and lake becomes more and more like cargo, people will use that over one large super project import.

Ruben Van de Velde (May 30 2024 at 11:16):

Mario Carneiro said:

dunno if that's on the lean-action roadmap cc: Kim Morrison

No, someone on here promised to create it because they didn't like me updating the old lean 3 one

Shreyas Srinivas (May 30 2024 at 11:18):

Also, I wonder what fraction of mathlib dependent projects could very well work with something like a Mathlib-lite library

Mario Carneiro (May 30 2024 at 11:18):

Most can, of course. The issue is that everyone has a different idea of what the subset is

Mario Carneiro (May 30 2024 at 11:19):

What I was actually suggesting was not really project that imports everything; that might be useful as an 'integration test' and probably run by the FRO, it's also a maintenance nightmare (or more likely, broken all the time). Rather, I was thinking something more like https://github.com/rust-unofficial/awesome-rust : a wiki page with a bunch of links to projects considered useful by someone with opinions

Mario Carneiro (May 30 2024 at 11:20):

and no attempt to actually integrate them or keep them up to date

Shreyas Srinivas (May 30 2024 at 11:21):

Mario Carneiro said:

Most can, of course. The issue is that everyone has a different idea of what the subset is

Coq's stdpp could be a useful guideline

Mario Carneiro (May 30 2024 at 11:22):

that's literally batteries

Shreyas Srinivas (May 30 2024 at 11:22):

Surely it isn't there yet, but it could be.

Mario Carneiro (May 30 2024 at 11:22):

what is 'it'?

Shreyas Srinivas (May 30 2024 at 11:23):

Batteries

Mario Carneiro (May 30 2024 at 11:23):

Note that lean core is way more featureful than the coq stdlib, so there is less need for batteries to have a ton of things in it

Mario Carneiro (May 30 2024 at 11:24):

or rather, it used to have more things but they are now in core

Martin Dvořák (May 30 2024 at 11:24):

Shreyas Srinivas said:

Anyway, to return to the point, Martin Dvořák : I see this as an extremely laborious and thankless undertaking with few long term rewards. If the promised version management and cloud caching lands even within two years from now, and lake becomes more and more like cargo, people will use that over one large super project import.

SupLean will be a temporary solution.

I will be more than happy if, one day (the sooner the better), we will have tools that will make SupLean obsolete.

Mario Carneiro (May 30 2024 at 11:24):

Martin, I think you should try it, if only to see why it's so difficult

Mario Carneiro (May 30 2024 at 11:25):

I would be surprised if you can get it to compile

Martin Dvořák (May 30 2024 at 11:26):

Mario Carneiro said:

there are tons of student projects and such on github

I've heard of a guy named Pareto...

Shreyas Srinivas (May 30 2024 at 11:27):

Mario Carneiro said:

Note that lean core is way more featureful than the coq stdlib, so there is less need for batteries to have a ton of things in it

I'd like to use tactics like nlinarith, field_simp, ring, set, zify, qify etc without downloading Mathlib cache. It is mostly just the tactics and basic math data types like naturals, rationals, and reals.

Mario Carneiro (May 30 2024 at 11:28):

I would too. This is a lake issue that should be fixed

Mario Carneiro (May 30 2024 at 11:29):

I think it might be on Mac's agenda for this month or next IIRC?

Shreyas Srinivas (May 30 2024 at 11:32):

Oh that's a lot sooner than the safe two years estimate I had in mind. Martin there is a really short window of utility for this superlibrary then.

Eric Wieser (May 30 2024 at 11:32):

I'd like to use tactics like nlinarith, field_simp, ring, set, zify, qify etc without downloading Mathlib cache.

Isn't this solvable today with lake exe cache get YourFile?

Eric Wieser (May 30 2024 at 11:33):

Which should download only the cache for the files that YourFile imports

Shreyas Srinivas (May 30 2024 at 11:33):

In Mathlib. I think Martin wants more

Shreyas Srinivas (May 30 2024 at 11:36):

BTW, Martin, do you have a plan for cloud caches for this super library? Without that, building your projects depending on them is going to be expensive and time-consuming.

Mario Carneiro (May 30 2024 at 11:40):

Eric Wieser said:

I'd like to use tactics like nlinarith, field_simp, ring, set, zify, qify etc without downloading Mathlib cache.

Isn't this solvable today with lake exe cache get YourFile?

Yes, the improvements in question (essentially upstreaming lake exe cache) would mean that other components like the server and lake build transparently use this version of cache get instead of the completionist one

Eric Wieser (May 30 2024 at 11:40):

lake build doesn't use the cache at all, does it?

Mario Carneiro (May 30 2024 at 11:41):

I think this is the main issue with using lake exe cache get YourFile in practice, you have to calibrate it based on the imports of your file

Mario Carneiro (May 30 2024 at 11:41):

that's correct, it doesn't use the cache

Shreyas Srinivas (May 30 2024 at 11:41):

How big is the cache with your import is Mathlib.Tactic?

Mario Carneiro (May 30 2024 at 11:41):

try it and see

Eric Wieser (May 30 2024 at 11:42):

Exactly as big as it needs to be for you to use every tactic that exists

Mario Carneiro (May 30 2024 at 11:42):

note that Mathlib.Tactic is also completionist, you should probably import the specific tactics you want

Yaël Dillies (May 30 2024 at 11:43):

Eric Wieser said:

Encouraging downstream projects to bump via a mathlib tagged commit would probably go a long way

That's what I do with PFR (and APAP when I bump PFR)

Mario Carneiro (May 30 2024 at 11:43):

hopefully that's what every downstream project should be doing, except those working closely with some PR sequence (like FLT I think)

Yaël Dillies (May 30 2024 at 11:47):

Well, APAP also works closely with some PR sequence. It's just that I also bump when a new Lean version comes along

Yaël Dillies (May 30 2024 at 11:48):

Actually, I must have been doing it wrong since I update to the new rc1 rather than the stable :thinking:

Yaël Dillies (May 30 2024 at 11:48):

I also do not tag the bump commit. Mind giving some advice on what I should be doing, Mario?

Mario Carneiro (May 30 2024 at 11:50):

I think you should bump to the stable and also tag your project if you are also concerned about projects downstream of APAP

Yaël Dillies (May 30 2024 at 11:51):

What does "bump to the stable" mean in terms of lake update and in terms of mathlib version?

Utensil Song (May 30 2024 at 11:51):

There should be a tagging convetion that reservior recognizes (and to show which tag/commit builds on which toolchain) then it would naturally become everyone's practice.

Yaël Dillies (May 30 2024 at 11:51):

Yeah, my question basically boils down to whether such a convention already exists

Mario Carneiro (May 30 2024 at 11:52):

On the tag/branch for that stable, you should use require mathlib "..." @ "v4.37.0" and then lake update

Yaël Dillies (May 30 2024 at 11:53):

I believed something like lake update stable existed? Is this still a daydream?

Shreyas Srinivas (May 30 2024 at 11:53):

It does

Mario Carneiro (May 30 2024 at 11:53):

lake update foobar exists, but it will update the dependency foobar

Utensil Song (May 30 2024 at 11:53):

image.png

It doens't exist yet. But reservior remembers the commit that passes.

Shreyas Srinivas (May 30 2024 at 11:53):

You change the version string to "stable" and call lake update

Shreyas Srinivas (May 30 2024 at 11:56):

Change the mathlib require line in lakefile.lean to require mathlib "..." @ "stable". Call lake update

Utensil Song (May 30 2024 at 11:56):

Mario Carneiro said:

On the tag/branch for that stable, you should use require mathlib "..." @ "v4.37.0" and then lake update

I use a script to change the lean-toolchain file according to the branch name, and use the following to require Mathlib:

def leanVersion : String := s!"v{Lean.versionString}"

require mathlib from git
  "https://github.com/leanprover-community/mathlib4" @ leanVersion

Yaël Dillies (May 30 2024 at 11:58):

Mario Carneiro said:

On the tag/branch for that stable, you should use require mathlib "..." @ "v4.37.0" and then lake update

And where should I do that? On master? on a new branch called stable?

Yaël Dillies (May 30 2024 at 11:58):

Does Reservoir look at non-master branches of projects?

Utensil Song (May 30 2024 at 11:58):

That's how I can test which version did my project fail, when I wake up a few months later, and there are many rc/stable versions happened.

Shreyas Srinivas (May 30 2024 at 12:04):

I set up a fresh math project to experiment with lake exe cache get YourFile. The Sample/Basic.lean contained only one import import Mathlib.Tactic.

The project is called Sample. There was an "uncaught exception"

:~/Projects/Programs/Lean/Sample$ lake exe cache get Sample.lean

Attempting to download 3848 file(s)
Downloaded: 3848 file(s) [attempted 3848/3848 = 100%] (100% success)
Decompressing 4570 file(s)
unpacked in 4576 ms
info: Sample: no previous manifest, creating one from scratch
info: mathlib: cloning https://github.com/leanprover-community/mathlib4.git to '././.lake/packages/mathlib'
info: batteries: cloning https://github.com/leanprover-community/batteries to '././.lake/packages/batteries'
info: Qq: cloning https://github.com/leanprover-community/quote4 to '././.lake/packages/Qq'
info: aesop: cloning https://github.com/leanprover-community/aesop to '././.lake/packages/aesop'
info: proofwidgets: cloning https://github.com/leanprover-community/ProofWidgets4 to '././.lake/packages/proofwidgets'
info: Cli: cloning https://github.com/leanprover/lean4-cli to '././.lake/packages/Cli'
info: importGraph: cloning https://github.com/leanprover-community/import-graph.git to '././.lake/packages/importGraph'
info: mathlib: running post-update hooks
uncaught exception: Unknown package directory for Sample

Shreyas Srinivas (May 30 2024 at 12:04):

The total space used was 4.1 G

Mario Carneiro (May 30 2024 at 12:05):

that's because the post-update hook ran, which calls lake exe cache get

Mario Carneiro (May 30 2024 at 12:05):

you can set the environment variable MATHLIB_NO_CACHE_ON_UPDATE=1 to disable the hook when running lake commands that indirectly call it

Mario Carneiro (May 30 2024 at 12:06):

the uncaught exception is because you requested a file from the cache that does not exist in mathlib

Mario Carneiro (May 30 2024 at 12:06):

you need to request the root file(s) in mathlib that you are importing

Mario Carneiro (May 30 2024 at 12:07):

i.e. Mathlib.Tactic in this case

Shreyas Srinivas (May 30 2024 at 12:07):

Oh okay. I thought YourFile was a file from my project with its imports

Utensil Song (May 30 2024 at 12:12):

Utensil Song said:

image.png

It doens't exist yet. But reservior remembers the commit that passes.

And they are recorded in the index repo e.g. https://github.com/leanprover/reservoir-index/blob/master/leanprover-community/batteries/builds.json

Yay, no need for an extra API support, one can just work with the index repo.

Shreyas Srinivas (May 30 2024 at 12:16):

Here's the output now:

Lean/Sample$ MATHLIB_NO_CACHE_ON_UPDATE=1 lake exe cache get Mathlib.Tactic
info: mathlib: cloning https://github.com/leanprover-community/mathlib4.git to '././.lake/packages/mathlib'
info: batteries: cloning https://github.com/leanprover-community/batteries to '././.lake/packages/batteries'
info: Qq: cloning https://github.com/leanprover-community/quote4 to '././.lake/packages/Qq'
info: aesop: cloning https://github.com/leanprover-community/aesop to '././.lake/packages/aesop'
info: proofwidgets: cloning https://github.com/leanprover-community/ProofWidgets4 to '././.lake/packages/proofwidgets'
info: Cli: cloning https://github.com/leanprover/lean4-cli to '././.lake/packages/Cli'
info: importGraph: cloning https://github.com/leanprover-community/import-graph.git to '././.lake/packages/importGraph'
Warning: .lake/packages/mathlib/Mathlib.Tactic not found. Skipping all files that depend on it
No files to download
Decompressing 2013 file(s)
unpacked in 1804 ms

Folder size : 2.1 GB. That's a massive improvement

Shreyas Srinivas (May 30 2024 at 12:18):

Though I don't understand that warning

Utensil Song (May 30 2024 at 12:19):

I'm very optimistic that SupLean with proper automation would not be a maintance nightmare. The use case is very limited though, so I personally would not invest time to implement it.

But such a project should be feasible now, and become easier and easier to maintain until one day it no longer needs to exist thanks to tooling improvements in the ecosystem.

Shreyas Srinivas (May 30 2024 at 12:20):

@Utensil Song : Reservoir remembers the last version the project builds with. Martin wants/needs it to work with the latest toolchain that mathlib works on. The reservoir index is definitely part of a better tooling solution. Either Martin needs to create a fork of each project and keep them updated or persistently remind project owners or maintainers to do this.

Mario Carneiro (May 30 2024 at 12:24):

The reason I don't think SupLean would work is because you need all projects involved to be on the same toolchain, and even if there is a 90% chance that the average project has a version on that toolchain, that means that if you have 10+ projects then it's no longer likely to be possible to combine them

Shreyas Srinivas (May 30 2024 at 12:24):

The flaw in the plan is the cache

Mario Carneiro (May 30 2024 at 12:24):

if you want 1000 lean projects to agree on a version the chances of this working are vanishingly small

Utensil Song (May 30 2024 at 12:24):

Shreyas Srinivas said:

Utensil Song : Reservoir remembers the last version the project builds with. Martin wants/needs it to work with the latest toolchain that mathlib works on. The reservoir index is definitely part of a better tooling solution

My goal aims a little bit lower. Helping the exact? users to figure out what's available in a few recent toolchains would be good enough. By a few recent toolchains I mean maybe the needed lemma doesn't build in the latest toolchain, but on 2 previous toolchains, the user then have found the projects that contains the needed lemma, that's already a success for Martin's goal.

Mario Carneiro (May 30 2024 at 12:25):

unless you kick projects out according to whether they build, but then you will have to constantly change the list of projects every new version depending on who is on top of things

Shreyas Srinivas (May 30 2024 at 12:25):

And I don't see Martin's plan for a cloud cache for this super library yet. It requires cache to be upstreamed to lake and work in a less version constrained way. Without cloud caching, builds are going to take way too long.

Mario Carneiro (May 30 2024 at 12:26):

I don't see what a cloud cache has to do with it

Shreyas Srinivas (May 30 2024 at 12:26):

If I import a super library

Utensil Song (May 30 2024 at 12:26):

IIRC, reservior already has plans for cloud cache/release.

Shreyas Srinivas (May 30 2024 at 12:26):

and most of it isn't cached

Mario Carneiro (May 30 2024 at 12:26):

Why would you import the super library?

Shreyas Srinivas (May 30 2024 at 12:26):

Isn't that Martin's idea for Suplean?

Mario Carneiro (May 30 2024 at 12:26):

I don't expect it to be any more than an integration test

Utensil Song (May 30 2024 at 12:27):

lean4#4270

Utensil Song (May 30 2024 at 12:27):

SupLean only needs to fetch the packed artifacts.

Yaël Dillies (May 30 2024 at 12:28):

Yaël Dillies said:

Mario Carneiro said:

On the tag/branch for that stable, you should use require mathlib "..." @ "v4.37.0" and then lake update

And where should I do that? On master? on a new branch called stable?

:ping_pong:

Utensil Song (May 30 2024 at 12:29):

I consider it exactly a thought experiment of the integration test of a Sup of Lean packages, and I'm optimistic that in near future, more project will be able to agree on a few recent toolchains, no need to be the latest.

Shreyas Srinivas (May 30 2024 at 12:30):

what are the version constraints for lean-toolchain to allow lake pack/unpack to work? Are they similar to Mathlib cache (viz. Project toolchain === Mathlib toolchain) @Mac Malone ?

Shreyas Srinivas (May 30 2024 at 12:31):

Yaël Dillies said:

Yaël Dillies said:

Mario Carneiro said:

On the tag/branch for that stable, you should use require mathlib "..." @ "v4.37.0" and then lake update

And where should I do that? On master? on a new branch called stable?

:ping_pong:

This needs a separate thread

Martin Dvořák (May 30 2024 at 12:35):

Kim Morrison said:

I think a great idea would be for someone to volunteer to maintain a "SupLean" repository, which aims to require/import everything on Reservoir meeting some quality/popularity threshold, and reports back to project maintainers when they have made choices that make it impossible to import everything simultaneously. Then either users or a web service could run exact? in this "import everything" file.

First experiment:
https://github.com/madvorak/sup-lean

I tried to import these two projects at the same time:
https://github.com/AlexKontorovich/PrimeNumberTheoremAnd
https://github.com/leanprover-community/flt-regular

Build failed. It seems that the latter project requires Fin.castIso which was last week (re)moved in Mathlib.
Should I try picking a specific commit so that it builds?

Shreyas Srinivas (May 30 2024 at 12:36):

Shreyas Srinivas said:

Though I don't understand that warning

What is this warning about?

PS : I really like the reduction in space consumption to 2.1 GB from 4.1GB

Shreyas Srinivas (May 30 2024 at 12:37):

@Martin Dvořák : This is just a glimpse of the kind of versioning CSPs you will have to solve

Martin Dvořák (May 30 2024 at 12:38):

Well, in the future, I may decide to import only those projects that will "collaborate" with me.

Shreyas Srinivas (May 30 2024 at 12:38):

Trust the FRO. Trust Mac. The tooling will land soon.

Utensil Song (May 30 2024 at 12:40):

Yaël Dillies said:

Does Reservoir look at non-master branches of projects?

I checked the Reservoir script, no. It indexes repos, then in build time, it just clones the default branch.

Checking tags would also be time consuming, and easily exceeds Github API limits, so it might not scale.

Utensil Song (May 30 2024 at 12:41):

Martin Dvořák said:

Should I try picking a specific commit so that it builds?

Yes, you can pick it according to the index, see here.

Mario Carneiro (May 30 2024 at 12:42):

Shreyas Srinivas said:

Shreyas Srinivas said:

Though I don't understand that warning

What is this warning about?

PS : I really like the reduction in space consumption to 2.1 GB from 4.1GB

From the code it seems like it should accept either Mathlib.Tactic or Mathlib/Tactic.lean, but the error message suggests that the latter will work better

Shreyas Srinivas (May 30 2024 at 12:42):

Tracking specific commits and then commit chains is going to multiply the maintenance nightmares over just following the main branch.

Shreyas Srinivas (May 30 2024 at 12:47):

Martin Dvořák said:

Well, in the future, I may decide to import only those projects that will "collaborate" with me.

This also means you need to import this suplean as opposed to just integration testing that Mario suggests. I hope you have plans for a cache.

Shreyas Srinivas (May 30 2024 at 12:54):

Mario Carneiro said:

From the code it seems like it should accept either Mathlib.Tactic or Mathlib/Tactic.lean, but the error message suggests that the latter will work better

Yes this works without warnings. Maybe there is a bug with handling module names. The new storage consumption is still 2.1 GB which is a nice 50% reduction in size.

Arthur Paulino (May 30 2024 at 12:55):

Just an idea, ignore if useless: what if you reduce this problem to branch management within the mathlib repository itself by cherry-picking commits and unifying them in a single/non-conflicting branch? This way you make use of the whole infra that's already set for mathlib

I mean, this is common practice in software development. You pick enough updates you want to merge on main next. This alternate branch is often called dev

Arthur Paulino (May 30 2024 at 12:59):

The strategy is that dev contains relevant work, but not quite ready to be merged into main right away. But it's available for people to consciously use it if intended

Utensil Song (May 30 2024 at 13:09):

Just wrote a quick script to see if the repos on reservior build on recent toolchains:

image.png

Utensil Song (May 30 2024 at 13:13):

I would say this is better than how I imagined.

Shreyas Srinivas (May 30 2024 at 13:15):

Kim Morrison said:

(I am very keen to namespace all of Mathlib, and I think this would be a prerequisite for something like this working at all.)

I hope this message doesn't get lost in the thread. This is worth a separate discussion. IMO namespacing mathlib is not avoidable if you want to interoperate with other projects.

Ruben Van de Velde (May 30 2024 at 14:02):

Martin Dvořák said:

Kim Morrison said:

I think a great idea would be for someone to volunteer to maintain a "SupLean" repository, which aims to require/import everything on Reservoir meeting some quality/popularity threshold, and reports back to project maintainers when they have made choices that make it impossible to import everything simultaneously. Then either users or a web service could run exact? in this "import everything" file.

First experiment:
https://github.com/madvorak/sup-lean

I tried to import these two projects at the same time:
https://github.com/AlexKontorovich/PrimeNumberTheoremAnd
https://github.com/leanprover-community/flt-regular

Build failed. It seems that the latter project requires Fin.castIso which was last week (re)moved in Mathlib.
Should I try picking a specific commit so that it builds?

try again

Patrick Massot (May 30 2024 at 14:31):

I think the most important goal here should be that nobody spends time on this idea except Martin if they really want to.

Jireh Loreaux (May 30 2024 at 15:03):

Martin, I think what would actually be much more useful to you than SupLean + exact? is for all (most?) reservoir projects to build a loogle cache, and then have a loogle search bar on the reservoir page that can search all packages. I think @Joachim Breitner may have already mentioned this above. It seems very useful, but probably will take some time to engineer.

Alex Meiburg (May 30 2024 at 15:24):

Patrick Massot said:

This could easily be part of reservoir.

Siddhartha Gadgil said:

What is Mathlib++ supposed to add over a collection of separate repositories with proper dependencies?

Echoing these -- I'm having trouble imagining what purpose Mathlib++ would serve that isn't served by reservoir.

If you have need of a fact that is "out of scope" for mathlib, (because it's some esoteric or highly specialized theory) then that could be it's own repo. If you want a better way of finding lemmas, that's best served by a specialized search tool or wiki. Right?

Mac Malone (May 30 2024 at 17:09):

Shreyas Srinivas said:

what are the version constraints for lean-toolchain to allow lake pack/unpack to work? Are they similar to Mathlib cache (viz. Project toolchain === Mathlib toolchain) Mac Malone ?

What do you mean? pack/unpack just compress/uncompress a set of build artifacts. There are no compatibility concerns I can think of.

Mac Malone (May 30 2024 at 17:11):

That is, right now, it is just caling tar. The goal with these commands is to provide a standardized interface for this action that can eventually be upgraded / customized.

llllvvuu (May 30 2024 at 18:06):

This reminds me of: https://uncenter.dev/posts/npm-install-everything/

Kim Morrison (May 30 2024 at 21:55):

Mario Carneiro said:

dunno if that's on the lean-action roadmap cc: Kim Morrison

Yes, I think we'd like it to be. There is some technical question about combining cron actions into the existing lean-action, but I might leave that to @Austin Letson to comment on what they think is the best approach.

Kim Morrison (May 30 2024 at 21:56):

Eric Wieser said:

Encouraging downstream projects to bump via a mathlib tagged commit would probably go a long way

Note that both Batteries and Mathlib have regularly generated tags each time the move to a new RC or release toolchain.

Kim Morrison (May 30 2024 at 21:56):

They both also have nightly-testing-YYYY-MM-DD tags for every day that has a CI-grean nightly-testing branch.

Kim Morrison (May 30 2024 at 21:57):

There would be no obstacle to also adding master-YYYY-MM-DD tags daily if someone wanted finer granularity "versions".

Eric Wieser (May 30 2024 at 22:27):

Kim Morrison said:

Eric Wieser said:

Encouraging downstream projects to bump via a mathlib tagged commit would probably go a long way

Note that both Batteries and Mathlib have regularly generated tags each time the move to a new RC or release toolchain.

I should have been clearer; I meant encouraging projects downstream of mathlib / batteries to follow suit, perhaps with a bot

Eric Wieser (May 30 2024 at 22:28):

If lake upgrade could upgrade via every intermediate set of dependency tags, creating tags along the way, that would be handy (but probably our of scope, and certainly low priority)

Dean Young (May 31 2024 at 00:27):

Right now these redirect to linearlibrary.net, where I've set up a jQuery terminal to connect to my VPS. I think it would be cool if these terminals got to be connected to the gits somehow.

Utensil Song (May 31 2024 at 03:27):

Utensil Song said:

Yaël Dillies said:

Does Reservoir look at non-master branches of projects?

I checked the Reservoir script, no. It indexes repos, then in build time, it just clones the default branch.

Checking tags would also be time consuming, and easily exceeds Github API limits, so it might not scale.

Actually my comment was wrong, the Reservoir script does a full clone of the repo, so in principle it can check a tag matching the toolchain version, it's just a 1+1 operation, default+tag (and maybe fallback to earlier versions).

The real question is, in other package ecosystems, downstream packages are tagged with their own version, not some toolchain version. How does these 2 versions interact?

Now Lean is kind of moving faster than most Lean projects' release schedule (if there are any :joy: ), so it's not a now-issue. But what if in future, a Lean project releases a few versions when Lean has only one release in the meanwhile, how does that work? (Maybe it's discussed before?)

Utensil Song (May 31 2024 at 03:46):

Utensil Song said:

I consider it exactly a thought experiment of the integration test of a Sup of Lean packages, and I'm optimistic that in near future, more project will be able to agree on a few recent toolchains, no need to be the latest.

I have some more thoughts on this integration test idea. SupLean is an import integration test, it measures how many projects can work together in the same toolchain, and they can actually work together coherently without name/notation clashes etc.

I'm well aware that now Lean package ecosystem is nowhere near this, and it's not a priority now, but still it's an interesting (a bit hypothetical) metric.

Another observation is that there are many types of Lean projects, teaching repos, homework repos, example repos, general programming utility repos, formalization repos etc. , not all types of them should be working together, but maybe for at least the last 2 categories.

Andrés Goens (May 31 2024 at 11:37):

Jireh Loreaux said:

Martin, I think what would actually be much more useful to you than SupLean + exact? is for all (most?) reservoir projects to build a loogle cache, and then have a loogle search bar on the reservoir page that can search all packages. I think Joachim Breitner may have already mentioned this above. It seems very useful, but probably will take some time to engineer.

Echoing this (and also disergarding the question of who would do that), perhaps even more useful would be to have a version of Loogle+ in the same way that hoogle+ does for Haskell's hoogle. It kind of iterates on the request of matching exact? that @Martin Dvořák is asking. Probably some of those sepecial-case lemmas could be found by a sort of Loogle+ (and would of course synergyize even better with a more comprehensive loogle cache for all of reservoir)

Martin Dvořák (Jun 01 2024 at 08:42):

Yaël Dillies said:

It was however remarked that this was not much more than "Reservoir, but in one repo"

Well, not quite.

Reservoir currently doesn't have a mechanism that would allow an authority to step in and give you a stamp saying "this code indeed corresponds to the informal notions it claims to represent".

Yaël Dillies (Jun 01 2024 at 09:06):

Uh sure, but are you claiming to become that authority?

Shreyas Srinivas (Jun 01 2024 at 09:15):

Actually reservior does do that. It checks if your library builds against the corresponding toolchain.

Yaël Dillies (Jun 01 2024 at 09:15):

Shreyas, no it doesn't. Read Martin's message again.

Shreyas Srinivas (Jun 01 2024 at 09:17):

Oh, you mean it doesn't check for whether the specification is what it claims to be for any given def or theorem in a library.

Shreyas Srinivas (Jun 01 2024 at 09:18):

That can only be done by people reviewing and giving it their stamp of authority. This is a huge amount of responsibility to take on.

Martin Dvořák (Jun 01 2024 at 09:22):

Yaël Dillies said:

Uh sure, but are you claiming to become that authority?

No, I definitely don't want to play this role.

However, you were making a comparison with Isabelle AFP, and they do have something like that.

Martin Dvořák (Jun 01 2024 at 09:28):

There are good reasons why I don't want to do it (and that's why I wanted to start Mathlib++, not LeanAFP).

Ideally, I would like to disconnect myself from informal MathematiCS completely.
Ideally, I would like Mathlib to have the unquestionable monopoly on definitions.
Ideally, I would like my job to be "here is a statement (in a file that imports only Mathlib); prove it using any means necessary; once your proof does not use sorry and does not assume nonstandard axioms, you automatically get a publication".

Ruben Van de Velde (Jun 01 2024 at 13:04):

That's a completely different discussion from the more achievable goals you were proposing before, though

Frédéric Dupuis (Jun 01 2024 at 13:38):

I don't think this is something we should aim for. Publications are not zero-knowledge proofs: the goal is to explain the result to a human in a way that provides as much insight as a possible into the subject, not merely to inform the reader that some theorem is true. Formalization can help by relieving the human from having to spend too much time on unimportant details while remaining confident about the correctness of the result, but it definitely doesn't make papers obsolete!

Dean Young (Jun 01 2024 at 14:58):

The terminal at mathlibplusplus.com can be set up with a password system and the usual features of a Linux terminal interface. The one at linearlibrary.net was my thought as to a cool front end to... anything really. For example, the password system on the VPS could be made so that anyone with an account can accept a pull request from the other end. In that case, it could be the responsibility of the user to ensure the addition not disturb existing work.

Dean Young (Jun 01 2024 at 15:12):

I'm not an expert but some of these problems could suggest a dedicated server rather than a repository alone.

Martin Dvořák (Jun 01 2024 at 15:28):

[I thought I was replying to a PM.]

llllvvuu (Jun 01 2024 at 21:48):

Martin Dvořák said:

Yaël Dillies said:

It was however remarked that this was not much more than "Reservoir, but in one repo"

Well, not quite.

Reservoir currently doesn't have a mechanism that would allow an authority to step in and give you a stamp saying "this code indeed corresponds to the informal notions it claims to represent".

Legitimacy is actually an argument against centralization. It is hard to imagine such a centralized authority at scale. Also, theorems aren't trusted code (only defs are) so it doesn't matter if theorems come from arbitrary sources.

Shreyas Srinivas (Jun 01 2024 at 21:49):

defs aren't trusted either. Proofs are trusted given theorems and defs as they are.

llllvvuu (Jun 01 2024 at 21:59):

What Martin seems to be worried about is bad defs that don't "corresponds to the informal notions it claims to represent". But I agree that it doesn't really seem like an issue as anyone can view the defs for themselves (and there are many trustworthy people in the potential Reservoir community so it seems like a waste to foist the role trustworthiness upon one authority as Martin suggests)

llllvvuu (Jun 01 2024 at 22:03):

I'd also argue that it is impossible for an authority to fully certify that "this code indeed corresponds to the informal notions it claims to represent". I would argue that that is, to some degree, in the eye of the beholder.

Last updated: May 02 2025 at 03:31 UTC