Zulip Chat Archive

Stream: mathlib4

Topic: Grammars downstream of Mathlib

Martin Dvořák (Mar 12 2025 at 13:56):

Recently, there has been a lot of PRs about context-free languages that Mathlib maintainers didn't have time for. Off the top of my head:
@Alex Loitzl 🐓
@Tristan Figueroa-Reid
@Rudy Peterson
@Stefan Hetzl
@Tobias Leichtfried

One of us should start a new repository downstream of Mathlib where we will focus all grammar-related endeavors.
Who wants to create and maintain it?

Tristan Figueroa-Reid (Mar 12 2025 at 17:13):

Unfortunately, I don't know that much about computability and complexity to be a maintainer.

Shreyas Srinivas (Mar 12 2025 at 17:20):

Happy to help in any way I can

Shreyas Srinivas (Mar 12 2025 at 17:21):

But you probably want a repo under leanprover-community?

Tristan Figueroa-Reid (Mar 12 2025 at 18:10):

Also, it might be better to move all of computability and complexity to its own repository.

Eric Wieser (Mar 12 2025 at 18:47):

there has been a lot of PRs about context-free languages that Mathlib maintainers didn't have time for

A new repository isn't a bad option, but another good option is for people contributing in this area to start reviewing each other's PRs!

Martin Dvořák (Mar 12 2025 at 18:59):

another good option is for people contributing in this area to start reviewing each other's PRs!

I afaik reviewed all of them but nobody merged them.

Martin Dvořák (Mar 13 2025 at 07:34):

My impression is that Mathlib maintainers don't find it sufficient when we review each other PRs. They want to review our PRs themselves and they say they don't have people for that.

Tristan Figueroa-Reid (Mar 13 2025 at 07:51):

Martin Dvořák said:

My impression is that Mathlib maintainers don't find it sufficient when we review each other PRs. They want to review our PRs themselves and they say they don't have people for that.

It's a shame that it's quite hard to reach the same level of code quality that the mathlib4 maintainers work at, but I feel like your impression raises some more issues about the state of mathlib4 rather than computability specifically. The other easy (nondichotomous) choice is that mathlib4 lowers the bar for code quality, which doesn't seem like a good idea.

Ruben Van de Velde (Mar 13 2025 at 07:54):

The current policy is that a maintainer needs to review everything that goes into mathlib, yeah, but that's certainly easier (though not necessarily easy) if someone else who knows the mathematics has already given a thumbs-up

Martin Dvořák (Mar 13 2025 at 08:15):

What makes me sad the most is that @Alex Loitzl 🐓 's result
https://github.com/AlexLoitzl/pumping_cfg/blob/cc4c1f157919f94c2c4cdbaabf50b60abba2822a/PumpingCfg/Pumping.lean#L319
cannot find its way to Mathlib.

If it were up to me, I would make all those 5000 or so lines that lead to it private and expose only the final result, then ask Mathlib reviewers to review the final results only, and merge it. Unfortunately, it will not fly with the Mathlib policies.

Martin Dvořák (Mar 13 2025 at 08:18):

We should absolutely strive for Language.IsContextFree.pumping being in Mathlib. It is really sad that Mathlib reviewers don't let it pass.

Eric Wieser (Mar 13 2025 at 08:21):

If it were up to me, I would make all those 5000 or so lines that lead to it private and expose only the final result, then ask Mathlib reviewers to review the final results only,

5000 lines of unreviewed code in mathlib is a time bomb waiting to blow up in Kim's face when they update mathlib to the next Lean version, or when anyone changes how a tactic works, adds a new simp lemma, etc

Johan Commelin (Mar 13 2025 at 08:22):

I see ~ 600 lines leading up to it... Am I missing some dependency chain?

Eric Wieser (Mar 13 2025 at 08:23):

(which PR are we even talking about here; a thread without a PR link is much less likely to attract reviews)

Tristan Figueroa-Reid (Mar 13 2025 at 08:25):

The original PR that was the cherry on top for this discussion was https://github.com/leanprover-community/mathlib4/pull/21585, but the linked repository does not have a full associated PR.

Martin Dvořák (Mar 13 2025 at 08:26):

Johan Commelin said:

I see ~ 600 lines leading up to it... Am I missing some dependency chain?

Yes, it depends on #19943 being merged first.

Alex Loitzl 🐓 (Mar 13 2025 at 08:28):

Which might not be totally out of the picture?
I am hoping to at least wrap up the PR in the coming weeks, when I have some more time. (Whatever the final decision may be)

Johan Commelin (Mar 13 2025 at 08:29):

Martin Dvořák said:

Yes, it depends on #19943 being merged first.

Understood. Thanks for the pointer.

Stefan Hetzl (Mar 13 2025 at 08:31):

I understand that unreviewed code should not go into mathlib in order not to become a burder later on. On the other hand, I have a similar sentiment as @Martin Dvořák concerning the formalisation of the equivalence between context-free grammars and pushdown automata by my student @Tobias Leichtfried .

Stefan Hetzl (Mar 13 2025 at 08:32):

I am a bit of an outsider to formal maths and ITP, but I think that the policy of having one monolithic block of formal mathematics (as Lean does with mathlib) makes a lot of sense and makes Lean stand apart from most, if not all, other ITPs. I am not sure which vision for the entire field is behind this policy, but when the vision is that, at some point, mathematicians are supposed to submit a paper with partial or full formalisation to journals, then I think there is no way around making this monolithic block as comprehensive as possible. Whether that should be done in one repository or in several ones, and what the maintenance structure should look like is to decide for people who are much more familiar with the technical matters than I am.

Stefan Hetzl (Mar 13 2025 at 08:32):

In any case, to outsiders, the perspective of contributing something that will in the end receive formal approval as part of that monolithic block, whatever this may mean in technical terms, is very attractive and a big part of the motivation for formalising something.

Yaël Dillies (Mar 13 2025 at 08:54):

To be quite clear, I am happy to review the PRs on computability people send my way, but I am not a maintainer

Yaël Dillies (Mar 13 2025 at 08:55):

(and when I do review them, please implement the feedback!)

Stefan Hetzl (Mar 13 2025 at 08:59):

@Yaël Dillies Thank you! Could you have a look at PR #22159? It adds the definition of pushdown automata (which is the first step for the result about the equivalence of context-free grammars and pushdown automata).

Patrick Massot (Mar 13 2025 at 09:22):

I think all this grammar stuff is the perfect example of something that should happily live outside of Mathlib, even more that combinatorial game theory.

Yaël Dillies (Mar 13 2025 at 12:33):

My point of view is and has been that reviewing this content is useful regardless of whether it ends up in mathlib or a submathlib or something else yet again

Peter Nelson (Mar 13 2025 at 22:42):

Patrick Massot said:

I think all this grammar stuff is the perfect example of something that should happily live outside of Mathlib, even more that combinatorial game theory.

Are there guidelines for how the maintainers decide something like this? I am very happy that matroids didn’t end up on the wrong side of such a judgment call - I would very likely have given up if they had, knowing that everything rots the day I stop paying attention.

Kim Morrison (Mar 13 2025 at 23:17):

We're in a sort of transitional period at the moment, where we would all like to get started on "child of Mathlib" repositories, but we don't have a successful example yet to follow.

To summarize, here's what I have in mind:

Such a repository would live in the leanprover-community github organization, and would be named mathlib-XXX for some XXX.
"Politically" it could have a separate set of maintainers (e.g. with their own private zulip stream for coordination), although it is very likely that at first this set would overlap with the mathlib maintainers. Further, the "mathlib admin" team would have final jurisdiction in the event of a serious problem.
mathlib-XXX repositories would maintain the same technical standards as Mathlib:
- no sorry on the main branch
- all Mathlib linters (indeed, all of Mathlib CI) run all the time
- note this step requires some engineering work --- it would be a disaster to copy-paste Mathlib's github workflows, as they would rot fast. We need to factor out as much Mathlib CI as possible into Github actions and reusable scripts
mathlib-XXX repositories would regularly update to the latest Mathlib --- in particular this requires a commitment from the maintainers to undertake or coordinate this work. They would bump to the latest toolchain monthly, like Mathlib does
the queueboard should support all these repositories too

Kim Morrison (Mar 13 2025 at 23:17):

Now all this sounds like a lot of work, and I appreciate that currently it is an obstacle to launching such mathlib-XXX repositories! But it's important to remember that all this work already has to happen at Mathlib, and the reason Mathlib is having to throttle growth in some directions is that we're having trouble keeping up with all this.

Kim Morrison (Mar 13 2025 at 23:17):

There is a lot of engineering work to make this work nicely (abstracting common CI components, automation to bump downstream repositories and notify maintainers about problems requiring manual intervention, the queueboard, automated assistance for identifying and moving material upstream)... Some of this will be done by the enthusiastic volunteers who've made all the CI and automation around Mathlib possible, some will be done by FRO employees, and we are working really hard to source money for paid engineering time on these problems.

Kim Morrison (Mar 13 2025 at 23:17):

I think there are already great initial steps. Projects like Carleson and FLT are helping build the infrastructure, but these are not mathlib-XXX projects --- you can tell that because they have sorry on their main branch. Likely candidates for mathlib-XXX projects are Yael and Bhavik's additive combinatorics project, the nascent combinatorial game theory project, and the whole languages/grammars/automata project.

Kim Morrison (Mar 13 2025 at 23:17):

But it's really important that the people running these projects understand what they are signing up to --- it's really not possible to run a durable project where you allow in "5000 lines of private code", because the maintainers are signing up to managing this forever, as Lean changes, and as Mathlib changes. Obviously a huge fraction of the maintenance work does not actually fall on the maintainers because other contributors do so much, but they have to be ready to backstop this work, and to set standards so it is possible. Hopefully this gives some context for while Mathlib has been having to throttle expansion into new areas. It is a massive commitment to take in new code!

Yaël Dillies (Mar 14 2025 at 00:47):

I am hoping to have successfully separated add-combi from LeanAPAP and set CI up for it within the next few months. As Kim says, this is a lot of work!

Yaël Dillies (Mar 14 2025 at 00:49):

For context, I am currently managing 11 Lean projects of various sizes and forms, and it is a HUGE pain that workflows aren't uniformised. I have a strong incentive to improve on the status quo

Joseph Myers (Mar 14 2025 at 00:50):

Also:

mathlib-XXX should be included in the online API documentation for mathlib, so people can search and browse the API for all the repositories together.
mathlib-XXX should update frequently (say daily or more frequently) to the same mathlib version so that an external project can use multiple mathlib-XXX repositories. Each one updating daily to a slightly different mathlib version from that day would be a lot less helpful to external users.

Joseph Myers (Mar 14 2025 at 00:58):

(Slight caveat on the frequent updating: if there's one common mathlib version each day that gets a matching commit in every mathlib-XXX, it should be OK for some mathlib-XXX to have additional updates during the day - in particular, if a lemma of use to that mathlib-XXX has just been merged to the main mathlib4 repository and people want to put something depending on that lemma in that mathlib-XXX repository immediately without waiting for the next nightly update. That might need extra smarts in lake update to be able to automatically pick compatible versions of all mathlib-XXX dependencies, however.)

Tristan Figueroa-Reid (Mar 14 2025 at 01:49):

I think that having a separate repository is a good idea, but I am not in favor of having a bunch of different repositories over different accounts/a bunch of repositories under mathlib4. From the limited experience I have in this community, it seems that what these projects are looking for is a place to store lower quality code that isn't exactly to par with mathlib4 but still at least uses reasonable names, can afford minor breaking changes, publishes to lake automatically, and other conveniences.

It doesn't seem too wise to have a bunch of duplicated infrastructure spread about potentially dozens of separate repositories (as we would have to develop another separate piece of infrastructure to roll out these changes) when this code could be kept in mathlib4, or (and perhaps a better idea), some kind of mathlib4-extras, where both core mathlib4 maintainers and maintainers that don't have the code quality standards of mathlib4 (but still have a good understanding of what is 'good' code) can build projects that aren't as crucial to the building blocks of commonly-used mathematics.

The main motivation I see for a mathlib4-extras is to still establish a good form of community between these projects: for example, one of combinatorial game theory's few applications is in its numerous algorithms that can be classified into various complexity classes, so a well-developed complexity library might become a dependency of combinatorial game theory: it is much harder to collaborate if they are across different repositories (as a case study, even mathlib4 takes a while to upstream certain theorems to batteries)

Kim Morrison (Mar 14 2025 at 04:54):

@Tristan Figueroa-Reid, I disagree with a few things above.

Anyone looking for a place to store lower quality code should work outside of mathlib-XXX until such a time as they are ready to contribute to the "durable" ecosystem (or to blueprint projects). What I'm proposing is not for that. (So in particular if the contributors to a candidate mathlib-XXX do want a place to store lower quality code, it would be good to know that so we find a better solution for them.) Lower quality code is simply a maintenance impossibility, and I don't think the mathlib ecosystem should take that trade-off at all.

The point is that it is no longer viable to put everything in mathlib, because there is only so much that the maintainer group can take responsibility for, and simply "adding more maintainers" becomes impossible after some point. We need a more distributed system to enable growth.

But you're absolutely right that it's not wise to duplicate infrastructure --- this is why we want to refactor Mathlib CI into reusable components, so hopefully setting up all the CI for a mathlib-XXX is a matter of turning on a few Github actions (which will update automatically as the CI evolves). I think we'll have some Lean FRO employee time to help with this next quarter, and as Yael says above, they and others are keen to work on this too!

Tristan Figueroa-Reid (Mar 14 2025 at 05:00):

(That was on me for the wording, but I do not mean lower quality as something impossible to maintain - perhaps a better upper example would be the lowered standard of code quality that is given to IMO formalizations, where proofs are allowed to be denser than usual)

The motivation above completely makes sense to keep these out of mathlib4, though I am struggling to see any benefit for the mathlib-xxx repositories being all separate when they will all be owned by the same leanprover-community organization anyways.

Tristan Figueroa-Reid (Mar 14 2025 at 05:04):

(e.g. DefinitelyTyped, a 3m sLoC monorepo with dozens of large projects and thousands of smaller projects manages to have similar code quality across the board, and doesn't need to deal with syncing up infra across different repositories). While mathlib4 and the mathlib-xxx might have different levels of code quality, I doubt that it would be desirable that the mathlib-xxx repositories have varying levels of code quality *when compared against each other, so it seems better for them to be combined into some extras repository.

Tristan Figueroa-Reid (Mar 14 2025 at 05:30):

Generally, I do see the motivation for this if mathlib-xxx repositories were owned by other users. I agree that this infrastructure should be developed for external Lean projects that aren't in leanprover-community. However, I suspect that the move of projects that are too hard to maintain in mathlib4 and fragmenting them to mathlib4-xxx repositories anticipates the limits of monorepos, when there are (albeit enterprise) open source projects like gecko or blink that surpass +50m sLoC, or non-enterprise repositories with hundreds of maintainers like nixpkgs.

Johan Commelin (Mar 14 2025 at 07:01):

Those are good observations. We definitely need to explore ways in which the ecosystem can become more distributed. One thing I'm a bit worried about is if we have mathlib4-xyz and mathlib4-abc, and they both have some more advanced theory that wants to import more basic files from the other.

Ruben Van de Velde (Mar 14 2025 at 08:21):

Also, I guess we'll want reverse CI (testing mathlib PRs against mathlib-***)

Tristan Figueroa-Reid (Mar 14 2025 at 08:23):

Ruben Van de Velde said:

Also, I guess we'll want reverse CI (testing mathlib PRs against mathlib-***)

We shouldn't need to do this as long as the mathlib-*** repo(s) are not doing anything blatantly against the style guide like using non-terminal simps.

Ruben Van de Velde (Mar 14 2025 at 08:26):

In my experience bumping downstream repositories, that's overly optimistic

Tristan Figueroa-Reid (Mar 14 2025 at 08:35):

(Thanks for that notice! I don't have much experience with downstream repositories as I am a new contributor, but I have seen that complaint in Zulip quite a few times)

My original thinking was that the repositories mathlib4, batteries, and a mathlib4-extras that covers all of our potential mathlib4-xxx repos would be a good balance of maintainability and infrastructure deduplication. I think that having "reverse CI" under that system is do-able, but I am worried about the magnitude at which there will be PRs where:

Author makes a PR to mathlib4 and the "reverse CI" fails.
The author makes a temporary patch for the mathlib4 PR to make the reverse CI work, and this PR gets merged
A PR is merged to mathlib4-extras (this would get worse if we had various mathlib-xxx repositories) that fixes what was blocking the original mathlib4 PR
A new mathlib4 PR gets made that finally does the intended behavior.

Without the CI this would only get worse, where we would have randomly failing builds if mathlib-xxx automatically updates, but with it, I could easily imagine this annoying cycle of PRs.

Tristan Figueroa-Reid (Mar 14 2025 at 08:38):

This is a good reason to have the potential mathlib4-xxx repositories in mathlib4, but that would require some adjoint maintainer system where the current list of maintainers maintains Mathlib and we have a secondary list of maintainers that maintain the extra side projects in some other subfolder.

Mario Carneiro (Mar 14 2025 at 09:24):

Johan Commelin said:

Those are good observations. We definitely need to explore ways in which the ecosystem can become more distributed. One thing I'm a bit worried about is if we have mathlib4-xyz and mathlib4-abc, and they both have some more advanced theory that wants to import more basic files from the other.

In metamath this situation also arises (because there is a central portion and then "mathboxes" managed by individuals), and the solution is quite simple: if something is needed by more than one mathbox, it needs to be upstreamed to the central portion, here mathlib. The fact that there is this cross cutting requirement is exactly the justification one needs (generally) to upstream something to mathlib.

Tristan Figueroa-Reid (Mar 14 2025 at 09:29):

Indeed. (I believe) The main suggestion here is to have extended mathlib modules where they can be maintained by other people who don't fit the requirements to be a core mathlib4 maintainer. The other 'fix' is to relax the maintainer requirements / the bar for code quality, but this core idea is to hit a balance between code quality and maintainability, where the core mathlib4 can have its high code quality, while the extended mathlib4 modules have high enough quality code to be depended on, but not high that it excludes potential maintainers.

Tristan Figueroa-Reid (Mar 14 2025 at 09:32):

The increasing number of PRs and the variety of fields with not enough qualified maintainers was what drove the idea to have these extra mathlib modules. The other main question is where to put this: after Ruben's comment, I'm leaning to having these mathlib4 extensions be subfolders in the mathlib4 repository, but outside of Mathlib/ (like Archive/). Though, it seems reasonable to have a single repository for this too. I did explain this earlier, but I should clarify that I am in opposition of having several leanprover-community/mathlib-xxx repositories and at the very least, I want these mathlib4 extensions in one repo.

Ruben Van de Velde (Mar 14 2025 at 09:43):

I guess an alternative solution would be to keep everything inside the mathlib repo, and assign "module maintainers" who can review changes that stay within particular modules

Tristan Figueroa-Reid (Mar 14 2025 at 09:59):

It would be nice if these non-core modules were separate libraries on reservoir, though. I am in favor of keeping it inside the mathlib repo but I do see motivation for them to be separately published libraries at least.

Eric Wieser (Mar 14 2025 at 10:12):

Can I elaborate on what that motivation is? Why does the granularity on reservoir matter?

Shreyas Srinivas (Mar 14 2025 at 13:00):

Eric Wieser said:

Can I elaborate on what that motivation is? Why does the granularity on reservoir matter?

Granular cache I guess (assuming caches exist for all reservoir packages in the future).

Eric Wieser (Mar 14 2025 at 16:37):

For now cache is granular per file anyway

Eric Wieser (Mar 14 2025 at 16:37):

Just the tools to download things at that granularity need some work

Tristan Figueroa-Reid (Mar 14 2025 at 18:03):

I was originally thinking of doing that separate reservoir packaging to establish a clear boundary between Mathlib and these mathlib extras, but we absolutely could just have, as Ruben suggested:

The current core maintainers that can review and merge any PR
Code owners for certain modules, perhaps with notices that these modules are allowed to be merged by X person/people.

Tristan Figueroa-Reid (Mar 14 2025 at 19:11):

Johan Commelin said:

Those are good observations. We definitely need to explore ways in which the ecosystem can become more distributed. One thing I'm a bit worried about is if we have mathlib4-xyz and mathlib4-abc, and they both have some more advanced theory that wants to import more basic files from the other.

I think that it is good to have supporting infrastructure for external projects outside of leanprover-community, but @Mario Carneiro did put up a good point that when code needs to be shared, the best place that should go in some kind of mathlib.

For example, in vihdzp/combinatorial-games, there is currently a transfer of ownership of NatOrdinal from mathlib4 to combinatorial-games, since, even though NatOrdinal does have other uses, it is most closely related to combinatorial games. (to clarify, I am not against this move, but I wanted to bring it up for this point) - without adding more core maintainers than necessary, it seems that it would be a lot easier for core maintainers to review NatOrdinal changes if they saw the exact motivation in combinatorial game theory.

With the dependent PR system, this is made a lot easier, too:

For some mathlib module X that isn't covered by a primary maintainer of mathlib itself, a potential contributor to X can make some PR to mathlib, and make a new PR contributing to X that depends on the previous PR. A core maintainer can review the first PR, and a maintainer of X can review the second PR. This workflow is much easier to accomplish within a single repository.

Tristan Figueroa-Reid (Mar 14 2025 at 19:11):

It seems like a good idea to at least try this code ownership system: it seems a lot easier than trying to make the infrastructure for a polyrepo, and we're already seeing the conflicts of code ownership in a repository that isn't even in leanprover-community yet.

Yaël Dillies (Mar 14 2025 at 19:12):

What do you mean by "code ownership" precisely? We previously tried github's CODEOWNER file, and it was not so good

Tristan Figueroa-Reid (Mar 14 2025 at 19:14):

Yaël Dillies said:

What do you mean by "code ownership" precisely? We previously tried github's CODEOWNER file, and it was not so good

Better infrastructure would be needed for that than GitHub's (unfortunately weak) code owner system. I don't have a good example of a system that works well for this, as I am only aware of the other end of the contributor spectrum, where there are thousands of modules, each with its own set of maintainers.

Yaël Dillies (Mar 14 2025 at 19:17):

We do have our own fork of bors now, so we could experiment with it

Mario Carneiro (Mar 14 2025 at 19:17):

I've always wanted to, but at the time I was messing with it cache was on fire so I had other things to do

Mario Carneiro (Mar 14 2025 at 19:19):

it's also kind of annoying to work with because there is a bunch of heroku stuff involved in deploying it

Mario Carneiro (Mar 14 2025 at 19:19):

you can run it locally though, it's an elixir program

Tristan Figueroa-Reid (Mar 14 2025 at 19:21):

A simple system would be that:

The core 20-ish maintainers still have the same permissions as they have.
We have a file that contains metadata about which non-core maintainers can merge PRs targeting specific folders. These folders will only be in this list if no core maintainer is willing to maintain them (complexity, combinatorial game theory, etc.). If a folder isn't in that list, that implies that it's maintained by said core maintainers.
Those non-core maintainers can then merge those targeted PRs.

Tristan Figueroa-Reid (Mar 14 2025 at 19:22):

I think this would work quite well with the dependent PRs system, and it still preserves the benefits of a monorepo (codebase-wide golfing, shared infra, etc...)

Mario Carneiro (Mar 14 2025 at 19:22):

I would not have the "only if no core maintainer is willing" clause, I think we can just have topic maintainers in addition to core maintainers

Mario Carneiro (Mar 14 2025 at 19:22):

and topic maintainers would only have permission to merge to certain files/folders

Violeta Hernández (Mar 14 2025 at 19:39):

My own two cents, I do prefer having separate repositories. Mathlib is huge, and even downloading the cache takes me a couple of minutes. In that same time, I can build the entire CGT repo on my machine.

Mario Carneiro (Mar 14 2025 at 19:40):

I argue that this is an issue with cache (or rather, lake), but fixing it never seems to make it to the top of the issue list

Tristan Figueroa-Reid (Mar 14 2025 at 19:40):

Violeta Hernández said:

My own two cents, I do prefer having separate repositories. Mathlib is huge, and even downloading the cache takes me a couple of minutes. In that same time, I can build the entire CGT repo on my machine.

The main argument here is to try to get the best of both worlds, where there are modules where non-core contributors can merge pull requests instead of having to wait on core maintainers to do so.

Tristan Figueroa-Reid (Mar 14 2025 at 19:42):

The original suggestion was to have a polyrepo of mathlib-xxx (e.g. mathlib-cgt), but it was suggested earlier that it would be easier to have some code ownership system inside Mathlib where a similar process could take place. The only place where this would cause a slowdown in development is when certain theorems need to be upstreamed to *parts of Mathlib maintained by core maintainers, but I think that if core maintainers need to worry about less PRs, that issue should resolve itself.

Violeta Hernández (Mar 14 2025 at 19:43):

Perhaps tangentially, I feel that whenever I try and downstream results to Mathlib I often get met with some sort of pushback, usually in the form of "this is unnecessary". I had this happen recently with #22617, where two small auxiliary lemmas that have been convenient to me in the CGT repo got struck down because you could replicate them with a somewhat unintuitive use of mt.

Tristan Figueroa-Reid (Mar 14 2025 at 19:47):

That is even more reason to keep this in Mathlib! I'm unopinionated about the PR you linked, but for PRs like that, it would motivate the contributors to merge if they saw another PR that was clearly beneficial while depending on the original PR.

Tristan Figueroa-Reid (Mar 14 2025 at 19:48):

For example, changes to NatOrdinal seem quite strange out of the context of combinatorial game theory, but having two PRs: one that makes changes to NatOrdinal, and one that uses those changes to golf / prove new parts of combinatorial game theory gives much more motivation for a maintainer to believe that your changes are useful.

Tristan Figueroa-Reid (Mar 14 2025 at 19:49):

Plus, the early conflict helps! I doubt this would be the case for many parts of combinatorial game theory, but if a file from CGT or any other mathlib-xxx project were to be added to mathlib4, it would make more sense to have those conflicts earlier on than when the files are eventually moved.

Violeta Hernández (Mar 14 2025 at 19:56):

I'm not sure if much would change from the current status quo? It's not like I haven't explained "this PR is used within CGT because of this", and I'm not confident that having CGT as an enshrined part of Mathlib would change the considerations of other maintainers.

Violeta Hernández (Mar 14 2025 at 19:58):

What I'm trying to get at is, the nice thing about having separate repos is that we can parallelize using new material in the new repo, and slowly porting/adapting that material for Mathlib.

Violeta Hernández (Mar 14 2025 at 20:00):

I think docs#IncompRel is a good example of this. It's nominally useful for more than just game theory, but it took months to be approved (#19580) because of discussions about notation and whether we should also have the comparability relation as well. And that was one of the blockers for doing the IGame refactor within Mathlib.

Tristan Figueroa-Reid (Mar 14 2025 at 20:12):

I do see the argument (though again, as established earlier, a monorepo would be technically better than a polyrepo while doing the same thing), but in terms of keeping everything in Mathlib itself, perhaps it might be a good idea to have some kind of experimental staging for definitions? i.e. some libraries like Node.JS use "experimental methods" that have debated on syntax, yet are still in the main deployment of the software, and can afford breaking changes.

Violeta Hernández (Mar 14 2025 at 20:30):

I think that's potentially a separate issue? Like, we did do some major refactors on the CGT repo which were long overdue on Mathlib, but at this point it should be a stable-ish codebase.

Tristan Figueroa-Reid (Mar 14 2025 at 20:45):

(Note that I do mean this for repositories that aren't just combinatorial game theory, but I can also see this happening again even in CGT) - since this happened in CGT, it's quite likely that these sorts of refactors and blocking definitions will be needed for other parts of mathlib4.

Violeta Hernández (Mar 14 2025 at 20:47):

We do already have long-stretching Mathlib refactors, even ones that don't happen at the level of leaf files.

Violeta Hernández (Mar 14 2025 at 20:49):

IMO if we want to uphold the deprecation policy, what we already currently do, slowly phasing out definitions by others and adding the required deprecation tags, is about as good as can get. Experimental staging would be a big red tape essentially prohibiting downstream usage of these files.

Violeta Hernández (Mar 14 2025 at 20:49):

And if we're not willing to uphold the deprecation policy for these Mathlib side projects then it's even more difficult to justify that they're being held at Mathlib standards...

Violeta Hernández (Mar 14 2025 at 20:52):

(if we just want to test things out, there's always the tried and true "make a branch for it")

Tristan Figueroa-Reid (Mar 14 2025 at 20:54):

Violeta Hernández said:

And if we're not willing to uphold the deprecation policy for these Mathlib side projects then it's even more difficult to justify that they're being held at Mathlib standards...

(Note that they won't be held directly at Mathlib standards, but still with some standards: the general idea is to have a monorepo where these projects can exist at a slightly lower code quality)

Violeta Hernández said:

(if we just want to test things out, there's always the tried and true "make a branch for it")

Would this work for the case above with IncompRel, where the notation was being bikeshedded?

Violeta Hernández (Mar 14 2025 at 21:04):

Tristan Figueroa-Reid said:

Violeta Hernández said:

And if we're not willing to uphold the deprecation policy for these Mathlib side projects then it's even more difficult to justify that they're being held at Mathlib standards...

(Note that they won't be held directly at Mathlib standards, but still with some standards: the general idea is to have a monorepo where these projects can exist at a slightly lower code quality)

Violeta Hernández said:

(if we just want to test things out, there's always the tried and true "make a branch for it")

Would this work for the case above with IncompRel, where the notation was being bikeshedded?

You do have a point, experimental staging on that file would allow us to get work on games done more quickly...

Violeta Hernández (Mar 14 2025 at 21:20):

I guess I'm simply skeptical due to the usual "we've never needed this before" argument

Tristan Figueroa-Reid (Mar 14 2025 at 21:57):

That makes sense. I suppose so far the suggestions are:

We move the repositories that were recently removed from Mathlib back into Mathlib
We experiment with a codeowner-like system, where we have non-core maintainers that are able to merge PRs that target only specific files and folders: it is okay if these files and folders aren't perfect.
We experiment with tagging definitions as experimental, where bikeshedding can happen when they are stabilized.

Shreyas Srinivas (Mar 14 2025 at 23:35):

An intermediate solution might be to have some staging branches from which mathlib maintainers can slowly draw material into the main branch

Tristan Figueroa-Reid (Mar 14 2025 at 23:36):

In my experience, staging branches don't work quite well since they don't play to git's strengths (quite the opposite, actually). Since we already have attributes, it doesn't seem too unreasonable to have experimental attributes. Though, this "experimental feature" topic should probably be a different thread.

Tristan Figueroa-Reid (Mar 14 2025 at 23:37):

(#mathlib4 > Experimental Attribute)

Tristan Figueroa-Reid (Mar 14 2025 at 23:39):

For the sake of this thread, then, I think it seems like a good idea to experiment with modifying bors such that we can have a fine-grained code ownership system and see how well that works with our "sub-maintainers." I believe the (absolutely non-exhaustive: I'm not aware of the state of the other proposed mathlib-xxx projects) people that would be on that list is @Violeta Hernández for combinatorial game theory, and @Martin Dvořák for computability and complexity?

Shreyas Srinivas (Mar 15 2025 at 00:24):

I think complexity should be kept separate from computability

Tristan Figueroa-Reid (Mar 15 2025 at 00:27):

Good point! I forgot about that. The original issue was that we didn't have enough domain experts on _computability_ (regarding CFGs, right?) in the core maintainer team. In this case, maybe ./Complexity wouldn't have a code owner, but files in Complexity related to Computability can have Martin Dvořák [meta: how do you ping people silently on zulip?] as a code owner.

Shreyas Srinivas (Mar 15 2025 at 00:32):

That being said, since we have almost no complexity theory (in any textbook sense at least), any discussion of complexity is moot for now. Also complexity is such a gigantic and broad area that I think it should first develop in its own repository with members from that community taking the lead.

Tristan Figueroa-Reid (Mar 15 2025 at 00:35):

A big point in this discussion is enabling the same fluidity that people working in their own repositories and putting a (restricted portion) of that freedom inside Mathlib.

Mario Carneiro (Mar 15 2025 at 00:37):

[meta: how do you ping people silently on zulip]

Like this: Martin Dvořák @_**Martin Dvořák**

Tristan Figueroa-Reid (Mar 15 2025 at 00:37):

(Also, while I am painfully aware about the magnitude of tcs complexity theory as a field, some fields are being formalized in mathlib4 that are bigger than complexity, and they can be built just as well inside Mathlib.)

Violeta Hernández (Mar 15 2025 at 01:22):

Tristan Figueroa-Reid said:

A big point in this discussion is enabling the same fluidity that people working in their own repositories and putting a (restricted portion) of that freedom inside Mathlib.

It might be that these are inherently opposite goals. Mathlib tries to have a solid foundation, whereas one usually wants to test out different ideas at the onset of a large project.

Tristan Figueroa-Reid (Mar 15 2025 at 02:29):

Violeta Hernández said:

It might be that these are inherently opposite goals. Mathlib tries to have a solid foundation, whereas one usually wants to test out different ideas at the onset of a large project.

Indeed! The original idea was to have parts of Mathlib that had slightly less code quality than the rest: they would still be "part of mathlib," but would be marked as maintained by certain non-core maintainers.

Ruben Van de Velde (Mar 15 2025 at 07:03):

Much depends on what you mean by "less code quality". Are we talking about not working in the right generality or missing API lemmas, or not following code style or naming conventions, or not following the rules to make proofs maximally robust against upstream changes?

Tristan Figueroa-Reid (Mar 15 2025 at 07:11):

The inherent problem is that this highly depends on the individual non-core maintainer. I am not strictly sure what "less code quality" would entail myself. My best frame of reference is in combinatorial-games, which does not sparingly use aesop. However, due to my inexperience, I am unaware of any underlying style issues beyond that.

(I should re-establish, in my intent to discourage the use of a polyrepo, that I am mainly discussing whether it's a good idea to keep all of the mathlib-extra projects inside Mathlib or outside of it, whether that be a single external repository or some subfolder in mathlib4 like Archive - there are no technical reasons *that I am aware of as to why we should split each mathlib-extras project into its own repository.)

Yaël Dillies (Mar 15 2025 at 07:51):

Here is one such technical reason: If something breaks in a submathlib during a mathlib refactor, then the mathlib contributor will have to fix the submathlib, regardless of the code quality of the latter

Yaël Dillies (Mar 15 2025 at 07:52):

This is because all projects in a single repo share their lakefile, and therefore must be on the same version of everything (and in particular of each other)

Yaël Dillies (Mar 15 2025 at 07:55):

Yaël Dillies said:

Here is one such technical reason: If something breaks in a submathlib during a mathlib refactor, then the mathlib contributor will have to fix the submathlib, regardless of the code quality of the latter

This is a genuine concern of mine by the way, not a theoretical one. I expect that what will happen in such a (rather common) situation is that:

the mathlib contributor starts a refactor
the bad quality code breaks, so they can't finish the refactor
they instead must first clean the bad quality code up

Tristan Figueroa-Reid (Mar 15 2025 at 07:55):

Yaël Dillies said:

This is because all projects in a single repo share their lakefile, and therefore must be on the same version of everything (and in particular of each other)

(I suppose this is more of an argument against having these mathlib modules outside of Mathlib4, but I can't see why that would specifically justify a polyrepo)

Yaël Dillies (Mar 15 2025 at 07:56):

Yaël Dillies said:

This is because all projects in a single repo share their lakefile, and therefore must be on the same version of everything (and in particular of each other)

Have you read :wait_one_second: ?

Tristan Figueroa-Reid (Mar 15 2025 at 07:57):

I have :+1:, though I did doubt it - are lakefiles really tied to git repositories directly? I was unaware of that.

Yaël Dillies (Mar 15 2025 at 07:58):

Technically, you can have lakefiles in subfolders. I am not sure of the implications with respect to git

Yaël Dillies (Mar 15 2025 at 07:59):

In the current setup, what happens is

the mathlib contributor starts a refactor
the bad quality code doesn't break, because it still uses an older version of mathlib
the refactor gets merged
when the bad quality code is updated to post the refactor, the submathlib contributors have to fix their bad quality code

Tristan Figueroa-Reid (Mar 15 2025 at 08:00):

Yaël Dillies said:

This is a genuine concern of mine by the way, not a theoretical one. I expect that what will happen in such a (rather common) situation is that:

the mathlib contributor starts a refactor

the bad quality code breaks, so they can't finish the refactor

they instead must first clean the bad quality code up

I do think that it is a reasonable assumption to say that the code won't be of horrible quality - there are plenty of people in the mathlib community that have been here for a few years and can absolutely recognize immediate red flags with bad style like odd instances, non-terminal simps, etc...

It wouldn't be a horrible idea, though, to go back through some of the mathlib4 PRs that weren't made by the current mathlib4 maintainers and seeing how many reviews were targeted towards these kinds of bad styles that weren't about naming / formatting.

Tristan Figueroa-Reid (Mar 15 2025 at 08:02):

I am not one such person (as I am still not fully aware of all of the best practices of mathlib4), but in this thread alone, there were about three or four such people that aren't mathlib4 maintainers but are absolutely aware of how to make good-style PRs.

Yaël Dillies (Mar 15 2025 at 08:03):

As a pretty active mathlib reviewer, I can tell you that a lot of pull requests contain code that if it were to break during a refactor of mine I would feel uncomfortable working around, and instead would first clean up.

Tristan Figueroa-Reid (Mar 15 2025 at 08:56):

(Just to clarify, are you saying that it's quite common [at least for you] to worry about how your potential refactors affect open pull requests?)

Patrick Massot (Mar 15 2025 at 09:59):

Yaël Dillies said:

Here is one such technical reason: If something breaks in a submathlib during a mathlib refactor, then the mathlib contributor will have to fix the submathlib, regardless of the code quality of the latter

I don’t think this is meant to work like this. Otherwise there is no point having separate repositories. “Submathlib” maintainers will have to fix the code, not Mathlib maintainers.

Yaël Dillies (Mar 15 2025 at 10:33):

Yes, Patrick, that is exactly my point

Patrick Massot (Mar 15 2025 at 11:18):

I probably read too quickly, sorry.

Violeta Hernández (Mar 15 2025 at 23:10):

Tristan Figueroa-Reid said:

My best frame of reference is in combinatorial-games, which does not sparingly use aesop.

That's Mathlib guidance? I wasn't aware of it!

To be fair, it would be very painful and probably undesirable to get rid of a lot of these usages, as they take place within very deep structural inductive proofs.

Kyle Miller (Mar 15 2025 at 23:22):

I've never heard any guidance against aesop.

If an aesop proof takes a long time to compute, it might be worth using aesop? to eliminate and optimize it, for sake of compile times. Or, if a goal is really delicate in some way, where small changes might make aesop succeed or fail, I might be wary of using it, but these are all I can think of.

Violeta Hernández (Mar 15 2025 at 23:25):

Yaël Dillies said:

In the current setup, what happens is

the mathlib contributor starts a refactor

the bad quality code doesn't break, because it still uses an older version of mathlib

the refactor gets merged

when the bad quality code is updated to post the refactor, the submathlib contributors have to fix their bad quality code

I've had experience doing refactors and then having to fix up very weird code I barely understand and which barely worked. It's not an enviable position. And if these subrepos are going to be about topics which are already niche for the standards of Mathlib (which requires a college degree at minimum to understand in its entirety) then I think it's even more important to avoid people constantly running into problems with them.

Violeta Hernández (Mar 15 2025 at 23:28):

I'd feel particularly bad if I was constantly getting called to fix my CGT code because Mathlib doesn't build and no one else can fix it, instead of just bumping the CGT repo on every release and fixing whatever breaks in a single swoop.

Kyle Miller (Mar 15 2025 at 23:34):

@Tristan Figueroa-Reid Beware of conflating code quality with style. Code quality has many factors, some are style (useful for making sure people don't get distracted by irrelevant details), but more important are large-scale features. Do the interfaces make sense? Are they complete? Do they prevent exposing internal details? Is the code structured in a way where maintaining it when upstream changes are made is generally easy? Is it structured where changes to internal details don't affect downstream projects too badly? Is functionality well-factored-out? Can people find things? Figure out how to use things? Is the design set up in a way that supports most interests without much compromise? Are the compromises well understood and documented? Etc etc etc.

Mathlib is also this big experiment to see if all of modern mathematics can fit together. No one knows if it's possible. Large-scale cross-cutting refactors have been necessary as people have gained a better understanding of how mathematics should be set up. This is a very mathematical activity too — mathematicians like gaining insight by thinking deeply about things we think we already know well.

Kyle Miller (Mar 15 2025 at 23:58):

With my Lean developer hat on, mathlib has varying code quality, and some of mathlib has made necessary changes painful to carry out. Mathlib is a great test suite for Lean, since it seems to exercise every observable behavior of the system, whether it's an intended feature or not. But also, sometimes proofs have inscrutable intermediate states, or they make use of some defeq abuse that should never have worked in the first place. The FRO has made a commitment to keep mathlib up-to-date though, so we do the work to fix these issues when we modify Lean (many thanks to Kim here!).

Earlier in the thread there were some comments about 500 line proofs. For theoretical areas it's hard to imagine an irreducibly complex proof that has to be that long (for proofs about concrete complicated programs, that's a different story). I could be off the mark, but if a proof could be written in a maintainable way at 500 lines, likely it is obvious how to decompose it into pieces and it would have been done. There is no value judgment here about whether it is "right" or "wrong" to have long proofs. I'm just thinking about this from the perspective of someone who regularly has to go into other people's proofs to fix them up, and who appreciates that Mathlib reviews do a lot of work toward making this sort of work actually manageable.

Violeta Hernández (Mar 16 2025 at 00:18):

(the proof that surreal number multiplication is well-defined is about 500 lines too... thankfully its contributors figured out a pretty ingenious way to split it up into sub-lemmas. it took a lot of effort, though.)

Tristan Figueroa-Reid (Mar 16 2025 at 01:46):

I see! That makes more sense. I wasn't aware of the depth as to how much mathlib needs to be kept up to date. Perhaps it would be best then to have a bunch of lake projects inside a separate mathlib4-extras repository, then? We would still have the benefits of a monorepo (shared infrastructure, codebase-wide golfs) while still enabling some separation of concerns between core mathlib4 and these child projects.

(One of the few benefits to having each mathlib4-extra module being its own repository is the clone size and less clutter in issues and pull requests, but a depthless clone and GitHub's labels easily give that same benefit to the shared potential mathlib4-extra monorepo.)

Perhaps even Archive can be moved to this potential mathlib4-extra space. That way, the three big _libraries_ in the leanprover-community space would be batteries, mathlib4, and mathlib4-extra.

Martin Dvořák (Mar 16 2025 at 10:02):

Kyle Miller said:

Earlier in the thread there were some comments about 500 line proofs. For theoretical areas it's hard to imagine an irreducibly complex proof that has to be that long (for proofs about concrete complicated programs, that's a different story).

This is not really a dichotomy. In the study of grammars, even if you study them "theoretically" (whatever it means), you sometimes need to analyze long sequences of rewrites. When the invariants preserved throughout the rewriting get complicated, the proofs that the invariants are preserved get even more complicated.

Of course, you can split the long proof into many shorter proofs, but then you have to copypaste enormous intermediate steps from the local context into the lemma assumptions.

Violeta Hernández (Mar 16 2025 at 19:21):

I feel like having all the mathlib-extra repositories in the same repo, even separate from Mathlib, still leads to the same sort of problems, where we can't do a version bump because some code broke and only the 3 people working within that specific section of the repository are able to fix it easily.

Kevin Buzzard (Mar 16 2025 at 22:21):

You could run your repos like I run FLT, not in leanprover-community, and I bump it every week or two and never feel under pressure.

Tristan Figueroa-Reid (Mar 17 2025 at 02:38):

(Note that FLT isn't a library - one of the more weirder things that I wasn't expecting, but is something that we should indeed worry about, is that people will depend on these mathlib4-extras repositories - for example, the formalization of Hex based on mathlib4's CGT.)

Tristan Figueroa-Reid (Mar 17 2025 at 02:40):

Violeta Hernández said:

I feel like having all the mathlib-extra repositories in the same repo, even separate from Mathlib, still leads to the same sort of problems, where we can't do a version bump because some code broke and only the 3 people working within that specific section of the repository are able to fix it easily.

Absolutely! I suggested this earlier, but we can have separate lake projects in the same repository. It feels weird, but that's the core idea of a monorepo. mathlib4 has a lot of interconnected parts with no clear dependency tree. The definition of 'monorepo' is arguable, but I consider a repository to be a proper monorepo if it has subprojects with well-defined relationships.

Tristan Figueroa-Reid (Mar 17 2025 at 02:41):

I was wondering if it was a good idea to have these projects in Mathlib itself, but the arguments against that were compelling :+1:

Mario Carneiro (Mar 17 2025 at 10:35):

Tristan Figueroa-Reid said:

Violeta Hernández said:

I feel like having all the mathlib-extra repositories in the same repo, even separate from Mathlib, still leads to the same sort of problems, where we can't do a version bump because some code broke and only the 3 people working within that specific section of the repository are able to fix it easily.

Absolutely! I suggested this earlier, but we can have separate lake projects in the same repository. It feels weird, but that's the core idea of a monorepo. mathlib4 has a lot of interconnected parts with no clear dependency tree. The definition of 'monorepo' is arguable, but I consider a repository to be a proper monorepo if it has subprojects with well-defined relationships.

I'm quite sure we want master branch on any mathlib or submathlib to be always-green, ensured by CI, so having temporarily broken subfolders sounds like a non-starter. If they are to live in the mathlib monorepo then they will need to get fixes in the same PRs that cause the breakage, whether that means airlifting in a topic maintainer or whatever else.

Yaël Dillies (Mar 17 2025 at 10:37):

I think the folders wouldn't be "temporarily broken" but rather "temporarily out of sync"

Mario Carneiro (Mar 17 2025 at 10:39):

out of sync with what? In the one-repo-to-rule-them-all plan that folder would be the source of truth for that submathlib, so it would be going in and out of breakage

Yaël Dillies (Mar 17 2025 at 10:40):

The folder would have its own lakefile pointing at a non-necessarily-master revision of mathlib

Mario Carneiro (Mar 17 2025 at 10:41):

That is horrific

Mario Carneiro (Mar 17 2025 at 10:41):

A git repo should not need to reference its own history, this causes all sorts of problems

Mario Carneiro (Mar 17 2025 at 10:42):

if it needs to be versioned separately from mathlib, for god's sake please use a separate repo with separate change tracking

Yaël Dillies (Mar 17 2025 at 10:42):

I agree, just wanted you to know exactly what you were arguing against :grinning_face_with_smiling_eyes:

Tristan Figueroa-Reid (Mar 17 2025 at 15:46):

[Oh, I did mean that in the sense that they should be in a separate repository because its pointing at mathlib4, though now I do see that there is less of a point in that regard, since they're all pointing at different versions of mathlib4 anyway]

The arguments presented make sense! I was operating under the (idealistic) assumption that 'okay' code generally doesn't need to be refactored often to prevent CI from breaking, but I was unaware of, as many of you explained, the ways that proofs and definitions constantly break CI in mathlib4 in ways that are only consistently recognizable to a few people. My general assumption was that a lot of the reviews were preventing stylistic drift and maintaining quality, i.e. that PRs were kept to a similar, up-to-date fashion as the rest of the codebase, and, as Kyle Miller describes, that the code is designed nicely in such a way that anticipates user needs - not that those reviews were necessary to have the project continue on. I do appreciate that you all took the time to entertain this idea, though! :+1:

Given that information above, it does make sense that these mathlib4 modules are in different repositories.

Tristan Figueroa-Reid (Mar 18 2025 at 22:45):

(@Martin Dvořák did you want to make the new repository for grammar-related formalizations?)

Last updated: May 02 2025 at 03:31 UTC