Zulip Chat Archive

Stream: mathlib4

Topic: How do you decide what to add to mathlib?


Noble Mushtak (he/him) (Dec 26 2024 at 08:16):

Hi, I am a new member still learning about Lean, and I would be interested in contributing to mathlib at some point, as I have worked on formalizations in Coq before. I was just wondering, how do you decide what new lemmas/theorems to add to mathlib? I saw there's this list of missing undergrad topics: https://leanprover-community.github.io/undergrad_todo.html so is this list the highest priority of what lemmas/theorems should be added to mathlib first?

Kevin Buzzard (Dec 26 2024 at 08:56):

No not really, in fact many of the results there are not in mathlib because we don't particularly need them. Groups have their own ongoing projects which they add to mathlib bit by bit.

Yury G. Kudryashov (Dec 26 2024 at 21:07):

There is no entity that makes decisions what to formalize. There are many individual contributors, each formalizes whatever they like. Maintainers decide whether the code is in good shape and the material is in scope of Mathlib.

Yury G. Kudryashov (Dec 26 2024 at 21:07):

Quite often, people collaborate to formalize something big.

Yury G. Kudryashov (Dec 27 2024 at 00:14):

Generally, people choose some big goal and formalize whatever they need for this goal.

Violeta Hernández (Jan 05 2025 at 22:38):

Mathlib has a very big scope. I've only come across a handful of situations where something isn't Mathlib appropriate, mostly because it's specialized and not too mathematical (e.g. Rubik's cubes, sudokus, my stuff on ordinal notations)

Violeta Hernández (Jan 05 2025 at 22:38):

By default if you want to add something and there's any chance someone else might make use of it, then you can and should.

Violeta Hernández (Jan 05 2025 at 22:39):

(but you can still ask if you're unsure)

Kim Morrison (Jan 06 2025 at 00:45):

Violeta Hernández said:

Mathlib has a very big scope. I've only come across a handful of situations where something isn't Mathlib appropriate, mostly because it's specialized and not too mathematical (e.g. Rubik's cubes, sudokus, my stuff on ordinal notations)

This is one end of a spectrum of views, not universally held. If it will not be used by someone in an adjacent (or preferably far away!) field --- then I would strongly encourage development in a separate repository. As an example, I think all concrete "named" combinatorial games (domineering, hackenbush, etc) should be developed outside of Mathlib.

Edward van de Meent (Jan 06 2025 at 08:13):

Nim(bers) :face_with_peeking_eye:

Patrick Massot (Jan 06 2025 at 09:52):

Mathlib having a very big scope is kind of a historical accident. We are bound to move towards a smaller scope as more and more people start using Lean. We must leave space for other projects to grow and also keep Mathlib manageable.

Artie Khovanov (Jan 06 2025 at 15:00):

What is the smaller scope?

Kevin Buzzard (Jan 06 2025 at 15:31):

My vision for the smaller scope is "things discussed in lectures or seminars in the top maths departments in the world" but others have different visions.

Artie Khovanov (Jan 06 2025 at 16:58):

(and, crucially, their prerequisites, right)

Kevin Buzzard (Jan 06 2025 at 17:28):

I was taught all prerequisites in university lectures. I saw how to build maths from scratch.

Ruben Van de Velde (Jan 06 2025 at 17:41):

Except for type theory, right :innocent:

Kevin Buzzard (Jan 06 2025 at 17:43):

That's true -- I was taught set-theoretic foundations. I learnt Lean's type theory in #tpil and I think it's safe to say that we have had all that for 7 years now :-)

Yaël Dillies (Jan 06 2025 at 18:21):

My view for a reasonable and useful scope would be "Anything that will be reused", or maybe "Anything that will be reused in a different area". This has the advantage of being actionable a posteriori: If you want to depend on a library that's not mathlib, then upstream the content you need from that library to mathlib

Ruben Van de Velde (Jan 06 2025 at 18:24):

... And that has reviewers

Yaël Dillies (Jan 06 2025 at 18:27):

If it's going to be reused, then hopefully that means there's >= 2 people interested in it :wink:

Martin Dvořák (Jan 06 2025 at 18:48):

Kim Morrison said:

If it will not be used by someone in an adjacent (or preferably far away!) field --- then I would strongly encourage development in a separate repository.

Does it mean that if I raised the question #mathlib4 > VCSP theory whether VCSP theory is in the scope of Mathlib, the answer in 2025 would be "no"? Not because of the lack of reviewers but simply because VCSP would be out of scope today?

Kevin Buzzard (Jan 06 2025 at 19:41):

I don't think that I've come across that stuff in the departments I've worked in so I wouldn't be arguing for it to be in mathlib, but there are many maintainers and people have many different opinions, so I would just remain silent when faced with a question like that. I've only expressed my personal view above, I'm not speaking for the maintainers in general.

Joseph Myers (Jan 06 2025 at 20:32):

As a maximalist, I think "anything likely to be reused" is appropriate for mathlib, and any other known mathematics (not likely to be reused) can go in the mathlib archive.

However, I think we should distinguish here between mathlib the highly integrated library and mathlib the monorepo. We could have a highly integrated library made up of multiple repositories designed to be used together, and once mathlib gets to more than 10 to 100 times its present size, we may need to have the library in multiple repositories. It's mathlib the highly integrated library that I think should have maximal scope of everything that might be reused (and likewise, a mathlib archive of known mathematics in general would also not need to be in one repository).

The highly integrated part is important for a multi-repository mathlib. All the repositories should be leanprover-community/mathlib[something] repositories, not in random personal GitHub accounts where they could be lost if the original author deletes them for some reason, or cease to be kept up to date with mathlib. They should all have the same naming, code style etc. conventions; all have the same linters enabled; all have the same CI and review processes; code shouldn't go in one of those repositories if it can sensibly go in the underlying core mathlib (and likewise if it can go somewhere intermediate in the dependency tree, once there is more than one level of dependencies). All the repositories should be updated to a new version of the core mathlib (and other intermediate dependencies) at least once a day, so there is a collection of versions of all the repositories no more than a day old that can be used together. The mathlib documentation on the website should cover all such repositories (with the ability to search either all of them together, or just a particular repository and its dependencies). (And something similar applies for a multi-repository mathlib archive, following similar conventions to the present mathlib archive and also frequently updated but not all in a single repository.)

Kim Morrison (Jan 06 2025 at 23:30):

I do like Joseph's vision for a highly integrated multi-repository mathlib. I think it's worth thinking about the resources required to maintain this. Realistically, I think there is at least one full-time technical position required just to maintain the infrastructure that is implied by this vision, not to mention the implied expansion of the maintainer/reviewer team.

Ruben Van de Velde (Jan 06 2025 at 23:32):

Also presumably reverse-dependency CI so you notice when a mathlib-the-core-repo change breaks things in mathlib-the-library-in-many-repos

Violeta Hernández (Jan 13 2025 at 15:21):

Yaël Dillies said:

"Anything that will be reused in a different area".

Does this mean the entirety of ordinal arithmetic is out? :frown:

Ruben Van de Velde (Jan 13 2025 at 15:24):

If anything, it would mean you get a whole project to yourself :)


Last updated: May 02 2025 at 03:31 UTC