Zulip Chat Archive

Stream: mathlib4

Topic: Is a backward-compatible subset of Mathlib possible?

Wen Yang (杨文) (May 27 2025 at 14:03):

Hi all,

I understand that Mathlib is not backward-compatible, and this remains a challenge even recently.

I believe the lack of backward compatibility could hinder the adoption of Mathlib in applications. It causes trained LLMs to quickly become outdated. It also makes it harder for developers to build programs upon Mathlib.

While we don’t have sufficient resources to make the entire Mathlib backward-compatible, perhaps we could focus on creating a sufficiently small subset that maintains compatibility?

My proposal consists of three components:

Wen Yang (杨文) (May 27 2025 at 14:03):

1. Stable-Mathlib

A project depending on Mathlib that serves as a stable subset of Mathlib. Its goal is to cover undergraduate common math courses, such as Calculus, Linear Algebra, and Probability Theory.

Initially, we assume certain definitions/theorems in Mathlib are "stable" (i.e., their names and types won’t change). Then we create formal mirrors (like symbolic links) in Stable-Mathlib.
Example: Create StableMathlib/Data/Real/Basic.lean with just one line:

import Mathlib.Data.Real.Basic

Users of Stable-Mathlib would write:

import StableMathlib.Data.Real.Basic

Note: This doesn’t mean we need to consider all content in Mathlib.Data.Real.Basic. Instead, we identify specific definitions/theorems we need and create test suites to monitor changes in Mathlib that could affect Stable-Mathlib users. When necessary, we’ll add actual code in Stable-Mathlib to maintain compatibility.

Wen Yang (杨文) (May 27 2025 at 14:03):

2. UG-common-math

A project depending on Stable-Mathlib that adds more definitions/theorems (possibly redundant ones) to cover the same areas: Calculus, Linear Algebra, and Probability Theory. Ideally, the main branch would strictly build on Stable-Mathlib, never directly using Mathlib’s definitions/theorems.

Wen Yang (杨文) (May 27 2025 at 14:04):

P.S. I’d also like to create a Chinese version of UG-common-math, but currently I can only use \f<<>> to do it. I’m curious when Lean will support native Chinese identifiers for definitions and theorems.

Wen Yang (杨文) (May 27 2025 at 14:04):

3. Textbook formalizations

Projects depending on UG-common-math, each corresponding to a formalized mathematics textbook.

Wen Yang (杨文) (May 27 2025 at 14:04):

The workflow would be inverted:

Start with textbook formalizations (component 3).
Identify needed content for component 3 → add to component 2 (UG-common-math).
Identify dependencies in UG-common-math → create test suites in component 1 (Stable-Mathlib).

Wen Yang (杨文) (May 27 2025 at 14:05):

We do not need to limit the scope to undergraduate common math courses. While I think many people might be interested in these courses, someone suggested starting with Abstract Algebra instead.

Wen Yang (杨文) (May 27 2025 at 14:05):

I’d like to know if this approach is feasible. Specifically:

Are any of these steps already being implemented by others?
Could performance issues become significant?
Would the maintenance workload be too heavy?

Wen Yang (杨文) (May 27 2025 at 14:05):

Looking forward to your feedback!

Wrenna Robson (May 27 2025 at 14:06):

To be honest my response to "it could consider trained LLMs to become outdated" is "that sounds like an LLM developer problem". If they can only cope with stable APIs then they're less useful than just getting people to do it.

Wrenna Robson (May 27 2025 at 14:07):

But I think in general this is an interesting idea and a good point. (Your point about chinese identifiers is also a really good one and probably deserves a separate post.)

Michael Rothgang (May 27 2025 at 14:08):

In my perception, recent efforts towards backwards compatibility are mostly spent towards "make it easy to migrate to a new mathlib version". If you could migrate your project to a newer version of mathlib mostly automatically/with little effort, is that much worse than a guaranteed-stable project?

Ruben Van de Velde (May 27 2025 at 14:09):

Who will do this work?

Ruben Van de Velde (May 27 2025 at 14:09):

And what will the benefit be?

Michael Rothgang (May 27 2025 at 14:10):

I don't see mathlib maintainers committing to maintaining a stable subset of mathlib soon. (Caveat: I'm a mathlib reviewer, but not a maintainer.) There is lots of work to be done, and so far I don't see why one should spent all the energy on this project.
Having an unofficial-stable-mathlib depending on mathlib is possible, of course: the question is who will do the work.

Michael Rothgang (May 27 2025 at 14:10):

And why: I'm not convinced yet the massive work required has sufficient use to justify it.

Michael Rothgang (May 27 2025 at 14:13):

Wen Yang said:

The workflow would be inverted:

Start with textbook formalizations (component 3).

Identify needed content for component 3 → add to component 2 (UG-common-math).

Identify dependencies in UG-common-math → create test suites in component 1 (Stable-Mathlib).

I'm sceptical this can work. Mathlib's completeness varies a lot by area, and e.g. in differential geometry a large number of relatively basic things are still missing. (Sometimes because they are hard!) In such an area, I'd rather push for contributions to mathlib, not an intermediate "stable mathlib". To add, if I formalise something for the first time, it's unlikely to be "good enough" for the next three years without change. I'd rather add material to mathlib first and let it mature there.

Wen Yang (杨文) (May 27 2025 at 14:14):

Ruben Van de Velde 发言道：

Who will do this work?

I'm planning to gather some people to work on it, provided that it doesn't require too much effort or resources.

Michael Rothgang (May 27 2025 at 14:14):

Michael Rothgang said:

In my perception, recent efforts towards backwards compatibility are mostly spent towards "make it easy to migrate to a new mathlib version". If you could migrate your project to a newer version of mathlib mostly automatically/with little effort, is that much worse than a guaranteed-stable project?

I should add: help towards these efforts is very welcome. Some ideas simply need somebody to implement them "well enough", so we can standardise on one tool.

Wen Yang (杨文) (May 27 2025 at 14:18):

Michael Rothgang 发言道：

If you could migrate your project to a newer version of mathlib mostly automatically/with little effort, is that much worse than a guaranteed-stable project?

I think it's unlikely that migrating my project(s) to a newer version of mathlib can be done mostly automatically or with little effort — in fact, the process could be quite painful.

Wen Yang (杨文) (May 27 2025 at 14:22):

Ruben Van de Velde 发言道：

And what will the benefit be?

I don't know. But at the very least, I hope to start formalizing a university-level math textbook now. However, I think that keeping up with Mathlib's updates will be a challenge.

Wen Yang (杨文) (May 27 2025 at 14:29):

Michael Rothgang 发言道：

And why: I'm not convinced yet the massive work required has sufficient use to justify it.

I believe we need a mathematical library that is more accommodating to specific, possibly redundant definitions, theorems, and examples than Mathlib currently allows. For instance, there are theorems that hold true only for some functions but cannot be expected to apply to more general functions. And the textbooks often include examples and counterexamples that might not accepted by Mathlib.

Wen Yang (杨文) (May 27 2025 at 14:35):

Michael Rothgang 发言道：

Mathlib's completeness varies a lot by area, and e.g. in differential geometry a large number of relatively basic things are still missing. (Sometimes because they are hard!) In such an area, I'd rather push for contributions to mathlib, not an intermediate "stable mathlib". To add, if I formalise something for the first time, it's unlikely to be "good enough" for the next three years without change. I'd rather add material to mathlib first and let it mature there.

@Michael Rothgang I fully agree with your view. However, I think it's also important to recognize that many of the definitions, theorems, and examples required for formalizing a university-level math textbook might not fit well within Mathlib — often because they are context-specific or pedagogical in nature, rather than part of a generalized mathematical library. This doesn't diminish their value, but suggests that there might be a need for complementary resources alongside Mathlib.

Ziyu Zhou (May 27 2025 at 14:39):

Even in basic tutorials, such as math in lean4, we see a lot of changes caused by mathlib updates. I'm not sure how small "a stable subset" will be.
For example:
https://github.com/avigad/mathematics_in_lean_source/commit/82c6e7a045000c5c99c89483209e960455322b79
https://github.com/avigad/mathematics_in_lean_source/commit/12c71e9a4ca3b0a4c495fdc18e094e76d05c510c

Wen Yang (杨文) (May 27 2025 at 14:46):

Ziyu Zhou 发言道：

Even in basic tutorials, such as math in lean4, we see a lot of changes caused by mathlib updates. I'm not sure how small "a stable subset" will be.

I’m not sure either. In my view, the scope covered by MIL is quite limited — and perhaps this is precisely because Mathlib itself is still evolving and not yet sufficiently stable to support a broader teaching base.

Ziyu Zhou (May 27 2025 at 14:51):

Brainstorm: A better idea might be an AI agent that can track mathlib updates in real time to help us manage the project.

Kevin Buzzard (May 27 2025 at 14:55):

If @Jireh Loreaux gets his way (and he might well) then Group will one day be refactored yet again in mathlib (a pnat_smul field might be added), so I cannot imagine that anyone in charge of the library would commit to any stability right now.

Wen Yang (杨文) (May 27 2025 at 22:51):

Kevin Buzzard 发言道：

If Jireh Loreaux gets his way (and he might well) then Group will one day be refactored yet again in mathlib (a pnat_smul field might be added), so I cannot imagine that anyone in charge of the library would commit to any stability right now.

@Kevin Buzzard I just found the relevant code in #22517:

class Semigroup (G : Type u) extends Mul G where
  /-- Multiplication is associative -/
  protected mul_assoc : ∀ a b c : G, a * b * c = a * (b * c)
  /-- Positive integer power operation -/
  ppow : ∀ n : ℕ, 0 < n → G → G := ppowRec
  ppow_one : ∀ g, ppow 1 Nat.one_pos g = g := by intros; rfl
  ppow_succ : ∀ g n, ppow (n + 2) (n + 1).succ_pos g = g * ppow (n + 1) n.succ_pos g :=
    by intros; rfl

Are you referring to this part? From my perspective, this is indeed an important example, and I should prepare for such changes. However, I’m not fully grasping your point here. I don’t see this code having a major impact because the new fields (ppow, ppow_one, ppow_succ) appear to be automatically generated. It seems unlikely that these additions would break existing theorems or proofs about Semigroup.

I just read this:
Jireh Loreaux 发言道：

#22517 implements the forgetful inheritance approach, but the performance impact is absurdly large (1.1 tera-instructions). It seems to be a "death by a thousand cuts situation" for the most part with the largest single-file instruction increase being 24 billion. Does anyone know what might be a good potential culprit?

Are you suggesting that even though downstream code continues to work after the refactor, the performance overhead introduced by the changes might be significantly higher than expected?

Jireh Loreaux (May 28 2025 at 02:27):

I'm suggesting that such a large performance increase is unacceptable, and I need to figure out how to reduce it significantly before the PR can be reasonably considered.

Kevin Buzzard (May 28 2025 at 04:23):

Yes you've found the right PR. The fields have optional parameters so if you leave them out then they'll be generated automatically, but sometimes the default values will be the "wrong" things and in this case one will have to fill them in manually (indeed if the automatic solution was correct always then I guess we would not need the extra fields).

Last updated: Feb 28 2026 at 14:05 UTC