Zulip Chat Archive

Stream: mathlib4

Topic: Philosophy: single-purpose lemma

Martin Dvořák (Apr 24 2024 at 11:35):

We had some disagreement in a current PR thread (#12380) which was rather philosophical in nature than an issue of the particular code we wrote, so I would like to bring the philosophical issue here.

Is it acceptable to declare a lemma that will be used only once?

Pro: It makes the complicated proof (that uses this lemma) shorter.
Con: It makes the total length of the code longer.
Pro: Even tho we think that it is a single-purpose lemma, another use of the lemma might pop up later.
Con: If the lemma is too idiosyncratic, we will probably not be able to name it in such a way that a potential later user will be able to find it even if they happen to need it.
Pro: The lemma can always be found by importing the entire Mathlib and writing exact? or similar.
Con: If the lemma is too idiosyncratic, its potential second use case will be probably phrased a bit differently, which will make exact? fail.
Pro: If we inline the proof of the lemma inside the theorem that uses it (instead of keeping the single-purpose lemma in the production code), a potential second user will definitely have to duplicate the code as well as if the lemma existed and was not found; however, it is easier to do refactoring that will search for duplicate lemmas than refactoring that will search for duplicate blocks inside different proofs.

A second question is whether such a single-purpose lemma should be private.

Ruben Van de Velde (Apr 24 2024 at 11:39):

I think it's fine if it does improve the complicated proof - either in readability or in performance. I would probably keep it next to the complicated proof rather than trying to find a "logical" place for it

Yaël Dillies (Apr 24 2024 at 12:08):

... and mark it private

Sébastien Gouëzel (Apr 24 2024 at 13:45):

I'd say it's only acceptable if the lemma is nontrivial (i.e., the proof of the lemma has more than three lines). Otherwise, you should inline it in the proof of the main result.

Martin Dvořák (Apr 24 2024 at 15:21):

However, if the lemma is used more than once (even if it was used twice by the same theorem), is it OK to have a trivial lemma, right?

Sébastien Gouëzel (Apr 24 2024 at 15:32):

Do you have a concrete example for me? My general impression is that lemmas which are really specific and have a one-line or two-lines proof should almost never be stated separately, because you won't gain any readability compared to just giving the proof where you need it.

Rida Hamadani (Apr 24 2024 at 15:35):

One example that comes to mind is this.

Mario Carneiro (Apr 24 2024 at 15:46):

I would normally do that as a have

Mario Carneiro (Apr 24 2024 at 15:47):

If there are reasons why the theorem will never be referenced again later (e.g. it has additional hypotheses not in the main theorem with the same statement), then I think it should be inlined

Rida Hamadani (Apr 24 2024 at 16:01):

My reason for preferring have is that when going through a file, I'm usually searching for a "useful lemma", so having less things to search through is more efficient (a single-purpose lemma doesn't fit the definition for a "useful lemma").

Martin Dvořák (Apr 24 2024 at 17:58):

Sébastien Gouëzel said:

Do you have a concrete example for me? My general impression is that lemmas which are really specific and have a one-line or two-lines proof should almost never be stated separately, because you won't gain any readability compared to just giving the proof where you need it.

Currently for example this lemma:

lemma Set.between_inter {I T X : Set α} (hIT : I ⊆ T) (hTX : T ⊆ X) (E : Set α) :
    I ∩ E ⊆ T ∩ E ∧ T ∩ E ⊆ X ∩ E :=
  ⟨Set.inter_subset_inter_left E hIT, Set.inter_subset_inter_left E hTX⟩

It is completely trivial and brings no extra value for Mathlib as a whole, but it perfectly fits into the shape of our goals, so we use it.

Patrick Massot (Apr 24 2024 at 18:00):

This example seems very clear: this should not be stated, even as a private lemma.

Martin Dvořák (Apr 24 2024 at 18:03):

In my older project, this would be a good example:

lemma List.length_append_append :
    (x ++ y ++ z).length = x.length + y.length + z.length := by
  rw [List.length_append, List.length_append]

Again, completely trivial and useless for "how many features does Mathlib provide", but I used it 13 times throughout my project, so it paid off.

Sébastien Gouëzel (Apr 24 2024 at 18:21):

This one is just simp [List.length_append], so I don't see how the lemma could help compared to this single command whose name is not longer than the name of the lemma. Maybe it's a #xy problem: if you show me how you use this lemma, I can probably show you how to avoid it without any loss in readability.

Martin Dvořák (Apr 24 2024 at 18:26):

Let's say the current goal is

(a ++ b).length = (x ++ y ++ z).length

and the desired goal is

(a ++ b).length = x.length + y.length + z.length

after which I want to do some other transformations, probably exploiting some properties of (a ++ b) without ripping them apart. The easiest way how to get from the former proof state to the latter proof state is to use rw [List.length_append_append].

Yaël Dillies (Apr 24 2024 at 18:27):

I thought similar things until I embraced calc. Maybe you could calc a try?

Sébastien Gouëzel (Apr 24 2024 at 18:28):

For this one you just need rw [List.length_append _ z, List.length_append x], so there is essentially no gain.

Martin Dvořák (Apr 24 2024 at 18:29):

The gain is in both readability and how fast I can write it.

Martin Dvořák (Apr 24 2024 at 18:30):

I think there is a value in being able to express "one step of my thinking" by "one step in Lean".

Sébastien Gouëzel (Apr 24 2024 at 18:34):

The issue is for your readers: if you introduce more lemmas, they have more to remember. Just like when you write a math paper, the focus should not be on how easy it is for you to write something, but how you're making it as easy as possible to read.

Martin Dvořák (Apr 24 2024 at 18:36):

The reader should read the main result and all definitions that the main result depends on.

Martin Dvořák (Apr 24 2024 at 18:39):

Sébastien Gouëzel said:

I'd say it's only acceptable if the lemma is nontrivial (i.e., the proof of the lemma has more than three lines). Otherwise, you should inline it in the proof of the main result.

This reminds me of something our teacher said when I was in the second year of my undergrad:
"The difference between mathematicians and computer scientists is that mathematicians like when things are nontrivial."
Back then I didn't understand what he meant.

Jireh Loreaux (Apr 24 2024 at 18:51):

Martin Dvořák said:

The reader should read the main result and all definitions that the main result depends on.

This presupposes that the reader doesn't care about the proof just because it is formally verified. I would argue that's false in general.

David Loeffler (Apr 25 2024 at 05:58):

Jireh Loreaux said:

Martin Dvořák said:

The reader should read the main result and all definitions that the main result depends on.

This presupposes that the reader doesn't care about the proof just because it is formally verified. I would argue that's false in general.

The assertion may not always be true, but I think it's true rather a substantial fraction of the time – particularly for mathlib files near the leaves of the import tree, which often consist of long chains of obscure technical lemmas leading up to a few big theorems at the end, with the preliminary lemmas being utterly useless outside the context of that particular proof. If I'm looking for some specific well-known theorem in the manual because I want to use it elsewhere, I don't care about the technical lemmas leading to it – only the theorem and the definitions needed to state it.

I would argue for a wider use of private in mathlib, in order to avoid cluttering the reference manual and the autocomplete results with single-use lemmas like this. At one time I recall being told that private was always discouraged in mathlib contributions, and I am pleasantly surprised to see Yael's suggestion in this thread to use private getting some upvotes. Another possibility is to wrap the technical results in a dedicated namespace and only export the "interesting" results to the root namespace, but this only partially solves the clutter issue.

Martin Dvořák (Apr 25 2024 at 07:27):

Martin Dvořák said:

lemma Set.between_inter {I T X : Set α} (hIT : I ⊆ T) (hTX : T ⊆ X) (E : Set α) :
    I ∩ E ⊆ T ∩ E ∧ T ∩ E ⊆ X ∩ E :=
  ⟨Set.inter_subset_inter_left E hIT, Set.inter_subset_inter_left E hTX⟩

It is completely trivial and brings no extra value for Mathlib as a whole, but it perfectly fits into the shape of our goals, so we use it.

Unpopular opinion:

Lemmas like Set.between_inter become (locally) helpful once you bracket your conjunctions in a logical way instead of going with the flow.

Eric Wieser (Apr 25 2024 at 08:02):

Popular opinion: conjunctions should almost never appear in the conclusions of lemmas, especially if each half of the conjunction already exists as a lemma

Martin Dvořák (Apr 25 2024 at 08:13):

I agree with you. What we did is some kind of compromise between declaring a Prop "is between" and writing every inclusion explicitly. In literature, they say T is a set of sets T := { I' | I ⊆ I' ⊆ X } and then talk about "belonging to T"; however, for formalization, it is better not to define T as a set of sets. Perhaps is best not to define T at all.

Martin Dvořák (Apr 25 2024 at 08:17):

Sébastien Gouëzel said:

The issue is for your readers: if you introduce more lemmas, they have more to remember. Just like when you write a math paper, the focus should not be on how easy it is for you to write something, but how you're making it as easy as possible to read.

I think that List.length_append_append gets bonus points for having a name that is easy to understand without having to press F12.

Martin Dvořák (Apr 25 2024 at 08:21):

Rida Hamadani said:

having less things to search through is more efficient

In general, I am gradually turning away from searching stuff by scrolling.

Martin Dvořák (Apr 25 2024 at 08:26):

Eric Wieser said:

Popular opinion: conjunctions should almost never appear in the conclusions of lemmas, especially if each half of the conjunction already exists as a lemma

I agree that the "main logical connective" in the conclusions of lemmas should not be AND. However, it is perfectly fine if AND appears inside OR or inside IFF that is the conclusion of the lemma.

Nevertheless, for private lemmas I'd drop even the first requirement. I wish a private lemma could be whatever the caller finds convenient.

Martin Dvořák (Feb 17 2025 at 09:23):

Yaël Dillies said:

... and mark it private

Is #20433 a good candidate for a private lemma? It is pretty specific but not so niche that I would be sure it will never be used elsewhere.

Yaël Dillies (Feb 17 2025 at 11:08):

Yes, I would make it private at the location you need it

Last updated: May 02 2025 at 03:31 UTC