Zulip Chat Archive

Stream: new members

Topic: Coding style – splitting long proofs

Martin Dvořák (Aug 17 2022 at 22:48):

Soft question (coding style):

Do you split long proofs into several lemmata just because the proof is too long? Even if they aren't logically independent pieces by using only a small subset of the propositions that are present inside the proof?

Jireh Loreaux (Aug 18 2022 at 01:13):

For mathlib, I would say "generally not." However, unless the proof is about a deep result, there is often a way to split it up into pieces, even if it means a major rewrite, or using other parts of the library better. That being said, there can always be exceptions to both statements above.

Mario Carneiro (Aug 18 2022 at 01:24):

One reason to split up a large proof is if it improves compile times

Mario Carneiro (Aug 18 2022 at 01:25):

lemmas are processed in parallel, and additionally a long proof can involve a large local context which makes all other tactics slower

Mario Carneiro (Aug 18 2022 at 01:26):

it's also just easier to work with when you are writing the proof since the response time is smaller

Mario Carneiro (Aug 18 2022 at 01:28):

that last reason is probably the most direct reason to split a long proof, it just gets too annoying to write otherwise

Martin Dvořák (Aug 18 2022 at 18:55):

I think I made a mistake that I was taking only the logical structure into account but not the length of the respective parts. I made a few levels of decomposition, after which I basically stopped breaking things apart. As a result, I have a proof that consists of 36 lemmata; many of them are under 10 lines, but some of them are longer, and one of them is a monstrous lemma with a proof of 1100 lines.

The response time cries in Lean was really annoying.

Jireh Loreaux (Aug 18 2022 at 19:05):

A proof of 1100 lines is certainly a good enough reason to split it. Are you sure that lemma can't be deconstructed further? It seems very surprising to me. We have a few long proofs in mathlib, but I'm not sure I know of any that are over say, 200 lines [other community members please insert links to counterexamples here].

Mario Carneiro (Aug 18 2022 at 19:07):

I think there are counterexamples, but I generally associate ultra-long proofs with folks who haven't quite learned the coding style yet

Mario Carneiro (Aug 18 2022 at 19:08):

It would be nice if someone could do some statistics on mathlib proof length distribution and outliers

Mario Carneiro (Aug 18 2022 at 19:11):

unfortunately environment.decl_pos only has the start position and not the end

Martin Dvořák (Aug 18 2022 at 19:13):

Jireh Loreaux said:

A proof of 1100 lines is certainly a good enough reason to split it. Are you sure that lemma can't be deconstructed further? It seems very surprising to me. We have a few long proofs in mathlib, but I'm not sure I know of any that are over say, 200 lines [other community members please insert links to counterexamples here].

It can be. I just made a mistake that I stopped after a few levels of decomposition.

Martin Dvořák (Aug 18 2022 at 19:14):

After which I was splitting things only if they were using only a small subset of propositions in the place where they were used.

Martin Dvořák (Aug 18 2022 at 19:14):

Mario Carneiro said:

(...) folks who haven't quite learned the coding style yet

Yep, this is me!

Jireh Loreaux (Aug 18 2022 at 19:18):

@Mario Carneiro As a first approximation, you could just subtract successive environment.decl_pos values. It wouldn't be accurate because of variable declarations, section headings, but I think only really long docstrings would feed into the outliers, and these might be small enough in number to discard manually.

Mario Carneiro (Aug 18 2022 at 19:19):

I just wrote a fix for lean so that the .ast.json stores command ends correctly, so if we update lean and run it on mathlib we can get accurate data here

Mario Carneiro (Aug 18 2022 at 19:23):

lean#756

Last updated: May 02 2025 at 03:31 UTC