Zulip Chat Archive
Stream: new members
Topic: Coding style – splitting long proofs
Martin Dvořák (Aug 17 2022 at 22:48):
Soft question (coding style):
Do you split long proofs into several lemmata just because the proof is too long? Even if they aren't logically independent pieces by using only a small subset of the propositions that are present inside the proof?
Jireh Loreaux (Aug 18 2022 at 01:13):
For mathlib, I would say "generally not." However, unless the proof is about a deep result, there is often a way to split it up into pieces, even if it means a major rewrite, or using other parts of the library better. That being said, there can always be exceptions to both statements above.
Mario Carneiro (Aug 18 2022 at 01:24):
One reason to split up a large proof is if it improves compile times
Mario Carneiro (Aug 18 2022 at 01:25):
lemmas are processed in parallel, and additionally a long proof can involve a large local context which makes all other tactics slower
Mario Carneiro (Aug 18 2022 at 01:26):
it's also just easier to work with when you are writing the proof since the response time is smaller
Mario Carneiro (Aug 18 2022 at 01:28):
that last reason is probably the most direct reason to split a long proof, it just gets too annoying to write otherwise
Martin Dvořák (Aug 18 2022 at 18:55):
I think I made a mistake that I was taking only the logical structure into account but not the length of the respective parts. I made a few levels of decomposition, after which I basically stopped breaking things apart. As a result, I have a proof that consists of 36 lemmata; many of them are under 10 lines, but some of them are longer, and one of them is a monstrous lemma with a proof of 1100 lines.
The response time cries in Lean was really annoying.
Jireh Loreaux (Aug 18 2022 at 19:05):
A proof of 1100 lines is certainly a good enough reason to split it. Are you sure that lemma can't be deconstructed further? It seems very surprising to me. We have a few long proofs in mathlib, but I'm not sure I know of any that are over say, 200 lines [other community members please insert links to counterexamples here].
Mario Carneiro (Aug 18 2022 at 19:07):
I think there are counterexamples, but I generally associate ultra-long proofs with folks who haven't quite learned the coding style yet
Mario Carneiro (Aug 18 2022 at 19:08):
It would be nice if someone could do some statistics on mathlib proof length distribution and outliers
Mario Carneiro (Aug 18 2022 at 19:11):
unfortunately environment.decl_pos
only has the start position and not the end
Martin Dvořák (Aug 18 2022 at 19:13):
Jireh Loreaux said:
A proof of 1100 lines is certainly a good enough reason to split it. Are you sure that lemma can't be deconstructed further? It seems very surprising to me. We have a few long proofs in mathlib, but I'm not sure I know of any that are over say, 200 lines [other community members please insert links to counterexamples here].
It can be. I just made a mistake that I stopped after a few levels of decomposition.
Martin Dvořák (Aug 18 2022 at 19:14):
After which I was splitting things only if they were using only a small subset of propositions in the place where they were used.
Martin Dvořák (Aug 18 2022 at 19:14):
Mario Carneiro said:
(...) folks who haven't quite learned the coding style yet
Yep, this is me!
Jireh Loreaux (Aug 18 2022 at 19:18):
@Mario Carneiro As a first approximation, you could just subtract successive environment.decl_pos
values. It wouldn't be accurate because of variable declarations, section headings, but I think only really long docstrings would feed into the outliers, and these might be small enough in number to discard manually.
Mario Carneiro (Aug 18 2022 at 19:19):
I just wrote a fix for lean so that the .ast.json stores command ends correctly, so if we update lean and run it on mathlib we can get accurate data here
Mario Carneiro (Aug 18 2022 at 19:23):
Last updated: Dec 20 2023 at 11:08 UTC