## Stream: general

### Topic: huge term trees

#### Daniel Selsam (Jan 16 2020 at 14:17):

I have been doing more performance analysis on mathlib, and I found something rather intriguing. At some point during compilation, there is a typeclass goal with a ~1GB string representation. The consequences of such huge term trees are subtle. Although I did not compute the size of the term DAG (as opposed to the term tree), my suspicion is that it is relatively small. Most modules in Lean3 (and Lean4) are careful to traverse expressions as DAGs instead of as trees, and for these modules, explosion in term tree sizes may not have much ill-effect. Indeed, typeclass synthesis only takes 5ms for this example despite the enormous term tree size. However, not all modules are designed to avoid this blowup, and so huge term trees do carry the risk of massive slowdowns in some parts of the system. In some cases, Lean has even been consciously removing support for efficiently handling huge term trees, since many of the things it seems to require actually hurt performance in the majority of cases. In particular, there is no current plan for Lean4 to address performance problems arising from huge term trees.

What is causing these huge term trees? Can you design the libraries in a way that avoids this blowup?

#### Daniel Selsam (Jan 16 2020 at 14:17):

In case it helps, here are the first 1,000 characters of the typeclass goal mentioned above:

"""
decidable (Pi {X : category_theory.functor.{v₁ (max u₂ v₃) u₁ (max v₂ v₃ u₂ u₃)} C 𝒞 (category_theory.functor.{v₂ v₃ u₂ u₃} D 𝒟 E ℰ) (category_theory.functor.category.{v₂ v₃ u₂ u₃} D 𝒟 E ℰ)} {Y : category_theory.functor.{v₁ (max u₂ v₃) u₁ (max v₂ v₃ u₂ u₃)} C 𝒞 (category_theory.functor.{v₂ v₃ u₂ u₃} D 𝒟 E ℰ) (category_theory.functor.category.{v₂ v₃ u₂ u₃} D 𝒟 E ℰ)} {Z : category_theory.functor.{v₁ (max u₂ v₃) u₁ (max v₂ v₃ u₂ u₃)} C 𝒞 (category_theory.functor.{v₂ v₃ u₂ u₃} D 𝒟 E ℰ) (category_theory.functor.category.{v₂ v₃ u₂ u₃} D 𝒟 E ℰ)} (f : category_theory.has_hom.hom.{(max u₁ u₂ v₃) (max v₁ (max u₂ v₃) u₁ v₂ v₃ u₂ u₃)} (category_theory.functor.{v₁ (max u₂ v₃) u₁ (max v₂ v₃ u₂ u₃)} C 𝒞 (category_theory.functor.{v₂ v₃ u₂ u₃} D 𝒟 E ℰ) (category_theory.functor.category.{v₂ v₃ u₂ u₃} D 𝒟 E ℰ)) (category_theory.category_struct.to_has_hom.{(max u₁ u₂ v₃) (max v₁ (max u₂ v₃) u₁ v₂ v₃ u₂ u₃)} (category_theory.functor.{v₁ (max u₂ v₃) u₁ (max v₂ v₃ u₂ u₃)} C 𝒞 (category_theory.functor.{v₂ v₃ '
"""

#### Daniel Selsam (Jan 16 2020 at 14:20):

And here is a toy pattern that could explain:

class C1 (α : Type)  : Type := (u:Unit:=())
instance (α : Type)  : C1 α := {}
class C2 (α : Type) [C1 α]  : Type := (u:Unit:=())
instance (α : Type) [C1 α]  : C2 α := {}
class C3 (α : Type) [C1 α] [C2 α]  : Type := (u:Unit:=())
instance (α : Type) [C1 α] [C2 α]  : C3 α := {}
class C4 (α : Type) [C1 α] [C2 α] [C3 α]  : Type := (u:Unit:=())
instance (α : Type) [C1 α] [C2 α] [C3 α]  : C4 α := {}
class C5 (α : Type) [C1 α] [C2 α] [C3 α] [C4 α]  : Type := (u:Unit:=())
instance (α : Type) [C1 α] [C2 α] [C3 α] [C4 α]  : C5 α := {}
-- and so on


#### Johan Commelin (Jan 16 2020 at 14:23):

First of all: Thanks for all your efforts. The Lean 4 team is doing a really great job!
That said:

Can you design the libraries in a way that avoids this blowup?

I understand this question. But my first reaction is: mathematicians shouldn't have to care...

#### Johan Commelin (Jan 16 2020 at 14:24):

@Daniel Selsam Do you have the name and line numbers for this instance in mathlib?

#### Mario Carneiro (Jan 16 2020 at 21:31):

In particular, there is no current plan for Lean4 to address performance problems arising from huge term trees.

This is a bad idea. Large trees before deduplication is an inevitable consequence of dependent type theory due to redundancy inside types and dependent arguments. It's not a thing you can easily work around as a user. This will be the thing that we will all notice and hope to be improved in the next version if you don't fix it at this early stage.

#### Mario Carneiro (Jan 16 2020 at 21:32):

There is a coq paper about typeclass slowness using exactly your C1-C5 example

#### Daniel Selsam (Jan 16 2020 at 21:49):

There is a coq paper about typeclass slowness using exactly your C1-C5 example

Using unbundled typeclasses.

#### Mario Carneiro (Jan 16 2020 at 21:49):

I think that's what you are doing in the example as well

#### Mario Carneiro (Jan 16 2020 at 21:50):

the parameters are stacking up, and each one refers to all the previous, so instances on C5 are already O(n^5)

#### Daniel Selsam (Jan 16 2020 at 21:50):

I think that's what you are doing in the example as well

Yes, but that is not the way typeclasses are generally used in mathlib.

#### Mario Carneiro (Jan 16 2020 at 21:50):

It is, when more types get involved

#### Mario Carneiro (Jan 16 2020 at 21:51):

For example module A B needs ring A and add_group B as parameters

#### Johan Commelin (Jan 16 2020 at 21:51):

And topological_vector_space needs ...

#### Mario Carneiro (Jan 16 2020 at 21:51):

I'm not exactly sure where your example is coming from but category theory has a similar problem with mandatory dependencies

#### Daniel Selsam (Jan 16 2020 at 21:53):

Yes but ring A and add_group B internally bundle many layers. Do you know which class in mathlib takes the most instance parameters? It seems harder to accidentally take a ton of instance parameters in Lean since it doesn't support the implicit generalization that Coq does.

#### Mario Carneiro (Jan 16 2020 at 21:53):

topological vector space is probably a contender

#### Johan Commelin (Jan 16 2020 at 21:54):

complete topological vector space?

#### Daniel Selsam (Jan 16 2020 at 21:54):

In Coq, the same example looks harmless :

Class C1 (X : Set) : Set := {}.
Instance I1 (X : Set) : C1 X := {}.
Class C2 (X : Set) {C1 X} : Set := {}.
Instance I2 (X : Set) {C1 X} : C2 X := {}.
Class C3 (X : Set) {C2 X} : Set := {}.
Instance I3 (X : Set) {C2 X} : C3 X := {}.
Class C4 (X : Set) {C3 X} : Set := {}.
Instance I4 (X : Set) {C3 X} : C4 X := {}.
Class C5 (X : Set) {C4 X} : Set := {}.
Instance I5 (X : Set) {C4 X} : C5 X := {}.
Example failing_tower : C5 unit := _.


#### Mario Carneiro (Jan 16 2020 at 21:54):

I recall that being a feature request on lean 3

#### Patrick Massot (Jan 16 2020 at 21:56):

Looking harmless was a feature request?

#### Mario Carneiro (Jan 16 2020 at 21:56):

Being able to write [module A B] and getting {A} {B} [ring A] [add_group B] [module A B] automatically

#### Mario Carneiro (Jan 16 2020 at 21:57):

This is very nice when you are doing e.g. linear algebra between two or three vector spaces

Oooh...

#### Patrick Massot (Jan 16 2020 at 21:58):

I didn't even know you could ask Santa Claus to get that.

#### Kevin Buzzard (Jan 16 2020 at 21:59):

It would be delicate to get right -- if B was already an add_group for some other reason then you don't want to add it again.

#### Mario Carneiro (Jan 16 2020 at 22:00):

I would have pointed to the issue page but they are all gone :(

#### Daniel Selsam (Jan 17 2020 at 00:26):

In particular, there is no current plan for Lean4 to address performance problems arising from huge term trees.

This is a bad idea. Large trees before deduplication is an inevitable consequence of dependent type theory due to redundancy inside types and dependent arguments. It's not a thing you can easily work around as a user. This will be the thing that we will all notice and hope to be improved in the next version if you don't fix it at this early stage.

I agree that it might be this critical, but I don't have enough information yet to have an informed opinion about what to do. In particular, there might be acceptable ways of keeping the heights of these towers acceptably bounded, and the possible backend solutions might all have substantial downsides. For now, let us all work together to investigate the problem and to assess our options.

#### Daniel Selsam (Jan 17 2020 at 00:30):

Daniel Selsam Do you have the name and line numbers for this instance in mathlib?

Unfortunately, no. It would take quite a bit more bookkeeping to get location information for arbitrary typeclass queries, and I won't have time or CPU cycles for this for several days. Is somebody else able to investigate in the meantime with the new (and much improved) leancrawler? It is very well-documented. Here is what I suggest to start: create a lean file that imports all of mathlib, export it to a .yaml as described in the README, and run a simple script like the following:

from crawler import LeanLib
import networkx

lib = LeanLib.from_yaml('FindBig', 'export.yaml')

decls = list(lib)

print("\nTop-10 biggest decls by type:\n")
decls.sort(key=(lambda decl: - decl.type_size))
for i, decl in enumerate(decls[:10]):
print(decl, decl.type_size, decl.type_dedup_size)

print("\nTop-10 biggest decls by value:\n")
decls.sort(key=(lambda decl: - decl.value_size))
for i, decl in enumerate(decls[:10]):
print(decl, decl.value_size, decl.value_dedup_size)


#### Daniel Selsam (Jan 17 2020 at 00:31):

I just ran this, but only importing a few mathlib theories. This was the output:

Top-10 biggest decls by type:

category_theory.limits.is_colimit.hom_desc 35592 1159
category_theory.limits.is_limit.hom_lift 7508 1350
gcd_domain.mk.inj 6650 3306
gcd_domain.mk.inj_arrow 6643 3319
algebraic_geometry.PresheafedSpace.ext 6407 549
decidable_linear_ordered_semiring.mk.inj 6387 3709
decidable_linear_ordered_semiring.mk.inj_arrow 6378 3758
lattice.complete_boolean_algebra.mk.inj 6252 4008
lattice.complete_boolean_algebra.mk.inj_arrow 6241 4025
linear_ordered_semiring.mk.inj 5958 3444

Top-10 biggest decls by value:

algebraic_geometry.PresheafedSpace.stalk_map.comp 63928223 16816
emetric.countable_closure_of_compact 6553913 35855
real.exists_sup 6422084 11798
Top.presheaf.stalk_pushforward.comp 4524414 13016
category_theory.equivalence.unit_inverse_comp 3779575 9474
is_measurable.diff_null 1158921 7082
Top.presheaf.stalk_pushforward.id 899051 5074


#### Daniel Selsam (Jan 17 2020 at 01:29):

Also, to stress: most parts of the system are very careful to only traverse terms as DAGs. So, even with the status quo I don't think a term with a $2^{10}$ tree-to-DAG ratio would in general suffer anywhere near $2^{10}$ overhead overall. But if 10 becomes 20 or 30, then any weak link anywhere in the system could cause epic slowdowns.

#### Johan Commelin (Jan 17 2020 at 05:49):

This is completely crazy. The top two entries are

category_theory.limits.is_colimit.hom_desc 35592 1159
category_theory.limits.is_limit.hom_lift 7508 1350


which are exactly each others duals. So why does the first one get a 35592 and the second one a 7508.

#### Johan Commelin (Jan 17 2020 at 05:49):

@Daniel Selsam Thanks for this little demo. I'm sure we can find a machine to crunch some numbers for us.

#### Daniel Selsam (Jan 24 2020 at 01:50):

Here is the experiment I suggest starting with:

• import all of mathlib (as described at https://github.com/leanprover-community/mathlib/pull/1281)
• use leancrawler and the script above to find the biggest top-level types and values
• for the def with the biggest value, list the K biggest subterms that appear >N times for some big Ns.
• if there is a blowup, hopefully this list will provide a strong hint as to the cause

#### Tim Daly (Jan 24 2020 at 11:18):

There is a phenomena in computer algebra called "intermediate expression swell". A computation can start with a small input and end with a small output but the intermediate steps generate many megabytes. I wonder if there is a similar phenomena that occurs in certain proofs. It may not be a "bug" per se, just an emergent property of the particular proof technique.

#### Daniel Selsam (Jan 24 2020 at 18:33):

Ran the script posted above on all of mathlib:
"""
Top-10 biggest decls by value:

tangent_bundle_model_space_chart_at 2499367138 21824
Gromov_Hausdorff.totally_bounded 2448909819 42588
...
"""
With commas: 21,824 -> 2,499,367,138 which is >100,000x tree-to-dag ratio.

#### Daniel Selsam (Jan 24 2020 at 18:37):

The proof seems to be mostly simp. Could someone who understands the proof comment on why there might be so much sharing?

#### Johan Commelin (Jan 24 2020 at 18:39):

@Sebastien Gouezel Is the expert on that proof. @Daniel Selsam I'm trying to understand what you mean. What are those numbers representing?

#### Johan Commelin (Jan 24 2020 at 18:39):

The number of nodes in a dag (resp. tree) representation of the proof?

#### Daniel Selsam (Jan 24 2020 at 18:40):

The number of nodes in a dag (resp. tree) representation of the proof?

Yes.

#### Daniel Selsam (Jan 24 2020 at 18:46):

To clarify: this kind of repeated subterm might not be a problem at all. Based on Lean3's --profile report, it seems to only take a few seconds to elaborate and type-check it, and the moderate slowness might not even have anything to do with the tree size.

#### Johan Commelin (Jan 24 2020 at 18:51):

I can't really parse this proof. But it's clear that it's doing some massive simps (-;

#### Johan Commelin (Jan 24 2020 at 18:52):

I don't think I can really help here

#### Rob Lewis (Jan 24 2020 at 18:53):

There's something kind of funny going on there with the simps followed by erw too.

#### Kevin Buzzard (Jan 24 2020 at 19:05):

@Daniel Selsam just to let you know in case you didn't already -- this work by @Sebastien Gouezel on real manifolds is mathematics which is not in any of the other theorem provers. It is interesting to note that it is also (final year) undergraduate-level mathematics. I have argued that the fact that Lean could do it at all means that Lean is more suitable for formalising all mathematics than Isabelle is. Isabelle is good at analysis but never got this far into the complex definitions in the theory.

#### Sebastien Gouezel (Jan 24 2020 at 19:39):

Ran the script posted above on all of mathlib:
"""
Top-10 biggest decls by value:
Top-10 biggest decls by value:

tangent_bundle_model_space_chart_at 2499367138 21824
Gromov_Hausdorff.totally_bounded 2448909819 42588
"""
With commas: 21,824 -> 2,499,367,138 which is >100,000x tree-to-dag ratio.

It's funny, I am responsible for both of them in two unrelated areas. What I can tell is that both involve complicated objects, but otherwise they are of a somewhat different nature. The first one has big simps, but otherwise it is mostly unfolding definitions. The main unusual point in the second one involves constructing new types depending on some objects in the proof.

#### Sebastien Gouezel (Jan 25 2020 at 09:55):

For the very big tree-to-dag ratio in Gromov_Hausdorff.totally_bounded, this proof uses a lot the Hausdorff distance. I remember that at the beginning these proofs were super-slow, and this was solved by making Hausdorff distance irreducible. If I understand correctly, making something irreducible does not change anything for the kernel, which still has to deal with the full definition. So maybe the definition of Hausdorff distance is already big, and since it appears a lot of times in the proof this might explain what is going on. Hausdorff distance is defined in terms of supremums of infimums of real numbers, which are themselves defined with Cauchy sequences, so depending on the inner workings this might definitely be huge.

#### Mario Carneiro (Jan 25 2020 at 09:57):

The kernel will only see this if it is unfolded. Making it irreducible should prevent the elaborator from creating terms which require such unfolding

#### Mario Carneiro (Jan 25 2020 at 10:04):

I think what is actually happening is that those have statements are not actual haves, they are repeated subterms. For example, I can clearly see the full proof of

  have : ∀p:GH_space, ∃s : set (p.rep), ∃N ≤ K n, ∃E : equiv s (fin N),
p ∈ t → univ ⊆ ⋃x∈s, ball x (u n),


copied many times in the #print output

#### Mario Carneiro (Jan 25 2020 at 10:35):

Hm, I tried writing a variant of have that actually inserts beta redexes in the term, but they might be optimized away by intro or another tactic

#### Mario Carneiro (Jan 25 2020 at 10:40):

Looks like intro_core will check to see if the current result has the form ?m a and replaces it with ?m'[a], causing the substitution to occur when the metavariable is filled

#### Mario Carneiro (Jan 25 2020 at 10:46):

example : true :=
begin
tactic.assertv_core H (true) (trivial : true),
tactic.result >>= tactic.trace, -- ?m_1 trivial
intro z,
tactic.result >>= tactic.trace, -- ?m_1[trivial]
exact trivial,
tactic.result >>= tactic.trace, -- trivial
end


#### Mario Carneiro (Jan 25 2020 at 10:49):

It's a nice optimization that probably avoids a lot of superfluous redexes, but it's not obvious how to work around it from the lean side. I think you just have to detect these on the C++ side and introduce have`s then. (Lean can't see subterm sharing, which limits the ability to do anything about this. It also means that you can't write any efficient dag traversal tactics in lean. I guess lean 4 must have fixed this, if you wrote the elaborator in lean.)

Last updated: May 10 2021 at 19:16 UTC