Zulip Chat Archive
Stream: general
Topic: Visualizing/quantifying proof complexity
Arndt Schnabel (Mar 06 2025 at 20:03):
I wrote a script to go through all .lean files of Mathlib4 and generate the import dependency graph, then applied a topological sort to that graph so an ordering by dependencies arises (there where a handful of cyclical imports which i arbitrarily broke manually):
lean_toposort.png
Not sure if there is a tool that instead generates a list of theorems in a file (and a list of the specific theorems they depend on for each), that may be interesting to apply such a topological sort too, since different theorems in one file may be of varying complexity.
Arndt Schnabel (Mar 06 2025 at 20:06):
Here is the same idea but with theorems of Metamath's set.mm
proof library
toposort_metamath.png
Arndt Schnabel (Mar 06 2025 at 20:07):
These make me wonder at which depth cutting edge math proofs would be found
Oliver Dressler (Mar 06 2025 at 21:43):
I made a similar visualization of theorem dependencies in Mathlib (v4.15-rc1).
The underlying data is extracted using LeanDojo and processed into a 3MB zip.
Topological sort is a cool idea, I might give that a try and include the resulting complexity in the visualization!
suhr (Mar 06 2025 at 22:03):
Another good thing to compute is the dominator tree of the graph. It is basically "Y is needed only because it's used by X" relation.
suhr (Mar 06 2025 at 22:04):
There's a rather simple algorithm for that: https://www.cs.tufts.edu/~nr/cs257/archive/keith-cooper/dom14.pdf
Michael Rothgang (Mar 07 2025 at 13:04):
Arndt Schnabel said:
there where a handful of cyclical imports which i arbitrarily broke manually
I'm surprised by that statement: I thought/think that import cycles are disallowed. Do you remember one example of such a cycle?
Arndt Schnabel (Mar 07 2025 at 19:04):
Me too, but since i don't understand Lean yet or my script is too simple it may not be correct yet. But it found these:
Mathlib.Data.Complex.Exponential <- Mathlib.Data.Complex.Trigonometric <- Mathlib.Tactic.FunProp <- Mathlib.Tactic.ContinuousFunctionalCalculus <- Mathlib.Algebra.Order.Star.Basic <- Mathlib.Data.Complex.Exponential
Mathlib.Order.Interval.Set.UnorderedInterval <- Mathlib.Order.Interval.Set.OrderEmbedding <- Mathlib.Order.Interval.Set.OrdConnected <- Mathlib.Order.Cover <- Mathlib.Order.Interval.Finset.Basic <- Mathlib.Order.Interval.Multiset <- Mathlib.Order.Interval.Finset.Nat <- Mathlib.Order.Interval.Finset.Fin <- Mathlib.Data.Fintype.Fin <- Mathlib.Algebra.BigOperators.Fin <- Mathlib.Algebra.BigOperators.Finsupp <- Mathlib.Data.Finsupp.Basic <- Mathlib.Data.Finsupp.SMul <- Mathlib.LinearAlgebra.Finsupp.Defs <- Mathlib.LinearAlgebra.Finsupp.LSum <- Mathlib.Algebra.MonoidAlgebra.Defs <- Mathlib.Algebra.Polynomial.Basic <- Mathlib.Tactic.ExtractGoal <- Mathlib.Tactic.Common <- Mathlib.Order.Interval.Set.UnorderedInterval
Arndt Schnabel (Mar 07 2025 at 19:05):
here is the script:
topo.py
Arndt Schnabel (Mar 07 2025 at 19:06):
Oliver Dressler said:
I made a similar visualization of theorem
ooh, LeanDojo looks very useful for this purpose :)
Jireh Loreaux (Mar 07 2025 at 19:06):
Data.Rat.Defs
doesn't import Algebra.Ring.Rat
Arndt Schnabel (Mar 07 2025 at 19:07):
right, strange
Arndt Schnabel (Mar 07 2025 at 19:10):
ah, the order is the other way around
Arndt Schnabel (Mar 07 2025 at 19:17):
okay, i shouldn't have stripped the first component of the path, since it could be Mathlib or Batteries, also, i didn't consider multiline comments - the two remaining "cycles" are probably also only artifacts due to an import example in a comment - which my very basic script doesn't recognize as such
Arndt Schnabel (Mar 07 2025 at 19:25):
updated the lean_toposort.png image above
Arndt Schnabel (Mar 07 2025 at 20:25):
Oliver Dressler said:
The underlying data is extracted using LeanDojo and processed into a 3MB zip.
I hope I understood the data.json structure correctly, but here is a list of the top 3 used theorems of each level of the topological sort on its graph :)
topolevelstop3.txt
main.py
Last updated: May 02 2025 at 03:31 UTC