Zulip Chat Archive

Stream: general

Topic: processing the import graph


Eric Wieser (Dec 09 2021 at 07:14):

I think the graphviz output from leanproject produces a graph laid out in that way, but I think the result is far too cluttered to be interpretable.

Chris Lovett (Dec 09 2021 at 07:27):

Eric Wieser said:

I think the graphviz output from leanproject produces a graph laid out in that way, but I think the result is far too cluttered to be interpretable.

I might know some tricks. Can someone point me to the .dot file behind that graph? Even better the tool that generates the dot graph so I can also run it on the Lean4 codebase?

Eric Wieser (Dec 09 2021 at 07:31):

A raw and auto-updated gexf file is available at https://leanprover-community.github.io/mathlib_docs/import.gexf

Eric Wieser (Dec 09 2021 at 07:32):

I think leanproject import-graph can generate a .dot file, but it's very slow and it's easy and faster to generate one from the gexf

Johan Commelin (Dec 09 2021 at 07:34):

@Chris Lovett @Eric Wieser I changed this to a new topic, because I think it's interesting enough to deserve further discussion.

Johan Commelin (Dec 09 2021 at 07:35):

@Chris Lovett I don't think that the existing tools will work directly on Lean 4 code.

Eric Wieser (Dec 09 2021 at 07:38):

If you can generate a similar gexf file you can point the online import viewer at it and it will mostly work

Junyan Xu (Dec 09 2021 at 08:09):

It seems what's asked by @Chris Lovett is along the lines of this:

The problem of order preserving hierarchical agglomerative clustering can be said to belong to the family of acyclic graph partitioning problems (Herrmann et al., 2017). If we consider the
strict partial order to be a directed acyclic graph (DAG), the task is to partition the vertices into groups (levels?) so that the groups together with the arrows still makes up a DAG.

From the paper https://arxiv.org/abs/2004.12488 that I just found

Chris Lovett (Dec 09 2021 at 09:39):

Thanks for the link. I did grouping by simply splitting the names on "." and ":" and made a hierarchical graph that also automatically creates links between parent groups if any child node in group A has a link to a child node in group B. Then I run tarjans algorithm on the resulting top level graph for mathlib, and it is essentially one big strongly coupled component (very circular, everything is reachable from everything pretty much), see all the red in the diagram.

So when this happens another trick is to "slice and dice" by throwing away detail - for example, I can take one node and focus down to it's immediate neighborhood ( only links 1 step away from that node), and break all circularity turning it into a tree, and I saved each of those with the name of that node, for example data.svg or group_theory, from this we find there are some top level things like representation_theory and there are some bottom level things like order, data, tactic and logic but number_theory kind of turns that upside down). You can also drill down and expand each one of these areas and see what is inside.svg and you find that set_theory has a pretty loverly internal structure... I also see some diamonds :-)

Johan Commelin (Dec 09 2021 at 10:01):

That first graph looks like some Chinese dragon (-;

Chris Lovett (Dec 10 2021 at 05:22):

Here's another view inside "tactic" namespace, while it is complex, unlike most of the other namespaces it does not have any cycles which means it has very nice top to bottom layering.
image.png

Chris Lovett (Dec 10 2021 at 05:26):

Group theory on the other hand has a couple cycles in it:
image.png

Chris Lovett (Dec 10 2021 at 05:27):

What I don't know is whether the "import" graph is the right one to be looking at... I mean I guess it should be interesting because it implies some sort of type dependencies across those imports otherwise they would not be there, assuming we've done proper garbage collection on unnecessary imports.

Mario Carneiro (Dec 10 2021 at 06:00):

You should be looking at files, not folders. All of the cycles in your graph are coming from folders

Mario Carneiro (Dec 10 2021 at 06:02):

In particular, folders represent "topical" hierarchy, while files respect dependency hierarchy. A given topic might need to be revisited several times with more and more dependencies, and any given mathematical topic has no upper bound on the dependencies it might need, so you are sure to get cycles if you merge all nodes within a folder


Last updated: Dec 20 2023 at 11:08 UTC