Zulip Chat Archive
Stream: general
Topic: import graph
Kenny Lau (Dec 19 2018 at 08:46):
could someone kindly create a graph of imports?
Kenny Lau (May 28 2020 at 07:55):
do we have an import graph online?
Johan Commelin (May 28 2020 at 07:59):
Not yet, I think. It's WIP
Johan Commelin (May 28 2020 at 07:59):
Also... now you have to update the title and first post of your PR :rofl:
Kenny Lau (May 28 2020 at 08:03):
what do you mean?
Johan Commelin (May 28 2020 at 08:04):
Well... you're now attacking nonzero_divisors
as well, right?
Johan Commelin (May 28 2020 at 08:04):
It's less drastic, but it would be nice to mention it.
Johan Commelin (Nov 23 2020 at 10:08):
@Eric Wieser Here are some bash functions that will build the import graph for you within seconds:
function file_to_import {
sed -e 's/src\///;s/.lean//;s/\//./g'
}
function import_to_file {
sed -e 's/[.]/\//g;s/.*/src\/&.lean/'
}
function list_deps {
find src/ -name '*.lean' \
| xargs grep "^import " \
| sed -e 's/src\///;s/.lean:import//;s/\//./g;s/ --.*//' \
| grep -v "tactic.norm_num data.real.basic" \
| grep -v "tactic.find tactic.find"
}
Eric Wieser (Nov 23 2020 at 10:08):
Where does the code live for leanproject import-graph
?
Johan Commelin (Nov 23 2020 at 10:09):
Output is in tsort
format, in other words: one edge per line
A B
C D
E F
A D
A F
etc...
Eric Wieser (Nov 23 2020 at 10:09):
Searching mathlib-tools
for import-graph
finds only one match in a changelog
Eric Wieser (Nov 23 2020 at 10:10):
ImportGraph
finds a slightly better match, but only an unused definition.
Johan Commelin (Nov 23 2020 at 10:11):
@Eric Wieser it really depends on what you want to do. I guess the rendering is what is taking a long time.
Johan Commelin (Nov 23 2020 at 10:11):
But if you just want the data, I guess the functions I gave above are a good first start.
Eric Wieser (Nov 23 2020 at 10:11):
The default is .dot
though right, which should be trivial to render
Eric Wieser (Nov 23 2020 at 10:11):
Yeah, thanks for the above
Johan Commelin (Nov 23 2020 at 10:12):
Not sure if rendering dot files the size of mathlibs import graph is trivial.
Johan Commelin (Nov 23 2020 at 10:12):
I've never studied graph rendering algorithms
Eric Wieser (Nov 23 2020 at 10:12):
.dot is a text format
Eric Wieser (Nov 23 2020 at 10:12):
I'm not asking it to render the dot
Eric Wieser (Nov 23 2020 at 10:12):
Just write it
Johan Commelin (Nov 23 2020 at 10:12):
list_deps | wc -l
2779
Eric Wieser (Nov 23 2020 at 10:13):
Although I suppose it might be running the layout engine in advance.
Johan Commelin (Nov 23 2020 at 10:13):
Ooh, so you just want a ->
between the source and target, basically. And then a digraph FOOBAR {
at the start of the file, and a }
at the end.
Johan Commelin (Nov 23 2020 at 10:13):
Anyway, feel free to improve those bash functions
Eric Wieser (Nov 23 2020 at 10:13):
Not sure I'd go as far as saying that's what I want, but that's what I imagine import-graph
already does
Eric Wieser (Nov 23 2020 at 10:14):
I just don't know what it actually does because I've never managed to invoke it successfully
Eric Wieser (Nov 23 2020 at 10:14):
Hence the request for the source
Johan Commelin (Nov 23 2020 at 10:14):
leanproject
is producing a dot file with layout instructions, not just the raw graph data.
Eric Wieser (Nov 23 2020 at 10:14):
Patrick Massot (Nov 23 2020 at 12:23):
Eric Wieser said:
Searching
mathlib-tools
forimport-graph
finds only one match in a changelog
What about actually reading that page instead of complaining about GitHub? It says it found two relevant commits.
Patrick Massot (Nov 23 2020 at 12:24):
You'll then see that this command runs lean --deps
to get reliable import information that regex will never give you. However we are pretty strict in mathlib about import lines formatting so ideally the regex based version should almost give the same information (much much faster).
Patrick Massot (Nov 23 2020 at 12:25):
Of course running the full thing on uncompiled mathlib is a very bad idea. But on a compiled mathlib it should take less than 30 seconds.
Eric Wieser (Nov 23 2020 at 12:28):
Yeah, the key takeaway here was that .dot
output is also doing graph layout, which I wasn't expecting. Using graphml
output was much faster
Eric Wieser (Nov 23 2020 at 12:28):
Perhaps a compromise regarding docs would be to add an "imports" and "imported by" section to each page?
Eric Wieser (Nov 23 2020 at 12:28):
That might help with discovering related lemmas
Eric Wieser (Nov 23 2020 at 12:29):
I'll have a go at that once a decision is made about leanprover-community/doc-gen#102, since they'll interact quite a bit and the ImportName
object there will help resolve the paths.
Eric Wieser (Nov 30 2020 at 12:46):
Eric Wieser said:
Perhaps a compromise regarding docs would be to add an "imports" and "imported by" section to each page?
Done in leanprover-community/doc-gen#103
Eric Wieser (Nov 30 2020 at 16:23):
With thanks to @Gabriel Ebner for putting it somewhere sensible on the page
Mario Carneiro (Nov 30 2020 at 16:24):
bit of a funny name
Kevin Buzzard (Nov 30 2020 at 16:24):
good name for a band.
Eric Wieser (Dec 01 2020 at 11:32):
I had a go at building an interactive import graph for my lean project:
https://observablehq.com/@eric-wieser/lean-ga-imports
I don't know how well this would perform with a full mathlib graph...
Yaël Dillies (Jun 17 2021 at 07:41):
This visualisation is great! I want to see the full mathlib.
Yaël Dillies (Jun 17 2021 at 07:45):
But I was having a more pragmatic question: Could we have somewhere in the docs a list of import leaves? I keep encountering files that are never imported and sometimes there's some code that's duplication of a more popular file. So I think that having such a list of orphan files will reduce this accidental duplication, and also provide instantaneous insight so as to where the mathlib frontier is.
Yaël Dillies (Jun 17 2021 at 07:46):
I think this could fit into a bigger idea of making a list of all src
files with a few stats about them (in the same style as the contributors page):
- number of imports
- max depth of import
- number of files importing it
- max depth of subsequent import
- number of lines
- number of declarations
...
Mario Carneiro (Jun 17 2021 at 07:47):
Accidental duplication is best addressed by putting content where it is supposed to go, instead of at the top of the file you are working on like some kind of mini for_mathlib
Yaël Dillies (Jun 17 2021 at 07:49):
Yeah of course, but I'm speaking about a retroactive change. Some duplication has already been done and I want to find it out without roaming through all files.
Mario Carneiro (Jun 17 2021 at 07:50):
if there is any content that is out of place, we should fix that first. Code duplication will become obvious at that point
Mario Carneiro (Jun 17 2021 at 07:51):
If content is out of place there is no guarantee that it will be in a leaf file, unless you subscribe to the thesis that orphan files are new files and review quality has been decreasing over time
Yaël Dillies (Jun 17 2021 at 07:53):
No, I think orphan files are either new and simply yet to be used anywhere (which is fine) or somewhat old with a drift of interest that means nobody is aware of that file anymore, to the risk of redoing the work.
Yaël Dillies (Jun 17 2021 at 07:53):
I'm just trying to have an awareness-raising page.
Mario Carneiro (Jun 17 2021 at 07:54):
I'm still confused as to how this can happen. If you have the theory of foobars it should be in the algebra.foobar
file, and then the next person who wants to do the theory of foobars will write their code, put it in the algebra.foobar
file and then notice that it already exists
Mario Carneiro (Jun 17 2021 at 07:55):
I have on several occasions proved a theorem only to discover that it already exists because lean tells me there is a name clash
Yaël Dillies (Jun 17 2021 at 07:56):
Yeah, but mathematics is the art of giving different names to the same thing :wink:
Mario Carneiro (Jun 17 2021 at 07:56):
I thought it was the other way around?
Yaël Dillies (Jun 17 2021 at 07:57):
Typically, nat.xgcd
and pnat.xgcd
are basically the same thing. But they still both exist. nat.xgcd
is well incorporated in the import graph. pnat.xgcd
is long forgotten.
Mario Carneiro (Jun 17 2021 at 07:58):
In those cases, there is a reason for both to exist
Mario Carneiro (Jun 17 2021 at 07:59):
and in any case, when you want nat.xgcd
you will look for nat.xgcd
and find it, similarly for pnat
Mario Carneiro (Jun 17 2021 at 07:59):
whether or not you knew it existed before you went looking
Yaël Dillies (Jun 17 2021 at 07:59):
Well maybe but I have a hard believing that given that in two years nobody has ever used pnat.xgcd
Mario Carneiro (Jun 17 2021 at 07:59):
I can believe it... both pnat
and xgcd
are individually rare, so the combination is even more so
Mario Carneiro (Jun 17 2021 at 08:01):
but it's not like constants are competing to be the most used thing. You use what you need for the task in front of you. If that is pnat.xgcd
then nothing else will substitute
Yaël Dillies (Jun 17 2021 at 08:01):
This for me underlines what the problem with this file is: It's not really about pnat
. The use of pnat
here is artificial.
Yaël Dillies (Jun 17 2021 at 08:02):
And I still don't see why that's an incentive against having a list of orphan files.
Mario Carneiro (Jun 17 2021 at 08:03):
what is the putative duplication in this example? Do you think pnat.xgcd
does too much work itself instead of delegating to nat.xgcd
?
Yaël Dillies (Jun 17 2021 at 08:03):
Yes. Or the other way around.
Mario Carneiro (Jun 17 2021 at 08:03):
I don't mean to suggest that a list of orphan files isn't useful. I was just responding to the claim about how this relates to code duplication
Yaël Dillies (Jun 17 2021 at 08:03):
Okay :smile:
Mario Carneiro (Jun 17 2021 at 08:04):
graph depictions of mathlib always end up kind of useless for actual consultation
Mario Carneiro (Jun 17 2021 at 08:05):
but a simple database would be easier to work with
Yaël Dillies (Jun 17 2021 at 08:05):
To be fair, I think there's not much code duplication. Mayb some definitions and a few individual lemmas.
Yaël Dillies (Jun 17 2021 at 08:05):
Could one incorporate such a database into a webpage?
Mario Carneiro (Jun 17 2021 at 08:06):
The contributor table looks pretty nice and databasy
Yaël Dillies (Jun 17 2021 at 08:06):
Yeah okay. That's exactly what I want!
Yaël Dillies (Jun 17 2021 at 08:07):
If anybody were to direct me to the relevant resources, I could do my import statistics table myself.
Mario Carneiro (Jun 17 2021 at 08:08):
I guess you just need the same data set that leancrawler uses (cc: @Patrick Massot )
Yaël Dillies (Jun 17 2021 at 08:09):
Yeah most probably. But that means we can't update it regularly? Unless we let leanprover_bot
do it?
Mario Carneiro (Jun 17 2021 at 08:10):
it could be incorporated into the community web page and/or docgen
Eric Wieser (Jun 17 2021 at 08:30):
Perhaps doc-gen should publish its intermediate export json (which I used in the visualization up thread) to make it easier to perform an analysis of import graphs
Eric Wieser (Jun 17 2021 at 08:30):
It's probably as easy as adding a cp
to the CI script somewhere
Eric Wieser (Oct 30 2021 at 21:54):
I had a go at making a full mathlib import graph visualization here:
https://eric-wieser.github.io/doc-gen/import-graph.html
Should this be part of the mathlib docs? Or should doc-gen just host the data file?
Yaël Dillies (Oct 30 2021 at 22:00):
Okay this is scary. Not sure I want to work on a monster...
Yaël Dillies (Oct 30 2021 at 22:01):
Did you remove the transitive imports? It might clutter quite a lot.
Eric Wieser (Oct 30 2021 at 22:01):
All I removed was core:init.default
because that node is stupid
Eric Wieser (Oct 30 2021 at 22:04):
Fun fact; complete_lattice
imports field
: image.png
Mario Carneiro (Oct 30 2021 at 23:11):
This is a really nice visualization. It's really hard to make any kind of dependency graph visualization that doesn't look like a ball of yarn, but the interactivity helps a lot here
Eric Wieser (Oct 30 2021 at 23:14):
There's a reasonable amount of power to tweak the cost function along edges with the library I'm using, which helps a little with untangling the ball of yarn. The heuristic is mainly "edges strengths between files are exponential in the number of prefix components they share", which keeps the colors together (but also induces nasty oscillations in the really deep parts of mathlib!)
Bryan Gin-ge Chen (Oct 30 2021 at 23:21):
This is very well done, thanks!
I don't have a strong preference regarding where the graph itself should be hosted, but the mathlib stats page should definitely link to the graph.
Should this be part of the mathlib docs? Or should doc-gen just host the data file?
I'd just go with whatever is more convenient.
Yaël Dillies (Oct 30 2021 at 23:21):
I am once again asking you to get bored more often.
Eric Wieser (Oct 30 2021 at 23:33):
Let's just do the latter for now: leanprover-community/doc-gen#142
Johan Commelin (Nov 01 2021 at 06:09):
One idea might be to have a fish-eye lens (en/dis-abled via some checkbox) that can follow the mouse. Does sigma.js
support that out of the box?
Eric Wieser (Nov 01 2021 at 09:31):
That would be cool, but I highly doubt it is built in, and writing custom GLSL shaders to achieve it is outside the scope of what seems worthwhile
Eric Wieser (Nov 01 2021 at 10:48):
I've moved the graph to a standalone repo at https://github.com/eric-wieser/mathlib-import-graph (now live at https://eric-wieser.github.io/mathlib-import-graph/).
This version now shows some information about how many files are selected, and node sizes reflect file sizes.
Eric Wieser (Nov 01 2021 at 11:30):
Bryan Gin-ge Chen said:
but the mathlib stats page should definitely link to the graph.
I made a PR at leanprover-community/leanprover-community.github.io#218. Feel free to adjust.
Jireh Loreaux (Aug 23 2022 at 20:10):
So, the import graph is awesome, but there are a few features I think it would be really nice to have. I find that as I have become more accustomed to the library, I have a vague sense of (parts of) the import graph in my head, but sometimes it would be nice to have more specific information. For example, it would be nice to have easy access to each of the following bits of information:
- which files are direct children or parents of a given file, what about grandchildren and grandparents (i.e., transitive import distance = 2), etc.? I know that the first question can be answered by the docs, and therefore the second also, but the clicking and page loads can be tedious sometimes.
- the (transitive) import subgraph of all ancestors, what about all descendents? I know you can see the former in the current import graph, but it would be cool if it could be re-displayed as its own subgraph, perhaps as a DAG.
- The subgraph between a node and one of its descendents (again represented as a DAG). This would be especially handy for knowing whether I should try to compile locally, or just push it to the CI runners and wait for the oleans.
I have no idea how much work it would be to implement any of these things.
Patrick Massot (Aug 23 2022 at 20:13):
Are you familiar with the output of leanproject import-graph --help
?
Yaël Dillies (Aug 23 2022 at 20:13):
For 1., you already have it in the mathlib docs.
Patrick Massot (Aug 23 2022 at 20:14):
Adding more functionalities to leanproject import-graph
is very easy, you can use anything you fancy from https://networkx.org/
Jireh Loreaux (Aug 23 2022 at 20:21):
I had no idea leanproject import-graph
existed, thanks!
Eric Wieser (Aug 23 2022 at 21:29):
Note that there is a gexf file hosted alongside the mathlib docs that contains the graph in a form importable by networkx; which is much faster than building it yourself if you just care about master
Last updated: Dec 20 2023 at 11:08 UTC