Zulip Chat Archive
Stream: PhysLean
Topic: Visual representation of the content in PhysLean?
Joseph Tooby-Smith (Aug 21 2025 at 09:00):
Does anyone know of a nice way which we could represent visually the content of PhysLean, something that could be used in talks etc. so people can see roughly what it contains without having to go into the repo itself.
Kevin Buzzard (Aug 21 2025 at 10:20):
I don't, but I am wondering the same thing about FLT, where I have a blueprint but the graph is chaotic and doesn't really represent what is happening.
Taewoo Lee (Aug 21 2025 at 10:42):
I only recently start learning Lean and I ponder the visualisation would be easier using tree diagram, like family tree. goal + tactic lays another goal and continues... I am not sure I understand your concern, but this is my take.
Joseph Tooby-Smith (Aug 21 2025 at 10:54):
Hi @Taewoo Lee nice to see you here! I think you are thinking about this in terms of individual theorems and proof steps? Here I'm more interested in the global structure of the project e.g. the different topics being worked on, how much covarage each area of physics has etc. Saying that, let me add, that I think (if I've understood correctly) what you may be thinking about already exists in the form of: paperproof .
Joseph Tooby-Smith (Aug 21 2025 at 11:04):
Mathlib has this import graph, which is of a similar flavor to what I am after, but I struggle to see anything in this graph other then: Mathlib is big and lots of files are imported by lots of other files.
Taewoo Lee (Aug 21 2025 at 13:50):
Thanks, @Joseph Tooby-Smith . haha, sorry I have a approximate to 0 knowledge of Lean, so I did suggest completely wrong idea. "paperproof" is identical to the idea I suggested above, and it is no surprise such proof-visulisation programme is already materialised.
Joseph Tooby-Smith (Aug 21 2025 at 14:47):
Ha, no worries @Taewoo Lee ! I think my original message probably lacked key words, which made what I was asking for confusing :).
Joseph Tooby-Smith (Aug 22 2025 at 07:32):
Ok, so this github workflow in a branch called Diagram:
workflow
Generates the following diagram: https://github.com/HEPLean/PhysLean/blob/Diagram/diagram.svg. It's not ideal, but it is along the lines I was thinking.
Shlok Vaibhav (Aug 22 2025 at 17:30):
I have an alternate scheme in mind, using python generate a hierarchical tree data structure from file and folder hierarchies in the repo. Then all hierarchical data visualizations can be easily applied as shown below
image.png
One added benefit is that customizability to any level, all data in files can be read and just like Taewoo suggested, even statistics about theorems can be visualized or even how the size of repo or a folder is growing with time/users. May help with internal planning and analysis as well
Shlok Vaibhav (Aug 22 2025 at 17:31):
If this seems relevant, i can share and implement a quick basic scheme, maybe a treemap and everyone can give feedback on this approach?
Joseph Tooby-Smith (Aug 22 2025 at 18:08):
Yeah something like this would be great, and seems very relevant! I would especially like to be able to visualize how things change with time!
Daniel Morrison (Aug 23 2025 at 00:31):
I would be interested to see the code for this as well, I was wondering if there was a way to use the hierarchical structure to algorithmically determine the best places to start learning a repo. Not just for PhysLean but also Mathlib. But I'm not sure how to start pulling that kind of data.
Shlok Vaibhav (Aug 23 2025 at 02:45):
So two separate aims, i found lots of tools online to visualize the meta-data of a repo, this seems reasonable given ubiquity of need of such tools. The easiest one as mentioned in this blog: https://cupprum.github.io/git-render/ (Took 5 mins to setup, two py packages - gitpython to generate the data structure and plotly to generate treemap) generates an interactive tree map and a sunburst where you can keep descending. (The commits are shown on log10 scale). I will share list of some more tools that i feel relevant, like found a tool that shows growth of commits/files over time (https://github.com/wdm0006/git-pandas, though very few stars so needs scrutiny). As to the original problem of generating good visualizations to convey the scope and activity of PhysLean, will need to explore more data visualization tools like plotly as text-visibility is also crucial thereimage.png
image.png
Here's a short video of using the treemap and sunburst:
PhysLean_Plotly.mp4
Shlok Vaibhav (Aug 23 2025 at 02:51):
This is the code from that blog:
import os
import plotly.express as px
from git import Repo
import numpy as np
repo = Repo("./Github/PhysLean")
blobs = [{
'parent': os.path.dirname(blob.path) or "/",
'path': blob.path,
'commits': np.log10(len(commits := list(repo.iter_commits(paths=blob.path)))),
} for blob in repo.tree().traverse()]
px.treemap(
blobs,
names='path',
parents='parent',
color='commits',
color_continuous_scale='amp',
).show()
Joseph Tooby-Smith (Aug 23 2025 at 09:29):
Nice! This looks awesome! Do you know if this something which can be hosted with a simple python server?
Daniel Morrison (Aug 23 2025 at 23:16):
I realized what I wanted was actually the import structure and not file structure, and found import-graph which you get with Mathlib and can generate graph structures of the imports.
Joscha Mennicken (Aug 23 2025 at 23:23):
If you're looking for cool visualizations of git repo contents over time (including who is changing which files), there's also gource. It's fun to look at, and you can spot things like large rewrites, or which contributors usually touch which areas. It can be a bit difficult to spot individual files though.
Daniel Morrison (Aug 24 2025 at 06:27):
I was able to get some code working for determining the transitive imports and transitive dependents based on the import graph I mentioned above. Here's a test plot of the number of transitive dependents versus the number of transitive imports, and a zoom in excluding the outliers. Does this mean anything? Idk lol
Joseph Tooby-Smith (Aug 24 2025 at 06:51):
Nice @Joscha Mennicken thanks for sharing! Will have to explore how to get this working for PhysLean!
Joseph Tooby-Smith (Aug 24 2025 at 06:54):
@Daniel Morrison ooh thanks! I think the outlier at 350+ (to the right) is the main file PhysLean.lean, right? This does actually have a lot of benefit I think. In particular for speed of building, we want the number of transitive imports to be as small as possible, so focusing on reducing it for those files with high transitive imports would be a good task.
Joseph Tooby-Smith (Aug 24 2025 at 06:55):
Would be interested in the outlier to the top left is?
Daniel Morrison (Aug 24 2025 at 07:02):
Maybe a more interesting example, I made a histogram of the product of the fraction of files imported times the fraction of files that depend on it as a simple model of measuring the importance of a file. Basically it's useful in that many files use that information, but not so low level that you import everything.
As expected, most files are basically zero, but there is a separation of some outliers above .015. These are those outliers:
- PhysLean.SpaceAndTime.SpaceTime.Basic
- PhysLean.Relativity.Tensors.RealTensor.Vector.MinkowskiProduct
- PhysLean.Relativity.Tensors.RealTensor.Vector.Basic
- PhysLean.Relativity.Tensors.RealTensor.Metrics.Basic
- PhysLean.Relativity.Tensors.RealTensor.Basic
- PhysLean.QFT.PerturbationTheory.FieldSpecification.Basic
Interestingly 5 of the 6 are the Basic files in different folders, ones that I would usually go to as a starting point when understanding the project. So maybe there is something to this measure of "importance".
Daniel Morrison (Aug 24 2025 at 07:05):
You are right that the outlier to the right is PhysLean.lean, because by construction it imports everything. I'll check what the other outliers are, but I would guess that maybe they are the meta files.
Shlok Vaibhav (Aug 24 2025 at 07:07):
Joseph Tooby-Smith said:
Nice! This looks awesome! Do you know if this something which can be hosted with a simple python server?
I assume here we want to consider the ways to put something like this interactive on physlean homepage?
This implementation works on a local clone of the repo, so I think this is not something that can be put on physlean. So need to look for other ways
Daniel Morrison (Aug 24 2025 at 07:09):
I was correct, the outliers are
- PhysLean.Meta.TODO.Basic: 270 files
- PhysLean.Meta.Informal.Basic: 168 files
While I'm doing this anyway, the files with 80+ imports are all in QFT.PerturbationTheory.WickAlgebra and QFT.PerturbationTheory.WickContraction.
Joseph Tooby-Smith (Aug 24 2025 at 07:15):
Nice @Daniel Morrison ! This all makes sense, and this seems very useful information!
Related to @Shlok Vaibhav I would be interested in putting this sort of thing on the website, or at least having an automated workflow which could allow us to see this sort of information updated on new commits.
Joseph Tooby-Smith (Aug 24 2025 at 07:18):
I assume here we want to consider the ways to put something like this interactive on physlean homepage?
This implementation works on a local clone of the repo, so I think this is not something that can be put on physlean. So need to look for other ways
Ok, it may still be possible with a local clone. There is a server for the PhysLean website, basically to run:
- https://live.physlean.com
- https://loogle.physlean.com (which is currently down and needs fixing)
But 'static' ways of displaying it are naturally better and easier to maintain.
Joseph Tooby-Smith (Aug 24 2025 at 07:20):
Daniel Morrison said:
While I'm doing this anyway, the files with 80+ imports are all in QFT.PerturbationTheory.WickAlgebra and QFT.PerturbationTheory.WickContraction.
Ok, I think this somewhat makes sense. Those files are part of the proof of Wick's theorem in QFT and setting up perturbation theory, as such they depend on a lot of the relativity stuff etc.
Shlok Vaibhav (Aug 24 2025 at 12:32):
image.png
Cleaned up it little just for presentation ( but no distinguishing information amongst code files of same folder)
Code:
import os
import plotly.express as px
import itertools
from git import Repo
import numpy as np
import pandas as pd
repo = Repo("./Github/PhysLean")
physlean_path = os.path.join(repo.working_tree_dir, "PhysLean")
folders = [name for name in os.listdir(physlean_path) if os.path.isdir(os.path.join(physlean_path, name))]
folder_map = {folder: idx + 1 for idx, folder in enumerate(folders)}
print(folder_map)
blobs = [{
'parent': os.path.dirname(blob.path) or "/",
'idee': blob.path,
'labeling': blob.path.rsplit("/", 1)[-1].removesuffix("lean"),
#'commits': len(commits := list(repo.iter_commits(paths=blob.path))),
'value': (
folder_map.get(blob.path.split("/")[1], 0)
if len(blob.path.split("/")) > 1 and blob.path.startswith("PhysLean/")
else 0
)} for blob in repo.tree().traverse() if "PhysLean" in blob.path.rsplit("/", 1)[0]]
print(blobs[-1])
fig = px.treemap(
blobs,
names='labeling',
ids = 'idee',
parents='parent',
labels = 'labeling',
color='value',
#hover_data=['commits'],
#values='commits',
color_continuous_scale='plasma',
).show()
Shlok Vaibhav (Aug 24 2025 at 12:33):
Tried to ensure that every file name is visible above, colormap choice needs to be better. Tried spectral color scale but the yellow hue needs to be avoided, so need to change the value assignment to folders:
image.png
Joseph Tooby-Smith (Aug 24 2025 at 14:49):
This looks great @Shlok Vaibhav , do the size of the boxes represent the sizes of the files?
A more general comment: the PhysLean repo has a ./scripts/... directory which may be a good place for this kind of program, if you and @Daniel Morrison would be interested in contributing them there? We could then think about integrating them into the website, but it would be nice to make sure they nevertheless don't get lost.
Daniel Morrison (Aug 24 2025 at 19:40):
Yeah that's a good idea. I have some ideas for stuff to add and I want to clean up some of the code first but I'll eventually share it
Shlok Vaibhav (Aug 25 2025 at 01:35):
@Joseph Tooby-Smith ,have added the file size part now.
Yes, it would be good to have such a treemap on website because i found it informative to look at, didn't know before there was thermodynamics as well.
image.png
Joseph Tooby-Smith (Aug 25 2025 at 06:09):
@Shlok Vaibhav What is the outputted format of this, e.g. is it svg, or html or just png?
Shlok Vaibhav (Aug 25 2025 at 06:52):
Default is an interactive widget which is what I generate, it allows you to zoom into a portion as well as hover over to view some data like commits, file size etc. the output can also be saved as png or html. Probably just having png is good enough as presently all files can be comfortably seen in the image, later on interactive may become important as files grow numerous or some metric is to be seen on hovering
Shlok Vaibhav (Aug 25 2025 at 06:55):
What do you feel needs to be emphasized in such plots to be usable in 1) talks 2) website . For now, i emphasized visibility of all files and file size
Joseph Tooby-Smith (Aug 25 2025 at 06:57):
Shlok Vaibhav said:
Default is an interactive widget which is what I generate, it allows you to zoom into a portion as well as hover over to view some data like commits, file size etc. the output can also be saved as png or html. Probably just having png is good enough as presently all files can be comfortably seen in the image, later on interactive may become important as files grow numerous or some metric is to be seen on hovering
Nice, ok, thanks this makes sense.
Joseph Tooby-Smith (Aug 25 2025 at 07:02):
Shlok Vaibhav said:
What do you feel needs to be emphasized in such plots to be usable in 1) talks 2) website . For now, i emphasized visibility of all files and file size
I actually think this is already emphasizing the right things - it shows what areas have more development then others, and I think in talks it would be pretty easy to point to different areas of the project to demonstrate specific things which has been done.
For the website, I think the ideal situation would be to have a github workflow which automatically generates this diagram either as html or png. Do you think this would be possible?
Shlok Vaibhav (Aug 25 2025 at 07:12):
I too feel it will be necessary to find such automation, I will try to find how to incorporate into GitHub workflow, never dealt with such automations before :sweat_smile:
Joseph Tooby-Smith (Aug 25 2025 at 08:46):
I think for such automation the starting point is to write down the command line prompts one would need to e.g. generate the image, since this makes up the bulk of the workflow. There are specific commands to e.g. clone a repo, download python etc. As an example, this workflow on the PhysLean website repo clones the main PhysLean repo and builds it and then performs some basic commands.
Joscha Mennicken (Aug 25 2025 at 11:19):
I've opened a PR with a python script that generates an mp4 video using gource at PhysLean#711. You may want to add more authors and/or tweak some gource and ffmpeg options before merging.
Joscha Mennicken (Aug 25 2025 at 11:20):
If you add authors to the script, their GitHub profile pictures will be used instead of a generic placeholder in the final video.
Joseph Tooby-Smith (Aug 25 2025 at 13:13):
Nice! Thanks @Joscha Mennicken , I think with the author list it is probably easier to either scrape it from GitHub or leave it empty. Maybe the latter is the simpler?
Joscha Mennicken (Aug 25 2025 at 13:19):
I think scraping GitHub would be a bit more complicated. The authors list exists not only for the fancy profile pictures, but also to ensure that commits under different aliases are recognized as by the same person. For example, you have made commits both as jstoobysmith as well as Joseph Tooby-Smith, which would be displayed as two different avatars by default.
Joseph Tooby-Smith (Aug 25 2025 at 15:53):
Ok, thanks for explaining! I think the most democratic thing to do then is for now to leave the author list empty, and if someone gets chance latter they could fill it in with everyone that has contributed, then that's all good. I'm happy to make this change to the PR - just let me know.
Also, didn't realize we could use the notation PhysLean#711 :joy: , I wonder when that got added :slight_smile: .
Joscha Mennicken (Aug 25 2025 at 16:18):
It was added earlier today: https://leanprover.zulipchat.com/#narrow/channel/236604-Zulip-meta/topic/PhysLean.20linkifier.20broken/near/536029225
Joseph Tooby-Smith (Aug 25 2025 at 16:21):
Awesome thanks for getting that sorted @Joscha Mennicken !
Joscha Mennicken (Aug 25 2025 at 16:54):
Note that when you run my script, it shows you the list of names that it doesn't yet know authors for. Completing the authors list is a matter of going through those names and assigning them GitHub account names and aliases.
Feel free to edit or remove the author list as you wish.
Shlok Vaibhav (Aug 26 2025 at 01:23):
Update: Had Silly query about workflow, got resolved on its own, trying to workflow with python on a simple repo, seems doable, thanks for pointing to the action! Using that in conjunction with this guide on using python in workflows: https://github.com/actions/setup-python
image.png
Shlok Vaibhav (Aug 31 2025 at 16:37):
Sorry for the delay, I am able to run a workflow that creates and commits a png file. Have raised a PR (724).
Last updated: Dec 20 2025 at 21:32 UTC