Zulip Chat Archive

Stream: general

Topic: mathlib graph

Johan Commelin (Sep 10 2024 at 04:47):

What is the best way to get the directed graph of all mathlib declarations (as nodes) and their direct dependencies (as edges)? I know this will be a massive dataset, but I think it should still be manageable? (Probably hard to visualize though.)

Damiano Testa (Sep 10 2024 at 05:12):

You want the edges to be just the "covering relations", right? I've been wanting something like this as well, but at the moment I do not think that there is a single command that does this.

Johan Commelin (Sep 10 2024 at 05:48):

Yup, if lemma foo mentions def bar in its proof term, then there should be an edge foo -> bar.

Damiano Testa (Sep 10 2024 at 05:49):

Ah, that is different from what I thought. I often want the smallest set of edges whose transitive closure is the full graph, while you want the full graph, I believe.

Damiano Testa (Sep 10 2024 at 05:50):

(For me a "covering relation" is an edge that cannot be replaced by a path of length at least two.)

Damiano Testa (Sep 10 2024 at 05:55):

I have an experiment computing the "longest" declaration and I wanted to improve it to finding "maximal" declarations, but a naive approach did not work. So I think that you are right that it should be doable, but you probably need to know what you are doing!

Damiano Testa (Sep 10 2024 at 05:55):

Once I'm back at a computer, I can send you the links to the experiments that I made.

Kim Morrison (Sep 10 2024 at 06:18):

@Johan Commelin, I recently implemented this for the gray shading in lake exe graph.

See def Name.transitivelyUsedConstants (n : Name) : CoreM NameSet := do in import-graph.

Kim Morrison (Sep 10 2024 at 06:19):

There's existing code in import-graph for doing transitive reduction given Name -> NameSet function, currently used where the names are modules. But the same code should work when the names are delcarations, to giving the covering graph that @Damiano Testa wants.

Kim Morrison (Sep 10 2024 at 06:21):

i.e. transitiveReduction

Johan Commelin (Sep 10 2024 at 06:29):

Aah, right. I don't want the transitive reduction. Or maybe I also want that, at a later stage.

Jovan Gerbscheid (Sep 10 2024 at 07:38):

I was experimenting with this before. Here's the code I used to construct the HashMap of all mathlib constants with their proof term (or type if it doesn't have a definition, e.g. an axiom or inductive):

def isBadDecl (n : Name) (cinfo : ConstantInfo) (env : Environment) : Bool :=
  (cinfo matches .thmInfo .. | .opaqueInfo ..)
    || cinfo.isUnsafe
    || cinfo.isPartial
    || n.isInternalDetail
    || isAuxRecursor env n
    || isNoConfusion env n

def allDependencies : MetaM (Std.HashMap Name (Array Name)) := do
  let env ← getEnv
  return env.constants.map₁.fold (init := {}) fun map name cinfo =>
    if isBadDecl name cinfo env then map else
    let e := match cinfo.value? with
      | none => cinfo.type
      | some val => val
    let consts := e.foldConsts #[] fun n arr =>
      if isBadDecl n (env.constants.find! n) env then
        arr
      else
        arr.push n
    map.insert name consts

In particular, the function Expr.foldConsts is what I used to get the constants from an expression.

Kim Morrison (Sep 10 2024 at 07:52):

This doesn't do transitive dependency?

Kim Morrison (Sep 10 2024 at 07:52):

And if constants appear multiple times they will be repeated.

Kim Morrison (Sep 10 2024 at 07:53):

Better to use the functions in import-graph.

Johan Commelin (Sep 10 2024 at 07:54):

I actually don't want transitive dependencies.

Jon Eugster (Sep 10 2024 at 07:58):

Btw @Johan Commelin if you have some code doing what you want, I think it would be very helpful if you shared that and we can see what can be integrated in import-graph. That sounds like API that would be useful to have!

Johan Commelin (Sep 10 2024 at 08:03):

Atm I don't have any code (-;

Kim Morrison (Sep 10 2024 at 08:05):

Johan Commelin said:

I actually don't want transitive dependencies.

Ah, you said above you didn't want transitive reduction. We're way ahead of you. :-)

Kim Morrison (Sep 10 2024 at 08:05):

Even not wanting transitive dependencies you should still use import-graph, just because it uses NameMap and NameSet which will be more efficient.

Johan Commelin (Sep 10 2024 at 08:06):

Right, I want neither. I want the raw data first. Some graph theorists asked me for it, and they want to do their graph theory magic on the raw data.

Damiano Testa (Sep 10 2024 at 08:07):

Raw data means just looking at the constants that appear in the Expr for the declaration?

Johan Commelin (Sep 10 2024 at 08:09):

Yep, deduplicating should be fine. But not transitive closures or reductions.

Damiano Testa (Sep 10 2024 at 08:11):

So, for each Expr, you recurse and extract all Names from all the .consts that you see and then for each declaration you do that for the value, if it exists and for the type otherwise. Does this sound right?

Damiano Testa (Sep 10 2024 at 08:12):

I certainly remember doing this and failing to make it in any way efficient! :smile:

Jovan Gerbscheid (Sep 10 2024 at 08:15):

Expr.foldConsts already does the deduplication. It is implemented by checking pointer equality, making it more efficient.

Damiano Testa (Sep 10 2024 at 08:25):

What I think would be very useful to have is the partial order on Names that works on Names that are module names or declaration names.

Damiano Testa (Sep 10 2024 at 08:25):

Maybe it is there now, but I am pretty sure that it was not available a few weeks ago.

Adam Topaz (Sep 12 2024 at 17:28):

@Johan Commelin didn’t I send you such a graph a little while ago?

Adam Topaz (Sep 12 2024 at 17:39):

This code will generate a 351M file (at least with the version of mathlib I have) in less than one minute:

import Mathlib

open Lean

#eval show CoreM Unit from do
  let handle ← IO.FS.Handle.mk "output" .write
  let env ← getEnv
  let mut counter := 0
  for (n,c) in env.constants do
    if ← n.isBlackListed then continue -- or some other filter?
    for d in c.getUsedConstantsAsSet do
      if ← d.isBlackListed then continue
      handle.putStrLn s!"{n} {d}"
      counter := counter + 1
  println! s!"Found {counter} connections"

That file is too big to upload to zulip unfortunately.

Adam Topaz (Sep 12 2024 at 17:44):

(Note that this doesn't even do anything multithreaded, so it's very naive, but it's still reasonably fast)

Daniel Weber (Sep 12 2024 at 17:47):

I don't think simply ignoring blacklisted nodes is the correct thing to do in this context, as that removes some transitive connections

Adam Topaz (Sep 12 2024 at 17:48):

ok...

Adam Topaz (Sep 12 2024 at 17:49):

running it now without the filter...

Adam Topaz (Sep 12 2024 at 17:49):

done. Result is now 912M, still under 2 mins

Adam Topaz (Sep 12 2024 at 17:50):

If you want to run it yoruself:

import Mathlib

open Lean

#eval show CoreM Unit from do
  let handle ← IO.FS.Handle.mk "output" .write
  let env ← getEnv
  let mut counter := 0
  for (n,c) in env.constants do
    --if ← n.isBlackListed then continue -- or some other filter?
    for d in c.getUsedConstantsAsSet do
      --if ← d.isBlackListed then continue
      handle.putStrLn s!"{n} {d}"
      counter := counter + 1
  println! s!"Found {counter} connections"

Johan Commelin (Sep 12 2024 at 17:54):

Adam Topaz said:

Johan Commelin didn’t I send you such a graph a little while ago?

Possibly, my brain is a bit like a colender sometimes...

Adam Topaz (Sep 12 2024 at 17:57):

BTW, if you want to run this, I recommend doing the following:

#check show CoreM Unit from do
  let handle ← IO.FS.Handle.mk "output" .write
  let env ← getEnv
  let mut counter := 0
  for (n,c) in env.constants do
    --if ← n.isBlackListed then continue -- or some other filter?
    for d in c.getUsedConstantsAsSet do
      --if ← d.isBlackListed then continue
      handle.putStrLn <| Json.mkObj [("src",toJson n),("dep",toJson d)] |>.compress
      counter := counter + 1
  println! s!"Found {counter} connections"

just because processing the lines in the file and parsing them as JSON is easier this way. This makes it quite easy to get the data into python for example.

Johan Commelin (Sep 12 2024 at 18:00):

Ok, and now I should figure out how to run a single Lean file again to pipe the output to a json file.

Adam Topaz (Sep 12 2024 at 18:00):

you can just run this in your editor. It will produce a file called "output" with json objects on each line.

Adam Topaz (Sep 12 2024 at 18:01):

you know what... I'll upload the output to my google drive and give you the link so you can just download it without running the file

Johan Commelin (Sep 12 2024 at 18:02):

oh, I can run it myself, but either is fine

Adam Topaz (Sep 12 2024 at 18:03):

(if anyone wants the file, feel free to DM me)

Adam Topaz (Sep 12 2024 at 18:06):

I like writing json lines like this because it's then real easy to process this with python as follows:

import json

data = []
with open("output",'r') as f:
    for l in f:
        data.append(json.loads(l))

Johan Commelin (Sep 12 2024 at 18:07):

This version seems to include decls like UInt32.reduceOfNat._lambda_1._cstage2, it seems

Adam Topaz (Sep 12 2024 at 18:07):

yeah, because I was told not to filter out blacklisted names :)

Tony Beta Lambda (Sep 13 2024 at 05:03):

https://github.com/reaslab/jixia can already do that, generating both the type dependencies and value dependencies in its "symbol" plugin.

Deming Xu (Sep 13 2024 at 22:57):

https://github.com/semorrison/lean-training-data

There is a "premises" Lean file in this repo:

premises: For each declaration imported in a target file (e.g. Mathlib), list all of its direct dependencies (i.e. constants referenced from its type or proof). Constants appearing in explicit arguments are prefixed *, and constants used by the simplifier are prefixed s.

Last updated: May 02 2025 at 03:31 UTC