Zulip Chat Archive
Stream: lean4
Topic: lake show dependencies
Eric Wieser (May 01 2023 at 10:36):
Is there a way to ask lake
to show the dependencies of a file without building that file and everything below it from scratch?
Alex J. Best (May 01 2023 at 11:36):
There is the builtin
lake env lean --deps Mathlib/Data/Nat/Basic.lean
but this only shows immediate dependencies (I'm also not sure if it builds as everything is already built for me)
Eric Wieser (May 01 2023 at 12:18):
Starting from everything already being built is fine for me
Eric Wieser (May 01 2023 at 12:19):
I only mentioned a clean build because from that you get all the filenames from the progress messages
Eric Wieser (May 01 2023 at 15:35):
I tried calling lean --deps
recursively, but it's prohibitively slow:
Henrik Böving (May 01 2023 at 15:37):
As I understand it lake as a build system is inherently monadic. That is the dependency graph is dynamic to the point where you cannot know it without executing it. So I would say generally speaking it is impossible to figure this out without executing some part of the build system. But it might be possible in the case where the full power of the monadic part is not used?
Mario Carneiro (May 01 2023 at 15:38):
I'm pretty sure it stages that at least somewhat, i.e. you can get the deps without actually building everything
Henrik Böving (May 01 2023 at 15:40):
Hmmm, I mean I can for example define a facet that branches on the execution of some IO operation and based on that decides to add something as a dependency or not. How are you going to figure that out?
Mario Carneiro (May 01 2023 at 15:40):
You run the IO operation
Eric Wieser (May 01 2023 at 15:41):
I'm very happy to extract the dependency graph from build artifacts
Mario Carneiro (May 01 2023 at 15:41):
I'm not saying that it's a pure function (I mean obviously it's not pure since you have to at the very least read some .lean files), but it doesn't require calling lean on all those files
Eric Wieser (May 01 2023 at 15:42):
I just don't want to have to build all of mathlib from scratch. If lake exe cache get
downloads what I need to get the data through Lake, then that's fine for me
Henrik Böving (May 01 2023 at 15:42):
Eric Wieser said:
I'm very happy to extract the dependency graph from build artifacts
If you figure out how to parse the .trace files that sounds doable
Eric Wieser (May 01 2023 at 15:43):
Surely Lake already has all the information I need internally?
Henrik Böving (May 01 2023 at 15:43):
Yes the deps of a file are stored in the .trace file accompanying it as I understand
Henrik Böving (May 01 2023 at 15:44):
I just dont know what the trace file means...but in e.g. the doc gen facet there is a lot of "mixing of traces" to basically delacre a dependency
Henrik Böving (May 01 2023 at 15:45):
A trace is a piece of data (generally a hash) which is used to verify whether a given target is up-to-date. If the trace stored with a built target matches the trace computed during build, then a target is considered up-to-date. A target's trace is derived from its various inputs (e.g., source file, Lean toolchain, imports, etc.).
ah well, almost /o\
Eric Wieser (May 01 2023 at 15:48):
It sounds like you're telling me that lake never actually constructs the transitive build graph
Mario Carneiro (May 01 2023 at 15:49):
I wouldn't be surprised if it didn't, IIRC it is somewhat expensive
Mario Carneiro (May 01 2023 at 15:49):
Although honestly it feels like I could do pretty well on this problem with a regex parser and no lean calls at all
Eric Wieser (May 01 2023 at 15:50):
I think that's what leanproject does, but it's pretty silly that lean --deps
is so slow that it's easier to use regex
Mario Carneiro (May 01 2023 at 15:50):
it doesn't feel like it should be a computationally expensive task
Eric Wieser (May 01 2023 at 15:50):
I imagine lake env
adds some overhead?
Mario Carneiro (May 01 2023 at 15:50):
I don't think it is that much, and it is a one time cost
Eric Wieser (May 01 2023 at 15:51):
Not in my script above
Eric Wieser (May 01 2023 at 15:51):
I guess I should run my entire script through lake env
?
Eric Wieser (May 01 2023 at 15:51):
Either way, it only takes 4 minutes on the file I care about, and I only need to run it twice
Eric Wieser (May 01 2023 at 15:59):
Update: the overhead is almost entirely lake env
Eric Wieser (May 01 2023 at 15:59):
The script goes from 4 minutes to 20s.
Mario Carneiro (May 01 2023 at 16:03):
this is why we can't have nice things
Mario Carneiro (May 01 2023 at 16:07):
Looking at the code, it is moderately surprising to me (until I give it a bit of thought) that the lean entry point (and what parses the --deps
argument) is part of the C++ part. Has anyone tried to write a lean frontend in lean itself?
Mario Carneiro (May 01 2023 at 16:08):
it's doing some stuff like loading shared libraries that make sense for leanc
but I don't think would be necessary if you just want to run the initial part of the lean module parser
Mario Carneiro (May 01 2023 at 16:10):
oh fun, there is a thing that parses #lang lean4
at the start of files in here. I don't think that works...
Henrik Böving (May 01 2023 at 16:11):
Works anymore! iirc it was used for the switch between the old and the new language frontend back in the bootstrapping days
Mario Carneiro (May 01 2023 at 16:24):
Here's a demo: Make a new lake project deps
with this:
import Lean
open Lean Parser
def main (args : List String) : IO Unit := do
let some (path : String) := args[0]? | throw (.userError "usage: deps <FILE>")
let input ← IO.FS.readFile path
let inputCtx := mkInputContext input path
let (stx, _, _) ← parseHeader inputCtx
for x in stx[1].getArgs do
IO.println s!"{x.getIdAt 2}"
result:
$ time build/bin/deps ../mathlib4/Mathlib/AlgebraicTopology/MooreComplex.lean
Mathlib.Algebra.Homology.HomologicalComplex
Mathlib.AlgebraicTopology.SimplicialObject
Mathlib.CategoryTheory.Abelian.Basic
________________________________________________________
Executed in 29.27 millis fish external
usr time 25.79 millis 271.00 micros 25.52 millis
sys time 0.04 millis 42.00 micros 0.00 millis
Mario Carneiro (May 01 2023 at 16:26):
compare:
$ time lake env lean --deps Mathlib/AlgebraicTopology/MooreComplex.lean
/home/mario/.elan/toolchains/leanprover--lean4---nightly-2023-04-11/lib/lean/Init.olean
./build/lib/Mathlib/Algebra/Homology/HomologicalComplex.olean
./build/lib/Mathlib/AlgebraicTopology/SimplicialObject.olean
./build/lib/Mathlib/CategoryTheory/Abelian/Basic.olean
________________________________________________________
Executed in 349.30 millis fish external
usr time 275.22 millis 320.00 micros 274.90 millis
sys time 76.69 millis 44.00 micros 76.65 millis
Mario Carneiro (May 01 2023 at 16:27):
This of course isn't doing anything about transitive dependencies, but it shouldn't be too hard to bolt that on similar to the python implementation
Mario Carneiro (May 01 2023 at 16:46):
Okay here's a version that does transitive dependencies (within mathlib):
import Lean
open Lean Parser
def main (args : List String) : IO Unit := do
let root :: mod :: _ := args | throw (.userError "usage: deps <ROOT> <MODULE>")
let mut visited : NameSet := {}
let mut to_visit : Array Name := #[mod.toName]
repeat do
let mut next_to_visit : Array Name := #[]
for p in to_visit do
if p.getRoot == `Mathlib then
let path := modToFilePath root p "lean"
let input ← IO.FS.readFile path
let inputCtx := mkInputContext input path.toString
let (stx, _, _) ← parseHeader inputCtx
for c in stx[1].getArgs do
let c := c.getIdAt 2
if !visited.contains c then
next_to_visit := next_to_visit.push c
visited := visited.insert p
to_visit := next_to_visit
if to_visit.isEmpty then
break
for x in visited do
IO.println s!"{x}"
Mario Carneiro (May 01 2023 at 16:48):
$ time build/bin/deps ../mathlib4 Mathlib.AlgebraicTopology.MooreComplex
Mathlib.Algebra.GroupPower.Ring
Mathlib.Data.Sigma.Basic
Mathlib.CategoryTheory.Limits.Preserves.Limits
...
Mathlib.Data.Rat.Defs
Mathlib.Data.Fintype.Powerset
Mathlib.CategoryTheory.Preadditive.Basic
Mathlib.Init.Propext
Mathlib.CategoryTheory.Limits.Shapes.RegularMono
Mathlib.Data.List.NodupEquivFin
Mathlib.Tactic.SimpRw
Mathlib.Data.Set.NAry
Mathlib.Init.Algebra.Classes
Mathlib.Tactic.SplitIfs
________________________________________________________
Executed in 423.00 millis fish external
usr time 391.22 millis 640.00 micros 390.58 millis
sys time 31.88 millis 0.00 micros 31.88 millis
Mario Carneiro (May 01 2023 at 16:49):
it's 1.48s to do the transitive closure of Mathlib
Eric Wieser (May 01 2023 at 17:12):
Does this print them in the order they populate the environment?
Mario Carneiro (May 01 2023 at 17:13):
no, they are I think sorted by Name.quickLt
Mario Carneiro (May 01 2023 at 17:13):
which is basically random
Mario Carneiro (May 01 2023 at 17:14):
if you use RBTree Name Name.cmp
instead of NameSet
for the type of visited
they come out in alphabetical order
Mario Carneiro (May 01 2023 at 17:15):
uh... not alphabetical order, looks like length then alphabetical order by component
Mario Carneiro (May 01 2023 at 17:15):
meh, just pipe it through sort
Mario Carneiro (May 01 2023 at 17:31):
Hm, it looks like there were some bugs in the DFS code I borrowed from the python script which were causing it to revisit files multiple times. This version prints the files out in preorder (postorder is a bit more difficult with this work list implementation), if you care about maintaining a topological sort, and it takes only half the time, 574ms for Mathlib
:
def main (args : List String) : IO Unit := do
let root :: mod :: _ := args | throw (.userError "usage: deps <ROOT> <MODULE>")
let mut out : String := ""
let mut visited : NameSet := {}
let mut toVisit : Array Name := #[mod.toName]
while !toVisit.isEmpty do
let mut nextToVisit : Array Name := #[]
for p in toVisit do
if !visited.contains p then
let path := modToFilePath root p "lean"
let input ← IO.FS.readFile path
let inputCtx := mkInputContext input path.toString
let (stx, _, _) ← parseHeader inputCtx
for c in stx[1].getArgs do
let c := c.getIdAt 2
if c.getRoot == `Mathlib then
nextToVisit := nextToVisit.push c
visited := visited.insert p
out := out ++ s!"{p}\n"
toVisit := nextToVisit
IO.print out
Eric Wieser (May 01 2023 at 17:38):
Does this belong in lake itself as a helper? Or would it go in Std like cache did?
Mario Carneiro (May 01 2023 at 17:42):
I think it could just go in whatever script is using it. A lot of the speed boosts it is getting come from it hardcoding various properties of the use case and not being as robust as lake
Mario Carneiro (May 01 2023 at 17:42):
I mean, it's basically the same length as your original python script
Mario Carneiro (May 01 2023 at 17:44):
We could add some parts of this to std and/or lake to make the script even shorter, but it would have to get quite a bit more complex and probably slower if it was upstreamed wholesale
Mario Carneiro (May 01 2023 at 17:47):
There is also another version of this program that could be written, which reads the .olean files instead of the .lean files. This is likely even faster, since the oleans actually contain transitive dependencies (they don't, but they do contain the parsed imports)
Mario Carneiro (May 01 2023 at 17:48):
this is a bit closer to what lean itself does
Mario Carneiro (May 01 2023 at 17:49):
Actually I'm not sure if it would be faster, 500ms is hard to beat and I think even just loading the oleans for mathlib already takes that long (maybe it is also paying for initialization though)
Mario Carneiro (May 01 2023 at 17:57):
wowza, 88ms for Mathlib
:
import Lean
open Lean Parser
def main (args : List String) : IO Unit := do
let root :: mod :: _ := args | throw (.userError "usage: deps <ROOT> <MODULE>")
let mut out : Array String := #[]
let mut visited : NameSet := {}
let mut toVisit : Array Name := #[mod.toName]
while !toVisit.isEmpty do
let mut nextToVisit : Array Name := #[]
for p in toVisit do
if !visited.contains p then
let path := modToFilePath root p "olean"
let (mod, _) ← readModuleData path
nextToVisit := mod.imports.foldr (init := nextToVisit) fun i nextToVisit =>
let c := i.module
if c.getRoot == `Mathlib then nextToVisit.push c
else nextToVisit
visited := visited.insert p
out := out.push s!"{p}"
toVisit := nextToVisit
out.forRevM IO.println
$ time build/bin/deps ../mathlib4/build/lib/ Mathlib
...
Mathlib.Util.AssertNoSorry
Mathlib.Util.AtomM
Mathlib.Util.Export
Mathlib.Util.IncludeStr
Mathlib.Util.LongNames
Mathlib.Util.MemoFix
Mathlib.Util.Syntax
Mathlib.Util.SynthesizeUsing
Mathlib.Util.Tactic
Mathlib.Util.Time
Mathlib.Util.WhatsNew
Mathlib.Util.WithWeakNamespace
Mathlib
________________________________________________________
Executed in 88.77 millis fish external
usr time 40.51 millis 781.00 micros 39.73 millis
sys time 47.85 millis 175.00 micros 47.67 millis
Mac Malone (May 07 2023 at 16:30):
Mario Carneiro said:
I wouldn't be surprised if it didn't, IIRC it is somewhat expensive
Lake does compute dependencies and it does it in a few different ways. The direct dependencies are available in the deps
facet and the transitive dependencies in transDeps
. However, using them requires a script; they are not available from the CLI (yet).
Last updated: Dec 20 2023 at 11:08 UTC