Zulip Chat Archive
Stream: metaprogramming / tactics
Topic: Counting declarations
Jon Eugster (Sep 01 2024 at 12:09):
What is the best way to get the number of declarations defined in a particular file? Probably from CoreM
...
I found docs#Std.Tactic.Lint.getDeclsInPackage which seems to do something like that, but for example for `Lean
it returns all declarations in any file Lean/XY/_.lean
and I thought having it return 0
would be better in my use case, as Lean.lean
is an empty file (with some imports).
Damiano Testa (Sep 01 2024 at 12:44):
The Environment
contains const2ModIdx
, though you then have to find the correct modIdx
. I think that there is a function that tells you directly the declarations in the current module, but maybe that's not what you want (and I'm on mobile and cannot find it now).
Damiano Testa (Sep 01 2024 at 13:09):
E.g., something like this?
import Mathlib
open Lean Elab Command
elab "#decls_in " id:ident : command => do
let env ← getEnv
match env.moduleIdxForModule? id.getId with
| none => logWarning m!"Sorry, module '{id}' does not exist"
| some modIdx =>
let csts := env.const2ModIdx
let cInM ← csts.foldM (fun a b c => do
if c == modIdx && (! (← b.isBlackListed)) then return a.insert b else return a) ({} : NameSet)
logInfo m!"Found {cInM.size} declarations in '{id}':\n\n{cInM.toArray.qsort (·.toString < ·.toString)}"
#decls_in Mathlib.Data.Nat.Defs
Jon Eugster (Sep 01 2024 at 13:09):
That's docs#Std.Tactic.Lint.getDeclsInCurrModule.
I see, that's roughly what I'm trying now with const2ModIdx
, thanks!
Dumb sanity question, if I have a file with two def
s in there and nothing else, like this here: Batteries/Lean/IO/Process.lean, should the number of decls be 2?
Damiano Testa (Sep 01 2024 at 13:09):
I removed the "blacklisted" decls, otherwise you get a lot of _aux
stuff.
Damiano Testa (Sep 01 2024 at 13:10):
Try filtering out more or less what you want, to see what suits you.
Damiano Testa (Sep 01 2024 at 13:14):
In general, I find it always a little tricky to find out exactly what declarations I care about: normally, everything that is "visible" is something that I would want, to_additive
stuff tends also to be what I want, but anything after that... it depends!
Jon Eugster (Sep 01 2024 at 13:15):
Ah thanks, that's awesome! (I see, I didn't use env.moduleIdxForModule?
so probably got a random index :sweat_smile:
I'm not in Mathlib
so I have to see what to replace isBlacklisted
with (I vaguely remember that Lean had some isInternal
or so.
Damiano Testa (Sep 01 2024 at 13:16):
For the particular file that you mentioned, with isBlacklisted
you do get only two declarations, but there 16 more that are blacklisted.
Damiano Testa (Sep 01 2024 at 13:16):
def isBlackListed {m} [Monad m] [MonadEnv m] (declName : Name) : m Bool := do
if declName == ``sorryAx then return true
if declName matches .str _ "inj" then return true
if declName matches .str _ "noConfusionType" then return true
let env ← getEnv
pure <| declName.isInternalDetail
|| isAuxRecursor env declName
|| isNoConfusion env declName
<||> isRec declName <||> isMatcher declName
Damiano Testa (Sep 01 2024 at 13:21):
This may be a "minimized" version:
import Lean.Elab.Command
open Lean Meta
/- from `Mathlib.Lean.Expr.Basic` -/
def Lean.Name.isBlackListed {m} [Monad m] [MonadEnv m] (declName : Name) : m Bool := do
if declName == ``sorryAx then return true
if declName matches .str _ "inj" then return true
if declName matches .str _ "noConfusionType" then return true
let env ← getEnv
pure <| declName.isInternalDetail
|| isAuxRecursor env declName
|| isNoConfusion env declName
<||> isRec declName <||> isMatcher declName
/- from `Batteries.Tactic.OpenPrivate` -/
def Lean.Environment.moduleIdxForModule? (env : Environment) (mod : Name) : Option ModuleIdx :=
(env.allImportedModuleNames.indexOf? mod).map fun idx => idx.val
elab "#decls_in " id:ident : command => do
let env ← getEnv
match env.moduleIdxForModule? id.getId with
| none => logWarning m!"Sorry, module '{id}' does not exist"
| some modIdx =>
let csts := env.const2ModIdx
let cInM ← csts.foldM (fun a b c => do
if c == modIdx && (! (← b.isBlackListed)) then
return a.insert b
else
return a) ({} : NameSet)
logInfo m!"Found {cInM.size} declarations in '{id}':\n\n\
{cInM.toArray.qsort (·.toString < ·.toString)}"
Jon Eugster (Sep 02 2024 at 10:09):
so, yesterday I ended up with this
/-- Get all declarations in the specified file. -/
def getDeclsInFile (module : Name) : CoreM NameSet := do
let env ← getEnv
match env.moduleIdxForModule? module with
| none => return {}
| some modIdx =>
let decls := env.const2ModIdx
let declsIn ← decls.foldM (fun acc n idx => do
if idx == modIdx && (! (← isBlackListed n)) then
return acc.insert n
else
return acc) ({} : NameSet)
return declsIn
(source)
However, I have a hunch that this is quite slow and all I need later is (← getDeclsInFile n).size
to create some node-sizes in this mathlib graph (which some might remember from Lean3):
Screenshot_20240901_231516.png
Do you have a suggestion for a faster method to get some meaningful node sizes? Maybe just retrieving the file size?
Jon Eugster (Sep 02 2024 at 10:21):
Hmm apparently de-monadifying everything and only calling let env ← getEnv
once outside is already an improvement. Is ← getEnv
something that's usually slow?
Damiano Testa (Sep 02 2024 at 10:26):
I am not sure about getEnv
being slow: could it be that re-getting the environment means that a lot of cached information is thrown away?
Damiano Testa (Sep 02 2024 at 10:28):
If you just want to count the definitions that are "visible", you might also look at the .ilean
s that contain information about what is defined in a file, though only the declarations that have some syntax, I believe.
Damiano Testa (Sep 02 2024 at 10:28):
So, to_additive
declarations would not be there, probably.
Damiano Testa (Sep 02 2024 at 10:29):
As a general rule, I do not have a very good feeling for what is slow and what is fast, so it is very likely that the code above could be made immensely more efficient by small changes that I ignore...
Jon Eugster (Sep 02 2024 at 10:48):
I don't understand this too well, so maybe your guess about the cache is something.
In any case, only getting the env
once and passing it into the functions reduced the time to 6 minutes for mathlib, which I guess is acceptable. (but it takes 5 seconds in total if I just set the size to 1
)
I'll try the .ilean
. Do you know if there's an intended way to read them? I guess finding them in .lake
and parsing as JSON is somewhat hacky...
Damiano Testa (Sep 02 2024 at 11:04):
If you want the count for all modules, maybe you can setup a HashMap Name Nat
that loops through the declarations once and adds them where they should go?
Damiano Testa (Sep 02 2024 at 11:30):
For instance, this takes a few seconds on the server, although I have not tested it very much:
import Mathlib
open Lean Meta
elab "#tally_decls" : command => do
let mut invert : HashMap Nat Name := {}
let env ← getEnv
for e in env.allImportedModuleNames do
invert := invert.insert ((env.getModuleIdx? e).getD default) e
let mut tots : HashMap Nat Nat := {}
for (c, mod) in env.const2ModIdx.toArray do
if ! (← c.isBlackListed) then
let curr := (tots.find? mod).getD 0
tots := tots.insert mod (curr + 1)
logInfo m!"{tots.toArray.map fun (idx, tot) => ((invert.find? idx).getD default, tot)}"
#tally_decls
/-
[(Init.Prelude, 864),
(Init.Coe, 63),
(Init.Notation, 126),
(Init.Tactics, 167),
(Init.SizeOf, 75),
(Init.Core, 494),
(Init.SimpLemmas, 136),
(Init.Data.Nat.Basic, 332),
(Init.WF, 67),
(Init.MetaTypes, 97),
(Init.WFTactics, 5),
(Init.Data.Nat.Div, 59),
(Init.Data.List.Notation, 2),
(Init.Data.List.Basic, 290),
(Init.Data.Nat.Bitwise.Basic, 18),
(Init.Data.Fin.Basic, 51),
(Init.Data.UInt.Basic, 197),
(Init.Data.Char.Basic, 23),
...
(Mathlib.Topology.Sheaves.SheafCondition.EqualizerProducts, 51),
(Mathlib.Topology.Sheaves.Sheafify, 8),
(Mathlib.Topology.Sheaves.Skyscraper, 39),
(Mathlib.Topology.UniformSpace.AbsoluteValue, 2),
(Mathlib.Topology.UniformSpace.CompareReals, 13),
(Mathlib.Util.AssertNoSorry, 1),
(Mathlib.Util.Export, 54),
(Mathlib.Util.FormatTable, 12),
(Mathlib.Util.GetAllModules, 2),
(Mathlib.Util.IncludeStr, 1),
(Mathlib.Util.LongNames, 3),
(Mathlib.Util.SleepHeartbeats, 2),
(Mathlib.Util.TermBeta, 2),
(Mathlib.Util.Time, 2)]
-/
Jon Eugster (Sep 02 2024 at 11:32):
Damiano Testa said:
If you want the count for all modules, maybe you can setup a
HashMap Name Nat
that loops through the declarations once and adds them where they should go?
actually true, I'm looping way to often! Thanks!
Damiano Testa (Sep 02 2024 at 11:35):
Btw, I find the dance around ModuleIdx
and Name
very annoying: I never remember which way the functions go and do not really know why it is so hard to get a human-readable name out of a ModuleIdx
!
Jon Eugster (Sep 02 2024 at 11:56):
Ah wow, down to 7 seconds :star_struck:
I did this now:
def getNumberOfDeclsPerFile (env: Environment) : NameMap Nat :=
env.const2ModIdx.fold (fun acc n idx =>
let mod := env.allImportedModuleNames.get! idx
if isBlackListed env n then acc else acc.insert mod ((acc.findD mod 0) + 1)
) {}
from your code I assume env.allImportedModuleNames.get! idx
should be how one could retrieve the module name, is that right?
Jon Eugster (Sep 02 2024 at 11:58):
Because this is how it's defined under the hood:
def getModuleIdx? (env : Environment) (moduleName : Name) : Option ModuleIdx :=
env.header.moduleNames.findIdx? (· == moduleName)
Damiano Testa (Sep 02 2024 at 11:58):
I do not know whether the order of the ModuleIdx
is the same as the one in which they are stored in allImportedModuleNames
. It would make sense, but I do not know.
Jon Eugster (Sep 02 2024 at 11:59):
ah yes, let me double-check these definitions
Jon Eugster (Sep 02 2024 at 11:59):
def allImportedModuleNames (env : Environment) : Array Name :=
env.header.moduleNames
Damiano Testa (Sep 02 2024 at 12:00):
I typed my comment, while you sent your message with the definition of getModuleIdx?
, so do not take my comment as a reply to your message just above it, but to the one before that!
Damiano Testa (Sep 02 2024 at 12:00):
I.e., I still do not know what is the order, since I have not yet fully parsed what the functions do! :smile:
Damiano Testa (Sep 02 2024 at 12:04):
Ok, I have not tested it, but you are probably right: it seems that moduleNames
is an array and the ModuleIdx
is the index within that array of the module name.
Damiano Testa (Sep 02 2024 at 12:04):
So, then also my command above can be shortened, since invert
was just a way of doing what you suggested with get!
ing the Array entry.
Jon Eugster (Sep 02 2024 at 12:05):
Yep, the messages are a bit out of order. So to restate this, it looks like env.header.moduleNames
is just an Array Name
, env.allImportedModuleNames
is just a wrapper for the same array and env.getModuleIdx? `MyModule
seems to just use Array.findIdx?
to find the index in this array, which is then the ModuleIdx
(as def ModuleIdx := Nat
)
Damiano Testa (Sep 02 2024 at 12:06):
I agree! Maybe test it on a couple of modules, just to make sure, but I am convinced that this is what it is.
Damiano Testa (Sep 02 2024 at 12:06):
I was just worried that these arrays were obtained after storing their entries in a HashSet or something, forgetting any ordering, but it seems like this is not the case.
Jon Eugster (Sep 02 2024 at 12:15):
Btw, here is the result: https://github.com/leanprover-community/import-graph/pull/30
Damiano Testa (Sep 02 2024 at 12:18):
Very cool!
Damiano Testa (Sep 02 2024 at 12:18):
I wonder whether we should add the graph to the summary of every PR. :lol:
Jon Eugster (Sep 02 2024 at 12:57):
... and at a visual-based linter to test the yarn ball isn't messier than previously :wink:
Last updated: May 02 2025 at 03:31 UTC