Zulip Chat Archive
Stream: general
Topic: using `PersistentEnvExtension`s
Damiano Testa (Aug 20 2024 at 09:24):
I am learning about PersistentEnvExtension
, but there is something that I do not understand.
I created a new file that contains only the definition of a new structure
and initializes the extension:
import Mathlib.Init
import Lean.Elab.Command
open Lean Elab Meta Command
namespace Mathlib.AssertNotExist
structure AssertExists where
isDecl : Bool
givenName : Name
modName : Name
/-- Defines the `assertExistsExt` extension for adding an `Array` of `AssertExists`s
to the environment. -/
initialize assertExistsExt : PersistentEnvExtension AssertExists AssertExists (Array AssertExists) ←
registerPersistentEnvExtension {
mkInitial := pure {}
addImportedFn := (return .flatten ·)
addEntryFn := .push
exportEntriesFn := id
}
end Mathlib.AssertNotExist
In a separate file, I define a function that modifies the environment, by adding an entry to the extension:
def addDeclEntry (isDecl : Bool) (declName : Name) : CommandElabM Unit := do
let modName ← getMainModule
modifyEnv fun env =>
assertExistsExt.addEntry env
{ isDecl := isDecl, givenName := declName, modName := modName }
This addDeclEntry
is then used in some places in Mathlib. I pushed this to #15974 and CI fails, e.g.:
✖ [1521/4982] Building Mathlib.Order.PrimeSeparator
trace: .> LEAN_PATH=././.lake/packages/batteries/.lake/build/lib:././.lake/packages/Qq/.lake/build/lib:././.lake/packages/aesop/.lake/build/lib:././.lake/packages/proofwidgets/.lake/build/lib:././.lake/packages/Cli/.lake/build/lib:././.lake/packages/importGraph/.lake/build/lib:././.lake/build/lib LD_LIBRARY_PATH= /home/lean/.elan/toolchains/leanprover--lean4---v4.11.0-rc2/bin/lean -Dpp.unicode.fun=true -DautoImplicit=false -Dweak.linter.hashCommand=true -Dweak.linter.missingEnd=true -Dweak.linter.cdot=true -Dweak.linter.longLine=true -Dweak.linter.oldObtain=true -Dweak.linter.refine=true -Dweak.linter.setOption=true ././././Mathlib/Order/PrimeSeparator.lean -R ./././. -o ././.lake/build/lib/Mathlib/Order/PrimeSeparator.olean -i ././.lake/build/lib/Mathlib/Order/PrimeSeparator.ilean -c ././.lake/build/ir/Mathlib/Order/PrimeSeparator.c --json
error: Lean exited with code 137
If I tried locally lake build Mathlib.Order.PrimeSeparator
it also seems to hang, but if I open the file in VSCode, everything works.
What am I doing wrong?
Damiano Testa (Aug 20 2024 at 09:27):
Actually, lake build
just took a long time: it finished now:
$ lake build Mathlib.Order.PrimeSeparator
Build completed successfully.
(It took quite a bit of time, though, to build just this file, as I had already build the imported files.)
Matthew Ballard (Aug 20 2024 at 09:31):
It looks like you are writing a monstrous amount to the environment.
Damiano Testa (Aug 20 2024 at 09:32):
The function is only used to add an entry every thing that the assert_not_exists
/assert_not_imported
is used.
Damiano Testa (Aug 20 2024 at 09:32):
Is this really a lot?
Damiano Testa (Aug 20 2024 at 09:32):
There are 427 uses of either in all of Mathlib.
Damiano Testa (Aug 20 2024 at 09:33):
Though maybe I am misunderstanding how this works...
Matthew Ballard (Aug 20 2024 at 09:33):
No, that is not much at all in theory. But you must be capturing a lot more in implementation.
Matthew Ballard (Aug 20 2024 at 09:34):
It took my machine at 10x the normal amount of time to unpack the oleans with on the first 30% being present
Damiano Testa (Aug 20 2024 at 09:35):
Ok, so you are saying that if I implemented this right, it should work, but, for some reason, I am adding much more stuff to the environment than what I think?
Matthew Ballard (Aug 20 2024 at 09:35):
That is my guess yes
Damiano Testa (Aug 20 2024 at 09:35):
Is there some what to find out what it is that I am doing that is too much?
Matthew Ballard (Aug 20 2024 at 09:36):
Get the state after some asserts and see what it looks like
Damiano Testa (Aug 20 2024 at 09:36):
Ok, note that addDeclEntry
is the only entry point for the modifications that I do.
Damiano Testa (Aug 20 2024 at 09:37):
This should narrow it down to
addDeclEntry
adds too much;addDeclEntry
is used too much.
Matthew Ballard (Aug 20 2024 at 09:39):
It is probably in the modifications to assert_not_exists
and assert_not_imported
which call that but I haven't looked closely. addEntry
looks reasonable to my eye but I can't be sure atm
Damiano Testa (Aug 20 2024 at 09:40):
This is how it is used in assert_not_exists
:
elab "assert_not_exists " n:ident : command => do
let decl ← try liftCoreM <| realizeGlobalConstNoOverloadWithInfo n catch _ => return
let decl ← ((liftCoreM <| realizeGlobalConstNoOverloadWithInfo n) <|> return .anonymous)
if decl == .anonymous then
addDeclEntry true n.getId
else
[old code, not modifications]
Damiano Testa (Aug 20 2024 at 09:41):
Ah, when I was going to paste the modifications in assert_not_imported
, I realized that there are if ... then
not else
that I thought were ite
s!!
Damiano Testa (Aug 20 2024 at 09:42):
So, I think that I am accumulating all the times that if
applies, when I thought that I was accumulating just the else
s...
Mario Carneiro (Aug 20 2024 at 09:43):
I think I see the issue: your extension is a "fork bomb", it copies the entries from everything upstream and adds it to every file, resulting in exponential growth
Damiano Testa (Aug 20 2024 at 09:44):
I am happy to believe that it is not correctly implemented, but would this show up already with just 427 increments?
Mario Carneiro (Aug 20 2024 at 09:45):
exportEntriesFn
is only supposed to produce entries from the current file, which is why most env extensions use a pair to separate stuff from the imports from stuff to export later
Mario Carneiro (Aug 20 2024 at 09:45):
Even with only 1 entry, as long as the import depth is enough it will get copied and pasted exponentially many times
Damiano Testa (Aug 20 2024 at 09:46):
But then how is information passed down to all files?
Matthew Ballard (Aug 20 2024 at 09:46):
Wouldn't a docs#Lean.SimplePersistentEnvExtension be enough here?
Damiano Testa (Aug 20 2024 at 09:47):
Oh, you are saying that the past entries get added to one another if there is an "import diamond"?
Mario Carneiro (Aug 20 2024 at 09:47):
addImportedFn
takes the entries from all upstream files and adds them to the current state, then exportEntriesFn
records that as the state for the file, so if there is an entry in A and B imports A then it will copy A's entry, and if C imports B it will see two entries and copy those two, so if D imports C it will see 4 entries...
Matthew Ballard (Aug 20 2024 at 09:48):
I learned this when I set my addImportedFn
to map to the empty constant. A simple extension manages the latter for you
Damiano Testa (Aug 20 2024 at 09:49):
Ok, so exportedEntriesFn
should return the "difference" of what the file started with and what it ends up with (implemented probably as adding a Bool
field, rather than what I said).
Damiano Testa (Aug 20 2024 at 09:50):
Matt, thanks! I will use a SimplePersistentEnvExtension
. Honestly, it looked correct, but since there was no documentation for it, I thought that I would use the one that had some docs...
Damiano Testa (Aug 20 2024 at 09:50):
Also, thank you both for your help! I now understand the issue and I think that I can get along with a solution!
Mario Carneiro (Aug 20 2024 at 09:51):
The usual way it's implemented is that the state is a pair of some state for lookups combining everything so far (e.g. a hashmap of entries), and a List
of entries that have been declared in the current file. The initial state from addImportedFn
combines all the upstream stuff to produce the hashmap and initializes the list to []
, and exportEntriesFn
ignores the hashmap and outputs the list as an array
Mario Carneiro (Aug 20 2024 at 09:52):
I believe that's exactly what SimplePersistentEnvExtension
does, so if you don't need to modify that structure then you can use it directly
Damiano Testa (Aug 20 2024 at 09:55):
Ok, great! I do not need to modify anything: I simply want to collect the information that some module/declaration was asserted not to exist somewhere and make sure that, once Mathlib.lean
has been build, that the enviroment eventually contains the non-existing things.
Damiano Testa (Aug 20 2024 at 09:56):
So, simply accumulating all the "assertions" (even with repetitions, given the total numbers) is good enough.
Damiano Testa (Aug 20 2024 at 09:58):
Although, while I am at it, I may as well store the "assertions" in a HashSet
, rather than an Array
.
Matthew Ballard (Aug 20 2024 at 09:59):
I think that's common practice but may not make much of a difference here
Damiano Testa (Aug 20 2024 at 10:00):
Since converting Array
s into HashSet
s is completely straightforward, I might as well do it.
Matthew Ballard (Aug 20 2024 at 10:00):
For reference, here is the initialization of the attribute extension in core
builtin_initialize attributeExtension : AttributeExtension ←
registerPersistentEnvExtension {
mkInitial := AttributeExtension.mkInitial
addImportedFn := AttributeExtension.addImported
addEntryFn := addAttrEntry
exportEntriesFn := fun s => s.newEntries.reverse.toArray
statsFn := fun s => format "number of local entries: " ++ format s.newEntries.length
}
Damiano Testa (Aug 20 2024 at 10:03):
Does this look correct?
initialize assertExistsExt :
SimplePersistentEnvExtension AssertExists (HashSet AssertExists) ←
registerSimplePersistentEnvExtension {
addImportedFn := fun as => as.foldl HashSet.insertMany {}
addEntryFn := .insert
}
Damiano Testa (Aug 20 2024 at 10:11):
At least, with the change above, the time to build just the first "broken" file (with the imports already built) is very short:
$ time lake build Mathlib.Order.PrimeSeparator
Build completed successfully.
real 0m1,990s
user 0m1,502s
sys 0m0,352s
Damiano Testa (Aug 20 2024 at 12:48):
CI at least was able to build mathlib with the definition above. It also checked that all declarations and imports currently checked to not exist somewhere are in fact eventually present! See the check assertions
tab in this CI run.
Damiano Testa (Aug 20 2024 at 12:48):
I will prepare a cleaned up PR and will make sure to !bench
it, to see how it performs.
Last updated: May 02 2025 at 03:31 UTC