Zulip Chat Archive

Stream: general

Topic: using `PersistentEnvExtension`s


Damiano Testa (Aug 20 2024 at 09:24):

I am learning about PersistentEnvExtension, but there is something that I do not understand.

I created a new file that contains only the definition of a new structure and initializes the extension:

import Mathlib.Init
import Lean.Elab.Command

open Lean Elab Meta Command

namespace Mathlib.AssertNotExist
structure AssertExists where
  isDecl : Bool
  givenName : Name
  modName : Name

/-- Defines the `assertExistsExt` extension for adding an `Array` of `AssertExists`s
to the environment. -/
initialize assertExistsExt : PersistentEnvExtension AssertExists AssertExists (Array AssertExists) 
  registerPersistentEnvExtension {
    mkInitial := pure {}
    addImportedFn := (return .flatten ·)
    addEntryFn := .push
    exportEntriesFn := id
  }

end Mathlib.AssertNotExist

In a separate file, I define a function that modifies the environment, by adding an entry to the extension:

def addDeclEntry (isDecl : Bool) (declName : Name) : CommandElabM Unit := do
  let modName  getMainModule
  modifyEnv fun env =>
    assertExistsExt.addEntry env
      { isDecl := isDecl, givenName := declName, modName := modName }

This addDeclEntry is then used in some places in Mathlib. I pushed this to #15974 and CI fails, e.g.:

  [1521/4982] Building Mathlib.Order.PrimeSeparator
trace: .> LEAN_PATH=././.lake/packages/batteries/.lake/build/lib:././.lake/packages/Qq/.lake/build/lib:././.lake/packages/aesop/.lake/build/lib:././.lake/packages/proofwidgets/.lake/build/lib:././.lake/packages/Cli/.lake/build/lib:././.lake/packages/importGraph/.lake/build/lib:././.lake/build/lib LD_LIBRARY_PATH= /home/lean/.elan/toolchains/leanprover--lean4---v4.11.0-rc2/bin/lean -Dpp.unicode.fun=true -DautoImplicit=false -Dweak.linter.hashCommand=true -Dweak.linter.missingEnd=true -Dweak.linter.cdot=true -Dweak.linter.longLine=true -Dweak.linter.oldObtain=true -Dweak.linter.refine=true -Dweak.linter.setOption=true ././././Mathlib/Order/PrimeSeparator.lean -R ./././. -o ././.lake/build/lib/Mathlib/Order/PrimeSeparator.olean -i ././.lake/build/lib/Mathlib/Order/PrimeSeparator.ilean -c ././.lake/build/ir/Mathlib/Order/PrimeSeparator.c --json
error: Lean exited with code 137

If I tried locally lake build Mathlib.Order.PrimeSeparator it also seems to hang, but if I open the file in VSCode, everything works.

What am I doing wrong?

Damiano Testa (Aug 20 2024 at 09:27):

Actually, lake build just took a long time: it finished now:

$ lake build Mathlib.Order.PrimeSeparator
Build completed successfully.

(It took quite a bit of time, though, to build just this file, as I had already build the imported files.)

Matthew Ballard (Aug 20 2024 at 09:31):

It looks like you are writing a monstrous amount to the environment.

Damiano Testa (Aug 20 2024 at 09:32):

The function is only used to add an entry every thing that the assert_not_exists/assert_not_imported is used.

Damiano Testa (Aug 20 2024 at 09:32):

Is this really a lot?

Damiano Testa (Aug 20 2024 at 09:32):

There are 427 uses of either in all of Mathlib.

Damiano Testa (Aug 20 2024 at 09:33):

Though maybe I am misunderstanding how this works...

Matthew Ballard (Aug 20 2024 at 09:33):

No, that is not much at all in theory. But you must be capturing a lot more in implementation.

Matthew Ballard (Aug 20 2024 at 09:34):

It took my machine at 10x the normal amount of time to unpack the oleans with on the first 30% being present

Damiano Testa (Aug 20 2024 at 09:35):

Ok, so you are saying that if I implemented this right, it should work, but, for some reason, I am adding much more stuff to the environment than what I think?

Matthew Ballard (Aug 20 2024 at 09:35):

That is my guess yes

Damiano Testa (Aug 20 2024 at 09:35):

Is there some what to find out what it is that I am doing that is too much?

Matthew Ballard (Aug 20 2024 at 09:36):

Get the state after some asserts and see what it looks like

Damiano Testa (Aug 20 2024 at 09:36):

Ok, note that addDeclEntry is the only entry point for the modifications that I do.

Damiano Testa (Aug 20 2024 at 09:37):

This should narrow it down to

  • addDeclEntry adds too much;
  • addDeclEntry is used too much.

Matthew Ballard (Aug 20 2024 at 09:39):

It is probably in the modifications to assert_not_exists and assert_not_imported which call that but I haven't looked closely. addEntry looks reasonable to my eye but I can't be sure atm

Damiano Testa (Aug 20 2024 at 09:40):

This is how it is used in assert_not_exists:

elab "assert_not_exists " n:ident : command => do
  let decl  try liftCoreM <| realizeGlobalConstNoOverloadWithInfo n catch _ => return
  let decl  ((liftCoreM <| realizeGlobalConstNoOverloadWithInfo n) <|> return .anonymous)
  if decl == .anonymous then
    addDeclEntry true n.getId
  else
  [old code, not modifications]

Damiano Testa (Aug 20 2024 at 09:41):

Ah, when I was going to paste the modifications in assert_not_imported, I realized that there are if ... then not else that I thought were ites!!

Damiano Testa (Aug 20 2024 at 09:42):

So, I think that I am accumulating all the times that if applies, when I thought that I was accumulating just the elses...

Mario Carneiro (Aug 20 2024 at 09:43):

I think I see the issue: your extension is a "fork bomb", it copies the entries from everything upstream and adds it to every file, resulting in exponential growth

Damiano Testa (Aug 20 2024 at 09:44):

I am happy to believe that it is not correctly implemented, but would this show up already with just 427 increments?

Mario Carneiro (Aug 20 2024 at 09:45):

exportEntriesFn is only supposed to produce entries from the current file, which is why most env extensions use a pair to separate stuff from the imports from stuff to export later

Mario Carneiro (Aug 20 2024 at 09:45):

Even with only 1 entry, as long as the import depth is enough it will get copied and pasted exponentially many times

Damiano Testa (Aug 20 2024 at 09:46):

But then how is information passed down to all files?

Matthew Ballard (Aug 20 2024 at 09:46):

Wouldn't a docs#Lean.SimplePersistentEnvExtension be enough here?

Damiano Testa (Aug 20 2024 at 09:47):

Oh, you are saying that the past entries get added to one another if there is an "import diamond"?

Mario Carneiro (Aug 20 2024 at 09:47):

addImportedFn takes the entries from all upstream files and adds them to the current state, then exportEntriesFn records that as the state for the file, so if there is an entry in A and B imports A then it will copy A's entry, and if C imports B it will see two entries and copy those two, so if D imports C it will see 4 entries...

Matthew Ballard (Aug 20 2024 at 09:48):

I learned this when I set my addImportedFn to map to the empty constant. A simple extension manages the latter for you

Damiano Testa (Aug 20 2024 at 09:49):

Ok, so exportedEntriesFn should return the "difference" of what the file started with and what it ends up with (implemented probably as adding a Bool field, rather than what I said).

Damiano Testa (Aug 20 2024 at 09:50):

Matt, thanks! I will use a SimplePersistentEnvExtension. Honestly, it looked correct, but since there was no documentation for it, I thought that I would use the one that had some docs...

Damiano Testa (Aug 20 2024 at 09:50):

Also, thank you both for your help! I now understand the issue and I think that I can get along with a solution!

Mario Carneiro (Aug 20 2024 at 09:51):

The usual way it's implemented is that the state is a pair of some state for lookups combining everything so far (e.g. a hashmap of entries), and a List of entries that have been declared in the current file. The initial state from addImportedFn combines all the upstream stuff to produce the hashmap and initializes the list to [], and exportEntriesFn ignores the hashmap and outputs the list as an array

Mario Carneiro (Aug 20 2024 at 09:52):

I believe that's exactly what SimplePersistentEnvExtension does, so if you don't need to modify that structure then you can use it directly

Damiano Testa (Aug 20 2024 at 09:55):

Ok, great! I do not need to modify anything: I simply want to collect the information that some module/declaration was asserted not to exist somewhere and make sure that, once Mathlib.lean has been build, that the enviroment eventually contains the non-existing things.

Damiano Testa (Aug 20 2024 at 09:56):

So, simply accumulating all the "assertions" (even with repetitions, given the total numbers) is good enough.

Damiano Testa (Aug 20 2024 at 09:58):

Although, while I am at it, I may as well store the "assertions" in a HashSet, rather than an Array.

Matthew Ballard (Aug 20 2024 at 09:59):

I think that's common practice but may not make much of a difference here

Damiano Testa (Aug 20 2024 at 10:00):

Since converting Arrays into HashSets is completely straightforward, I might as well do it.

Matthew Ballard (Aug 20 2024 at 10:00):

For reference, here is the initialization of the attribute extension in core

builtin_initialize attributeExtension : AttributeExtension 
  registerPersistentEnvExtension {
    mkInitial       := AttributeExtension.mkInitial
    addImportedFn   := AttributeExtension.addImported
    addEntryFn      := addAttrEntry
    exportEntriesFn := fun s => s.newEntries.reverse.toArray
    statsFn         := fun s => format "number of local entries: " ++ format s.newEntries.length
  }

Damiano Testa (Aug 20 2024 at 10:03):

Does this look correct?

initialize assertExistsExt :
    SimplePersistentEnvExtension AssertExists (HashSet AssertExists) 
  registerSimplePersistentEnvExtension {
    addImportedFn := fun as => as.foldl HashSet.insertMany {}
    addEntryFn := .insert
  }

Damiano Testa (Aug 20 2024 at 10:11):

At least, with the change above, the time to build just the first "broken" file (with the imports already built) is very short:

$ time lake build Mathlib.Order.PrimeSeparator
Build completed successfully.

real    0m1,990s
user    0m1,502s
sys 0m0,352s

Damiano Testa (Aug 20 2024 at 12:48):

CI at least was able to build mathlib with the definition above. It also checked that all declarations and imports currently checked to not exist somewhere are in fact eventually present! See the check assertions tab in this CI run.

Damiano Testa (Aug 20 2024 at 12:48):

I will prepare a cleaned up PR and will make sure to !bench it, to see how it performs.


Last updated: May 02 2025 at 03:31 UTC