Zulip Chat Archive

Stream: lean4

Topic: difficulties with `Frontend`


Scott Morrison (Aug 09 2023 at 04:02):

I am trying to use Lean.Elab.Frontend, and running into difficulties with persistent extensions. I'm not sure if I'm "doing it wrong", or what I want isn't feasible.

To begin, I define a

compileModule (module : Name) (initializers := true) : IO (Environment × List Message × List InfoTree)

function which is intended to load the source for a specified module, and compile it, return the resulting Environment and any Messages and InfoTrees produced.

I also pass initializers, which can control whether enableInitializersExecution will be run.

Scott Morrison (Aug 09 2023 at 04:02):

Here's the implementation:

-- `Frontend/Compile.lean`
import Lean.Elab.Frontend

open Lean Elab System

/--
Wrapper for `IO.processCommands` that enables info states, and returns
* the new environment
* messages
* info trees
-/
def processCommandsWithInfoTrees
    (inputCtx : Parser.InputContext) (parserState : Parser.ModuleParserState)
    (commandState : Command.State) : IO (Environment × List Message × List InfoTree) := do
  let commandState := { commandState with infoState.enabled := true }
  let s  IO.processCommands inputCtx parserState commandState <&> Frontend.State.commandState
  pure (s.env, s.messages.msgs.toList, s.infoState.trees.toList)

/--
Process some text input, with or without an existing environment.
If there is no existing environment, we parse the input for headers (e.g. import statements),
and create a new environment.
Otherwise, we add to the existing environment.
Returns the resulting environment, along with a list of messages and info trees.
-/
unsafe def processInput (input : String) (initializers := true) (env? : Option Environment)
    (opts : Options) (fileName : Option String := none) :
    IO (Environment × List Message × List InfoTree) := do
  let fileName   := fileName.getD "<input>"
  let inputCtx   := Parser.mkInputContext input fileName
  let (parserState, commandState)  match env? with
  | none => do
    if initializers then enableInitializersExecution
    let (header, parserState, messages)  Parser.parseHeader inputCtx
    let (env, messages)  processHeader header opts messages inputCtx
    pure (parserState, (Command.mkState env messages opts))
  | some env => do
    pure ({ : Parser.ModuleParserState }, Command.mkState env {} opts)
  processCommandsWithInfoTrees inputCtx parserState commandState

def findLean (mod : Name) : IO FilePath := do
  return FilePath.mk (( findOLean mod).toString.replace "build/lib/" "") |>.withExtension "lean"

/-- Read the source code of the named module. -/
def moduleSource (mod : Name) : IO String := do
  IO.FS.readFile ( findLean mod)

unsafe def compileModule (mod : Name) (initializers := true) : IO (Environment × List Message × List InfoTree) := do
  processInput ( moduleSource mod) initializers none {}

Scott Morrison (Aug 09 2023 at 04:04):

Now, suppose I build a dummy PersistentEnvExtension, e.g. as:

-- `Frontend/DummyExt.lean`
import Lean.Environment

open Lean

initialize dummyExt :
    PersistentEnvExtension Unit Unit Unit 
  registerPersistentEnvExtension {
    mkInitial := pure ()
    addImportedFn := fun _ => pure ()
    addEntryFn := fun _ _ => ()
    exportEntriesFn := fun _ => #[]
  }

Scott Morrison (Aug 09 2023 at 04:05):

and create a file that imports it:

-- `Frontend/Main.lean`
import Frontend.DummyExt

def f := 37

Scott Morrison (Aug 09 2023 at 04:06):

Now I try to compile it:

-- `Frontend/Test1.lean`
import Frontend.Compile

open Lean

#eval show MetaM _ from do
  let (_, msgs, _)  compileModule `Frontend.Main
  for m in msgs do dbg_trace  m.data.format
  return msgs.length

and all is well: no messages are printed.

Scott Morrison (Aug 09 2023 at 04:08):

However I now try to compile it from a file that also depends on Frontend.DummyExt:

-- `Frontend/Test2/lean`
import Frontend.Compile
import Frontend.DummyExt

open Lean

#eval show MetaM _ from do
  let (_, msgs, _)  compileModule `Frontend.Main
  for m in msgs do dbg_trace  m.data.format
  return msgs.length

and I get the error message from compilation:

invalid environment extension, 'dummyExt' has already been used`

Scott Morrison (Aug 09 2023 at 04:11):

Is this feasible? I want to be able to compile other files, without having to worry about what I've already imported myself.

Mac Malone (Aug 09 2023 at 04:11):

@Scott Morrison First thing, I would advise defining the following to avoid marking meaningful functions as unsafe:

@[implemented_by enableInitializersExecution]
opaque enableInitializersExecutionWrapper : IO Unit

Scott Morrison (Aug 09 2023 at 04:12):

(In my real use case I just use unsafe from Std.Util.TermUnsafe.)

Scott Morrison (Aug 09 2023 at 04:13):

For completeness, I'll note that this test does work fine if I don't run enableInitializersExecution:

--- `Frontend/Test3.lean`
import Frontend.Compile
import Frontend.DummyExt

open Lean

#eval show MetaM _ from do
  let (_, msgs, _)  compileModule `Frontend.Main (initializers := false)
  for m in msgs do dbg_trace  m.data.format
  return msgs.length

Scott Morrison (Aug 09 2023 at 04:15):

However I'm pretty sure that's not the right approach? Certainly it prevents me from compiling interesting files. One example is:

--- `Frontend/Test4.lean`
-- Requires Mathlib on the path:
import Frontend.Compile

open Lean

#eval show MetaM _ from do
  let (_, msgs, _)  compileModule `Mathlib.Init.ZeroOne (initializers := false)
  for m in msgs do dbg_trace  m.data.format
  return msgs.length

which will crash the Lean server with:

libc++abi: terminating with uncaught exception of type lean::exception: cannot evaluate `[init]` declaration 'Mathlib.Prelude.Rename.linter.uppercaseLean3' in the same module

Mac Malone (Aug 09 2023 at 04:15):

@Scott Morrison Do you need to run initializers? Generally, frontends that are not producing oleans to import (or oleans with initializers) do not need it.

Scott Morrison (Aug 09 2023 at 04:16):

(I am a bit confused there, in any case: compiling either of the dependencies of Mathlib.Init.ZeroOne works fine.)

Scott Morrison (Aug 09 2023 at 04:17):

Certainly I want to be able to run this on any single file from Mathlib (retrieving the resulting Environment and InfoTrees).

Mac Malone (Aug 09 2023 at 04:19):

@Scott Morrison does running your frontend from the command line (via lake env lean) work?

Mac Malone (Aug 09 2023 at 04:20):

Regardless, I am bit concerned where that linter @[init] is coming from. It seems to me that this may be some oddity (perhaps a bug) with how the linter works (or perhaps to to_additive?)

Scott Morrison (Aug 09 2023 at 04:22):

Running lake env lean Frontend/Test4.lean gives the same error:

cannot evaluate `[init]` declaration 'Mathlib.Prelude.Rename.linter.uppercaseLean3' in the same module

Mac Malone (Aug 09 2023 at 04:23):

I was expecting it to, just wanted to make sure. :thumbs_up:

Mac Malone (Aug 09 2023 at 04:24):

Assuming you do, in fact, need to initialize the extension twice (i.e., use initializers := true). You can do what I do in Alloy's extensions and create a findOrRegisterPersistentExtension helper.

Scott Morrison (Aug 09 2023 at 04:26):

Interesting. It seems though that we would have to use it everywhere, which feels like it might be a workaround for a deeper issue.

Mac Malone (Aug 09 2023 at 04:27):

@Scott Morrison no, this is expected when using a persistent extension that is both part of imported compiled module and used by the frontend (and is not builtin).

Mac Malone (Aug 09 2023 at 04:29):

Both the imported module and the frontend wish to initialize the extension, so you have to tell it to pick one.

Scott Morrison (Aug 09 2023 at 04:30):

But then shouldn't every persistent extension be registered this way? i.e. this findOrRegisterPersistentExtension should be moved to core, possibly replacingregisterPersistentExtension?

Mac Malone (Aug 09 2023 at 04:30):

The source of this is that persistentEnvExtensionsRef is a static ref for all of lean and not part of the individual environment.

Mac Malone (Aug 09 2023 at 04:31):

@Scott Morrison In core, builtin_initialize avoids this.

Scott Morrison (Aug 09 2023 at 04:31):

Okay, but what is everyone else meant to be doing?

Mac Malone (Aug 09 2023 at 04:33):

@Scott Morrison I am not sure custom frontends are a well-specified feature of Lean at the moment (so I doubt anything is currently meant). Recall that custom user-defined extensions did not even exist when Lake was first developed.

Scott Morrison (Aug 09 2023 at 04:35):

In any case, is your suggestion:

  • I should not be using enableInitializersExecution for my use case, and instead
  • I should try to minimize the crash when compiling Mathlib.Init.ZeroOne, and see if that reveals something?

Mac Malone (Aug 09 2023 at 04:35):

Originally, the idea was that end users would also use builtin_initialize and add extensions via ---plugin. However, that has yet to come to pass due to the fact that custom frontends themselves cannot support plugins (though they can utilize them). That is, a custom frontend can be built with a plugin, but it cannot build things with plugins.

Mac Malone (Aug 09 2023 at 04:39):

@Scott Morrison Yes. I think your use case should not need enableInitializersExecution. However, I would like you to try compile and running your custom frontend as executable. I may be mixing up the rules for executables with interpreted contexts.

Scott Morrison (Aug 09 2023 at 04:57):

Ahha! When compiled, I do not get the invalid environment extension, 'dummyExt' has already been used error message, even when running with enableInitializersExecution, and import Frontend.DummyExt.

Scott Morrison (Aug 09 2023 at 05:07):

I can also compile things from all over Mathlib successfully, extracting InfoTrees!

It would certainly be nice to be able to do this interpreted, because I want to write code that manipulates these InfoTrees in complicated ways and it's easiest to do that in a live session in the editor.

Scott Morrison (Aug 09 2023 at 05:20):

@Mac Malone, I'm surprised that your findOrRegisterPersistentExtension is not implemented as:

unsafe def findOrRegisterPersistentEnvExtensionUnsafe {α β σ : Type} [Inhabited σ] (descr : PersistentEnvExtensionDescr α β σ) : IO (PersistentEnvExtension α β σ) := do
  let pExts  persistentEnvExtensionsRef.get
  match pExts.find? (fun ext => ext.name == descr.name) with
  | some ext => return unsafeCast ext
  | none =>
  let ext  registerEnvExtension do
    let initial  descr.mkInitial
    let s : PersistentEnvExtensionState α σ := {
      importedEntries := #[],
      state           := initial
    }
    pure s
  let pExt : PersistentEnvExtension α β σ := {
    toEnvExtension  := ext,
    name            := descr.name,
    addImportedFn   := descr.addImportedFn,
    addEntryFn      := descr.addEntryFn,
    exportEntriesFn := descr.exportEntriesFn,
    statsFn         := descr.statsFn
  }
  persistentEnvExtensionsRef.modify fun pExts => pExts.push (unsafeCast pExt)
  return pExt

@[implemented_by findOrRegisterPersistentEnvExtensionUnsafe]
opaque findOrRegisterPersistentEnvExtension {α β σ : Type} [Inhabited σ] (descr : PersistentEnvExtensionDescr α β σ) : IO (PersistentEnvExtension α β σ)

i.e. having the same signature as registerPersistentEnvExtension.

Scott Morrison (Aug 09 2023 at 05:21):

In any case, using this one, if I just replace my

initialize dummyExt :
    PersistentEnvExtension Unit Unit Unit 
  registerPersistentEnvExtension { ... }

with

initialize dummyExt :
    PersistentEnvExtension Unit Unit Unit 
  findOrRegisterPersistentEnvExtension { ... }

then the interpreted version of my examples above all start working.

Scott Morrison (Aug 09 2023 at 05:22):

I'm not really sure what to do with this observation, however. Should I try to persuade Std and Mathlib that they should be using findOrRegister... rather than register...?

Scott Morrison (Aug 09 2023 at 05:22):

I might try pinging @Mario Carneiro at this point (sorry it is mid-thread!) to see what they think.

Mario Carneiro (Aug 09 2023 at 05:27):

Loading lean files from within lean is basically unsupported because of the way initializers work

Mario Carneiro (Aug 09 2023 at 05:27):

honestly it seems like a design mistake to me

Scott Morrison (Aug 09 2023 at 05:28):

(you mean, the current way initializers work?)

Scott Morrison (Aug 09 2023 at 05:28):

(or: trying to load lean files from within lean?)

Mario Carneiro (Aug 09 2023 at 05:28):

yes, you basically have to load every lean file in its own process

Mario Carneiro (Aug 09 2023 at 05:29):

the reason is because initializer state is a global

Mario Carneiro (Aug 09 2023 at 05:29):

and the set of environment extensions is part of that

Mario Carneiro (Aug 09 2023 at 05:30):

this is the same reason why it's impossible to interact with an environment extension defined in the current file without crashing lean

Mario Carneiro (Aug 09 2023 at 05:34):

That's not to say that you can't hack around it though: the affected statics are basically append-only arrays so it's possible that something like Mac's findOrRegisterPersistentExtension will work

Mario Carneiro (Aug 09 2023 at 05:35):

I think enableInitializersExecution should already be on for an interactive lean session

Mario Carneiro (Aug 09 2023 at 05:36):

you usually only need to call it if you are calling lean code from main

Scott Morrison (Aug 09 2023 at 05:36):

It definitely has an effect here: see the difference between Test2 and Test3 above.

Mario Carneiro (Aug 09 2023 at 05:40):

oh, looks like the flag is reset to false after processing the imports (of Test2.lean itself)

Mario Carneiro (Aug 09 2023 at 05:40):

so you indeed have to call it again to turn it back on for the imports of the target module

Mario Carneiro (Aug 09 2023 at 05:48):

on further review I think findOrRegisterPersistentExtension is a solution at the wrong level: it is trying to handle the case of multiple initialization of a module but modules already track whether they have been initialized and skip repeat initialization

Scott Morrison (Aug 09 2023 at 05:52):

Okay, so my summary of the situation is as follows:

  • initializer state is global, so
  • if you either run compileModule from an interpreted session (where initializers have already been initialized), or if you run compileModule twice on different files from any session, global state will clash in interesting and unfun ways.
  • but in a compiled session, you can turn on enableInitializerExecution and then get a successful compile of one other file.

Scott Morrison (Aug 09 2023 at 05:52):

That's not what I was hoping for, but if that is correct the situation is probably workable, and at least no longer confusing.

Mario Carneiro (Aug 09 2023 at 05:59):

there is some internal state to prevent multiple import from causing multiple initialization of the same files; I think it should be possible to hook into that to avoid this issue (lean modifications will be required though)

Mario Carneiro (Aug 09 2023 at 07:36):

woohoo, I have a working patch for this issue: lean4#2398

Mario Carneiro (Aug 09 2023 at 07:37):

this now compiles error-free:

#eval show MetaM _ from do
  enableReinitialization ( getEnv)
  let (_, msgs, _)  compileModule `Frontend.Main
  for m in msgs do dbg_trace  m.data.format
  return msgs.length

Scott Morrison (Aug 09 2023 at 10:16):

This is working well! I can run compileModule all over Mathlib, in the same session, without initialization errors.

Scott Morrison (Aug 09 2023 at 10:16):

If you'd like I can make a self-contained test to add to your PR?

Mario Carneiro (Aug 09 2023 at 10:21):

Please do!

Mac Malone (Aug 09 2023 at 17:09):

Scott Morrison said:

Mac Malone, I'm surprised that your findOrRegisterPersistentExtension is not implemented as:

There was no deep though beyond it. :laughing: I just needed to get Alloy to work and went with the first thing that worked and seemed reasonably principled (even if the not the most optimal).

Adam Topaz (Aug 11 2023 at 02:50):

I've also been fighting similar issues recently. Here's something curious...

I have a copy of mathlib4 with an additional entry in the lakefile:

lean_exe compile where
  root := `Compile.Main
  supportInterpreter := true

and, obviously, a file called Compile/Main.lean.

First, suppose that the following is the contents of this file:

import Lean

open Lean Elab Parser System

def compile (module : Name) : IO Unit := do
  let contents  IO.FS.readFile (modToFilePath "." module "lean")
  let inputCtx : InputContext := mkInputContext contents "<input>"
  let header, parserState, messages  parseHeader inputCtx
  if messages.toList.length != 0 then
    for m in messages.toList do
      IO.println <|  m.toString
  let env, messages  processHeader header {} messages inputCtx
  if messages.toList.length != 0 then
    for m in messages.toList do
      IO.println <|  m.toString
  let cmdState : Command.State :=
    { Command.mkState env messages {} with infoState.enabled := true }
  let _  IO.processCommands inputCtx parserState cmdState
  IO.println s!"Success! {module}"
  return

def modules : IO (Array Name) := do
  let out  IO.FS.lines "Mathlib.lean"
  let out := out.map fun i => i.splitOn[1]!
  let out := out.map fun i => i.toName
  return out

unsafe def main (arg : List String) : IO Unit := do
  let arg := arg[0]!.toName
  enableInitializersExecution
  initSearchPath ( findSysroot)
  compile arg

In this case, lake exe compile Mathlib.Algebra.Abs succeeds with Success! Mathlib.Algebra.Abs.
A similar result happens with Mathlib.Algebra.AddTorsor, which is the second import appearing in our current Mathlib.lean file.

Now, consider the following variant of the main function above:

unsafe def main : IO Unit := do
  enableInitializersExecution
  initSearchPath ( findSysroot)
  let modules  modules
  for module in modules do
    compile module

This fails, with the following

Success! Mathlib.Algebra.Abs
libc++abi: terminating due to uncaught exception of type lean::exception: cannot evaluate `[init]` declaration 'Mathlib.Tactic.reflExt' in the same module

suggesting it failed on Mathlib.Algebra.AddTorsor.

So, basically I'm very confused. I tried a few other files from mathlib4 with the successful variant, and they all worked fine. What am I doing wrong?

Scott Morrison (Aug 11 2023 at 04:29):

Haha, looks we have been doing almost identical things. :-) I am trying to write something up that describes every possible variation here.

Scott Morrison (Aug 11 2023 at 04:35):

For me, these experiments give the same results whether or not I specify supportInterpreter := true in the lakefile. @Adam Topaz, could you check if that is the same for you?

Scott Morrison (Aug 11 2023 at 08:03):

@Adam Topaz, my summary is at lean4#2408, with a (failing) test case at lean4#2407, and @Mario Carneiro's relevant PR, updated to including the (succeeding) test case, is at lean4#2398.

Adam Topaz (Aug 11 2023 at 11:46):

In the single-file example without supportInterpreter, it seems to compile, but I get an error:

<input>:6:0: error: could not find native implementation of external declaration 'IO.getRandomBytes' (symbols 'l_IO_getRandomBytes___boxed' or 'l_IO_getRandomBytes')

Adam Topaz (Aug 11 2023 at 11:48):

Same with the for loop, and in that case the error pops up for each file.

Adam Topaz (Aug 11 2023 at 11:48):

But the original error doesn't appear in this case.

Adam Topaz (Aug 11 2023 at 11:49):

Strange. I still don't really understand what's going on internally...

Scott Morrison (Aug 11 2023 at 12:04):

I don't understand at all when supportInterpreter := true is needed. I don't seem to need it in any of my tests locally, but it was needed in the test case lean4#2407?

Sebastian Ullrich (Aug 11 2023 at 12:08):

I believe this option does not have any effect on non-Linux (even on macOS where it is at least accepted by the linker)

Sebastian Ullrich (Aug 11 2023 at 12:11):

@Mac Malone Perhaps true would be a better default for it then, the Linux binary size overhead is probably insignificant compared to the footgun size

Adam Topaz (Aug 11 2023 at 12:38):

FWIW, my main use case for such things right now is in the following file:
https://github.com/adamtopaz/lean_grader/blob/master/Main.lean

I'm setting up something for automatic grading in github classroom. Right now I'm only checking things in a single file whose name is hardcoded, but it would be nice to generalize and allow multiple files.


Last updated: Dec 20 2023 at 11:08 UTC