Zulip Chat Archive

Stream: Is there code for X?

Topic: Restore information from olean files


Yicheng Tao (Nov 14 2024 at 06:26):

I wonder if there is a way to restore information from the olean files. For example, a map from module to constants defined in that module. I found PersistentEnvExtension that may be relevant, but I don't know how to work with it. Is there any tutorials for such kind of problems?

Kim Morrison (Nov 14 2024 at 08:35):

Essentially everything is stored in the Environment. You can use withImportModules to start a session with a specified set of oleans loaded.

Kim Morrison (Nov 14 2024 at 08:36):

Searching for withImportModules in either Mathlib, Batteries, or lean-training-data should give you some examples to work off.

Yicheng Tao (Nov 14 2024 at 08:59):

Yes, for now I was struggling with Environment. The first thing I want to do is to extract the theorems declared in a module. Taking piece of code from LeanDojo's ExtractData.lean, I have the following function:

import Lean
import Lake


open Lean Elab System

set_option maxHeartbeats 2000000  -- 10x the default maxHeartbeats.

unsafe def processFile (path : FilePath) : IO Unit := do
  println! path
  let input  IO.FS.readFile path
  enableInitializersExecution
  let inputCtx := Parser.mkInputContext input path.toString
  let (header, parserState, messages)  Parser.parseHeader inputCtx
  let (env, messages)  processHeader header {} messages inputCtx
  if messages.hasErrors then
    for msg in messages.toList do
      if msg.severity == .error then
        println! "ERROR: {← msg.toString}"
    throw $ IO.userError "Errors during import; aborting"
  let env := env.setMainModule ( moduleNameOfFileName path none)
  let commandState := { Command.mkState env messages {} with infoState.enabled := true }
  let s  IO.processCommands inputCtx parserState commandState
  let env' := s.commandState.env
  let constmap := env'.constants.map₂.toList
  println! s!"{constmap.length}"
  for (name, cinfo) in constmap do
    -- println! s!"{name}"
    match cinfo with
    | .thmInfo val =>
        println! s!"{val.name}"
    | _ => pure ()

unsafe def main (args : List String) : IO Unit := do
  match args with
  | [file]=> processFile file
  | _ => throw $ IO.userError "Invalid arguments"

It works, but it works too much. I randomly choose a file from mathlib .lake/packages/mathlib/Mathlib/Algebra/CubicDiscriminant.lean and run the script. The outputs contain some unexpected items like «.lake».packages.mathlib.Mathlib.Algebra.CubicDiscriminant._auxLemma.3. I only want theorems that explicitly declared using the keyword theorem or lemma. How to filter these items out?

I noticed both LeanDojo and lean-training-data analyzed the InfoTree to get these data. Is that necessary? I'm also reading code from doc-gen4. If everything can be done by Environment, that would be best for me.


Last updated: May 02 2025 at 03:31 UTC