Zulip Chat Archive
Stream: lean4
Topic: Parsing expressions and types from file/string
Alex Nekrasov (Jun 26 2025 at 23:18):
Hello, everyone! I have a problem: I want to write a program which takes lean file and then changes all definitions, theorems and lemmas types, s.t. they contain all necessary information (for example in the code below double should have type double (n : Nat) : Nat), and not double (n : Nat) with unspecified output type). I tried using some code from lean repl and I was able to get Lean.Environment object. It contains all types of constants, but there are much more constants than there were defined in the file. The types there are also very low-level formatted. Here is the code:
import Lean.Elab.Frontend
set_option linter.unusedVariables false
open Lean Elab
def processCommandsWithInfoTrees
(inputCtx : Parser.InputContext) (parserState : Parser.ModuleParserState)
(commandState : Command.State) : IO (Command.State × List Message × List InfoTree) := do
let commandState := { commandState with infoState.enabled := true }
let s ← IO.processCommands inputCtx parserState commandState <&> Frontend.State.commandState
pure (s, s.messages.toList, s.infoState.trees.toList)
def processInput (input : String) (cmdState? : Option Command.State)
(opts : Options := {}) (fileName : Option String := none) :
IO (Command.State × Command.State × List Message × List InfoTree) := unsafe do
Lean.initSearchPath (← Lean.findSysroot)
enableInitializersExecution
let fileName := fileName.getD "<input>"
let inputCtx := Parser.mkInputContext input fileName
match cmdState? with
| none => do
let (header, parserState, messages) ← Parser.parseHeader inputCtx
let (env, messages) ← processHeader header opts messages inputCtx
let headerOnlyState := Command.mkState env messages opts
let (cmdState, messages, trees) ← processCommandsWithInfoTrees inputCtx parserState headerOnlyState
return (headerOnlyState, cmdState, messages, trees)
| some cmdStateBefore => do
let parserState : Parser.ModuleParserState := {}
let (cmdStateAfter, messages, trees) ← processCommandsWithInfoTrees inputCtx parserState cmdStateBefore
return (cmdStateBefore, cmdStateAfter, messages, trees)
def test (input : String) : IO Unit := do
IO.println input
IO.println "--------------------------------"
let (headerOnlyState, cmdState, messages, trees) ← processInput input none
let env : Environment := cmdState.env
let constants := List.take 5 env.constants.toList
IO.println constants.length
for constant in constants do
IO.println constant.1.toString
IO.println constant.2.type.dbgToString
IO.println constant.2.value?
IO.println "--------------------------------"
for message in messages do
let str ← message.toString
IO.println str
IO.println "--------------------------------"
return ()
def someString := "def double (n : Nat) := n + n\ntheorem double_two : double 2 = 4 := by\n rfl\n"
def main : IO Unit := do
test someString
#eval main
Because of this I have several questions. How to parse lean code into ast? How to find which functions were defined in given code? And how to reconstruct source code from ast? Is it even possible?
Eric Wieser (Jun 26 2025 at 23:48):
Alex Nekrasov said:
all necessary information (for example in the code below double should have type
double (n : Nat) : Nat), and notdouble (n : Nat)with unspecified output type).
What do you mean by "necessary"? The unspecified output was by definition not necessary if Lean was able to compile without it!
Alex Nekrasov (Jun 27 2025 at 00:00):
I think I meant not necessary, but complete: I want to know all the information about type variables that I can possibly get without needing to infer any types. I think that #check does exactly that, but I couldn't find where it is implemented. And one more example:
variable {A : Type}
def th1 (a : A) := a
#check th1
Here #check shows th1 {A : Type} (a : A) : A. This output reveals more detailed information about the type than the original definition.
Eric Wieser (Jun 27 2025 at 00:09):
What about for
def ohno := (0 : Fin 37)
#check ohno
set_option pp.all true
#check ohno
The second one is longer, so "more complete", right?
Alex Nekrasov (Jun 27 2025 at 00:13):
I think I'm more concerned about the fact that in th1 output type and used variables are omitted.
Last updated: Dec 20 2025 at 21:32 UTC