Zulip Chat Archive

Stream: lean4

Topic: Producing the syntax of a file


Damiano Testa (Sep 21 2023 at 14:44):

Dear All,

I understand very little of how Syntax works, so this question may be very confused -- I am trying to learn about it.

If I create a .lean file and I want to extract from it the parsed syntax, how can I do that? I found some functions in Lean.Parser.Types (e.g. parseHeader and parseCommand), but I imagine that there should be something that already takes a file path and directly returns the syntax object associated to it, right? Can anyone point me in the right direction?

Thanks!

Matthew Ballard (Sep 21 2023 at 14:53):

Is it hiding in the info tree?

Damiano Testa (Sep 21 2023 at 14:56):

Matt, I know so little that I am not really sure how to use your hint...

Matthew Ballard (Sep 21 2023 at 14:57):

I reached the edge of my knowledge with that comment. But I know where to look next for an example: std linters (or maybe core linters...)

Damiano Testa (Sep 21 2023 at 14:57):

Ok, I am looking at docs#Lean.Elab.InfoTree !

Damiano Testa (Sep 21 2023 at 15:04):

Ok, looking at InfoTree, it seems to store Syntax inside the Info nodes -- this is good progress!

I'm moving on to the std linters now to get hints on how to produce InfoTrees!

Matthew Ballard (Sep 21 2023 at 15:04):

I edited the message above because it might be core linters I am thinking about

Damiano Testa (Sep 21 2023 at 15:10):

docs#Lean.Elab.Command.State seems to store InfoTrees, so maybe in the end in the right monad it is simply let infotree ← get... I find this so hard!

Damiano Testa (Sep 21 2023 at 15:14):

So, this is probably very roundabout, but does it seem ok?

Use docs#Lean.Elab.runFrontend to get the environment, then use docs#Lean.Elab.Command.mkState to get the Lean.Elab.Command.State and from there tease out infostate and trees as needed.

Damiano Testa (Sep 21 2023 at 15:14):

I'm going to try and see if I get to the bottom of this before giving up for the day!

Damiano Testa (Sep 21 2023 at 15:44):

I have created a file called A.lean containing only

def A : Nat := 0

Then, in a separate file, I run

import Std

open Lean Elab  Command

def myTrees : IO Lean.Elab.Command.State := do
  let (env, _)  runFrontend "Mathlib/A.lean" .empty "Mathlib/A.lean" (.str (.str .anonymous "Mathlib") "A")
  let state := mkState env
  let trees := state.infoState.trees
  dbg_trace f!"\nNumber of trees: {trees.size}"
  return state

#eval do
  let _  myTrees
/-
Mathlib/A.lean:1:0: error: unexpected identifier; expected command

Number of trees: 0
-/

What am I doing wrong? Why are there 3 inputs to runFrontend that all feel like they should be the same file name?

Matthew Ballard (Sep 21 2023 at 15:45):

What is .empty?

Damiano Testa (Sep 21 2023 at 15:45):

Options. I did not know what to write there...

Damiano Testa (Sep 21 2023 at 15:46):

This is what runFrontend wants:

@[export lean_run_frontend]
def runFrontend
    (input : String)
    (opts : Options)
    (fileName : String)
    (mainModuleName : Name)
    (trustLevel : UInt32 := 0)
    (ileanFileName? : Option String := none)
    : IO (Environment × Bool) := do

Damiano Testa (Sep 21 2023 at 15:46):

Ah, maybe input is what I read from the file!

Damiano Testa (Sep 21 2023 at 15:46):

I'll try IO.FS.readFile in that field!

Damiano Testa (Sep 21 2023 at 15:47):

Ok, the error is gone, but there are still no trees...

Damiano Testa (Sep 21 2023 at 16:27):

I don't understand what is going on.

Lean seems to react to the content of the file and produces errors, if there are some. However, it produces no infoTrees.

Damiano Testa (Sep 21 2023 at 16:47):

Got it! I changed runFrontend to return the actual State, rather than the Environment.

A posteriori, everything is obvious: the Environment discards the Syntax information, since it has completed its job. Returning the State that it was in when it created the Environment remembers it!

Damiano Testa (Sep 21 2023 at 17:08):

For future reference, this is what I have and I can leave happy for the rest of the day! :smile:

import Std
import Mathlib.Tactic.RunCmd

open Lean Elab Tactic Command

/--  Copied from `Lean.Elab.runFrontend`, the only difference is in the last line: the original has
`pure (s.commandState.env, !s.commandState.messages.hasErrors)`
instead of `pure s.commandState`. -/
@[export lean_run_frontend]
def my_runFrontend
    (input : String)
    (opts : Options)
    (fileName : String)
    (mainModuleName : Name)
    (trustLevel : UInt32 := 0)
    (ileanFileName? : Option String := none)
    : IO Command.State := do
  let inputCtx := Parser.mkInputContext input fileName
  let (header, parserState, messages)  Parser.parseHeader inputCtx
  let (env, messages)  processHeader header opts messages inputCtx trustLevel
  let env := env.setMainModule mainModuleName
  let mut commandState := Command.mkState env messages opts

  if ileanFileName?.isSome then
    -- Collect InfoTrees so we can later extract and export their info to the ilean file
    commandState := { commandState with infoState.enabled := true }

  let s  IO.processCommands inputCtx parserState commandState
  for msg in s.commandState.messages.toList do
    IO.print ( msg.toString (includeEndPos := getPrintMessageEndPos opts))

  if let some ileanFileName := ileanFileName? then
    let trees := s.commandState.infoState.trees.toArray
    let references := Lean.Server.findModuleRefs inputCtx.fileMap trees (localVars := false)
    let ilean := { module := mainModuleName, references : Lean.Server.Ilean }
    IO.FS.writeFile ileanFileName $ Json.compress $ toJson ilean

  pure s.commandState

example : True := by
  run_tac do
    let state  my_runFrontend ("#eval \"hello!\"") ( getOptions) "whatFileIsThis?" `whatNameIsThis?
    dbg_trace f!"\nNumber of trees: {state.infoState.trees.size}"
  trivial
/-
"hello!"

Number of trees: 2

-/

Scott Morrison (Sep 22 2023 at 01:39):

@Damiano Testa, please look at https://github.com/semorrison/lean-training-data/blob/master/TrainingData/Frontend.lean which gives a access to everything you might want from the frontend. :-) I would like to polish this and put it somewhere more findable, but hopefully you can extract something you like from it already.

Damiano Testa (Sep 22 2023 at 03:06):

Wow, thanks Scott!

I am not going to be able to look at it before next week, but scanning the file it looks like it is extracting just the information that I wanted!

Scott Morrison (Sep 22 2023 at 03:14):

There is an off-by-one error still in the "source for each declaration" field, which I haven't fixed. Patches welcome!

Damiano Testa (Sep 22 2023 at 03:17):

Is that related to the fact that the .ileans also count from zero the number of lines? Or are you referring to something else? (I only looked at the code from my mobile, so my understanding is very shallow.)

Scott Morrison (Sep 22 2023 at 04:31):

Sorry, I haven't looked at it for a while. It should be easy to track down, it just an off-by-one error in the line counting, but I forgot to track it down.

Sebastian Ullrich (Sep 22 2023 at 07:00):

(not even all the editors agree whether the first line is 0 or 1, if it's about that :) )

Damiano Testa (Sep 22 2023 at 07:33):

Ok, that's not a big deal: my internal convention is to start counting from 1, but I'm used to the unnatural convention of the natural numbers including 0.

Damiano Testa (Sep 22 2023 at 07:34):

Anyway, I'll look at it, but it may take me a while before I have some serious time to do it.


Last updated: Dec 20 2023 at 11:08 UTC