Zulip Chat Archive

Stream: lean4

Topic: parsing strings


Damiano Testa (Jan 11 2023 at 12:49):

Dear All,

if I have a string like this:

def code : String := "#eval IO.println \"Hello\""

is there a way to get Lean to parse the string and print in the Infoview "Hello" (and fail, in case the string is not valid lean code)?

Thanks!

Arthur Paulino (Jan 11 2023 at 12:59):

(deleted)

Hanting Zhang (Jan 11 2023 at 17:09):

AFAIK, there's no function that "parses" a string like this in the way you want it to be parsed. (And how would it locally know what IO.println is, anyway?) The best you can do is parse it as an entire Lean file with Lean.Elab.runFrontend

Henrik Böving (Jan 11 2023 at 21:05):

Well it is a single command so there should be a parser and an elaborator for the resulting syntax object somewhere (that you probably have to feed with an envrionment etc as well) the questions are mostly do you actually want to do this.

Damiano Testa (Jan 11 2023 at 21:24):

Ok, thank you for your feedback!

I was hoping that this would be simpler: after all, if I had typed the sequence of symbols after the " on the line above where I did, Lean would have done what I wanted already. This made me think that it could be relatively simple to parse the string.

If it is complicated, then I will simply create a temporary lean file stripping all the weird stuff and simply let lean run on that reformatted file.

Damiano Testa (Jan 11 2023 at 21:29):

To un #xy, I have some .md files where I type some lean code and I would like to automatically add the Lean output of the lean code as a comment next to each

```lean
 [some lean code]
```

block. I have achieved this using some shell script to reformat the md file into a lean file, run lean on that file and then the shell picks up the pieces to recreate the final md file.

I was hoping that I could do this entirely inside lean, but it seems not!

Thanks again!

Andrés Goens (Jan 11 2023 at 22:33):

Have you looked at LeanInk? Sounds like a very similar use case

Damiano Testa (Jan 11 2023 at 23:46):

Thanks for the link! I could not find any clear example, but it seems to be what I want!

However, I installed leanInk and get a segmentation fault on my files. I will investigate more what could be the cause tomorrow.

Adam Topaz (Jan 12 2023 at 02:14):

We have emit_code or something with a similar name in lean3 that does this.

Damiano Testa (Jan 12 2023 at 04:28):

Adam, thanks!
docs#lean.parser.emit_code_here

Damiano Testa (Jan 12 2023 at 04:33):

Looking for emit in docs4, gave several hits in the Lean.IR.EmitC namespace: I'll investigate once I am at a computer.

Mario Carneiro (Jan 12 2023 at 04:34):

This is not difficult to do but there is not an interface specifically for doing this

Mario Carneiro (Jan 12 2023 at 04:34):

emitC is on the wrong track

Damiano Testa (Jan 12 2023 at 04:36):

Mario, then thanks for saving my future-me some time this afternoon failing to making it work!

Damiano Testa (Jan 12 2023 at 04:41):

Oh, is the C above the computer program, rather than a shortened version of Code? :man_facepalming:

Mario Carneiro (Jan 12 2023 at 04:41):

C the language

Mario Carneiro (Jan 12 2023 at 04:51):

import Lean.Meta.Eval
import Std.Util.TermUnsafe

open Lean Elab Command

elab "#run_string" e:term : command => do
  let str  runTermElabM fun _ => do
    let e  Term.elabTermEnsuringType e (some (.const ``String []))
    unsafe Meta.evalExpr String (.const ``String []) e
  let .ok stx := Parser.runParserCategory ( getEnv) `command str | throwError "failed to parse"
  elabCommand stx

def foo := "#eval \"Hello world\""

#run_string foo

The info message appears at the top of the file though, because runParserCategory uses a dummy file name and an initial position of zero, so using elabCommand to run it in the current context results in mixed up parser positions

Damiano Testa (Jan 12 2023 at 04:55):

That's great, Mario! Thank you very much! I'll experiment with it in a few hours!

Damiano Testa (Jan 12 2023 at 04:56):

I can probably fix the position issue by making it print the current line number as well.

Mario Carneiro (Jan 12 2023 at 05:02):

It is not hard to filter the output messages to remove position information, but I don't think it's really possible to work on the input syntax to give it useful positions since it was created by metaprogramming and does not really exist in the source at all

Mario Carneiro (Jan 12 2023 at 05:03):

like, the message wants to highlight the five characters of #eval, but what five characters would that be? For all we know foo has constructed "#eval" by a completely crazy method and those five characters appear nowhere

Damiano Testa (Jan 12 2023 at 05:13):

Yes, I understand. What I meant is that since I creat foo, I simply pass

def code : String := "#eval IO.println \"Hello!\nI am on line 75\""

Mario Carneiro (Jan 12 2023 at 05:14):

What I meant is that you can postprocess the error messages to put them all on #run_string regardless of where they said they wanted to be

Damiano Testa (Jan 12 2023 at 05:18):

Ah, I see! That's a more structured approach: I like it!

Damiano Testa (Jan 12 2023 at 05:21):

In any case, for what I have in mind, I can probably rig it so that I need a single #run_string per use.

Mario Carneiro (Jan 12 2023 at 05:22):

What do you have in mind anyway? This seems like a bad idea generally, a lot of stuff is weird inside this kind of context

Damiano Testa (Jan 12 2023 at 05:22):

Once I can pass a string, I can make the string contain all the uses that I want, I simply store the various bits and run_string after the fact.

Mario Carneiro (Jan 12 2023 at 05:23):

There is almost certainly a less general and more appropriate way to handle whatever kind of input you have

Damiano Testa (Jan 12 2023 at 05:23):

I simply want to update automatically Lean code-blocks in a .md file. So, I type the file and if I change the lean code, I don't have to re-copy-paste all outputs.

Mario Carneiro (Jan 12 2023 at 05:25):

In that case, rather than using run_string you should be calling runFrontend from a standalone program which reads the .md file, parses out the code blocks, calls lean on them and then interpolates the results back into the string

Mario Carneiro (Jan 12 2023 at 05:26):

or are you trying to make the lean server work directly on the .md file?

Damiano Testa (Jan 12 2023 at 05:26):

Sample file:

Title

This is a Lean4 command:

```lean
#eval 5
/-
5
-/
```

EOF

A couple of days later, I decide to change 5 to 6 and I don't want to update the comment.

Damiano Testa (Jan 12 2023 at 05:27):

I think that I did what you said in bash, but thought that using lean directly would have been better/more robust.

Mario Carneiro (Jan 12 2023 at 05:27):

I am suggesting to write that in lean

Mario Carneiro (Jan 12 2023 at 05:28):

but the difference with #run_string is that rather than running the program at compile time, we are using lean to construct a standalone program that you then call when you want to update your .md files

Damiano Testa (Jan 12 2023 at 05:28):

Essentially, I awk the .md file, pass the output to lake env lean file.output and then paste the result back into the original file.

Damiano Testa (Jan 12 2023 at 05:30):

Ok, I'll investigate (and try to understand) what you said.

Mario Carneiro (Jan 12 2023 at 05:30):

that approach basically already works; the only difference with doing it in lean would be you don't have to spin up a new process each time

Mario Carneiro (Jan 12 2023 at 05:30):

you also get a bit more structured view of what lean's output is

Mario Carneiro (Jan 12 2023 at 05:31):

which might be helpful if you want to put just 5 instead of <input>:0:1: 5 in the comment

Damiano Testa (Jan 12 2023 at 05:32):

Yes, the reason I am not entirely happy with my current approach is that it works well on __one__ file, but I already imagine that I'll want more functionalities when I start using it more extensively.

Having the parsing done by lean seemed like a better idea, hence this thread!

Mario Carneiro (Jan 12 2023 at 05:33):

note that #run_string as written only works with one command; if you want to process multiple commands you will need some kind of loop similar to what runFrontend is doing

Damiano Testa (Jan 12 2023 at 05:37):

Ok, I'll look at runFrontend. My idea for run_string was this.
Scan the .md file, putting all the consecutive lines that are not lean code in the string #eval IO.println ("lines of text")
Then append the string with the code block and the continue alternating text and code blocks, creating a huge string.

Finally run_string-it and save the output to a final .md file that I publish.

Damiano Testa (Jan 12 2023 at 05:40):

So the "string" looks like

"#eval IO.println "text
actual code, not parsed"
[actual code, now parsed]
[repeat]"

Mario Carneiro (Jan 12 2023 at 05:41):

yeah, there are several unnecessary steps there

Damiano Testa (Jan 12 2023 at 05:41):

The bash script that I wrote to do this is under 60 lines.

Mario Carneiro (Jan 12 2023 at 05:41):

why are you putting an IO.println in the middle of text?

Mario Carneiro (Jan 12 2023 at 05:42):

why not just print the text

Damiano Testa (Jan 12 2023 at 05:42):

I wanted to remove the escapes for the " that might appear in the text.

Mario Carneiro (Jan 12 2023 at 05:42):

(unnecessary for a lean program I mean, this seems par for the course for a bash script)

Mario Carneiro (Jan 12 2023 at 05:42):

no I mean for the non-lean code you can literally just print it

Mario Carneiro (Jan 12 2023 at 05:43):

rather than putting it in quotes, prepend #eval IO.println, and passing it to run_string where it parses that to find out you wanted to print text

Damiano Testa (Jan 12 2023 at 05:43):

If the text of the .md file says We call this "group", then I do not want the " to break the flow

Mario Carneiro (Jan 12 2023 at 05:43):

that is only a problem you have because of this workflow

Damiano Testa (Jan 12 2023 at 05:46):

I think that I see: if I pass a String to Lean that contains a " character, I don't need to escape it. This is already what I was hoping would happen by doing the parsing directly with lean!

Damiano Testa (Jan 12 2023 at 05:47):

I'll look at runFrontend and rewrite the script in lean.

Damiano Testa (Jan 12 2023 at 05:47):

Thanks!

Mario Carneiro (Jan 12 2023 at 05:47):

Right. If you do def main : IO Unit := do let str <- IO.FS.readFile "foo.txt"; IO.println str then it doesn't matter if the text in foo.txt had quotes or not, everything is printed verbatim

Damiano Testa (Jan 12 2023 at 05:50):

Great, this is the kind of simplification that I was hoping for, when I thought of using lean. I simply lack the language skills to instruct lean!

Damiano Testa (Jan 12 2023 at 05:50):

I felt like a butcher, running awk on correct lean code!

Damiano Testa (Jan 13 2023 at 22:07):

A small update: with the modification below to runFrontend, I get access to the MessageLog. From there, I can get what I want: thanks a lot for your help!

/--  Copied from `Lean.Elab.runFrontend`.  Returns the `MessageLog`. -/
 @[export lean_run_frontend]
 def Lean.Elab.myRun
     (input : String)
     (opts : Options)
     (fileName : String)
     (mainModuleName : Name)
     (trustLevel : UInt32 := 0)
     (ileanFileName? : Option String := none)
     : IO MessageLog := do
   let inputCtx := Parser.mkInputContext input fileName
   let (header, parserState, messages)  Parser.parseHeader inputCtx
   let (env, messages)  processHeader header opts messages inputCtx trustLevel
   let env := env.setMainModule mainModuleName
   let mut commandState := Command.mkState env messages opts

   if ileanFileName?.isSome then
     -- Collect InfoTrees so we can later extract and export their info to the ilean file
     commandState := { commandState with infoState.enabled := true }

   let s  IO.processCommands inputCtx parserState commandState
 --  for msg in s.commandState.messages.toList do
 --    IO.print (← msg.toString (includeEndPos := getPrintMessageEndPos opts))

   if let some ileanFileName := ileanFileName? then
     let trees := s.commandState.infoState.trees.toArray
     let references := Lean.Server.findModuleRefs inputCtx.fileMap trees (localVars := false)
     let ilean := { module := mainModuleName, references : Lean.Server.Ilean }
     IO.FS.writeFile ileanFileName $ Json.compress $ toJson ilean

   pure s.commandState.messages

Damiano Testa (Jan 13 2023 at 22:08):

What I still need to figure out is how to pass the "correct" environment, but I'll worry about this next week!


Last updated: Dec 20 2023 at 11:08 UTC