Zulip Chat Archive

Stream: general

Topic: ast

Iocta (Mar 23 2021 at 06:40):

I want to fiddle with a lean syntax graph in another language but I don't want to write a parser. Is it possible to get lean to dump the parse tree?

Alcides Fonseca (Mar 23 2021 at 07:55):

I am also interested in this and in the reverse process: convert the AST back to Lean (is a pretty printer and re-parsing the only way to go?)

Jason Rute (Mar 23 2021 at 11:15):

Can both of you explain a bit more about that part of the code you want an AST for (and maybe the applications)? In general, parsing Lean code is something you need a full Lean parser to do. Every time a new tactic, user command, or notation is added to the language it changes the behavior of the parser. Your three options are to (1) write a Lean parser or (2) hack into the C++ Lean parser, or (3) in certain cases you can write a poor Lean parser which doesn't cover all the cases and combine it with data from hacked versions of basic lean functions in Lean's parser and tactic monads.

Jason Rute (Mar 23 2021 at 11:15):

We took the final approach for the LeanStep dataset for the LeanStep/PACT paper. There we extracted an AST for all Lean tactic proofs. To read a bit more about how we did this, see this slide for Lean Together 2021 and the next few slides. The repo is here, but like a lot of code for papers, it got rushed at the end to meet a deadline and isn't as usable as I'd like. I hope to clean it up.

Jason Rute (Mar 23 2021 at 11:15):

Now if you just want the internal AST of expressions, that is much easier. There are a number of more parsable formats for Lean expressions that can (with varying degrees of difficulty) be coverted back into Lean. But again, you can't just read them easily from a Lean file. You need to work from within the Lean metaprogramming framework to export the data.

Jason Rute (Mar 23 2021 at 11:15):

Finally, it should be noted that this is all easier in Lean 4. (I assume you are asking about Lean 3.)

Jason Rute (Mar 23 2021 at 11:19):

It's also easy actually, in Lean 3, to get the AST of all definitions and theorems in the environment as well as the low-level term proofs that Lean saves. Again, you need to work from within the Lean metaprogramming framework to export this data.

Iocta (Mar 23 2021 at 18:34):

That's interesting. How do I get the AST of definitions out?

Iocta (Mar 23 2021 at 18:39):

Is that meta def trace_parser_state

Iocta (Mar 23 2021 at 18:42):

I see the slide mentioned Large was run over 8 run and Larger was run over 1, but not Larger over 8?

Jason Rute (Mar 23 2021 at 22:22):

Iocta said:

Is that meta def trace_parser_state

No. I've actually not seen that function before, but whatever it is it probably doesn't do anything like what you are asking. (It probably just traces helpful information about the parser state for debugging).

Jason Rute (Mar 23 2021 at 22:26):

Iocta said:

I see the slide mentioned Large was run over 8 run and Larger was run over 1, but not Larger over 8?

If you are interested in the results, then please look at the paper, and a more recent talk with slides. Everything in the last part of those other slides I shared is really out of date. We have better results now. I just shared the older slides since they contained more info on the tactic AST extraction.

Jason Rute (Mar 23 2021 at 22:46):

Iocta said:

That's interesting. How do I get the AST of definitions out?

I'll have to find a code snippet. How comfortable are you with metaprogramming in lean? The main idea is something like the following:

You should be working in a tactic or in an io function (like def main : IO unit).
The tactic tactic.get_envreturns an environment object env.
In particular, there is a method on env to iterate over all declarations in the environment.
You can filter out only the definitions.
For each definition you will have information about the name, the type, and the value of the definition. The type and value are expressions. There are a number of ways to print an expression to a string, including the usual pretty-printed version, the fully elaborated (pp.all) version, or expr.to_raw_fmt. All of these serializations can be converted back into a Lean expression as well, with lots of caveats. In particular, a Lean expression is an algebraic datatype and you could also write your own serializer to say S-expressions or JSON.
To output the definition to file, you can use Lean's IO framework, or just trace your data using tactic.trace. When you run lean foo.lean (or lean --main foo.lean), then any traced messages goes to stdout where you can pipe it into a file.

Jason Rute (Mar 23 2021 at 22:48):

Maybe start with this example: https://leanprover.zulipchat.com/#narrow/stream/113489-new-members/topic/printing.20lemma.20statements/near/213518206

Jason Rute (Mar 23 2021 at 22:52):

Or this one which gets all theorems: https://leanprover.zulipchat.com/#narrow/stream/219941-Machine-Learning.20for.20Theorem.20Proving/topic/ML.20for.20Lean.3A.20How.20to.20do.20it.3F/near/186859952

Iocta (Mar 23 2021 at 22:55):

I don't have much experience with lean's metaprogramming, but I'll look around. Thanks for the info.

Last updated: May 02 2025 at 03:31 UTC