Zulip Chat Archive

Stream: lean4

Topic: Learning about Lean IR


Lev Stambler (Sep 05 2023 at 20:35):

Is there a place where I can read up on Lean's IR and how to work with with it?

Henrik Böving (Sep 05 2023 at 20:36):

Which IR in particular are you talking about? I can think of at least 3 from the top of my head

Henrik Böving (Sep 05 2023 at 20:43):

@Matthew Ballard what does red square mean :P

Matthew Ballard (Sep 05 2023 at 20:43):

Infrared?

Henrik Böving (Sep 05 2023 at 20:43):

Ahhhh

Matthew Ballard (Sep 05 2023 at 20:45):

I assume there is no Lean UV though :)

Henrik Böving (Sep 05 2023 at 20:48):

In the hardware verification world there exists a tool called UVM for universal veritifcation methodology. So it would be pretty cool if we had Lean UV :P

Lev Stambler (Sep 07 2023 at 12:47):

Henrik Böving said:

Which IR in particular are you talking about? I can think of at least 3 from the top of my head

My bad :smile: , the intermediate representation.

Jason Rute (Sep 07 2023 at 14:06):

@Lev Stambler I think @Henrik Böving is saying he can think of at least 3 intermediate representations. I think (but could be mistaken) that there is the Syntax IR representing the syntax tree of the code, and the MacroM Syntax IR representing the syntax tree after expanding macros, and the Expr IR representing the proofs and objects at the kernel level after elaboration. The last is used for type checking and checking proofs. For running code, C++ is also used as an IR during compiling, and I think there is also a representation used for running in the virtual machine interpreter but I don’t know how it works. You can see some details in these fairly old slides: https://leanprover.github.io/presentations/20181012_MSR/#/3 You can also look at the metaprogramming book: https://github.com/leanprover-community/lean4-metaprogramming-book

Henrik Böving (Sep 07 2023 at 14:11):

Only half of that is correct.

indeed there is Syntax for the syntax tree but it is used before and after macro expansion MacroM Syntax is a value wrapped in a reader+state monad stack not an actual value (although it is the monad that is used during macro expansion). Then there is indeed Expr. There is also no C++ used as an IR ever. There are two internal IRs that are both referred to as LCNF (Lean Compiler Normal Form), one for the old and one for the new code generator. But we do indeed translate LCNF into C (or LLVM) in the end.

And indeed LCNF can also be executed by an interpreter if we want, that's e.g. what usually happens if you run #eval or also how tactic execution works (modulo pre compiled tactics).

Jason Rute (Sep 07 2023 at 16:15):

@Lev Stambler You may also want to say more what you want to do when you say “work with” the IR. For example, the meta-programming books says a lot about writing macros and tactics. There are also other resources on say using Lean code compiled to C in another non-Lean project.

Jason Rute (Sep 07 2023 at 16:21):

(And some in machine learning are interested in mining Lean expressions for data, and some in PL are interested in external type checkers for Lean proofs or translating Lean to other ITP languages. So there are a lot of ways to interact with Lean’s various internal representations, and some existing tools for various use cases. :smile: ).

Lev Stambler (Sep 07 2023 at 16:53):

@Jason Rute and @Henrik Böving, thank you for the response! I am looking more for the Expr representation but was hoping to get it into a format like JSON or some other portable format where I would not have to write a parser for Expr.

Overall, I am looking to run compression algorithms over an "expanded out" version of all the theorems in Mathlib. I.e. I want to expand each theorem into its fully elaborated version, replace any names with generic names, and then run compression. I hope to then see if there is some decent way to measure the amount of "information" conveyed in one proof

Lev Stambler (Sep 07 2023 at 17:02):

I have gone through a good bit of the meta-programming book before and found the part on expressions quite helpful! I was wondering if there is a canonical way to do things like replace other theorems/ definitions with their elaborated versions and then get out the fully elaborated expression in a portable format? (as mentioned above)

Jannis Limperg (Sep 07 2023 at 17:07):

This should be easy if you stay within Lean. The fully elaborated version of a theorem is precisely its Expr; you can get the Exprs for all theorems by iterating through docs#Lean.Environment. Replacing each constant (name) with something generic can be done with docs#Expr.replace. Then you can toString the Expr and compress the resulting textual representation.

Lev Stambler (Sep 07 2023 at 17:11):

Jannis Limperg said:

This should be easy if you stay within Lean. The fully elaborated version of a theorem is precisely its Expr; you can get the Exprs for all theorems by iterating through docs#Lean.Environment. Replacing each constant (name) with something generic can be done with docs#Expr.replace. Then you can toString the Expr and compress the resulting textual representation.

Fantastic. Thank you, I will check it out!

Lev Stambler (Sep 07 2023 at 17:17):

Jannis Limperg said:

This should be easy if you stay within Lean. The fully elaborated version of a theorem is precisely its Expr; you can get the Exprs for all theorems by iterating through docs#Lean.Environment. Replacing each constant (name) with something generic can be done with docs#Expr.replace. Then you can toString the Expr and compress the resulting textual representation.

I am getting a 404 for the Expr.replace page (https:/​/​leanprover-community.github.io/​mathlib4_docs/​404.html#Expr.replace)

Kevin Buzzard (Sep 07 2023 at 17:18):

It's in the Lean namespace: docs#Lean.Expr.replace

Patrick Massot (Sep 07 2023 at 17:23):

Lev Stambler said:

Overall, I am looking to run compression algorithms over an "expanded out" version of all the theorems in Mathlib. I.e. I want to expand each theorem into its fully elaborated version, replace any names with generic names, and then run compression. I hope to then see if there is some decent way to measure the amount of "information" conveyed in one proof

You will get a lot of implementation-dependent noise.

Jason Rute (Sep 07 2023 at 18:23):

One way that Lean’s expressions including proofs is stored is in the olean files. It’s a binary format. And there has been a lot of work trying to compress them as much as possible, so you might just be able to look at those compressed olean files. For Lean 3 at least, there is also a similar export format which is serialized. It just stores the expression data in a text format which could be compress with say gzip.


Last updated: Dec 20 2023 at 11:08 UTC