Zulip Chat Archive

Stream: general

Topic: Lean grammar definition


Alex Sweeney (Apr 09 2023 at 17:50):

Is the Lean (3 or 4) grammar defined anywhere in a convenient format like Antlr .g4 files? I would like to generate a lexer and parser that I can use with Python to analyze and extract information from lean code

Sebastian Ullrich (Apr 09 2023 at 18:00):

The short answer is that the most/only reasonable language to accurately parse Lean 4 code with is Lean 4, arguably even before you get to syntax extensions. There are various tools such as LeanInk that do this (if we don't have a list of them yet, we should probably make one)

Jason Rute (Apr 09 2023 at 19:58):

One very common and concrete problem is notation. To create lean-proof-recording I ended up writing a simple parser for Lean 3 tactic proofs. It was very brittle and buggy. For example, it was hard to know when a comma ends a tactic or is inside the tactic. The comma could be part of a binder and the user can add new binders, so to do it right I needed a list of all the binders in the environment for that part of the file which one doesn’t normally have if one is just processing a text file. This was just one of many similar issues. It is best to avoid this if possible. At some point Mario wrote an AST extractor for Lean 3 which might be the sort of thing you need. For Lean 4, this information is fairly easy to get using Lean 4 and one could probably write a simple AST exporter for Lean 4 as well.

Jason Rute (Apr 09 2023 at 20:00):

Also another way to get information from Lean code is to use the language server. All the information available in VS Code comes from the language server. There is even a simple Python wrapper around the Lean 3 server called lean-proof-recording.

Alex Sweeney (Apr 09 2023 at 21:46):

@Jason Rute Thank you, your lean-proof-recording looks like it's exactly what I wanted to make. My goal is to fine-tune GPT-3 and see what that is capable of. I am very new to ML stuff and I'm just excited for the possibility of fully automated theorem proving. I have no idea how we'd get there, but I thought I'd give it a shot. I know I'm on the wrong end of the dunning kruger graph, but I'm just surprised to see relatively few people exploring Lean + GPT. ATP seems like it's doable now, and would totally change everything.

Alex Sweeney (Apr 09 2023 at 22:02):

Since this is something you clearly had an interest in as well, would you mind sharing your thoughts? Do you think ATP is on the horizon and is Lean good enough to get us there? Has this recent spike in AI discussion and research got you excited for it? Are there still critical missing pieces or are we just waiting for someone to stitch existing technologies together in the right way?

Jason Rute (Apr 09 2023 at 22:16):

@Alex Sweeney You should look at #Machine Learning for Theorem Proving and maybe my recent IPAM survey talk (https://m.youtube.com/watch?v=P5ew0BrRm_I) as well as Tony’s IPAM talk (https://m.youtube.com/watch?v=_pqJYnQua58). My general thoughts are that we are moving quite fast but have significant hurdles. We can prove simple stuff, but for anything too involved, all the approaches fail. As for when you say “Lean + GPT” that can mean a few things. Many papers apply an AlphaGo style approach to theorem proving using a combination of ML guided tree search and reinforcement learning. The first papers used simple ML approaches like k nearest neighbors and they still work well. Others used neural networks of various types. The PACT paper literally used a fine-tuned (smaller variant of) GPT-3 on Lean. Other papers do premise selection which is similar to document retrieval. Again the older papers use hand crafted features or TF-IDF but the newest ones (like MagnusHammer) use embeddings from LLMs. Finally, with foundation models like Codex and ChatGPT it is possible to do some theorem proving tasks, like auto-formalization and synthesizing whole proofs, out of the box with clever prompt engineering. (And Baldur finetunes a foundation model to correct its incorrect proofs.) This is very exciting times, but unless scaling does miraculous things I think we still need some big ideas to make these systems powerful and practical.

Scott Morrison (Apr 10 2023 at 05:07):

@Alex Sweeney, I just now posted links to sagredo, a conversational tactic between GPT and Lean 4.


Last updated: Dec 20 2023 at 11:08 UTC