Zulip Chat Archive

Stream: new members

Topic: API for Getting Proof State Information from the Compiler

Seth Poulsen (Jun 29 2021 at 16:28):

Hi all, I'm new here, hopefully this is an ok place to ask my question, if not let me know.

I'm a PhD student at the University of Illinois, doing research in early proof education. We've been having students construct proofs by dragging-and-dropping prewritten proof lines into the correct order (see the system description here: https://arxiv.org/abs/2106.11032)

I would love to be able to generate these questions automatically from a proof assistant to scale up the amount of practice questions we can provide to students. I'm drawn to lean because I know you guys have a large library of proofs that are meant for educational purposes.

To generate natural language proofs from the formal proofs, I would likely need some sort of API where I could get info about the proof context and goal state from the prover at each step of the proof (like Coq's plugin API). Does lean have anything like this?

Thanks!

Yakov Pechersky (Jun 29 2021 at 16:34):

Proof blocks are awesome! One obstacle that you might find in trying to do this is that for simple arithmetic proofs that involve calculation, or expansion of terms, mathlib often uses high-powered tactics that are very general and do not create a sequence of steps that are human-parseable. What I mean is that

import tactic.ring

lemma my_lemma (a b : nat) : (a + b) ^ 2 = a ^ 2 + 2 * a * b + b ^ 2 := by ring

produces a term-mode proof which is wild!

Seth Poulsen (Jun 29 2021 at 16:39):

Thanks! That's good to know about how some tactics tend hide decision procedures that end up with really ugly proof terms. It seems to be the same with other proof assistants too from what I can tell. I think when I'm first getting started I'll have to only allow very simple algebraic manipulations and just show the equality in a single step (e.g. X is obviously equal to Y). In the future it would be cool to come up with a way to show the actual algebra steps for more complicated manipulations, but its certainly far from obvious how to do that in a human-friendly way.

Huỳnh Trần Khanh (Jun 30 2021 at 05:21):

For your purposes, the metaprogramming API can be helpful. You might also want to look into lean-gptf for inspiration.

Jason Rute (Jun 30 2021 at 12:36):

Can you describe what you would like your interface to look like some more? Here are some quick thoughts:

The lean language server provides some support for this. There is a python wrapper around it, but it is not being well maintained (by me). If you use it, use this branch in this PR. Also, the language server (which is what drives the interactive communication with VS Code), is sort of a blunt tool and may or may not be what you need.
You can roll your own interface. This isn't going to be a lot of code, but it will involve messing around with Lean metaprogramming, which is basically learning a new (and not well documented) programming language.
You should also check out Lean gym (https://github.com/openai/lean-gym). While intended for training AI, it does provide a nice back and forth interface for sending tactics to Lean and getting goal information back. One problem is that it requires proving a theorem already in Lean (although that could be modified in the interface).
Also, if you don't mind using Lean 4, you might also check out the lean-gym prototype for Lean 4 (https://github.com/dselsam/lean-gym) (see discussion at #Machine Learning for Theorem Proving > Lean gym for Lean 4 for setting it up). Again, this is for machine learning agents, but this type you can enter in the theorem you want. Note however, that it is just a quick prototype so it might have many bugs.
In general, if you are going to muck around with a Lean interface and don't need the advanced math in mathlib, you might consider Lean 4 since it is just a better all around programming language. (However, it is also in constant flux.)

Seth Poulsen (Jun 30 2021 at 12:55):

@Jason Rute Thanks for the response! I looked a little bit at the language server and as you said I'm afraid it won't provide the level of detail I am hoping for (e.g. type information about each of the terms in the assumptions and goal). It looks like the meta programming language and lean-gym are somewhere in the neighborhood of what I want. I'll have a look at them in more detail.

I'll admit it is hard for me to define exactly what I want right now because its not yet clear to me what exactly all the features are I'm going to need, and it probably won't be clear until I'm in the middle of building the formal proof -> natural language translation I'm hoping to do.

Seth Poulsen (Jun 30 2021 at 12:57):

One problem is that it requires proving a theorem already in Lean

What exactly do you mean by this?

Jason Rute (Jun 30 2021 at 13:00):

lean-gym requires you to provide a declaration name, e.g. nat.add_comm and then it sets the goal to the statement of that theorem. This is because the point of Lean gym is to train and test and AI on theorems already in the library. I assume you want fresh theorems. Honestly, it wouldn't be hard to modify lean-gym (or a similar tool) to support manually entering in a theorem to prove.

Jason Rute (Jun 30 2021 at 13:04):

Overall, if you know you want to use Lean for this, I recommend taking the time to learn meta programming in Lean, in particular both the tactic and io monads. After you understand that well enough, you should have the tools to make whatever sort of interface you like. You will have to soon decide if you want to use Lean 3 or Lean 4. Lean 4 is a better programming language and is the future of Lean. It has ok documentation for some stuff, but it is new and still changing. Lean 3 is a more difficult language for metaprogramming, but it is perfectly usable. Also, you will have access to all of the theorems and tactics in mathlib. For both, the documentation isn't great and you will need to ask a lot of questions here.

Seth Poulsen (Jun 30 2021 at 13:20):

Ok, I see. For now, I'll probably be focusing on translating already proven theorems to a natural language representation, but certainly in the future it would be cool to produce the proof and the natural language translation from scratch.

Seth Poulsen (Jun 30 2021 at 13:22):

I was just digging around the threads at #Machine Learning for Theorem Proving --it looks like you originally were going to use the lean server for the ML stuff but then instead shifted to using the meta-langauge, is that right? Did the server just not give detailed enough information or was there some other reason?

Seth Poulsen (Jun 30 2021 at 13:35):

Right now I'm a little bit on the fence between using Lean and Coq. Factors I'm weighing are (1) access to documentation and help integrating with the system, (2) ease of translating tactics to some sensible natural language representation, and (3) if there are libraries of proofs existing that could be of use for me

Seth Poulsen (Jun 30 2021 at 14:44):

@Huỳnh Trần Khanh Thank you for your response as well. I will have a look at lean-gptf.

Jason Rute (Jun 30 2021 at 17:01):

In short, for proof recording, the server doesn't tokenize as expected, doesn't include information about tactics inside tactics consistently, and most importantly doesn't provide any information about where a tactic ends. (This last point is honestly the largest pain of proof recording.) As for creating an interactive environment with Lean, the server is slow (it has a hard coded delay of 200ms every time the file is changed), and it is difficult to consistently enter expressions into Lean exactly as the pretty printer outputs them.

Seth Poulsen (Jul 01 2021 at 03:18):

@Jason Rute Thank you for your time in giving these explanations, I really appreciate it.

Last updated: May 02 2025 at 03:31 UTC