Zulip Chat Archive

Stream: general

Topic: Large transformer models like OpenAI Codex and Lean

Yaël Dillies (Aug 24 2021 at 09:21):

Golol said:

2) In formal mathematics it can be immediately verified if the Model has managed to produce code that achieves the subgoal. The programmer does have to carefully check if the theorem and definition statements are correct, but if any proof works, then it just works. There are no bugs in Lean in this sense.

Mostly yes, but definitions escape that pattern. It's very hard to know whether you got the right definition before proving what you want about it.

Golol (Aug 24 2021 at 09:30):

Yaël Dillies said:

Golol said:

2) In formal mathematics it can be immediately verified if the Model has managed to produce code that achieves the subgoal. The programmer does have to carefully check if the theorem and definition statements are correct, but if any proof works, then it just works. There are no bugs in Lean in this sense.

Mostly yes, but definitions escape that pattern. It's very hard to know whether you got the right definition before proving what you want about it.

Do you mean this for creating new definitions, or formalizing existing definitions in lean?

Yaël Dillies (Aug 24 2021 at 09:48):

Either. Mathematical definitions often have to undergo some adaptation due to technicalities.

I think if you are just trying to port existing definitions into lean and you have already laid out the foundation of a theory, i.e. decided how the basic objects will look exactly in lean, making sure that the definitions in the rest of the theory are correct is not so hard.

Patrick Massot (Aug 24 2021 at 10:12):

This whole thing has been discussed many times already.

Golol (Aug 24 2021 at 10:18):

Particularly in the context of recent advances with large transformer models? If not then I think it is worth a new discussion with this in mind. There is a difference to what I am saying and just "Let's use machine learning on mathlib".
The point is not to use AI to generate proofs. The point is that it has now been demonstrated that these models can translate back and forth between natural language and several syntactic/formal languages (for statements with reasonable complexity).
I'd love to read previois discussions on this.
edit: Ah great, I've found it with search

Jason Rute (Aug 24 2021 at 11:45):

Just to make this easier to find, the relevant prior discussions are in these places:

#Machine Learning for Theorem Proving (Probably the best place to have these conversations)
#lean-gptf An application of large language models to Lean. (Best location to discuss this particular project.)
Search for "codex" to see similar discussions on OpenAI Codex and OpenAI copilot (as well as TabNine). #general > github copilot in particular has some good comments by Stan Polu at OpenAI about continuing support for Lean.

Jason Rute (Aug 24 2021 at 11:46):

Jason Rute said:

I don’t know enough about Codex yet, but you may be interested in Lean gpt-f which itself is a collaboration with Stan Polu at OpenAI. Stan and Jesse Han and others are working on a number of projects applying AI to theorem proving. Here are some resources on Lean gpt-f.

stream: #lean-gptf

paper: https://arxiv.org/abs/2102.06203

repo: https://github.com/jesse-michael-han/lean-gptf

short talk: https://iclr.cc/virtual/2021/workshop/2124#collapse-sl-3634

long talk: https://m.youtube.com/watch?v=EXpmbAfBNnw

Golol (Aug 24 2021 at 12:26):

Thanks! I am really happy to hear that this is being worked on. I really think that this is the key to Lean being widely adopted. I hope that this will allow future mathematicians to formalize proofs after only a very brief introduction into type theory.

Justus Springer (Aug 25 2021 at 09:19):

Golol said:

I hope that this will allow future mathematicians to formalize proofs after only a very brief introduction into type theory.

I have my doubts about this tbh, at least for the short to medium term. The idea that a future lean user won't have to think about all the small technical details anymore and just let a transformer model translate his natural language descriptions into valid lean code certainly is exciting. But what happens when the proof produced by the model doesn't work (as it certainly will at some point)? Then you'll have to start debugging, and for that you need a lot of experience in lean and type theory.

I guess my point is: To use such a transformer model effectively, you'll need to have a very good intuition about what it can and can't do. I.e: Is this proof, written in natural language, precise enough so that this model can fill in the gaps and give me valid lean code? Or does it need more explaining? If so, which parts should I try to explain differently (more precisely), so that it works? To build this intuition requires lots of experience of dealing with small "annoying" details in a theorem prover. Details that a normal mathematician never thinks about, hence has no experience with explaining in a formal style. I myself consider myself very much at the beginning of building such an intuition, and I've been formalizing in lean for about half a year. So I believe the entry barrier will remain quite high, but I'd love to be proven wrong.

Justus Springer (Aug 25 2021 at 09:36):

I'm sure nothing of what I just wrote is new or hasn't already been said many times here in other discussions. I just felt compelled to counterbalance your arguments (which are certainly interesting) and say that, from what I've seen, evidence that AI can significantly lower the entry barrier for theorem provers is yet to be provided.

Golol (Aug 26 2021 at 07:55):

Thanks, I didn't quite think about it like that. This of course also happens in programming. In this case people usually break down the problem into smaller pieces and give more detailled descriptions. So I could say the way I believe this would work in the end ideally is that
1) 19/20 times with moderately detailled proofs (Beginning/Medium undergread level) The model manages to prove your subgoal.
2) In the remaining cases, 19/20 times the model manages to prove your subgoal after you sat down and tried to break it into really small pieces.
3) For that 1 tricky proof out of 400 in the paper you'e trying to formalize you ask a Lean specialist friend.
Of course this is not an argument as we don't have any clear evidence. Just a description of how I think things could work. We'll find out I suppose.

Mario Carneiro (Aug 26 2021 at 07:58):

By the way, you could replace ML by sledgehammer or other ATP in that description as well. It seems to work fairly well for the isabelle folks, but I don't think isabelle users are fooling themselves into thinking it's just like informal maths

Golol (Aug 26 2021 at 08:31):

That's also a good point. I'd just like to emphasize once more that the core difference between a language model and other ATPs is that an ATP work with two things: A collection of known truths and a subgoal.
A natural language model works with three things: A collection of known truths, a subgoal, and a natural language instruction by the user.
The task can be made easier for an ATP by increasing the known truths or simplifying the subgoal. For a natural language model you can also make it easier by giving more detailled instructions.

Scott Morrison (Aug 26 2021 at 08:51):

This point about "instruction by the user" is an excellent reminder that if we write good comments in all of our (formal) proofs, this good training data for ML models, as well as helpful for the humans today.

In particular, if we hope to be able to give natural language prompts to descendants of lean-gptf, it is going to need (want?) aligned training data: interspersed NL comments and formal tactic invocations are perfect.

Patrick Massot (Aug 26 2021 at 08:56):

I think we should rather have a tool that produces the natural language comments from the tactic scripts (including variations), probably rather in Lean 4 than in Lean 3. If this isn't possible because our tactics do too many different things (to the point that even a meta-program that has access to the goal before and after running the tactic can't write a sentence about it) then I interpret it as a signal that our tactics are too versatile and we should use more specialized versions that would be nicer even for regular readers.

Patrick Massot (Aug 26 2021 at 08:57):

Independently from machine learning, generating that kind of text is part of my big dream of a next generation math document anyway.

Last updated: May 02 2025 at 03:31 UTC

leanprover-community / mathlib