Zulip Chat Archive
Stream: Machine Learning for Theorem Proving
Topic: GPT generated theorem descriptions.
Kim Morrison (Apr 30 2024 at 06:28):
Continuing from https://leanprover.zulipchat.com/#narrow/stream/113486-announce/topic/GPT-4.20generated.20theorem.20descriptions/near/436189082
Kim Morrison (Apr 30 2024 at 06:30):
I'm inclined to think these are much too verbose.
theorem Nat.Prime.one_lt : ∀ {p : ℕ}, Nat.Prime p → 1 < p
The theoremNat.Prime.one_lt
states that for all natural numbersp
, ifp
is a prime number, thenp
is greater than 1. In other words, if a natural number is prime, it must be strictly larger than 1.
would be much better as just "if a natural number is prime, it must be strictly larger than 1"
Kevin Buzzard (Apr 30 2024 at 06:30):
So we can now document all mathlib lemmas right? :-)
Kim Morrison (Apr 30 2024 at 06:33):
I'd also like to see some automatic way of quantifying the quality of auto-informalization. I think a reasonable one is to atuo-formalize the resulting text, and see if it is identical (equivalent up to aesop??) to the original.
Kevin Buzzard (Apr 30 2024 at 06:55):
This would be more likely to work the more verbose the translation, right?
I think that formalising these informal statements sounds like a great idea! If we can get machines translating latex to lean then I can get FLT done much quicker. Right now I guess it's science fiction but practicing on theorem statements is surely a good place to start
Siddhartha Gadgil (Apr 30 2024 at 07:43):
Kevin Buzzard said:
This would be more likely to work the more verbose the translation, right?
I think that formalising these informal statements sounds like a great idea! If we can get machines translating latex to lean then I can get FLT done much quicker. Right now I guess it's science fiction but practicing on theorem statements is surely a good place to start
Actually LeanAide
translates statements pretty well already. And many of the failure cases are because of lack of example terminology, which can be partly fixed by using these informalizations.
Siddhartha Gadgil (Apr 30 2024 at 07:47):
I agree with both the points about having brief definitions and informalizing to test. There are issues with resources though. For context, I informalized with the following prompt template, including relevant definitions (with docstrings if available):
"The following are some definitions in Lean 4 that are involved in the statement of a theorem which you have to describe in natural language.\n```lean\n${definitions}\n```\n\nDescribe the following theorem (whose proof is suppressed for brevity) in natural language, similar to a documentation string. The description should be either a single sentence or a paragraph with 2-3 sentences, and may contain LaTeX mathematics.\n```lean\n${theorem}\n```\n"
If I queried too often GPT-4 throttled. So I had to add delays between queries and it took about a week to generate this data. Formalization needs larger prompts (examples of translations) so a full run will take about a month.
Siddhartha Gadgil (Apr 30 2024 at 07:49):
I feel the best bet for having more concise explanations is to use a smaller local model to shorten the GPT-4 descriptions, perhaps Mistral 7B or Lamma 3 8B. In general Mistral is much less chatty than GPT-4 so should be inclined to give briefer explanations.
Siddhartha Gadgil (Apr 30 2024 at 07:50):
I will try using Mistral unless someone has a better idea or someone with more ML skills and GPUs is trying something similar.
Siddhartha Gadgil (Apr 30 2024 at 08:23):
@Kevin Buzzard I feel that we can already try to use machine translation a bit like the Mathlib3 port: with a machine step (or rather steps) followed by a human step. The machine step as of now will be weaker so much of the burden will be on humans. But over time productivity can go up.
Further, ML can already do rewriting tasks well, for instance transforming proofs into a structured form an in this example. The individual claims can then be autoformalized. Ideally the result should then be provable by aesop
configured to use ring
etc.
In practice though there will be errors along the way to be fixed, and proofs that have to be hand completed.
I feel part of the reason this has not been more developed is that you cannot make benchmarks for "useful partial formalization", and ML community is geared to targetting benchmarks.
Kevin Buzzard (Apr 30 2024 at 09:00):
Interesting comments!
I'm finding with FLT that it's hard to point to the book and say "this is the lemma we need", because we tend to write things in a different generality.
Jason Rute (Apr 30 2024 at 13:14):
@Zhangir Azerbayev and @Rob Lewis did something similar with Codex in Lean 3. See here and #general>Translation to natural language (specifically this message). In their work, I noticed the descriptions tended to favor big famous theorems. So for example a technical theorem about a Haar measure would be interpreted as some well-known result about a Haar measure. The worst was theorems about coercions since that concept was completely foreign to the model. I imagine there are probably good ways to separate good definitions from bad ones automatically, including some form of self-consistency with back translation (as Kim suggests above), but these approaches get harder as the models get more expensive to run.
Siddhartha Gadgil (Apr 30 2024 at 13:24):
I don't know what their prompt was. What I did was delaborated the theorem statement to Syntax and included all the definitions with doc strings. So the model had something to work with. This may not have been so easy in Lean 3.
Jason Rute (Apr 30 2024 at 13:27):
I think they just put in the pretty printed theorem (or maybe the Lean original code) with a few examples of translations, but I'm not positive. Does your approach lead to better translations @Siddhartha Gadgil?
Jason Rute (Apr 30 2024 at 13:29):
Also, I unfortunately see now that my link to their example (which was a clone of the mathlib docs but with AI generated descriptions) no longer contains the AI generated descriptions. :sad:
Siddhartha Gadgil (Apr 30 2024 at 13:31):
I haven't compared with their's but in my own experiments with and without such information it did make a significant difference. Indeed these models are not great at "recalling" lean terminology but can pick it up from examples in the prompt fairly well.
Jason Rute (Apr 30 2024 at 13:31):
Is your prompt adaptive to the specific theorem you are translating?
Jason Rute (Apr 30 2024 at 13:32):
I mean the examples in the prompt. I assume you give some examples of good translations first, right?
Siddhartha Gadgil (Apr 30 2024 at 13:34):
For informalization I don't give examples of translation. I instead give definitions, with doc strings if available, of the terms that appear in the syntax of the statement of the theorem. Since the target is English I don't think examples of that are needed.
For formalization we give examples of related theorems and their translation. The hope is that these include most of the terms and syntax needed for the translation.
Miguel Marco (Apr 30 2024 at 17:38):
Siddhartha Gadgil said:
I feel the best bet for having more concise explanations is to use a smaller local model to shorten the GPT-4 descriptions, perhaps Mistral 7B or Lamma 3 8B. In general Mistral is much less chatty than GPT-4 so should be inclined to give briefer explanations.
From my limited experience, mixtral 8x7B works surprisingly well for a model that runs reasonably fast in a (high end-ish) domestic GPU. But even before that, i would give a try to llam3-8B, since it is really fast (so, if it doesn't happen to work, you wouldn't lose much time trying).
I just did a quick try with llama3. with just this prompt:
The following is the statement of a theorem writen in Lean4, translate it to natural language: theorem AlgHom.map_sub : ∀ {R : Type u} {A : Type v} {B : Type w} [inst : CommSemiring R] [inst_1 : Ring A] [inst_2 : Ring B] [inst_3 : Algebra R A] [inst_4 : Algebra R B] (φ : A →ₐ[R] B) (x y : A), φ (x - y) = φ x - φ y
It gave this response:
Here's the translation:
"Theorem: For any types `R`, `A`, and `B` with a commutative semiring structure on `R`, ring structures on `A` and `B`, and algebra homomorphisms from `R` to `A` and from `R` to `B`, if `φ` is an algebra homomorphism from `A` to
`B`, then for any elements `x` and `y` in `A`, the image of their difference under `φ` is equal to the difference of their images under `φ`. In other words, `φ(x - y) = φ(x) - φ(y)`."
Note that I've used more natural language than the original theorem statement, which is written in a more concise and formal style typical of mathematical proofs.
in a couple of seconds.
Miguel Marco (Apr 30 2024 at 17:44):
mixtral 8x7b gave this in about seven seconds:
Given a commutative semiring R, rings A and B, and an algebra homomorphism φ from A to B, for all elements x and y in A, the homomorphism maps their difference to the difference of their images under φ: that is, φ(x - y) = φ(x) -
φ(y).
Siddhartha Gadgil (May 01 2024 at 00:45):
@Kim Morrison @Kevin Buzzard I used Mistral 7B to rewrte the descriptions in more concise form. I have included the more concise descriptions italicised, for example in the Prime numbers page . They seem reasonable.
@Miguel Marco Thanks for the suggestions. I went with Mistral 7B for this task of making translations concise, because I do not have the resources to run Mistral 8x7b locally (I have one A4500 20GB CPU) and am having technical issues with stopping when running Llama 3 8B on vLLM (this is a known issue).
Miguel Marco (May 01 2024 at 08:27):
Which prompts did you use?
Siddhartha Gadgil (May 01 2024 at 08:28):
For summarization with Mistral 7B, this function:
def prompt (description):
text = f"""The following is a description of the statement of a theorem in Lean 4.
---
{description}
---
Give a concise, single-sentence mathematical statement of the theorem. Give ONLY the statement
"""
return text
Siddhartha Gadgil (May 01 2024 at 08:30):
The original translation used relevant definitions in the template:
The following are some definitions in Lean 4 that are involved in the statement of a theorem which you have to describe in natural language.\nlean\n${definitions}\n
\n\nDescribe the following theorem (whose proof is suppressed for brevity) in natural language, similar to a documentation string. The description should be either a single sentence or a paragraph with 2-3 sentences, and may contain LaTeX mathematics.\nlean\n${theorem}\n
\n
Shashank Pathak (May 06 2024 at 13:42):
Hi! For the JSON file, what do the name
and frequency
entries denote? For example, here
{"name": "id", "frequency": 37823}
{"name": "Eq.refl", "frequency": 33722}
{"name": "Eq.mpr", "frequency": 33681}
Siddhartha Gadgil (May 06 2024 at 15:36):
Shashank Pathak said:
Hi! For the JSON file, what do the
name
andfrequency
entries denote? For example, here
{"name": "id", "frequency": 37823} {"name": "Eq.refl", "frequency": 33722} {"name": "Eq.mpr", "frequency": 33681}
The name of the theorem and the frequency with which it appears in term proofs.
Xiyu Zhai (Jun 25 2024 at 05:12):
Kevin Buzzard said:
This would be more likely to work the more verbose the translation, right?
I think that formalising these informal statements sounds like a great idea! If we can get machines translating latex to lean then I can get FLT done much quicker. Right now I guess it's science fiction but practicing on theorem statements is surely a good place to start
Personally I believe it's not a science fiction but a within reach problem that's going to be fundamentally resolved in the coming years. Statistically speaking, the task is not difficult at all. We need some good AI system technique that guarantees both accuracy and speed.
Xiyu Zhai (Jun 25 2024 at 05:14):
I believe there should be a systematic way of evaluating trustworthiness of autoformalization. Formalized proof in Lean is 100% correct, then the correctness of autoformalization dominates the error.
Xiyu Zhai (Jun 25 2024 at 05:15):
I suggest a Chain-Of-Though approach for complicated translation. I'm working on it. But it shall take about a year.
Xiyu Zhai (Jun 25 2024 at 05:17):
There should be a hybrid AI system that easily incorporates new Lean formalized concepts into itself and runs with optimal speed yet gives sound guarantees.
Jason Rute (Jun 25 2024 at 23:45):
@Xiyu Zhai I look forward to seeing what you will come up with!
As for what is and isn't currently science fiction, that is probably a matter of degree (and how much you are factoring in future research).
- Translating self-contained mathematical statements that use standard mathematical objects into Lean or Isabelle seems to be making good progress. This includes using good feedback and reinforcement learning loops like in the recent DeepSeek-Prover (#Machine Learning for Theorem Proving > DeepSeek-Prover), which is probably state-of-the-art here.
- It is harder once the math concepts are more esoteric even if they already are found in a library like Mathlib. (Although using RAG and Lean's metaprogramming like in LeanAid are probably good approaches.)
- It is much harder once one has to translate new definitions. Now you really have to start to understand the concepts involved better, especially if the translation isn't routine. I'm not aware of any work in this direction.
- Translating proofs is also more of a challenge.
- I think once we start talking about translating an article, textbook, or research book, then we really are at least several years, if not decades, into the future. Current deep learning tools just don't do high-level organization and planning yet. It is the same difference as having ChatGPT write a few lines of code versus plan and develop a whole large project. (Although I could see auto-formalization being one of the first successful applications of such a project-level system since a formal theorem prover provides so much feedback and reward signals to help the agent know it is on the right path.)
Jason Rute (Jun 25 2024 at 23:52):
- And even if we could translate very well-written, self-contained articles or textbooks, real math is incredibly messy: citations, mistakes, left as an exercise to the reader, abuse of notation and terminology, lazy writing, the reader needs time to develop their understanding and intuition about the field, etc. That would be even another order of magnitude more difficult. However, this might have an interesting effect if we tell mathematician that our AI could formalize and check their work as long as it is well-written. :)
Xiyu Zhai (Jun 26 2024 at 03:38):
I have a new programming language for writing AI agents, the value of which I want to demonstrate in this endeavor. (In simple terms, in the language one easily breaks a complicated tasks into smaller ones, and using larger llms as labeling functions we can train smaller llms to achieve these simple tasks. The complete story is much more convoluted though)
The points you gave here are invaluable. It gives me a clearer idea of what difficulties lie in here. I'll definitely cite this. (I don't know what's the right way to quote a discussion in zulip though, but sometimes I feel the information in zulip can be much more valuable than those in papers)
Xiyu Zhai (Jun 26 2024 at 03:41):
From a theoretical viewpoint, I want to understand what's the computation process of translating a human language proof to a formalized one, and how to decompose it properly so that it can be reliably solved by an AI system with as little human intervention as possible.
Joseph Tooby-Smith (Jun 26 2024 at 17:49):
@Siddhartha Gadgil Is there an easy way I can use your auto-informalization on my own theorems and definitions?
Ideally I want to be able to run an auto-informlalizer on all of the lemmas and definitions in a file, certify them individually (as a human) as good or bad (or better or worse then the existing doc-string if it exists), and place the good ones in the file as a doc-strings.
Siddhartha Gadgil (Jun 26 2024 at 21:05):
It is open source in LeanAide but currently tangled in other code. I will try to make an easy to use version
Last updated: May 02 2025 at 03:31 UTC