Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: Automatically generated lemmas

Cedric Holle (Sep 22 2022 at 08:29):

Hey, I generated ~180k lemmas without proof. I hope someone has a use for them ;)
https://github.com/todbeibrot/lemma-set

Scott Morrison (Sep 22 2022 at 08:34):

Um, are these any more useful than randomly stringing together tokens drawn from mathlib? It doesn't look like anything there would even typecheck.

Cedric Holle (Sep 22 2022 at 08:37):

They all typecheck. They don't necessarily make a lot of sense.

Timothee Lacroix (Sep 22 2022 at 08:42):

Hey Cedric, do you have more details / maybe some code that describes how these lemmas were generated?

Cedric Holle (Sep 22 2022 at 08:54):

I used Deep Reinforcement Learning. It basically uses one definition at each step. It only optimizes for validity (and diversity). I am planning on uploading everything. But it might take a while.

Timothee Lacroix (Sep 22 2022 at 09:14):

Cool, wouldn't this amount to building the proof term then?( In which case you do have the proof)

Is there a paper?

Jason Rute (Sep 22 2022 at 10:38):

I second @Timothee Lacroix request for a paper or more general system description.

Jason Rute (Sep 22 2022 at 10:42):

When you say optimizing for validity, you mean (1) that the lemma statement type checks (as a Prop), (2) that it has a trivial proof solvable with say finish, tidy, simp, or some other simp tactic or set of tactics, (3) that it produces stuff already in mathlib, or (4) that you can synthesize a proof as well also using Deep Reinforcement Learning.

Jason Rute (Sep 22 2022 at 10:46):

Oh, looking at the README, I’m guessing (2), specifically you are trying finish on both the conjecture and it’s negation. Then you have three possibilities: solved with finish, negated with finish, and not resolvable withfinish. Then I assume you give each option a reward for RL purposes.

Jason Rute (Sep 22 2022 at 13:02):

Also by “uses one definition” at each step, I assume you mean you use “apply <ident>” tactics, right? I also assume you are using a language model if you are using lean-gym, right?

Cedric Holle (Sep 22 2022 at 18:04):

Timothee Lacroix said:

Cool, wouldn't this amount to building the proof term then?( In which case you do have the proof)

Is there a paper?

Unfortunately no, I only have a proof for the really trivial lemmas. You can find the solving tactic at the end of each line in the comment.

Jason Rute said:

Also by “uses one definition at each step”, I assume you mean you use “apply <ident>” tactics, right? I also assume you are using a language model if you are using lean-gym, right?

Yes, I proof Prop by using fapply <ident>, where <ident> is selected from a set of definitions in mathlib. The resulting proof is then the new lemma.
A language model is not required for this task. I use PPO by Stable Baselines3.

Jason Rute said:

Oh, looking at the README, I’m guessing (2), specifically you are trying finish on both the conjecture and it’s negation. Then you have three possibilities: solved with finish, negated with finish, and not resolvable withfinish. Then I assume you give each option a reward for RL purposes.

Exactly. And try_finishtries trivial, refl, dec_trivial, ...

Jason Rute said:

I second Timothee Lacroix request for a paper or more general system description.

I would like to write my master's thesis about it and explain every detail. At the moment I'm stuck with the registration. So it might take some time.

Jason Rute (Sep 22 2022 at 18:44):

@Cedric Holle As for PPO, of course you don’t need a language model for a classification task, but how do you process the input (i.e. the goal state) which is provided as text since PPO by Stable Baselines3 doesn’t seem to handle text input? Do you tokenize it and extract features (e.g. n-grams)? Also do you use the partial proof term at all? (Feel free to wait until you have a paper to share details if you want. I’m just curious.)

Cedric Holle (Sep 22 2022 at 21:08):

I use a simple dictionary of words to numbers. And when a new word appears, I make a new entry in the dictionary. The dictionary has a limited size because I don't want to change the network architecture. Of course, this can lead to some problems. But if you only prove Prop, the amount of different words remains relatively small.
I also ignore parts of the goals if they are too long or if there are too many goals.

Ziyu Zhou (Sep 24 2022 at 15:09):

Hey Cedric, is it possible to derive natural language versions of these lemmas? I m working on autoformalization so that aligned data is much more valuable to me.

Jason Rute (Sep 24 2022 at 16:15):

@Ziyu Zhou You could just use Codex as in the recent autoformalization paper.

Jason Rute (Sep 24 2022 at 16:51):

To make my remark more clear, I doubt Cedric’s method would lead well to informal statements since he uses Lean to construct the formal version at the level of every token in the definition. Of course one could systematically translate formal tokens to informal by traversing the formal term but my impression is that past attempts to do that have led to rigid, unnatural translations. However there has been recent work showing Codex works well on “unformalizing” Isabelle (and likely Lean) code.

Jason Rute (Sep 24 2022 at 16:51):

But if your goal is data for training an autoformalizer, then I guess the question is if Cedric’s data is useful for use in cycle consistency methods where one improves a model by having it translate (formalize) what it just translated (unformalized) in the other direction.

Jason Rute (Sep 24 2022 at 16:56):

I guess this leads to another question: how do we evaluate if a conjecturing model is useful? What would one even use a conjecturing model for? It would certainly be a source of synthetic data for other tasks (like automatic theorem proving). It could also be a tool inside another tool, like a lemma creation module. It even could be used directly for scientific discovery, but that would require a really good model.

Ziyu Zhou (Sep 25 2022 at 04:00):

@Jason Rute Thank you. Codex is powerful, but I would perfer some lightweight model to achieve similar or even better functionality. As for Cedirc's work and my question, actually I just intended to see his idea about it.

Ziyu Zhou (Sep 25 2022 at 04:12):

From my experience and my intuition, Codex will have trouble generalizing to more "advanced" mathematical propositions, such as perfectoid space. I'd like to try something new, perhaps somewhat rule-based(but not rigid), but nothing has come of it yet.

Cedric Holle (Sep 25 2022 at 08:49):

@Ziyu Zhou I totally agree with Jason

Last updated: May 02 2025 at 03:31 UTC