Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: Paper: Enhancing Neural Theorem Proving through Data Augm...

Jason Rute (Dec 28 2023 at 11:43):

This paper just came on my feed. https://arxiv.org/abs/2312.14188 Looks like another neural theorem prover for Lean. I have looked at the details yet.

Jason Rute (Dec 28 2023 at 11:45):

Maybe the authors @Rahul Vishwakarma and Subhankar Mishra would like to say more about it.

Jason Rute (Dec 28 2023 at 11:51):

The model seems to be called DS-Prover.

Adam Topaz (Dec 28 2023 at 14:54):

Is the code for splitting up tactic steps into multiple steps available somewhere?

Adam Topaz (Dec 28 2023 at 14:55):

BTW, simp [p1, ..., pn] and simp [p1]; ...; simp [pn] are generally not equivalent, but maybe that's not too much of an issue for the purposes of getting training data.

Jason Rute (Dec 28 2023 at 15:02):

@Adam Topaz are you referring to something in the paper?

Adam Topaz (Dec 28 2023 at 15:04):

Yes

Jason Rute (Dec 28 2023 at 15:04):

Oh I see, 3.2.2. Yes, that simp example seems problematic. It would be better to do this with metaprogramming to get things right. Some of the Coq papers split tactics like this with metaprogramming.

Adam Topaz (Dec 28 2023 at 15:05):

The appendix A.1 has a table of all the conversions

Eric Rodriguez (Dec 28 2023 at 15:11):

image.png
!!!

Rahul Vishwakarma (Dec 28 2023 at 15:59):

In this paper, we enhance the neural theorem prover using data augmentation and a dynamic sampling method.

Data Augmentation: ML model learns to predict the tactics from the training data. To enhance the training data, we increased the number of tactic goal pairs by creating new tactic goal pairs by decomposing the rewrite and simplification tactics with multiple premises into tactics with single premises e.g. changing ‘rw [p1, p2]’ into ‘rw [p1]’ and ‘rw [p2]’. This provides the model with a larger number of examples to learn from and helps it handle the diverse goals better while generating tactics.

Dynamic Sampling (DS-Prover) is a small improvement over fixed sampling (sampling same number of tactics each time to expand the goal) method where we decide the number of tactics to sample and apply to expand the current goal, taking into account the remaining time compared to the total allocated time for proving a theorem.

In the fixed sampling method, the number of nodes increases exponentially (at a depth) as the search tree depth increases. This makes this method harder to explore the proofs with a larger number of proofsteps (higher tree depth).

In dynamic sampling, we start with a larger number (n) of tactics to sample initially and then decrease n as time passes. This method improves the prover's performance by countering the exponential growth of the number of nodes at the increasing depth of the search tree, making the prover better at exploring the proofs even with larger number of proof steps along with smaller ones.

Rahul Vishwakarma (Dec 28 2023 at 16:06):

Adam Topaz said:

BTW, simp [p1, ..., pn] and simp [p1]; ...; simp [pn] are generally not equivalent, but maybe that's not too much of an issue for the purposes of getting training data.

Hmm, we cannot replace them directly. So, we had to look at all the possibilities. We have explained this in detail in section 3.2.2 (Data Collection) of the paper.

Adam Topaz (Dec 28 2023 at 16:08):

Thanks for the clarification!

Rahul Vishwakarma (Dec 28 2023 at 16:12):

Jason Rute said:

Oh I see, 3.2.2. Yes, that simp example seems problematic. It would be better to do this with metaprogramming to get things right. Some of the Coq papers split tactics like this with metaprogramming.

Here is the description of how we collected this data:
Since the list of premises in the simplification (simp, dsimp, simpa, simp rw) tactics is not ordered i.e. the premises can be used in any order to simplify the goal. So, to collect the goals corresponding to each new simplification tactics we used LeanDojo to apply all the new tactics one by one at the goal of the step where the original simplification tactic was supposed to be applied and record the tactic goal pair whichever succeeds and keep on repeating this task on the subgoals generated by the application of first successful tactic and we don’t keep the successful tactics in the future list.

Eric Rodriguez (Dec 28 2023 at 17:00):

The simpa example there ought to fail no matter what if this is done; I think the only way you could approximate simpa with a chain of tactics is turning simpa [p1, p2,...] using <a> into have := <a>; simp [p1] at this |-;... simpa [pn] using this; is there any successful simpa calls in the generated dataset?

Rahul Vishwakarma (Dec 28 2023 at 17:26):

Eric Rodriguez said:

The simpa example there ought to fail no matter what if this is done; I think the only way you could approximate simpa with a chain of tactics is turning simpa [p1, p2,...] using <a> into have := <a>; simp [p1] at this |-;... simpa [pn] using this; is there any successful simpa calls in the generated dataset?

Thanks for pointing this out.
I wasn't aware of this property of the 'simpa' tactic, so I applied the same method to all the simplification tactics.
Regarding the presence of 'simpa' in the generated data, I will check it and provide an update here.

Jason Rute (Dec 28 2023 at 17:48):

I’m happy to see papers looking more into search algorithms. I think we are currently too focused on the prediction algorithms and not how to use them most efficiently to search.

Alex Sanchez-Stern (Dec 28 2023 at 21:46):

Proverbot9001 runs into a similar issue when decomposing tactics for data augmentation, because many compound tactics have non-trivial semantics that can't be fully decomposed. But the way people use them most of the time doesn't run into those cases. So what we do to solve that is that our decomposition procedure (in proverbot terminology, the "lineaerizer") is running the decomposed proofs as it produces them, both because it needs some dynamic information to do the decomposition (; in coq is complicated), and because that means it can check the decomposed proofs at the end. Then, if decomposition fails on a particular proof, we can just return the original proof and move on. So it's a "best-effort" decomposition that works in 95% of cases, and is always sound.

So if you want to keep your ability to decompose more tactics, you might want to try that sort of "check and fallback" strategy. You can even do it at a much finer granularity than full proofs with some effort, just keep track of what each tactic produced in it's original form and make sure it produces an identical state in it's decomposed form. That way, if one simpa can't decompose, you can keep that one compound and still decompose the rest of the proof.

Alex Sanchez-Stern (Dec 28 2023 at 21:53):

Also, do you have any evidence that allowing more predictions at the beginning and fewer predictions in the end is better than allowing fewer predictions in the beginning, and then more at the end? Both combat the exponential growth of the search tree the same, but it is often more "obvious" what the correct step is during the initial steps in a proof than deep in a proof, from a human perspective.

Rahul Vishwakarma (Dec 29 2023 at 09:32):

Alex Sanchez-Stern said:

Also, do you have any evidence that allowing more predictions at the beginning and fewer predictions in the end is better than allowing fewer predictions in the beginning, and then more at the end? Both combat the exponential growth of the search tree the same, but it is often more "obvious" what the correct step is during the initial steps in a proof than deep in a proof, from a human perspective.

We chose a decreasing (high to low) number of tactic sampling method because of the following reasons:

The search algorithm often encounters situations where the queue, containing the nodes to explore, becomes empty. This occurs when the number of new nodes being added to the queue is fewer than the nodes being explored and removed, leading to an early stoppage without any proof. To prevent such cases, we decided to start with a larger number of tactics to sample. This approach enables us to add a greater number of nodes to the queue initially, thereby maintaining an adequate number of nodes throughout the proof search.
Upon analyzing the proofs of the theorems in mathlib, we discovered that there exists a larger number of theorems with smaller proof sizes, and vice versa. As the tree depth increases, the number of paths leading to that depth also exponentially increases, providing more options for finding the proof. Conversely, for shorter proofs, this is not the case. Hence, we need to initially sample a higher number of tactics for the prover to have enough options to find shorter proofs.
Another reason why we thought of this idea is based on exploration and exploitation. Initially (when we start the proof search) we have enough time to explore and hence we even look at the less probable tactics generated from the model, whereas as time passes we try to save time (for exploring higher depth) by focusing only on the tactics which are more probable (has higher chances of success).

Rahul Vishwakarma (Dec 29 2023 at 11:30):

Eric Rodriguez said:

is there any successful simpa calls in the generated dataset?

Upon analyzing the data, I found that only 102 out of 7384 total decomposed 'simpa' tactics were able to reduce their current goal into subgoals, and only those 102 tactic-goal pairs are present in the augmented dataset. Since we are only adding the successful tactic-goal pairs to the augmented data and discarding the unsuccessful ones, I don't think this method will cause any issues.

Last updated: May 02 2025 at 03:31 UTC