Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: How to run a simple ATP baseline for Lean


Jason Rute (Nov 06 2023 at 20:26):

One idea I've long had is to run a simple ATP baseline (or two) for Lean 3 or Lean 4 to compare to the existing SoTA methods (which almost all rely on medium or large transformer models). I'm wondering what tools are out there to make this baseline quick to implement. Say I create a tactic tac_predictor : Tactic List (Nat x String) which just uses the current tactic state to predict the next tactic as a list of weighted strings. It wouldn't use the pretty printed version of the state, but instead the types found in the environment. I'd ideally like a way to plug tac_predictor into an existing open source evaluation framework. Does something like that exist in say LeanDojo (@Kaiyu Yang, @Alex Gu) or llmstep (@Sean Welleck) or even mathlib (@Scott Morrison)? I'd ideally like the following:

  1. Some code I could adapt or reuse to extract the data I need. Again, this isn't a text representation, so I couldn't use an existing textual dataset, but it would still be tactic based, so I would need to access the lean proof state and the text of the tactic it was applied to.
  2. Some code to incorporate my tac_predictor into a tree search. Ideally the tree search would be fast, since my tactic predictor would be really fast.
  3. Some code to do an apples-to-apples comparison with an existing system and benchmark (either MiniF2F or a held out portion of Mathlib).

If I had to implement this on my own, I'd likely just reuse our code from PACT, but I was wondering if there was something more modern and standard. Or if I did it in from scratch in Lean 4, I guess it would be a good project to learn Lean 4 meta-programming. :smile: (Also, this is more hypothetical than a concrete plan. I think it will depend how easy it is to execute.)

Jason Rute (Nov 06 2023 at 20:27):

(deleted)

Jason Rute (Nov 06 2023 at 20:34):

Just to be clear, by ATP baseline I don’t mean a first order ATP. This isn’t a hammer. Think more hand crafted features put through a pre-deep-learning ML algorithm.

Kaiyu Yang (Nov 06 2023 at 20:36):

Some code to incorporate my tac_predictor into a tree search. Ideally the tree search would be fast, since my tactic predictor would be really fast.

@Jason Rute It looks like our recent work on extending aesop to take a "tactic generator" is exactly what you need? (see https://github.com/leanprover-community/aesop/pull/70 and https://github.com/leanprover-community/aesop/blob/master/AesopTest/TacGen.lean). It is part of our ongoing improvement to LeanInfer that combines LLM-generated tactics with aesop. Here is an example of how to use it within LeanInfer: https://github.com/lean-dojo/LeanInfer/blob/main/LeanInferTests/Aesop.lean

Kaiyu Yang (Nov 06 2023 at 20:41):

Some code to do an apples-to-apples comparison with an existing system and benchmark (either MiniF2F or a held out portion of Mathlib).

With this infrastructure, if you want to compare method A with method B, you can create two tactic generators and add them to two aesop rule sets, respectively. Then it's easy to just call aesop with different rule sets.

Jason Rute (Nov 06 2023 at 21:05):

Interesting. I’ll have a look. I just assumed Aesop was like tidy and didn’t backtrack to a previous state. Does this search/backtrack or is it more of a greedy search like tidy?

Kaiyu Yang (Nov 06 2023 at 21:08):

Aesop uses best-first search, so it should backtrack.

Jason Rute (Nov 06 2023 at 21:10):

Ok, I clearly don’t know how Aesop works. What does vanilla Aesop use for a heuristic in its best-first search? (Or maybe I should just read the Aesop paper.)

Jason Rute (Nov 06 2023 at 21:15):

Also, could your enhanced Aesop be able to, say, replicate your ReProver results (maybe just the baseline without retrieval)?

Kaiyu Yang (Nov 06 2023 at 21:18):

Aesop automatically applies some built-in tactics, e.g., simp_all, and it also provides a set of "rule builders" that allow users to configure the set of tactics for expanding a node during best-first search. Our extension basically allows users to provide not only fixed tactics but also "tactic generators MVarId → MetaM (Array (String × Float))". The tactic generator may or may not be based on LLMs.

Also, could your enhanced Aesop be able to, say, replicate your ReProver results (maybe just the baseline without retrieval)?

We tried on MiniF2F's validation set, and their performance were similar. However, they are not exactly the same, due to the details of aesop (e.g., it automatically normalizes proof states) and differences in resource limit (Lean 4 does not have Lean 3'stry_for_time, so we can only use heartbeats as a proxy of timeouts).

Jason Rute (Nov 06 2023 at 21:54):

As for my (1), do you have some tool which could take a function of form MVarId → MetaM (Array Float) and use it to get a vector embedding for each proof state (along with the text for the tactic)? Then I could extract this embedding-tactic pair for every goal in Mathlib.

Jason Rute (Nov 06 2023 at 21:56):

Or even MVarId → MetaM (String) would be fine. I could just serialize the data I want to extract.

Adam Topaz (Nov 06 2023 at 21:57):

Jason Rute said:

As for my (1), do you have some tool which could take a function of form MVarId → MetaM (Array Float) and use it to get a vector embedding for each proof state (along with the text for the tactic)? Then I could extract this embedding-tactic pair for every goal in Mathlib.

I think the code from lean-training-data shouldn't be too hard to modify to obtain this.

Siddhartha Gadgil (Nov 07 2023 at 03:46):

Kaiyu Yang said:

Some code to incorporate my tac_predictor into a tree search. Ideally the tree search would be fast, since my tactic predictor would be really fast.

Jason Rute It looks like our recent work on extending aesop to take a "tactic generator" is exactly what you need? (see https://github.com/leanprover-community/aesop/pull/70 and https://github.com/leanprover-community/aesop/blob/master/AesopTest/TacGen.lean). It is part of our ongoing improvement to LeanInfer that combines LLM-generated tactics with aesop. Here is an example of how to use it within LeanInfer: https://github.com/lean-dojo/LeanInfer/blob/main/LeanInferTests/Aesop.lean

This is splendid. I had made a hacky equivalent in LeanAide, but this is much cleaner. It is also very nice that common tasks are getting upstreamed to avoid duplication.

Min-Hsien Weng (Nov 08 2023 at 21:25):

Kaiyu Yang said:

Aesop automatically applies some built-in tactics, e.g., simp_all, and it also provides a set of "rule builders" that allow users to configure the set of tactics for expanding a node during best-first search. Our extension basically allows users to provide not only fixed tactics but also "tactic generators MVarId → MetaM (Array (String × Float))". The tactic generator may or may not be based on LLMs.

I came across the TensorFlow GNN [Graph Neural Networks] (https://github.com/tensorflow/gnn) library. It helps to work with graph data structures in neural networks, such as using GNNs to model abstract syntax trees (ASTs) in compilers.

It seems that the output of tactic generators could be graphical data structures combined with rankings. GNN may potentially make it easier to implement "tactic generation/Aesop as a neural network problem."


Last updated: Dec 20 2023 at 11:08 UTC