Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: Paper: Learning to Prove Theorems via Interacting with Proof

Jason Rute (May 09 2020 at 23:49):

Learning to Prove Theorems via Interacting with Proof Assistants by Kaiyu Yang and Jia Deng
The paper with CoqGym and ASTactic. (It's been around for a while, but I want to talk about it now.)

Jason Rute (May 09 2020 at 23:50):

@Brando Miranda I'm continuing the ASTacti/CoqGym conversation from here.

Jason Rute (May 09 2020 at 23:54):

@Brando Miranda asked:

Can someone shed me some light on why generation of tactics via tree expansion is not a popular method? Do people know what are it's drawbacks?

Ok, I looked at the CoqGym/ASTactic paper more. First, the "tree" in "tree expansion on tactic grammar" is neither of the above trees I mentioned in the other thread. As far as I can tell their approaches to those two trees are similar to the other papers. What they mean by "tree expansion" on the tactic grammar refers to how to build a single tactic command like rewrite IHa’. Now let's first talk about how DeepHOL does this (and other similar AI agents). DeepHOL take the goal state and then uses that goal state to compute a tactic such as REWRITE. This step is called tactic selection. Then if the tactic takes premise arguments, they take that same goal state (and in some models they also use the chosen tactic) to rank the available premises and they take the best one. (Or maybe they that ranking as a probability distribution and randomly choose one. I forget the details.) If there is a list of premises (like in SIMP_TAC), DeepHOL takes a fixed number (if I remember correctly) of the highest ranked premises. Now, in my mind, the big difference in the ASTactic paper is that they fill in more of the tactic arguments.

Jason Rute (May 09 2020 at 23:54):

Whereas DeepHOL only supplies a tactic and premises, ASTactic might supply more arguments. Some of this might just be necessary for some of the Coq tactics, but it is in the long run important to be able to fill in more arguments to tactics so that the AI is not constrained by the types of proofs it can find. (Currently DeepHOL and most of the other Machine Leaning proof AIs can't generate the witness terms for existential proofs.) The CoqGym/ASTactic folks call this "tree expansion on the tactic grammar", but as a concept it seems similar to the other papers, but with more tactic arguments filled in (and maybe more concern about the distinction between different types of arguments, e.g. not all lemmas can be used for rewriting).

Jason Rute (May 09 2020 at 23:54):

Now, I'm really confused how extensive their AI actually is at filling in tactic arguments. They give a complicated context free grammar in the appendix, but the paper only lists three types of arguments:

identifiers of premises. Note like DeepHOL, they only use premises which have already been named with identifiers. (Note that in Coq, or Lean it is essential to take local premise arguments in addition to global ones. Some tactics also only take local premises.)
numbers. I don't know enough about Coq to know how numbers are used in tactics, but they have a model for choosing the numbers 1 through 4.
variable names. For example, the induction tactic needs a variable name. They randomly pick a universally quantified variable in the goal. (I assume other Coq AIs do something similar. Of course this could also be considered a form of premise selection with the premise n : nat.)

Jason Rute (May 09 2020 at 23:55):

That is it! They also show an example of choosing the location of the rewrite. Like in Lean, Coq can apply a rewrite to the local context. Maybe they consider this to be part of tactic tree generation process where they choose the rewrite tactic, they have to choose out of the four rewrite location types. Then if one chooses to rewrite the local context, then one has to choose which local hypothesis to rewrite.

Jason Rute (May 09 2020 at 23:55):

My (slightly uninformed) conclusion is this. Yes, it is important to fill in more of the tactic arguments and to use the tactics to their fullest. In HOL-Light, I don't think the tactic were as complicated so they aren't missing as much by not using all the tactic possibilities. (However, it still may be good to filter down, say, the rewrite premises to those which are of the correct form. Also, they probably should eventually fill in variable names.) As for other Coq agents, I'd have to look more at how they do things. These other agents generally are more powerful, so I don't think they are missing out too much on Coq's tactic features if they choose not to use a tactic option. Maybe they apply heuristics for some things like variable names, or maybe they just treat problem as premise selection as well. (Since in Coq and Lean, n : nat and h : n = n are both a form of "premise". It may be easy for the AI to realize that induction needs a local premises of the first type and rewrite needs a local or global premise of the second type. Also, I don't know that it is even a good idea to use the grammar for human tactics. If it is important to rewrite the goal and rewrite the local hypotheses, one could just break that into two tactics. Honestly, the "context free grammar" is really flat and is mostly just a list of tactics with very few options for the tactics. (For example, they don't use any of the tactic combinators like repeat, try, <|>, ;, etc.) Last, none of the current AIs can yet handle the trickiest arguments, like existential proofs. (ASTactic limits existential arguments to local hypotheses.) Once we start supplying term arguments, then we will definitely need something like this. So if that is where your thoughts are headed, I agree we will need ways to build arbitrary term trees.

Jason Rute (May 09 2020 at 23:55):

One more comment about the ASTactic paper. They use a very complicated model for predicting the tactic and arguments, in particular they use the premises and the goal state and previous hidden states in their RNN for building process. As usually with more complicated models, especially RNNs, I think they could be more difficult to train and they didn't do an abolition to show that the GRU RNN added anything. Indeed this is not the best performing Coq AI (maybe the worst performing!), and I am even skeptical about the way they handle premise selection, since they only choose 10 premises from the global context. Are those the 10 used in the human proof or 10 random ones? Anyway, I think these ideas hold a lot of promise but I think they are currently not adding value for the added complexity.

Last updated: May 02 2025 at 03:31 UTC