Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: HOList or Lean?


view this post on Zulip Aidan Swope (Nov 10 2020 at 20:40):

Hello! I'm interested in getting started developing ML-based search strategies for ATP, but I'm having some trouble deciding between hacking on HOList vs Lean. Lean is appealing because it's going to be used for the IMO challenge and I'm more familiar with it as a prover, but HOList seems easier to get started with because it's got a "gym environment" with a fixed training set and a leaderboard of existing results. I'm also worried about developing for Lean 3 knowing that Lean 4 is coming.

Does anyone here have advice on whether Lean is in a good state to begin integrating with ML-based search? In particular I'm curious as to whether there's work on a similar "gym environment" for Lean, and whether proofs & methods developed for Lean 3 will be portable to Lean 4.

view this post on Zulip Kevin Lacker (Nov 10 2020 at 21:20):

I am interested in the same area. I don't think Lean 3 is in a great state to integrate with ML right now. The problem is that any sort of interfacing with external processes and doing some moderate amount of work in Lean is pretty slow. Gabriel Ebner has done the most work like this - see his talk in January - that was integrating with ATPs rather than with a machine-learning-driven system, but the challenges are similar. But he's waiting on Lean 4 for more work and I think his judgment is correct there

view this post on Zulip Kevin Lacker (Nov 10 2020 at 21:21):

To me, HOList doesn't seem promising to work with either. There's a leaderboard but the only people who interact with it in any way are people at one particular group at Google. So it isn't so much a big public thing as it is a specific approach by that group. I just suspect they will abandon it at some point, it's not like there's a community using it for anything. Maybe it is a good fit for what you're doing, though, if you're interested in work that would map nicely onto what they are doing

view this post on Zulip Aidan Swope (Nov 10 2020 at 21:23):

I see, thanks. After looking into it more it does look like Lean 4 will have nicer utilities for integrating with external ATP. Have you found other environments you see as more promising?

view this post on Zulip Kevin Lacker (Nov 10 2020 at 21:24):

I think it depends what you want to do - do you want to publish an AI paper, or do you want to publish an ATP paper, or do you want to make some tool that will help mathematicians, or something else entirely

view this post on Zulip Aidan Swope (Nov 10 2020 at 21:25):

I'm primarily from an ML background, and I'm interested in developing reinforcement-learning-based search strategies that will help big ATP efforts like the IMO challenge (the work would look much more like an ML paper than an ATP paper)

view this post on Zulip Kevin Lacker (Nov 10 2020 at 21:29):

  1. if you have some theory for how to beat the results in the "Learning to Reason in Large Theories without Imitation" paper, then hacking your idea into holist would make sense

view this post on Zulip Kevin Lacker (Nov 10 2020 at 21:31):

  1. kind of "lower in the stack" is work like https://arxiv.org/abs/2005.02094 on the automated theorem provers - those are aiming at simpler logical forms like first-order logic instead of all this meta-inductive-type stuff. there, the state of the art generally does not use machine learning. it seems possible for machine learning to help out there though. i dunno the best way to approach, maybe find one of the open source theorem provers and tweak it, depends on your idea

view this post on Zulip Kevin Lacker (Nov 10 2020 at 21:33):

  1. "higher in the stack" would be theorem proving that lives in the interface layer. afaict lean 3 is slower than isabelle so the coolest automated stuff happens in isabelle there, but neither one is really suitable for modern AI techniques unless you pipe all the data over to some other process that uses the gpu, or something else that is kind of mucking around in the internals

view this post on Zulip Aidan Swope (Nov 10 2020 at 21:35):

My work would probably be most appropriate "higher in the stack" -- at a high level I want to help turn interactive provers into automated ones with learned policies, possibly by imitating human proofs + RL, similar to the existing work on HOList. Mainly this is because proofs are shorter there.

I think I read that Lean 4 will have utilities to help with "symbolic simulation" so you don't have to translate Lean terms into your library before working with them, am I interpreting that right?

view this post on Zulip Kevin Lacker (Nov 10 2020 at 21:38):

I don't know - I don't think the features of Lean 4 are quite all figured out yet

view this post on Zulip Kevin Lacker (Nov 10 2020 at 21:41):

so I think that it will just not be easy to integrate Lean 3 with an external AI process. hopefully Lean 4 makes this a lot better, but don't hold your breath waiting for that to happen. with Isabelle it is certainly possible to integrate with external processes, sledgehammer does this, but I don't know enough more about the Isabelle system to firmly recommend for or against it.

view this post on Zulip Aidan Swope (Nov 10 2020 at 21:44):

This was very very helpful, thank you!

view this post on Zulip Jason Rute (Nov 10 2020 at 23:22):

My short answer is that ML for Lean would be bleeding edge. There is no good interface or dataset yet, but I'm optimistic that there will be more infrastructure soon. Having said that, I hope you keep an interest in Lean. It has a fast growing database, and I think ML for theorem proving does not need to be exclusive to any single theorem prover.

view this post on Zulip Jason Rute (Nov 10 2020 at 23:22):

There is a lot of interest in ML for Lean, some from large research groups like yourself (I'm assuming you are at NVIDIA from your LinkedIn), and some from hobbyists, or those firmly in the theorem proving community. I think one of the best things you and others can do now is clearly state what you would want if an interface for Lean 3 existed (or if a dataset existed). There are a few of us interested in building such things, and we would be happy to discuss further.

view this post on Zulip Jason Rute (Nov 10 2020 at 23:22):

As for Lean 4, it is unclear how long it will take. I do think that there will be advantages in theory, but there won't be an ML framework when it is built that I am aware of, so I'm inclined to start building something for Lean 3.

view this post on Zulip Jason Rute (Nov 10 2020 at 23:22):

As for the Grand Challenge, I'm not convinced that ML and RL done on human written Lean proofs will get very far in proving Olympiad problems, but I do think it will go really far in improving the ability of Lean to do trivial and simple tasks. Also, I see the appeal for at least some dataset in the same language as the Grand challenge.

view this post on Zulip Jason Rute (Nov 10 2020 at 23:26):

As for other ML fo ITP projects, it really depends what you want. HOList and CoqGym have some of the best datasets and RL interfaces, but unfortunately no one besides the developers uses them yet. Also look into ProverBot9001 for Coq. There are more datasets (for Isabelle for example), but I don't think they have interactive RL environments. Also, you can be like OpenAI and build your own Metamath environment since the logic is really simple to implement.

view this post on Zulip Aidan Swope (Nov 11 2020 at 00:58):

I definitely agree that naive RL on individual proof steps won't get us all the way to IMO problems. One hope I have is that ML-guided "mid-level provers" will enable the kind of "very-high-level" tactics Daniel Selsam described in his IMO challenge talk by dispatching intermediate goals. Another is that hierarchical procedural abstraction would allow learning multiple levels of guidance at once.

As for a Lean automation interface / dataset: I'd definitely be happy to discuss more. I have personal interest in Lean from a type theory and formal proving perspective, and I think big datasets with well-defined measures of progress are responsible for a lot of success in ML.

In the immediate future I'm leaning towards HOList for this project, partly for performance reasons as Kevin brought up. After discussing with the HOList authors, time taken to run the provers seems to be one of the biggest bottlenecks; their latest work uses a lot of compute already, mostly for generating experience. Fortunately, I don't think any given ML algorithm is likely to be too specialized to the prover it was developed in.

view this post on Zulip Kevin Lacker (Nov 11 2020 at 04:33):

Jason Rute said:

I think one of the best things you and others can do now is clearly state what you would want if an interface for Lean 3 existed (or if a dataset existed). There are a few of us interested in building such things, and we would be happy to discuss further.

I would like functionality similar to the Isabelle sledgehammer, where you could export a theorem statement to some external prover, and then have the Lean kernel verify that proof. On top of that, I would like a script that you could give your own theorem prover to it, ML-based or otherwise, and it would just check against all the existing theorems in mathlib to see which ones your theorem prover could handle. That combo would basically let you use "all the theorems in mathlib" as training data. I suspect over time that would become the largest & best ML-for-math training dataset, based on the velocity of the mathlib community.


Last updated: May 09 2021 at 21:10 UTC