Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: Comparison among Lean, Coq and HOL

venom Yu (Feb 20 2023 at 05:12):

Hi, I am a new member of the ML for Theorem Proving and I am surveying the related programming languages. Currently I've restricted my choices among Lean, Coq and HOL. Could anyone give me some suggestions on what language I should choose to start. I'm worried about the learning resources/difficulty, community, the difficulty to bind with python.

fzyzcjy (Feb 20 2023 at 05:25):

Also curious about the question! My two cents: If you want to play with https://github.com/facebookresearch/miniF2F, then cannot use Coq since it only supports Lean/Metamath/Isabelle/HolLight.

Zhangir Azerbayev (Feb 20 2023 at 05:48):

If you want to implement the full neural theorem-proving stack by yourself or with a small team, I'd learn metamath. The Metamath proof checker is small enough that you can implement it from scratch in Python, so you won't have to worry about interfacing with an external prover. Additionally, if you want to develop new algorithms for ml-based theorem proving, metamath is the place to build a minimal working example. However, metamath doesn't work well for autoformalization because the proofs are too low-level.

If you want to contribute to cutting-edge applied research in neural theorem proving and autoformalization, Isabelle/HOL and Lean are the way to go. There's a decently sized group of researchers at Google and Cambridge using Isabelle , and myself and Jia Deng at Princeton are interested in Lean. I don't have any personal experience trying to interface Isabelle and python, but from what I can tell it is easier than interfacing Lean and python.

If you want to build ML-based tools for interactive theorem proving (e.g chatbots, semantic search), I've found the Lean community very receptive to new technology. The VS-code tooling for Lean also makes front-end work easier than with other proof assistants. This will get even better with Lean 4.

Jia Deng's lab wrote a paper a few years ago that used Coq, but afaik there are no plans for follow up work that uses that language. The Google team used HOL Light in a 2019 paper, but afaik there are also no plans for follow up working in HOL light.

Jason Rute (Feb 20 2023 at 13:19):

Here are my own two cents on top of the excellent advice given already. Getting into this field isn't as easy as say playing with open AI's atari gym. It involves learning a lot of stuff. Here are some questions to consider, including your own:

How easy is it to learn?
Coq, Lean, Metamath, and Isabelle all have good learning resources and a good community which can answer your questions. HOL-Light isn't well used and there isn't much of a community to help I think.

Which have resources already for theorem proving, including binding with Python?
In my opinion, the most successful projects are where an expert in the language (especially the metaprogramming side of the language) partners with an expert in machine learning. Nonetheless, the fruit of those partnerships is a gym environment which lets you interact with the theorem prover and python, and if you can find a good gym, you may not needs such as partnership. There are some good tools out there. The best is probably @Albert Jiang's PISA interface to Isabelle. He is actively maintaining it, and it is what Google uses for research. But I haven't used it, so I don't know for sure. There also are good resources for Lean 3, including lean-gym. It is not maintained as much, but there are a number of people here with knowledge of Lean metaprogramming, and a number of people here who seem to be working on both Lean 3 and Lean 4. So you have a strong community. Lean 4 is going to make every part of this easier and it is going to let people built really cool applications (for those who know Lean metaprogramming). But the tooling isn't yet built. As for Coq, it presents an interesting and important challenge. Coq is spread out across projects and different versions. There is no central Coq library. But on the other hand, Coq is probably the most widely used theorem prover, especially for program verification. As @Zhangir Azerbayev said, Jia Deng and @Kaiyu Yang did some of the hard work of making a coq-gym, and others have built on top of it. It probably has the most papers which use it. However, it feels dated to me, and the reference neural prover uses tree LSTMs and a tactic AST which are far from state of the art here. But the Coq community is also large and diverse and I think there is a lot of interest in this area, so other tools may pop up. Google built a good HOL-Light interface a few years ago, but it is dated and no one uses it anymore. Finally, I totally agree with Zhangir about Metamath. It is very light weight and easy to make your own metamath clone in python. It will let you get up-to-speed really fast. And just also to mention, if you just want a toy theorem proving environment to play with ideas, there is INT by @Albert Jiang and others.

Do you have a particular method in mind?
All of the resources I've mentioned are tailored to the ideas that the author's had in mind. For example, HOList for HOL-Light and coq-gym are made for tree or graph networks so they proved s-expressions for the terms. The newer models are designed for language models and provide text. If you have a specific approach in mind, you will likely need to do some metaprogramming.

Do you want to do research or make a practical system?
Practical systems are harder to build. But they obviously provide more value to the community. I think Lean 4 widgets (worked on by @Wojciech Nawrocki) are the mostly likely option for building a really practical system that people actually use (and want to use). But I know others are also working on practical systems for Coq and Isabelle. Again, this is a good opportunity to partner with a metaprogramming expert.

venom Yu (Feb 20 2023 at 15:38):

Thanks for all the advice! They are all very valuable!

Last updated: May 02 2025 at 03:31 UTC