Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: Looking for help with natural language reasoning task

Dongwei Jiang (Jul 24 2023 at 15:36):

I'm looking for someone skilled with Lean to join me in a project involving natural language reasoning tasks. The idea is first use GPT-4 to translate natural language reasoning tasks like ProofWriter to Lean code and subsequently prompt GPT-4 for theorem proving, or possibly train a model for this task (an example can be found here). Throughout the task, LeanDoJo is used to enforce restrictions on inference output. I'm hoping that we can utilize theorem-proving models and bring the rigidity of math into everyday reasoning this way.

Given the complex nature of the project, it is crucial to engage someone well-versed in Lean. I particularly need help in two areas:
1 Finding more links between theorem proving and natural language reasoning
2 Figuring out the best way to train a model that can solve these reasoning tasks

On a reciprocal note, I am keen on offering my experience in the field of deep learning. While my background mainly focuses on speech, I am currently transitioning towards large language models. By sharing our knowledge and skills, I think we can both learn more and potentially come up with new ideas.

Kaiyu Yang (Jul 24 2023 at 20:39):

Just a general thought: If you want to target datasets similar to ProofWriter, logic programming (Prolog, datalog, etc.) might be much more effective than Lean.

Dongwei Jiang (Jul 25 2023 at 00:13):

Kaiyu Yang said:

Just a general thought: If you want to target datasets similar to ProofWriter, logic programming (Prolog, datalog, etc.) might be much more effective than Lean.

I appreciate your response! Indeed, recent projects such as Logic-LM have employed Prolog to formalize datasets akin to ProofWriter and FOL grammar to formalize datasets similar to FOLIO. My inclination towards Lean, which could be subject to correction, is based on:

1 Its potent representational capability, enabling the encapsulation of most symbolic reasoning problems for a unified solution. Additionally, a vast amount of existing Lean code online has been incorporated into GPT-4's training data, which could enhance the accuracy of the formalization.
2 Lean has been frequently used in the field of Automated Theorem Proving (ATP). It has been applied to a wide range of theorem proving problems, generating a substantial amount of theorem proving/reasoning data. In addition, there are pre-existing studies and readily available pre-trained models for Lean that I intend to leverage.

Tyler Josephson ⚛️ (Jul 25 2023 at 01:28):

(deleted)

Kevin Buzzard (Jul 25 2023 at 06:14):

Lean has been frequently used in the field of automated theorem proving

Citation needed?

Dongwei Jiang (Jul 26 2023 at 14:28):

Kevin Buzzard said:

Lean has been frequently used in the field of automated theorem proving

Citation needed?

Thank you for your inquiry. Based on my readings, there have been several initiatives including Curriculum Learning, PACT, HyperTreeProof, LeanDoJo and others that have used Lean as their language for ATP.

Moreover, the generous amount of Lean data that is readily accessible and the highly practical tool LeanDoJo, which enables smooth interaction with Lean, offer considerable assistance for the research question I am exploring

Kevin Buzzard (Jul 26 2023 at 16:50):

Thanks for taking the time to respond! I am not really up to date in this area.

Jason Rute (Jul 26 2023 at 18:49):

I think what @Dongwei Jiang means is that Lean has been used (in addition to Metamath, Isabelle, and Coq) for machine learning projects that try to get AI to solve theorems automatically. Think the minif2f benchmark. I don’t think there is any practical applications yet of ATP using lean, and certainly not any of the above mentioned papers, which aren’t available to end users. I think most practical ATP is still in the realm of FOL and SAT/SMT solvers.

Jason Rute (Jul 26 2023 at 18:51):

(I should back track a bit. I think the tactics and automation built and used by the Lean community, like linarith, could be considered practical ATP, although practical for the purpose of writing Lean proofs.)

Min-Hsien Weng (Aug 04 2023 at 00:28):

I am new to Lean and I often find it hard to find what premises to use and whether they would lead to the proof. If a language model could help me find useful premises and explain their meaning to the proof goal, that would be very helpful!

Dongwei Jiang (Aug 06 2023 at 02:26):

Min-Hsien Weng said:

I am new to Lean and I often find it hard to find what premises to use and whether they would lead to the proof. If a language model could help me find useful premises and explain their meaning to the proof goal, that would be very helpful!

I'm also new to Lean and my personal recommendation is to try out the number game. My grasp of the tactics and premises has greatly improved since I started playing it. The game is expertly designed with progressively challenging levels, clear explanations, and a touch of witty humor that brought a smile to my face.

Min-Hsien Weng (Aug 06 2023 at 22:58):

Dongwei Jiang said:

I'm also new to Lean and my personal recommendation is to try out the number game.

I did enjoy the game and had a good time and learned a lot. Highly recommend it the number game

Last updated: May 02 2025 at 03:31 UTC