# Zulip Chat Archive

## Stream: Machine Learning for Theorem Proving

### Topic: Seeking Insights on Theorem Proving's Role in Enhancing LLM

#### Dongwei Jiang (May 12 2023 at 17:15):

Hello everyone! I'm fairly new here and haven't had the opportunity to use Lean yet, but I've recently developed a fascination with the application of large language models (LLMs) for reasoning tasks. I've been delving into several papers in the theorem proving domain, including GPT-F, Curriculum Learning, Hyper Tree Search, Autoformalization, Thor, DSP, and MiniF2F, to name a few.

I'm aware that some of the authors of these papers frequent this chat, so please bear with me as I grapple with some beginner questions. I would be extremely grateful if anyone could shed some light on the significance of theorem proving from their perspective. Specifically, I'm struggling with two questions:

1 How does theorem proving stack up against program synthesis, especially considering that the latter also necessitates reasoning skills and seems to have a broader range of applications?

2 Given that current LLMs are still grappling with understanding and solving elementary school-level problems, is theorem proving a necessary step if my aim is purely to enhance the reasoning abilities of LLMs? The learning curve to understand Lean/Isabelle feels quite steep, and I'm wondering if it's an essential investment.

#### Andrés Goens (May 12 2023 at 17:34):

there's a stream #Machine Learning for Theorem Proving , maybe your post will be more visible to that audience there?

#### Tyler Josephson ⚛️ (May 12 2023 at 21:55):

As a beginner myself in this area, here’s my perspective:

The papers you mentioned often use some configuration of LLM to suggest proofs to Lean, which then verifies their correctness.

In this way, LLMs are still operating as pattern recognition machines, and they learn to suggest good tactics because the corpus of proofs on which they were trained has patterns, and reproducing those patterns sometimes leads to valid proofs, and sometimes not. By having Lean checking the answers, the correctness of the guesses can be verified.

But you might also be interested in how LLMs answer in chain of thought prompting or when told to “think step by step.” This kind of reasoning is more emergent and flexible than Lean, but it isn’t as trustworthy, and cannot be checked automatically. Folks are still studying how and why this happens. These emergent reasoning abilities more often appear in the LLMs trained on executable code than those trained mainly on text.

To your questions:

I’m trying to learn more about program synthesis myself. At least one big difference is that if you do program synthesis in, for example, Python, the resulting code cannot be formally verified. Programs written in Lean can be proven correct.

Good question! Since LLMs don’t yet reason robustly, I don’t think anyone really knows “what should we do to get LLMs to reason” - it’s an active area of research. Theorem proving may play a role, or may not. One thing I am sure of - learning Lean is harder than learning the basics of ML. Learn Lean and you get a niche skill that will make you stand apart (to those who are interested in hiring people who know Lean). But I’d suggest you particularly ask yourself if you enjoy it before investing into it

PS: you might also like the papers and talks about “Neural Algorithmic Reasoning” by Petar Veličković. There’s a lot of interesting work on reasoning + NNs outside of Lean, and outside of LLMs, even.

#### Dongwei Jiang (May 13 2023 at 03:53):

Andrés Goens said:

there's a stream #Machine Learning for Theorem Proving , maybe your post will be more visible to that audience there?

Thanks for pointing this out! I'll move the discussion to the new stream

#### Dongwei Jiang (May 13 2023 at 04:12):

(deleted)

#### Dongwei Jiang (May 13 2023 at 04:35):

Tyler Josephson ⚛️ said:

As a beginner myself in this area, here’s my perspective:

The papers you mentioned often use some configuration of LLM to suggest proofs to Lean, which then verifies their correctness.

In this way, LLMs are still operating as pattern recognition machines, and they learn to suggest good tactics because the corpus of proofs on which they were trained has patterns, and reproducing those patterns sometimes leads to valid proofs, and sometimes not. By having Lean checking the answers, the correctness of the guesses can be verified.But you might also be interested in how LLMs answer in chain of thought prompting or when told to “think step by step.” This kind of reasoning is more emergent and flexible than Lean, but it isn’t as trustworthy, and cannot be checked automatically. Folks are still studying how and why this happens. These emergent reasoning abilities more often appear in the LLMs trained on executable code than those trained mainly on text.

To your questions:

I’m trying to learn more about program synthesis myself. At least one big difference is that if you do program synthesis in, for example, Python, the resulting code cannot be formally verified. Programs written in Lean can be proven correct.

Good question! Since LLMs don’t yet reason robustly, I don’t think anyone really knows “what should we do to get LLMs to reason” - it’s an active area of research. Theorem proving may play a role, or may not. One thing I am sure of - learning Lean is harder than learning the basics of ML. Learn Lean and you get a niche skill that will make you stand apart (to those who are interested in hiring people who know Lean). But I’d suggest you particularly ask yourself if you enjoy it before investing into itPS: you might also like the papers and talks about “Neural Algorithmic Reasoning” by Petar Veličković. There’s a lot of interesting work on reasoning + NNs outside of Lean, and outside of LLMs, even.

Thank you so much for your detailed and informative response! It's quite enlightening to hear from someone also navigating through this complex field.

1 I must admit, I hadn't fully considered the extent of reasoning involved in this area, especially beyond the realms of neural networks. I'll be sure to read the paper you mentioned.

2 The point you made about the verifiability of theorem proving as compared to program synthesis was particularly eye-opening.

3 I will certainly be trying out Lean :)

#### Notification Bot (May 13 2023 at 04:35):

Dongwei Jiang has marked this topic as resolved.

#### Notification Bot (May 13 2023 at 10:57):

Eric Wieser has marked this topic as unresolved.

#### Notification Bot (May 13 2023 at 10:57):

This topic was moved here from #new members > ✔ Seeking Insights on Theorem Proving's Role in Enhancing... by Eric Wieser.

#### Dongwei Jiang (May 20 2023 at 16:23):

I've been pondering over our past discussion regarding the role of theorem proving in Language Learning Models (LLMs) research. After brainstorming with some friends, I've refined my thoughts. I believe theorem proving can help LLM on three fronts: understanding, reasoning, and planning.

1 The understanding challenge refers to the capability of LLMs to translate natural language problems into mathematical equations. Take this example: "Rachel has 17 apples. She gives 9 to Sarah. How many apples does Rachel have now?" which should translate to "17 - 9 = ?". In a similar but more complex vein, Autoformulation with LLMs is working on this kind of translation.

2 The reasoning challenge involves the LLM's ability to function within a rule-based system like solving equations or proving theorems. It's important to note that equation solving at a grade-school level isn't as interesting because the rules there are pretty straightforward. Theorem proving, on the other hand, is an interesting case as it requires strict adherence to logic and definitions.

3 The planning challenge, as discussed by Bubeck in his GPT-4 talk, concerns the LLM's ability to backtrack and execute efficient search strategies. The idea is to have a backup plan or to plan so effectively that it rarely fails, which also applies to theorem proving.

I should note that I'm not 100% confident in the accuracy of what I've said. My use of definitions or terms might be off, as I'm still learning about LLMs and Lean. I'm open to feedback and more than willing to learn.

Last updated: Dec 20 2023 at 11:08 UTC