Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: ICLR 2025 Submissions


Jason Rute (Oct 05 2024 at 02:25):

There are so many formal theorem proving submissions to this year's ICLR. (Submissions to ICLR are public.) If you search for keywords like "theorem proving", "autoformalization", "lean", "isabelle" (and maybe others) you will see a lot. I think I count 12 papers for just the keyword "Lean" (that seem to be about the Lean theorem prover) and 1 for "Isabelle". (None for "Coq". :sad: ) Most are new to me. I'll try to make a list of all the theorem proving papers soon.

fzyzcjy (Oct 05 2024 at 02:31):

Wondering where can we find the PDFs? It seems that there is no buttons for that... Googling the titles for an arxiv also fails (for many papers)

Jason Rute (Oct 05 2024 at 10:35):

fzyzcjy said:

Wondering where can we find the PDFs? It seems that there is no buttons for that... Googling the titles for an arxiv also fails (for many papers)

Oh, I might have jumped the gun, but I would guess that the PDFs will be available on OpenReview in about a week or so.

fzyzcjy (Oct 05 2024 at 10:36):

Thanks!

Sid (Oct 07 2024 at 03:50):

A subset of the relevant papers that I compiled is below -

Synthetic Theorem Generation in Lean - https://openreview.net/forum?id=EeDSMy5Ruj - tl;dr - forward reasoning to generate theorems

Formal Theorem Proving by Rewarding LLMs to Decompose Proofs Hierarchically - https://openreview.net/forum?id=D23JcXiUwf - tl;dr - RL-based training algorithm that encourages the model to decompose a theorem into lemmas, prove the lemmas, and then prove the theorem by using the lemmas

ImProver: Agent-Based Automated Proof Optimization - https://openreview.net/forum?id=dWsdJAXjQD - tld;r - proof rewriting to make proofs shorter, more modular, more readable

Alchemy: Amplifying Theorem-Proving Capability Through Symbolic Mutation - https://openreview.net/forum?id=7NL74jUiMg - tl;dr - sounds like forward reasoning on existing theorems to generate new ones? Not clear to me tho

CARTS: Advancing Neural Theorem Proving with Diversified Tactic Calibration and Bias-Resistant Tree Search - https://openreview.net/forum?id=VQwI055flA

FormalAlign: Automated Alignment Evaluation for Autoformalization - https://openreview.net/forum?id=B5RrIFMqbe

3D-Prover: Diversity Driven Theorem Proving With Determinantal Point Processes - https://openreview.net/forum?id=7gGVDrqVaz - tl;dr - DPP to subselect for a diverse set of tactics to reduce branching factor

Collaborative Theorem Proving with Large Language Models: Enhancing Formal Proofs with ProofRefiner - https://openreview.net/forum?id=y9xNQZjUJM

LeanAgent: Lifelong Learning for Formal Theorem Proving - https://openreview.net/forum?id=Uo4EHT4ZZ8 -

SubgoalXL: SUBGOAL-BASED EXPERT LEARNING FOR THEOREM PROVING - https://arxiv.org/pdf/2408.11172

Herald: A Natural Language Annotated Lean 4 Dataset - https://openreview.net/forum?id=Se6MgCtRhz

Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search - https://openreview.net/forum?id=I4YAIwrsXa - This is the DeepSeek Prover 1.5 paper I believe

Process-Driven Autoformalization in Lean 4 - https://openreview.net/forum?id=k8KsI84Ds7 - tl;dr Lean compilation feedback to enhance autoformalization

Lean-STaR: Learning to Interleave Thinking and Proving - https://arxiv.org/abs/2407.10040 - tl;dr interleave natural language and Lean

Lean-ing on Quality: How High-Quality Data Beats Diverse Multilingual Data in AutoFormalization - https://openreview.net/forum?id=Qdp7hlenr6

Luigi Massacci (Oct 07 2024 at 12:10):

Hopefully without abusing of your kindness since you already went through all this trouble, do you know if any of the above are actually usable (without too much trouble) for someone actually working with lean?

Adam Topaz (Oct 07 2024 at 12:19):

I don't really know how openreview works. Is it possible to see the papers at this point?

Jason Rute (Oct 07 2024 at 12:23):

@Adam Topaz see below. My guess is they will be available in a few days.

Jason Rute said:

Oh, I might have jumped the gun, but I would guess that the PDFs will be available on OpenReview in about a week or so.



Adam Topaz (Oct 07 2024 at 12:23):

Thanks!

Jason Rute (Oct 07 2024 at 12:53):

Luigi Massacci said:

Hopefully without abusing of your kindness since you already went through all this trouble, do you know if any of the above are actually usable (without too much trouble) for someone actually working with lean?

It is difficult to tell since they are just abstracts at the moment (and when the PDFs are visible in about a week, they still will be anonymous without, say, GitHub links). Moreover, usability hasn't been a concern of most researchers. There are no incentives for it. Frankly, even the current "usable" models aren't very good, and few (if any) people use them. But, having said that, I have three caveats:

  1. Some trained LLM models like DeepSeek-Prover v1.5 are publically available on huggingface, so it is possible to easily code something up trying to use it locally from within Lean. (But it may not be useful since it was optimized for competition math problems.)
  2. Some papers use existing LLM APIs. Even if not public, they could be reimplemented with minimal work. (The ImProver paper may be an interesting one, but again I can't tell from the abstract.)
  3. The many autoformalization papers this year may lead to a lot of possibilities for useful applications. It may be that someone on one of those papers made something useful, or if not, that it would be worth it to reimplement their ideas.

Sid (Oct 07 2024 at 19:04):

Luigi Massacci said:

Hopefully without abusing of your kindness since you already went through all this trouble, do you know if any of the above are actually usable (without too much trouble) for someone actually working with lean?

To add to Jason Rute's excellent answer, InternLM2-Step-Prover is another model I've found fairly usable. But ya, until the papers' text is released it's hard to judge. I think some of them are already on arxiv and for those, the Lean-Star paper is something that I think the DeepSeek team took some ideas from so that might be a good one to check out. The DeepSeek 1 and 1.5 Prover papers are very good reads as well. Lots of good ideas.

Luigi Massacci (Oct 07 2024 at 19:48):

Jason Rute said:

It is difficult to tell since they are just abstracts at the moment (and when the PDFs are visible in about a week, they still will be anonymous without, say, GitHub links).

My apologies, I typed my message from the phone so I didn’t look carefully at the website, and assumed it was some kind of preprint setup from the name.

Moreover, usability hasn't been a concern of most researchers. There are no incentives for it.

I’m sadly aware of that (you explained really well why that is the case in the past somewhere around here) but hope is the last to die…

As for the rest, anything beginning by “reimplement” doesn’t qualify as “usable” : (

Incidentally, @Kim Morrison, I seem to remember the FRO was working on ML integration, or did I hallucinate that? Arguably providing some kind of more plug and play pipeline is the kind of important but not incentivised by the system™ development where the FRO excels

Zheng Yuan (Oct 15 2024 at 00:55):

Paper pdfs are visible now

RexWang (Oct 18 2024 at 02:18):

After searching for terms like "Lean,Isabelle,miniF2F,ProofNet" and "autoformal", I found 20 papers related to formal proofs. All of these papers use Lean, except for 4-StepProof, 6-SubgoalXL, and 9-ProD-RL, which use Isabelle. (None for Coq)

Here's the list:

Index Abbreviation Source
1 LeanAgent LeanAgent: Lifelong Learning for Formal Theorem Proving
2 ImProver ImProver: Agent-Based Automated Proof Optimization
3 Herald Herald: A Natural Language Annotated Lean 4 Dataset
4 StepProof StepProof: Step-by-step verification of natural language mathematical proofs
5 Lean-STaR Lean-STaR: Learning to Interleave Thinking and Proving
6 SubgoalXL SubgoalXL: Subgoal-based Expert Learning for Theorem Proving
7 FormL4 Process-Driven Autoformalization in Lean 4
8 miniCTX miniCTX: Neural Theorem Proving with (Long-)Contexts
9 ProD-RL Formal Theorem Proving by Rewarding LLMs to Decompose Proofs Hierarchically
10 ProofRefiner Collaborative Theorem Proving with Large Language Models: Enhancing Formal Proofs with ProofRefiner
11 Lean-ing Lean-ing on Quality: How High-Quality Data Beats Diverse Multilingual Data in AutoFormalization
12 Con-NF Rethinking and improving autoformalization: towards a faithful metric and a Dependency Retrieval-based approach
13 FormalAlign FormalAlign: Automated Alignment Evaluation for Autoformalization
14 CARTS CARTS: Advancing Neural Theorem Proving with Diversified Tactic Calibration and Bias-Resistant Tree Search
15 3D-Prover 3D-Prover: Diversity Driven Theorem Proving With Determinantal Point Processes
16 SynLean Synthetic Theorem Generation in Lean
17 RMaxTS Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search
18 Alchemy Alchemy: Amplifying Theorem-Proving Capability Through Symbolic Mutation
19 ZIP-FIT ZIP-FIT: Embedding-Free Data Selection via Compression-Based Alignment
20 LIPS Proving Olympiad Inequalities by Synergizing LLMs and Symbolic Reasoning

And Some other works:

  1. FOL Reasoning:

  2. SAT:

  3. Automatic Geometry Problem Solving (GPS):

Kaiyu Yang (Oct 19 2024 at 17:51):

@RexWang That's really useful. Thanks! FYI, ProofRefiner is plagiarizing LeanCopilot and has been desk rejected.

Kevin Buzzard (Oct 19 2024 at 18:28):

Indeed the abstracts of those two papers are surprisingly similar.

fzyzcjy (Oct 21 2024 at 06:03):

A longer list for papers related to AI for mathematics in ICLR 2025 submissions: https://github.com/fzyzcjy/ai_math_paper_list. In total there are ~150 papers, since many of them are not formal methods. I mainly do it because of my personal need, and put it here since it might be useful for someone.

Quinn (Oct 28 2024 at 23:27):

i link to this thread in this month's Guaranteed Safe AI Newsletter https://gsai.substack.com/p/october-2024-progress-in-guaranteed


Last updated: May 02 2025 at 03:31 UTC