Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: itp-interface and proof-wala

Amitayush Thakur (Feb 10 2025 at 23:55):

PROOFWALA: Multilingual Proof Data Synthesis and Theorem-Proving

The PROOFWALA framework, along with our itp-interface, provides a robust and scalable environment for automated theorem proving in Lean 4 and Coq. Whether you're searching for proofs using large language models (LLMs) or generating proof datasets, our framework is designed to simplify and accelerate the entire workflow.

What Does the Framework Offer?

itp-interface Module
Implemented in Python, the itp-interface (available at GitHub) acts as a unified proof environment for both Lean 4 and Coq. Here's what makes it stand out:

Cross-ITP Compatibility: Write your automation code once—it works seamlessly with both Lean 4 and Coq.
Parallel Tactic Execution: Supports parallel proof step execution across multiple instances, improving efficiency during proof searches.
Efficient Memory Management: Avoids known pitfalls like memory bloating while using Lean 4 REPL, ensuring reliable large-scale proof searches.
Bug Prevention: Fixes issues related to incorrect proofs being accepted in Lean 4 REPL proof mode (see Lean issue).
Flexible and Scalable: Easily run parallel proof tasks across multiple nodes using Ray clusters.
Proof data extraction: Parallel and scalable proof-state and step pair data from Coq and Lean GitHub repositories.
Simple Setup:

pip install itp-interface
install-itp-interface

Note: No need to manually install Lean 4 or other dependencies.

Proof Step Generation and Search
The proof-wala tool (PyPI link, GitHub Link) integrates with itp-interface to provide:

Multilingual Proof Search: Supports parallel, multilingual proof searches for both Lean 4 and Coq.
Proof Tree Visualization: Annotate and analyze proof trees generated during searches, making it suitable for reinforcement learning (RL)-style training.
End-to-End Automation: Automates the entire proof discovery process, from theorem input to proof generation and annotation.

Demonstration:

We used our framework to train and evaluate models on datasets sourced from several formal repositories, including Mathlib (Lean), CompCert (Coq), MathComp (Coq) , and GeoCoq (Coq). The combined multilingual dataset contains over 450,000 proof-step pairs from around 80,000 theorems, totaling 270M tokens.
The multilingual models trained using PROOFWALA and ITP interface framework have demonstrated consistent performance improvements in proof generation as compared to the monolinguals models.
Checkout our full paper about how multilingual proofstep training helps here: https://arxiv.org/abs/2502.04671

This work builds on contributions from libraries like Lean 4 REPL and Coq Serapy. We are grateful to our co-contributors @George Tsoukalas , Greg Durrett, Swarat Chaudhuri, and the theorem-proving community for their valuable feedback and support.

Explore our framework and tools to enhance your research and automation in formal reasoning. Feedback is always welcome!

Last updated: May 02 2025 at 03:31 UTC