Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: miniCTX


Lasse Blaauwbroek (Aug 08 2024 at 04:31):

miniCTX just came out. The central premise of this work is that the lemmas that directly precede the theorem currently being proved is crucial context knowledge for proving agents. If I understand correctly, their approach is to predict the next tactic not only based on the current proof state but also based on the entire file's source code up until the current position. They fine-tune a LLM based with such a dataset. This results in substantial improvements on theorems that have lots of dependencies.

I can't help but shamelessly compare this work to Graph2Tac, and more generally the entire premise behind Tactician (work by me and my co-authors). There, the idea is to do online updates of models with information collected from recent lemmas and proofs. This is shown to be extremely effective to the point where super-simple ML techniques can significantly outperform all other techniques simply because they can include recent knowledge in their predictions.

All evidence seems to be pointing towards online capabilities being extremely important for theorem proving. The only question is to find the cheapest and most effective method of incorporating this online information!

Lasse Blaauwbroek (Aug 08 2024 at 04:43):

Additional note: The miniCTX paper seems to be a bit short on recent relevant related work, not only my own but also others. CC @Sean Welleck

Jiewen Hu (Oct 16 2024 at 20:10):

Hi everyone,

@Thomas Zhu , @Sean Welleck , and I are excited to announce the release of the new version of miniCTX (https://cmu-l3.github.io/minictx/, https://www.arxiv.org/abs/2408.03350), a context-rich benchmark for evaluating neural theorem proving in realistic scenarios. miniCTX contains theorems sourced from real Lean projects and textbooks, each associated with infile context (preceding code in a file), as well as cross-file context represented by premises. Models are tasked with proving a theorem given access to code from the theorem's repository, which contains context that is needed for the proof.

Along with miniCTX, we offer robust tools for data extraction, automated annotation, and interaction with Lean: NTP-Toolkit is an automated tool designed to extract training and evaluation data from Lean repositories with minimal manual effort. The toolkit transforms Lean 4 repositories into next-tactic or full proof examples, enabling seamless fine-tuning and evaluation of language models. 

Finally, we have an evaluation repo (minictx-eval) (based on the Lean REPL) that should make it easy to evaluate models for tactic-based proof search, full proof generation, or other strategies. 

We updated miniCTX based on feedback from the community, and in this new version we include both in-file context and cross-file premise context. The updated NTP-Toolkit now extracts imported modules with premises and tracks the number of both in-file and cross-file premises used. We’ve also expanded miniCTX with new splits beyond traditional math proofs, opening up new possibilities for formal verification and other domains.

We believe that the current version of miniCTX captures the complete context of a problem in realistic scenarios, thus serving as a more comprehensive benchmark compared to traditional ones. We also pay significant attention to automation in the toolkit, so users can test new repositories easily, and we can update the benchmark with new problems to avoid potential data contamination.

Thomas Zhu (Oct 16 2024 at 20:54):

In detail, the miniCTX benchmark contains problems from real-world projects including PFR, PimeNumberTheorem+, HepLean, and SciLean, as well as textbook problems like HTPI. We found that there are simple methods that drastically improve performance in real-world problems (19.5% to 35.9%). Meanwhile, existing competition-style benchmarks cannot capture this performance boost (miniF2F 32.8% to 33.6%), suggesting there is a significant gap for current benchmarks in assessing real-world theorem proving.

Thomas Zhu (Oct 16 2024 at 20:56):

@Lasse Blaauwbroek Also, thank you for the suggestions! We agree Graph2Tac is a very relevant work in online learning, and we incorporated it in the relevant work section.
Our ntp-toolkit is also based on Kim Morrison's lean-training-data, and we are also indebted to the authors of open-source Lean projects that are in our benchmark.

Thomas Zhu (Apr 23 2025 at 01:47):

We will present miniCTX as an Oral presentation at ICLR in Singapore tomorrow! Please come to Oral Session 1A, 4/24 11:06am if you are interested!

Alok Singh (Apr 23 2025 at 05:07):

Thomas Zhu said:

We will present miniCTX as an Oral presentation at ICLR in Singapore tomorrow! Please come to Oral Session 1A, 4/24 11:06am if you are interested!

Please record if possible:pleading_face:

Thomas Zhu (Apr 24 2025 at 01:26):

Alok Singh said:

Thomas Zhu said:

We will present miniCTX as an Oral presentation at ICLR in Singapore tomorrow! Please come to Oral Session 1A, 4/24 11:06am if you are interested!

Please record if possible:pleading_face:

Unfortunately this is unlikely (perhaps unless you have the virtual pass for ICLR). We will share our slides later. But the content of the oral talk will be a strict subset of the content of our paper: https://www.arxiv.org/pdf/2408.03350


Last updated: May 02 2025 at 03:31 UTC