Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: Neural Theorem Proving for Verification Conditions

Qiyuan Xu (Jan 28 2026 at 16:03):

We present a benchmark for proving Verification Conditions (VCs), a key bottleneck in automated program verification. This work is available on GitHub, and its associated paper Neural Theorem Proving for Verification Conditions: A Real-World Benchmark has been accepted at ICLR'26.

In recent years, Neural Theorem Proving (NTP) has achieved remarkable success in mathematical competitions. However, ITP is not merely a toy for competitions; traditionally, two applications exist: formalized mathematics and program verification. Regarding the former, pioneers have made encouraging progress (e.g., Josef Urban's 130k lines of autoformalization). Regarding the latter, although some benchmarks for Lean exist, they largely overlook half a century of research in programming language theory on Verification Condition Generation (VCG) and other reasoning techniques --- they perform program verification directly over semantics rather than employing program logics, weakest precondition calculi, and other reasoning techniques. To address this gap, we propose the first cross-platform benchmark built on industrial program verifiers (Frama-C and Why3) and drawn from real-world program verification projects (e.g., the Linux kernel, the Contiki OS kernel). It supports Lean, Isabelle, and Rocq. We hope this work will enable NTP research to build upon the five-decade legacy of program verification and create practical value in real-world software verification.

Last updated: Feb 28 2026 at 14:05 UTC