Zulip Chat Archive

Stream: general

Topic: formal verification of code written in "mainstream" langs?

Kevin Buzzard (Dec 09 2025 at 10:29):

I am a complete amateur here but someone was talking to me about it: what would it take to be able to "formally verify code in languages such as C++ and Python"? I recall hearing that one issue is that these languages are not even completely formally specified, but I would imagine that for a big chunk of the language this would not really be an issue (or am I wrong)?. To give a dumb example: say I wrote some python code which summed the numbers from 1 to n (imagine I just used an LLM to write such code and posted it here, if you want to see a concrete example). What kind of stuff does one need in order to make some kind of claim of the form "this Lean code formally verifies that on input n this python code will output n(n+1)/2"? Sorry to ask such a basic question!

Shreyas Srinivas (Dec 09 2025 at 10:31):

So there are currently toolchains and projects to go between lean and rust

Shreyas Srinivas (Dec 09 2025 at 10:31):

There’s Aeneas and Peregrine for example.

Shreyas Srinivas (Dec 09 2025 at 10:31):

Basically what you do is define an intermediate lambda calculus that captures a substantial formally specified subset of the language.

Shreyas Srinivas (Dec 09 2025 at 10:32):

A model of the language if you prefer to think of it that way.

Shreyas Srinivas (Dec 09 2025 at 10:33):

And then you work with this model in Lean. There are proofs of correctness for the translation between this model and the targeted subset of rust.

Shreyas Srinivas (Dec 09 2025 at 10:33):

It’s model theory all the way.

Shreyas Srinivas (Dec 09 2025 at 10:34):

You define models of models of models of models… and prove that some properties are preserved between each of these translations. Technically the rust program is a "refinement" of its specification in the model (in the same sense that mathematicians say that a relation refines another relation).

Shreyas Srinivas (Dec 09 2025 at 10:36):

Then when you prove that the lean version of the python code is correctly doing something you implicitly get that the python code itself is correctly doing something (Disclaimer : I’m oversimplifying)

suhr (Dec 09 2025 at 11:18):

And the problem with C++ is that it's too big and there's no small language you can translate it into.

The problem with Python is different: they can't even get typing right, why would you expect to have tools for formal verification?

suhr (Dec 09 2025 at 11:27):

But there are tools for C, Ada, Rust and even Java.

suhr (Dec 09 2025 at 11:37):

By the way, see https://github.com/hwayne/lets-prove-leftpad

TongKe Xue (Dec 09 2025 at 11:38):

I think another problem with Python is the ability to override all the default ops, so it's not even clear what a piece of python code does, unless you know the runtime env / what has been been overriden.

From this perspective, Haskell is probably best, with OCaml 2nd best in terms of formal verification. But maybe 'mainstream' rules them out.

TongKe Xue (Dec 09 2025 at 11:40):

Something else I have been thinking about that might be easier to verify:

If you built a compiler from APL/J/K that compiled to Cuda, that might be quite doable.

Both sides are: tensor based, fairly direct memory model, not much "dynamic magic"; straight forward code

suhr (Dec 09 2025 at 11:54):

https://futhark-lang.org/

Markus de Medeiros (Dec 09 2025 at 12:16):

This is one thing the Iris logic does well. If you scroll through the publications you'll see program logics targeting C, Rust, OCaml, WebAssembly or Go to name a few. The purpose of these Iris-style tools (so-called "program logics") is to provide custom proof rules that make verification tasks in these more complicated languages tractable.

Like you and Shreyas both bring up, these tools verify properties of formal models of programming languages rather than working from a verified compiler. End-to-end projects are more rare, the VST project builds a program logic of the CompCert verified C compiler, there is a verified Stan compiler as well, not sure if anything builds off of CakeML. These whole-language projects are super cool and in some applications necessary, but there are lots of other cases where miscompilation bugs are mostly orthogonal to the thing you're trying to prove (like in your example, kinda), so a whole-langauge project is not practically necessary.

Markus de Medeiros (Dec 09 2025 at 12:19):

(Also there is an ongoing port of Iris to Lean in #iris-lean)

Reuven Peleg (Dec 09 2025 at 13:31):

The way formally verified code works in Verus (subset of Rust), Dafny and C verifiers is quite different than the LEAN approach. Instead of interacting theorem proving it relies on translating the requirements into SMT expressions and using a solver such as Z3 to solve it. That means less manual work, as the programmer only need to specify hints (such as loop invariants) with the price of runtime and harder troubleshooting in case verification fails.
Would be interesting to try to use LEAN for this, I guess it will require some sort of translation from the target language to LEAN expressions somehow? Could be nice adventure to try

Shreyas Srinivas (Dec 09 2025 at 13:32):

Yes that's aeneas and peregrine (port of Hax)

Shreyas Srinivas (Dec 09 2025 at 13:36):

As an aside : I think Kevin knows about iris-lean. It has been around for a while. In winter 2023-24, Kevin was about to give a talk about ITPs to a general audience and wanted to mention Iris among the successes. He wanted to get a concrete idea of what Iris could do and posted here on Zulip. Fortunately I was about to attend Derek (Dreyer)'s lecture. I asked him and reported back .

suhr (Dec 09 2025 at 13:47):

Also https://plv.mpi-sws.org/refinedrust/

suhr (Dec 09 2025 at 13:51):

And Creusot translates Rust to Why3, where you can use both SMT solvers and Rocq (but unfortunately not Lean).

Niels Voss (Dec 09 2025 at 21:14):

Some notes about this:

The formal proof is only useful to the extent that the compiler is bug-free and that the formal spec for your mainstream language matches what the compiler actually does. There's still value in this, because typically compilers have fewer bugs than most code, but it is a limitation. One interesting spin on this is that the WIP metamath C programming language (as part of the MM0 project) is intended to write verified proofs, but lacks a formal spec; rather, the compiler produces proofs about a model of x86 assembly as it is run so that the compiler no longer needs to be trusted. This is similar to how lean tactics need not be trusted.
Formally verifying code written in external languages has the potential to be more powerful than formally verifying Lean code. In Lean, you can only prove properties about functions that hold up to function extensionality. So for example you can never prove that a Lean function takes up less than 100 MB memory or has a O(n^2) time complexity because these are properties about the program, not the function the program represents. A way around this is to build a model of Lean within Lean, although I suspect this is very complicated. See https://proofassistants.stackexchange.com/q/2018/6046
After foundational issues have been resolved, the typical way this works in practice is to use Hoare logic or its extension, separation logic. This has been implemented in Lean for reasoning about Lean itself: #Program verification > Monadic program logic: Request for feedback, but not for reasoning about arbitrary programs.

Floris van Doorn (Dec 10 2025 at 12:59):

To chime in what others have said: it depends on how much you want to trust.
If you only trust Lean and its kernel, you're going to have a very hard time, since you'd have to first fully verify a compiler of your language (which is possible, but a tremendous amount of work: https://compcert.org/ ) and even then you have to trust that the hardware you're running it on correctly does the low-level instructions according to their specification.

On the other hand, if you're willing to trust a lot, you have to give a specification of (a fraction of) the language you're using, and then show that the specification shows that your code produces the right output. The hard part here is writing a good enough specification.
If you're ok with writing Lean code, then it's not so hard to prove a piece of code correct, since you don't need to give a specification of the language yourself.

Last updated: Dec 20 2025 at 21:32 UTC