Zulip Chat Archive

Stream: Program verification

Topic: Collecting small program verification examples


Markus Himmel (Apr 30 2025 at 09:41):

I recently came across the human-eval-verus project which collects hand-written formally verified solutions to the HumanEval benchmark (which was originally intended as an AI benchmark, but for the purposes of this project it's just a collection of simple programming problems) using the Verus Rust verification system. I thought it might be fun to have something similar for Lean, so I created human-eval-lean which aims to collect verified Lean solutions to the problems.

So far I have imported all of the problem statements, solved one of the easiest problems and created an issue for every problem to make it easy to discuss and claim issues.

I thought that some members of this stream might find it fun to poke at the problems, see how far we can get, and gradually converge to a set of good solutions in idiomatic Lean. Having multiple solutions for some problems showing the various styles that Lean allows would also be great. Even just documenting in the issues which parts of the problem are difficult to solve/verify using Lean could be very useful for improving Lean as a verification platform.

If that sounds interesting or fun, I'm looking forward to PRs and discussions!

(Just to be clear: this effort is not connected to the Lean FRO, this is just me in my free time :))

Shreyas Srinivas (Apr 30 2025 at 09:45):

I can try if I find a bit of free time

Shreyas Srinivas (Apr 30 2025 at 09:45):

But where is the task list? Is there a checklist of completed tasks in the repo?

Markus Himmel (Apr 30 2025 at 09:50):

The idea is to close the corresponding issue once a problem is solved, so outstanding tasks are just open issues. I might add all of the issues to a GitHub project so that it's possible to view the list in a more compact way.

Shreyas Srinivas (Apr 30 2025 at 11:38):

You could borrow the system we used for equational theories

GasStationManager (Apr 30 2025 at 13:48):

Cool! Related: a few months ago I wrote a few simple programming problems with verified Lean solutions at CodeProofBenchmark. The problems were all in the range of easy to trivial, but I thought some of the proofs were less trivial, as least to me.

Question: would you consider AI-written solutions, as long as the specifications are created or verified by humans? I don't know how the current LLMs will do, but it'd be fun to try.


Last updated: May 02 2025 at 03:31 UTC