Zulip Chat Archive
Stream: Program verification
Topic: Collecting small program verification examples
Markus Himmel (Apr 30 2025 at 09:41):
I recently came across the human-eval-verus project which collects hand-written formally verified solutions to the HumanEval benchmark (which was originally intended as an AI benchmark, but for the purposes of this project it's just a collection of simple programming problems) using the Verus Rust verification system. I thought it might be fun to have something similar for Lean, so I created human-eval-lean which aims to collect verified Lean solutions to the problems.
So far I have imported all of the problem statements, solved one of the easiest problems and created an issue for every problem to make it easy to discuss and claim issues.
I thought that some members of this stream might find it fun to poke at the problems, see how far we can get, and gradually converge to a set of good solutions in idiomatic Lean. Having multiple solutions for some problems showing the various styles that Lean allows would also be great. Even just documenting in the issues which parts of the problem are difficult to solve/verify using Lean could be very useful for improving Lean as a verification platform.
If that sounds interesting or fun, I'm looking forward to PRs and discussions!
(Just to be clear: this effort is not connected to the Lean FRO, this is just me in my free time :))
Shreyas Srinivas (Apr 30 2025 at 09:45):
I can try if I find a bit of free time
Shreyas Srinivas (Apr 30 2025 at 09:45):
But where is the task list? Is there a checklist of completed tasks in the repo?
Markus Himmel (Apr 30 2025 at 09:50):
The idea is to close the corresponding issue once a problem is solved, so outstanding tasks are just open issues. I might add all of the issues to a GitHub project so that it's possible to view the list in a more compact way.
Shreyas Srinivas (Apr 30 2025 at 11:38):
You could borrow the system we used for equational theories
GasStationManager (Apr 30 2025 at 13:48):
Cool! Related: a few months ago I wrote a few simple programming problems with verified Lean solutions at CodeProofBenchmark. The problems were all in the range of easy to trivial, but I thought some of the proofs were less trivial, as least to me.
Question: would you consider AI-written solutions, as long as the specifications are created or verified by humans? I don't know how the current LLMs will do, but it'd be fun to try.
Markus Himmel (May 05 2025 at 14:56):
GasStationManager said:
Question: would you consider AI-written solutions, as long as the specifications are created or verified by humans? I don't know how the current LLMs will do, but it'd be fun to try.
Sure, feel free to use whatever tools are useful to you when creating solutions.
Bas Spitters (May 06 2025 at 13:40):
There has been some criticism of humaneval as a target.
I'm not an expert, but quick googling gave me some alternatives. More knowledgeable experts will hopefully supplement me on this.
https://github.com/CoderEval/CoderEval
https://github.com/evalplus/evalplus?tab=readme-ov-file
https://blog.continue.dev/an-introduction-to-code-llm-benchmarks-for-software-engineers/
https://huggingface.co/blog/leaderboard-bigcodebench
Apparently, this one already provides unit tests for Lean (and Rocq and ocaml ...)
https://github.com/nuprl/MultiPL-E?tab=readme-ov-file
Markus Himmel (May 06 2025 at 14:50):
Bas Spitters said:
There has been some criticism of humaneval as a target
As I understand it (correct me if I'm wrong), this criticism is aimed at HumanEval when used to evaluate LLMs. Since human-eval-lean isn't about that, and focuses on evaluating Lean as a programming language and verification platform instead, I think it's not a problem to use it for this project.
Last updated: Dec 20 2025 at 21:32 UTC