Zulip Chat Archive

Stream: new members

Topic: Finished my first lean mini-project

ssar (Mar 02 2025 at 21:59):

A few weeks ago I implemented Project Euler Problem 10 in lean and started to proof correctness of the solution. A few days ago I finally completed the proof of correctness.

The problem statement is to calculate
def S (n: Nat) := ∑ p ∈ Nat.primesBelow n, p
--#eval S 2000000 -- Does not work

This does not work out of the box (segfault for 500,000 < n), so I wrote a more optimized solution. It constructs an Array with arr[i] = i.minFac using a modified version of the Sieve of Eratosthenes and uses this array to calculate S. The modified function works on my machine for n <= 1,350,000,000

For anyone interested, the code and proof is here:
https://github.com/ssar/euler_lean/blob/main/Problems/P10.lean

A few things I learned along the way

The lean community is super helpful. Thanks!
While the implementation of the algorithm is easy, proving correctness was challenging for me. There were many small hiccups along the way where I spent time on proving things that seem trivial.
I tried to separate the algorithm implementation from the proof of correctness. But this doesn't fully work as array access and termination of recursive functions require mixing some proofs into the program logic.
I was expecting that doing work with lean would enable me to "read" lean proofs. Unfortunately I can't even read my own proof. I guess this is also partially doe to my proof being badly structured.
Most importantly for me as this is purely a hobby project to learn about programming and proving in Lean: I had fun doing this :)

I'll probably do some followup project, but I'm currently undecided what exactly. Candidates are:

Improve tactics used for array access to achieve better separation of code and proof. get_elem_tactic_trivial seems easy to extend and I already had some success creating a custom extension that just does simp[*]; omega
Implement some more complex algorithm
Try to make some statement about runtime behavior of the program. I found some previous discussion about complexity theory here and it seems there is no framework available yet. A problem might be that the definitions in lean don't match what gets executed. But it should be possible to define a simplified framework that just allows making statements about upper bounds by defining axioms for basic operations. Maybe even a statement about the number of array accesses would be good enough in this case.
Try to make some statement about memory consumption of the program.

Ruben Van de Velde (Mar 02 2025 at 22:10):

By "reading" lean proofs, do you mean just the text or stepping through them in the editor while looking at the infoview?

Ruben Van de Velde (Mar 02 2025 at 22:11):

Because I don't know anyone who can do the former with proofs of any complexity :)

Rado Kirov (Mar 02 2025 at 22:46):

Thanks for sharing! Looks like the proof of correctness is 5x the length of the algorithm. I am new to verified algorithms, is that what is roughly expected/normal for basic algorithm verification?

"3. Try to make some statement about runtime behavior of the program." - this sound very interesting

Adomas Baliuka (Mar 02 2025 at 22:55):

Statements about runtime behavior are not preserved by equality (two functions that compute the same thing are equal even if one is much slower). So to make statements about runtime, the objects being reasoned about cannot be vanilla Lean definitions but must be defined in terms of a computational model. People are doing this but it seems like a huge departure from the "standard" way Lean definitions and proofs work. So it's not necessarily something that can be just done by expanding on an existing Lean project.

ssar (Mar 03 2025 at 20:38):

Ruben Van de Velde said:

By "reading" lean proofs, do you mean just the text or stepping through them in the editor while looking at the infoview?

Yeah, seems following proof from just the text is close to impossible (e.g. following what new subgoals get introduced by a tactic like rw). Good to know that's not just me. I guess there should be proof styles that are easier to follow.
Even with infoview is still quite hard to understand how I ended with in the current goals.

ssar (Mar 03 2025 at 21:04):

Adomas Baliuka said:

Statements about runtime behavior are not preserved by equality (two functions that compute the same thing are equal even if one is much slower). So to make statements about runtime, the objects being reasoned about cannot be vanilla Lean definitions but must be defined in terms of a computational model. People are doing this but it seems like a huge departure from the "standard" way Lean definitions and proofs work. So it's not necessarily something that can be just done by expanding on an existing Lean project.

Ah, right. I proved what the content of my arrays is at every point of the algorithm, so not even a statement about number of array accesses would work.
Reading https://lean-lang.org/theorem_proving_in_lean4/axioms_and_computation.html#function-extensionality
I wonder if It would still be possible to still make statements about lean programs if we take care to not use funext (i.e. the Quot.sound axiom)?

ssar (Mar 03 2025 at 21:09):

Rado Kirov said:

Thanks for sharing! Looks like the proof of correctness is 5x the length of the algorithm. I am new to verified algorithms, is that what is roughly expected/normal for basic algorithm verification?

"3. Try to make some statement about runtime behavior of the program." - this sound very interesting

I'm also new to this, so no idea what's expected. It's sure possible to shorten my proof by a lot. The amount of time I invested on the proof of correctness was definitely more than 10x of the time I used for implementing the algorithm.

Last updated: May 02 2025 at 03:31 UTC