Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: Artificial Algorithms

GasStationManager (Jul 30 2025 at 18:16):

I am starting a repo of verified algorithms in Lean, implemented and proved by AIs. Primary goal of the repo is to serve as a place for sharing examples, recipes, and experiences on AI-assisted coding and verification of code. Secondary goal is to be potentially useful as building blocks.

Contributions are welcome and appreciated! (Already got one from @Huỳnh Trần Khanh). PRs welcome. Just do your favorite algorithm! Rule is: human-verified specification, machine-checked proofs. Please document your process.

Shreyas Srinivas (Jul 30 2025 at 18:43):

Is it necessary to use AI assistance?

GasStationManager (Jul 30 2025 at 19:06):

Shreyas Srinivas said:

Is it necessary to use AI assistance?

This repo is pretty AI-focused, but I am also open to human-implemented verified algorithms if they can be used by AIs: E.g. if you implement a verified algorithm, and then demonstrate that an AI can be taught to use your algorithm to solve a task, that counts too.

GasStationManager (Jul 30 2025 at 22:39):

Or, demonstrations of algorithmic and/or proving techniques that could be generalized by AIs to solve other tasks. E.g. here I manually did an example of memoization with correctness proof, and then prompted Sonnet to solve something else with memoization. Could something similar be done for, eg greedy methods?

Shreyas Srinivas (Jul 31 2025 at 09:14):

do these models "remember" something if I name it. For example, if I explain memoization and ask it to assign the name "memoization" to the technique, can it recall what I explained?

(deleted) (Jul 31 2025 at 09:16):

you can tell Claude to take notes in a file

(deleted) (Jul 31 2025 at 09:18):

this repo is in a very early stage

(deleted) (Jul 31 2025 at 09:18):

(deleted) (Jul 31 2025 at 09:19):

I have my own full time job :eyes: and GasStationManager also works full time. I probably know more about proving the correctness of algorithms than the average person

(deleted) (Jul 31 2025 at 09:23):

speaking of greedy methods

(deleted) (Jul 31 2025 at 09:24):

every greedy algorithm is its own beast. I'm not even sure if the base knowledge of existing LLMs can handle greedy algorithms

(deleted) (Jul 31 2025 at 09:32):

@GasStationManager I feel like you must video record the process of using AI to prove algorithms. Because looks like you keep achieving amazing results while I achieved very little with generic LLMs.

(deleted) (Jul 31 2025 at 09:36):

I believe it's possible to achieve amazing feats with LLMs. But work must be done to create the perfect environment

GasStationManager (Jul 31 2025 at 11:56):

@Huỳnh Trần Khanh those would currently be extremely long videos :<
I had to upgrade to the $100 Max plan for Claude. Even then it took multiple sessions over multiple days to do one of the divide-and-conquer algorithms.

(I do think proving divide-and-conquer algorithms can be sometimes trickier, as you usually need some additional insight as to why the algorithm works; usually taking advantage of some underlying mathematical structure, e.g. transitivity of < for binary search and quickselect, and associativity of addition for max subarray sum.)

Part of the motivation for doing these exercises is to identify the pain points, and perhaps improving the tooling. One observation is that as the source code gets long, the LLMs will struggle to keep it in their context window. One thing I found sometimes helpful is to (ask the AI to) factor out lemmas and then send them to be proved separately (one can imagine automating this in an agentic framework). But the AI may later forget those lemmas existed. It would be helpful to provide the AI with a "summary overview" of the file to keep in its context. The crude version would be to ask the AI to include a #check for each lemma / theorem it created.

GasStationManager (Jul 31 2025 at 12:15):

@Shreyas Srinivas The memory capabilities of current AIs are not very mature yet. You could ask Claude to take notes but may want to inspect the notes afterwards to make sure. Alternatively, once this repo becomes more comprehensive, could get it indexed by a search tool e.g. LeanExplore, so that when the AI search for "memoization", it will retrieve the example.

GasStationManager (Aug 17 2025 at 22:39):

Here's one that might be of interest to others: I was trying to formalize the value iteration algorithm for Markov Decision Processes (and hoping to make the AIs do most of the work).

I wanted the code to be runnable, while still possible to prove properties about, so chose to do the implementation in rational numbers (Rat). We were able to prove that the map contracts; but to prove convergence, traditionally the proof will need the Banach fixed point theorem, which would only work for Real numbers (and hence noncomputable). We ended up going with the following plan: prove convergence using a (noncomputable) definition of the value iteration algorithm with Reals, then have a computable definition of the algorithm with rationals, then prove that the rational version and the Real version give equal results, when given a rational input.

Here's the proof. Much of it done in Claude Code. It took quite a while, and was helped somewhat by the recent release of Opus 4.1 and GPT 5 which made progress on individual lemmas.

GasStationManager (Aug 17 2025 at 22:43):

This does make me more hopeful that some of the more "numerical" algorithms could be runnable and verifiable at the same time; e.g. if we want to use gradient descent, we could have a computable version and an equivalent Real version, and use theory from optlib to prove optimality.

(deleted) (Aug 18 2025 at 05:01):

https://www.asciinema.org/

(deleted) (Aug 18 2025 at 05:01):

Consider using this so I can learn your tricks

(deleted) (Aug 18 2025 at 05:03):

Or you can just export the Claude Code conversation

Geoffrey Irving (Aug 18 2025 at 11:32):

GasStationManager said:

I wanted the code to be runnable, while still possible to prove properties about, so chose to do the implementation in rational numbers (Rat).

As a style point, I think it’s better work over the combination of the reals and floats rather than try for an intermediate. I.e., state algorithms that can work over either, prove theorems over reals, then use that as informal evidence that a similar property has some chance of being true over floats. The intermediate thing of proving theorems about rationals is not that interesting, as the theorem is that there is an exponentially slower version of the algorithm you care about and the slow version has some property; one might as replace exponential with infinite.

(deleted) (Aug 18 2025 at 11:52):

Yeah. I think proving the algorithm works with IEEE 754 floats would be interesting

(deleted) (Aug 18 2025 at 11:52):

Hmm... Do we have an IEEE 754 library for Lean?

Alex Meiburg (Aug 18 2025 at 15:40):

Not really, no:
https://leanprover.zulipchat.com/#narrow/channel/270676-lean4/topic/IEEE.20754.20Float/near/395456271

https://leanprover.zulipchat.com/#narrow/channel/113489-new-members/topic/Polymorphic.20Round.20Function/near/509046509

https://leanprover.zulipchat.com/#narrow/channel/113489-new-members/topic/Proofs.20about.20floats.20and.20verifiable.20software/near/427956048

Alex Meiburg (Aug 18 2025 at 15:40):

It's one of those that everyone says we should do, but no one has, because it's painful and the rewards (the theorems you can prove) are few :)

GasStationManager (Aug 18 2025 at 15:40):

@Geoffrey Irving it is a good point; this is a proof of concept, I do want to eventually do this with floating point numbers, but be able to formally prove an error bound between the float and the Real version. Was meaning to ask you, would it be possible to use the floating number of the interval library to prove guarantees about algorithms using floats?

Matteo Cipollina (Aug 18 2025 at 15:50):

an early stage version with the prerequisites and some core API and theorems for Floats are here,
https://github.com/mkaratarakis/HopfieldNet/blob/master/HopfieldNet/floats/Flocqv2.lean

(deleted) (Aug 18 2025 at 16:00):

Alex Meiburg said:

It's one of those that everyone says we should do, but no one has, because it's painful and the rewards (the theorems you can prove) are few :)

It's 2025 and the field of formally verifying software is still small. We can't confidently say anything yet.

(deleted) (Aug 18 2025 at 16:01):

We don't even have a repository of formalized Codeforces problems.

(deleted) (Aug 18 2025 at 16:03):

I tried to make one, then my professor made me tweak the compiler so it emits blockchain code—as he believed there was no value in such a repository, and blockchain technology was more useful, and then the repository now serves as a general demonstration of how software verification can be done.

(deleted) (Aug 18 2025 at 16:05):

:eyes: maybe this repository could achieve what I couldn't achieve

Geoffrey Irving (Aug 18 2025 at 16:17):

GasStationManager said:

Geoffrey Irving it is a good point; this is a proof of concept, I do want to eventually do this with floating point numbers, but be able to formally prove an error bound between the float and the Real version. Was meaning to ask you, would it be possible to use the floating number of the interval library to prove guarantees about algorithms using floats?

This is misinterpreting what I said: I’m not claiming you should prove something over floats, but rather than you should replace your Rat proofs with Real.

To your question, though: unfortunately the interval library doesn’t help in its current form, as it doesn’t establish tightness of the bounds.

GasStationManager (Aug 18 2025 at 16:43):

@Geoffrey Irving Much of the proof is over Reals; the Rat part is just to show equivalence between the Rat and Real versions of the algorithm. To make this work for floats, one just need to replace the latter part.

And the bounds do not need to be tight. E.g. we can start with something like for a: float and b: float, the difference between (a+b) and (a:Real)+(b:Real) is bounded by some c:float. (Note that the value iteration algorithn only uses +, * and max). And with numerical algorithms, we expect the quality of the bounds will depend on some kind of condition number on the inputs.

Geoffrey Irving (Aug 18 2025 at 16:44):

Right, and my claim is that the Rat part is providing no value and can be deleted. :)

(deleted) (Aug 18 2025 at 16:45):

I think using Rat is just an arbitrary decision. Not good, but not worse than Real.

Geoffrey Irving (Aug 18 2025 at 16:45):

To clarify, it’s not just that the interval library doesn’t show the bounds are tight: all the theorems allow them to be arbitrarily loose. In fact they are pretty tight, but this is checked only via unit test.

Geoffrey Irving (Aug 18 2025 at 16:46):

Huỳnh Trần Khanh said:

I think using Rat is just an arbitrary decision. Not good, but not worse than Real.

It will be much worse as soon as you take an exp. :)

GasStationManager (Aug 18 2025 at 17:08):

I think the value of the Rat part is to go from a noncomputable algorithm over the Reals to a computable algorithm. Granted that it would be exponential and not something that we want to run in practice. And that's why I am very interested in making a float version with error bounds. Will check out the work linked by @Matteo Cipollina as well.

Bas Spitters (Aug 19 2025 at 07:51):

@GasStationManager yes, combining classical and computable reals is feasible, we did it in the corn/math-classes library some in Rocq, some 20years ago.
https://c-corn.github.io/pub.html
https://math-classes.github.io/ (this also introduced the way type classes are now used in Lean)

Specifically, in this paper:
Cezary Kaliszyk and Russell O'Connor. Computing with Classical Real Numbers. Journal of Formalized Reasoning, Vol. 2, No. 1, 2009, Pages 27--39
Ideally, one would want proof transfer in the style of HoTT or Trocq (@Cyril Cohen ), but the full automation of this is still in progress.

One gets reasonable computation speed by using "dyadic" rationals (akin to floats, but simpler): the denominators are factors of 2.
Computer certified efficient exact reals in Coq, Robbert Krebbers and Bas Spitters. In Proceedings of Calculemus/MKM 2011, volume 6824 of LNAI, pages 90–106, 2011.
http://arxiv.org/abs/1105.2751

There is some interest in implementing these in Lean (@Bhavik Mehta ).

@Geoffrey Irving There is an enormous amount of work, and a book, on formalizing floats in the Rocq ecosystem:
https://guillaume.melquiond.fr/
https://pages.saclay.inria.fr/sylvie.boldo/

The capla language is one of Guillaume's latest projects:
https://fresco.gitlabpages.inria.fr/capla/language/

GasStationManager (Aug 19 2025 at 22:36):

@Bas Spitters Thanks for the helpful pointers!
I am interested in practical computation that can also be verified, so would be looking particularly into floats and dyadic rationals. Do you know of papers / examples of formalizing numerical algorithms, perhaps built on top of the float libary in rocq?

Kim Morrison (Aug 19 2025 at 22:50):

@GasStationManager, I have a preliminary implementation of dyadic rationals at lean#9993. If you'd like to take a look at that and push proofs for any of the sorries that would be lovely! (Or anyone else, of course. I've just spun up a claude session that is working on it, but I have low expectations that it will make any progress without close supervision.)

(deleted) (Aug 19 2025 at 22:52):

How do I push

(deleted) (Aug 19 2025 at 22:52):

I want to try

Kim Morrison (Aug 19 2025 at 22:55):

You can try just by pulling, you only need to push if you make progress. :-)

(deleted) (Aug 19 2025 at 22:58):

Looks like it means I'll just post my code here when I make progress. Given that there's already a force push.

(deleted) (Aug 19 2025 at 23:02):

Let's give Claude Code a chance. Maybe clever prompting really helps. Maybe not. Let's see. If Claude Code can't do it I can

GasStationManager (Aug 19 2025 at 23:18):

Here's the first theorem (Opus 4.1, Claude desktop with LeanTool and LeanExplore):

theorem trailingZeros_two_mul_add_one (i : Int) : Int.trailingZeros (2 * i + 1) = 0 := by
  unfold trailingZeros trailingZeros.aux
  -- Key fact: (2 * i + 1) % 2 = 1
  have h : (2 * i + 1) % 2 = 1 := by
    rw [Int.add_emod]
    simp [Int.mul_emod_right]
  -- Now we need to handle the match on k
  split
  · -- Case k = 0
    rfl
  · -- Case k = succ n for some n
    simp [h]

GasStationManager (Aug 19 2025 at 23:19):

It is having some trouble with the second theorem theorem trailingZeros_two_mul (i : Int) : Int.trailingZeros (2 * i) = Int.trailingZeros i + 1 := sorry, as it doesn't know what to do when i=0.

Should the statement have a special case for i=0?

Aaron Liu (Aug 19 2025 at 23:38):

yes definitely

Aaron Liu (Aug 19 2025 at 23:38):

take a hypothesis that i ≠ 0 (and then make i implicit)

Kim Morrison (Aug 20 2025 at 00:17):

Thanks. I'd golf that to

theorem trailingZeros_two_mul_add_one (i : Int) : Int.trailingZeros (2 * i + 1) = 0 := by
  unfold trailingZeros trailingZeros.aux
  have h : (2 * i + 1) % 2 = 1 := by simp
  split <;> simp_all

I've commited that, as well as my Claude sessions which filled in 3/4 sorries in toRat. (It didn't do much before returning control to me, and I was elsewhere.)

Bas Spitters (Aug 20 2025 at 08:40):

@GasStationManager There are a number of options for numerical computation. We use Cauchy sequences/nets of dyadics/floats here:

Computer certified efficient exact reals in Coq, Robbert Krebbers and Bas Spitters. In Proceedings of Calculemus/MKM 2011, volume 6824 of LNAI, pages 90–106, 2011. [arXiv]

It's fairly efficient, see the timings in the paper (and correct for Moore's law... ;-) ).

Interval computation is faster. For that one would look at coq-interval by Guillaume or the PhD-thesis of Fabian Immler in Isabelle.

Cauchy sequences compute the error backwards (epsilon-delta), one can also use forward error propagation. Just use interval computation, if the resulting error is too big, restart with a small interval. This is implemented in IRRAM for instance. This leads to realizability approach to real computation.
This approach is followed here:
https://arxiv.org/abs/2202.00891
https://arxiv.org/abs/2410.13508

For floats there is the verinum project: https://verinum.org/

Benedikt Peterseim (Aug 20 2025 at 16:42):

Since Markov Decision Processes and floating point numbers were mentioned, I think this paper by some colleagues might be interesting, even though they use Isabelle: https://arxiv.org/abs/2501.10127

GasStationManager (Sep 19 2025 at 23:36):

I did a bit of further explorations on Value Iteration and computation. Recall that a drawback of the exact value iteration implementation with rationals above is that computation cost (in time and space) could blow up.

In this file, I asked Codex CLI to formalize an approximate variant of value iteration with the simplest rounding scheme: do the Bellman operator computation in exact Rational arithmetic, then round to the nearest integer. Then repeat. This ends up working sufficiently well for value iteration (and other contraction maps): the rounding errors accumulate but gets squashed by the discount factor, so we end up with a constant error bound that depends on the discount factor. In practice you would pre- scale up the reward function by an appropriate amount so that this is acceptable. (Or equivalently round to the nearest integer multiple of some h.)

This was done mainly in Codex CLI by GPT-5 (med and high), with two lemmas solved separately by Sonnet 4. My first time trying Codex CLI on a larger Lean proof, and I must say I liked the experience. OpenAI's models have always been strong at informal reasoning, now their Lean ability seems to be catching up to Claude.

Will continue exploring as I learn more about the various numerical approached mentioned earlier this thread. For value iteration, much of the current proof could be adapted, e.g. if we instead use dyadics / or a future float library, as long as one can prove a constant error bound for one step of the Bellman iteration implementation, the rest of the proof can be reused to show a constant error bound on value iteration.

Michail Karatarakis (Sep 22 2025 at 16:58):

Regarding computable reals, in CoRN there is a notion for the reals called Real Number Structure (RNS) which defines any instance of the reals as a complete ordered Archimedean field. AFAIK the history is roughly as follows : Russel had a version of the constructible reals fast and he tried to prove that they form an RNS. However, it was easier to instead build a bijection between his reals and an existing model of the reals in CoRN by Niqui using a theorem that all real number structures are isomorphic. Then Robbert and Bas optimized Russel's code using typeclasses resulting in another implementation called faster . Conceptually, one can vary the base field - for example, by working over machine integers instead of integers. However, the "faster" implementation was not shown to be an RNS.

Together with @Matteo Cipollina , we have been working on constructible reals and on porting CoRN's RNS and fast. In addition, Russel provides a Haskell implementation at the end of his paper, which we have already ported into Lean. @Bas Spitters , I heard from Freek that you've been working on constructible reals too. There is also @Alex Meiburg 's ComputableReals that follows a different approach.

This raises the question: do we want the notion of RNS in mathlib too? Presumably we would need the bijection between the computable and noncomputable reals. Maybe the work of defining this and showing that it is an isomorphism is similar to showing that all RNSs are isomorphic?

Aaron Liu (Sep 22 2025 at 18:51):

Michail Karatarakis said:

This raises the question: do we want the notion of RNS in mathlib too? Presumably we would need the bijection between the computable and noncomputable reals. Maybe the work of defining this and showing that it is an isomorphism is similar to showing that all RNSs are isomorphic?

Is this similar to docs#ConditionallyCompleteLinearOrderedField

Bas Spitters (Sep 23 2025 at 08:41):

@Michail Karatarakis Thanks for the summary. That's roughly correct. For context, Russell did his PhD with me.
Then we did developed math-classes, type classes for maths, with Eelis (http://dx.doi.org/10.1017/S0960129511000119), which is the basis for type classes usage in Lean.
We used this work with Robbert to make a faster and more modular implementation of the reals in math-classes .
https://c-corn.github.io/pub.html

Great you're porting the reals to Lean!
Why did you choose to not to use the faster, type class based implementation?

One more bit of information, the classical reals in the Rocq stdlib are now based on the constructive reals. Classical logic is only added in the end.

Since you asked, I'm not porting the reals myself (but I'm very happy you are!). I was discussing with @Bhavik Mehta (and others) that it would facilitate some of the issues they were having in their proofs.

I'd be happy to discuss more...

Bas Spitters (Sep 23 2025 at 09:26):

#general > Discussion: ComputableReal

Last updated: Feb 28 2026 at 14:05 UTC