Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: Theorem Proving for Machine Learning

GasStationManager (Jan 06 2026 at 03:28):

(Sorry, couldn't resist the title)

I am interested in (auto)formalizations of the theoretical foundations of machine learning / AI. And I'm hoping other AI/ML researchers on this channel are interested in potential collaborations.

I have created a github repo. AI-generated / AI-assisted contributions are explicitly allowed and encouraged, with the following quality assurance:

Formal theorem statements are vetted by human experts.
Proofs are checked by secure verifiers like Comparator & SafeVerify. (So far I have a script based on SafeVerify; I am very interested in getting Comparator set up as well. )

Among its first contributions is my formalization of the classical result of the convergence of stochastic gradient descent. This includes a formalization of the Robbins-Siegmund theorem from stochastic approximation theory, which could be of independent interest.

I think this could be a potential model for effective collaborations between subject-area experts and experts in Lean autoformalization. The subject-area experts only have to know enough Lean to verify the Lean theorem statements are faithful translations of the original informal theorems. For example, you are welcome to contribute formalizations of theorem statements that you'd like to see formally proved.

Comments and suggestions appreciated!

Jason Rute (Jan 06 2026 at 04:13):

You might be interested in this Rocq project by my previous colleagues at IBM Research: https://github.com/IBM/FormalML

GasStationManager (Jan 06 2026 at 06:04):

Thanks for the pointer!

On the Lean side, there's also the recent formalization of convergence of Q-Learning: https://github.com/ShangtongZhang/rl-theory-in-lean

And of generalization error bound by Radmacher complexity: https://github.com/auto-res/lean-rademacher

These are by experts in the respective areas, and they appeared to have taken significant efforts to formalize. In the companion paper of the Q-Learning result, the author mentions that LLMs were used to help with Mathlib research and to close small lemmas. I think AI could offer more in these formalization efforts; e.g. I can prove the convergence of stochastic gradient descent as a non-expert in the underlying theory (I had not even heard of the Robbins-Siegmund theorem beforehand).

Jason Rute (Jan 06 2026 at 12:58):

cc @Avi Shinnar

Eric Vergo (Jan 07 2026 at 06:13):

Love what’s going on here; I have not heard of Robbins-Siegmund either and will read about it.

This is something I vibe-coded a little while ago, and updated it today after seeing this post. https://e-vergo.github.io/LEAN_mnist/

Not sure it's exactly what you're looking for, but it's a working MLP that trains on the classic MNIST handwritten dataset, written entirely in Lean 4 (using SciLean). It achieves ~93% accuracy on the full 60K training set using a 784->128->10 dense MLP, which is pretty run of the mill.

It also includes some verification work: 3 gradient correctness theorems resting on 7 axioms. The MLP does really work, but the proofs aren't about the actual floating-point implementation. Because of the Float/Real gap in Lean 4 (no Flocq equivalent), the proofs are about an abstract version over the reals.

The goal was to start exploring the entire training pipeline for proof/verification opportunities. The idea is rather than focus on the neural net, treat the entire training process as the machine being studied, which includes the NN being trained. With that framing, "What parts of the machine can we prove things about?" was a motivating question. Data loading, backprop, results logging, everything. It's rough around the edges, but if you and/or others are interested I can keep plugging away. As it stands, I haven't done any verification work outside of the NN itself, but that's on the to do list. The reality is that this is a lot of work and I'm not sure if it has any value.

Also, I tried integrating ArtificialTheorems for the SGD convergence proofs, but hit a version mismatch (SciLean locks to Lean 4.20.1, ArtificialTheorems needs 4.23.0). Once SciLean upgrades, those convergence theorems could replace 3-4 of my axioms (I think).

GitHub: https://github.com/e-vergo/LEAN_mnist

GasStationManager (Jan 08 2026 at 23:23):

Thanks @Eric Vergo for sharing this! I'll take a closer look.

I think formal verification of end-to-end properties of an NN system would be very useful, and especially important to AI safety; and that is why I am interested in this direction. But it is still an open problem; I think an active area of research, including from the mechanistic interpretability angle, but we currently don't know how to do this in a scalable way.

My current goal with this repo is by (auto)-formalizing foundational ML theory, we build tools that will be useful towards answering these type of questions. We don't need complete coverage of undergraduate AI/ML (as if whatever definition we come up with won't become obsolete in a year). But I do think some ML theorems will stay relevant and important.

Some of the stuff on my wish list include:

universal representation theorems for deep neural nets
generalization theory (there's the Radmacher complexity paper mentioned above, but more)
implicit regularization (how training via SGD can achieve generalization)
RL theory
Bayesian learning. And perhaps building on top of that, Singular Learning Theory.

I think looking at an MLP with the MNIST dataset is a great way to get a sense of the different components of an ML system, how they fit together, and to what extent can they be formally verified.

GasStationManager (Jan 08 2026 at 23:41):

I think the backprop work is interesting. Ultimately, what we ideally want is an auto-diff library with formal guarantees. Elsewhere on zulip there is an announcement of JaxProof which looks quite interesting.

But at some point we do need to formally deal with floating point and how they affect accuracy. My intuition is that a good theoretical treatment should take into account the optimization process. SGD is an algorithm that appears to be extremely robust to noise. A recent trend is to do large-scale training using low-precision number representations, while still achieving good accuracy. To what extent can this be formalized?

Eric Vergo (Jan 09 2026 at 00:08):

GasStationManager said:

But at some point we do need to formally deal with floating point and how they affect accuracy. My intuition is that a good theoretical treatment should take into account the optimization process. SGD is an algorithm that appears to be extremely robust to noise. A recent trend is to do large-scale training using low-precision number representations, while still achieving good accuracy. To what extent can this be formalized?

yes, I had thoughts in this space as well. In the limit we could study binary neural nets, which may permit to being modeled in convenient ways.

Jason Rute (Feb 09 2026 at 12:54):

This looks like another work in this topic: https://x.com/fanghui_sgra/status/2020773004113010918?s=46

Last updated: Feb 28 2026 at 14:05 UTC