Zulip Chat Archive

Stream: general

Topic: formalization of three proofs of de Finetti's theorem


Cameron Freer (Jan 20 2026 at 15:35):

I've recently finished a formalization of de Finetti's theorem for infinite exchangeable sequences on standard Borel spaces. Specifically, this project implements all three proofs from the beginning of Kallenberg's 2005 book Probabilistic Symmetries and Invariance Principles.

It involves ~43,500 lines of Lean over 113 files and 667 theorems/lemmas. The project took a little under 3 months, with extensive use of various Claude and GPT models along with the Claude skill for Lean that I've been developing (see also the earlier zulip discussion).

The three proofs each make substantial use of mathlib, though significant gaps had to be filled in each case:

  1. ViaMartingale: Reverse martingale convergence (Aldous's approach). Gap: reverse martingale infrastructure
  2. ViaL2: Elementary L² bounds (lightest dependencies). Gap: elementary detailed L² calculations for contractability
  3. ViaKoopman: Mean Ergodic Theorem. Gap: machinery for applying mean ergodic theorem in this setting

Next I plan to upstream some portions to mathlib. I'm aware that some aspects of the code can probably make better use of existing mathlib infrastructure, and I've started discussions with some probability maintainers about how to improve the code. There may also be some opportunities to make use of machinery developed in the recent Brownian motion project.

Any comments or suggestions are very welcome -- thanks!

Jason Rute (Jan 20 2026 at 16:06):

Cameron given how critical statistics is for science, machine learning, finance, decision making, etc, but also how much statistics relies on deep math (like this theorem), what are your thoughts on formally verified statistics? For example could we get to a point where we are making statistical inferences in a formalized way such that the statistical assumptions we are assuming easily pop out of the formalization? And if so, would that have value or would the ideal assumptions needed to apply the statistical tools be too strong and unrealistic to tell us anything useful, or so obvious that we don’t need formalization to keep track of them.

Peter Pfaffelhuber (Jan 20 2026 at 22:47):

In my opinion, (mathematical) statistics is clearly missing in Mathlib, but many foundations are already implemented. As an example, a sufficient statistics could be defined right away.

Lua Viana Reis (Jan 21 2026 at 00:00):

I'd love to see a fourth proof via Markov categories! Their definition is already in Mathlib. I started writing the ergodic decomposition via them, but stopped to focus on my defense.


Last updated: Feb 28 2026 at 14:05 UTC