Zulip Chat Archive

Stream: mathlib4

Topic: Status of the Central limit theorem

Josha Dekker (Dec 24 2023 at 12:45):

I couldn’t find a thread on this, but I’m on mobile, so I might’ve missed this. What is the current status on the central limit theorem in Mathlib?

Is this work in progress or is this wide open?
In any case, are there important components (likely) missing?

Yaël Dillies (Dec 24 2023 at 14:01):

This is a question for @Jason KY.

Rémy Degenne (Dec 24 2023 at 14:14):

The main missing piece for the usual proof of that theorem is the definition of the characteristic function, and proofs of its properties.
I wrote a definition (and basically nothing else) a while ago here: https://github.com/leanprover-community/mathlib4/compare/RD_char_fun .
I you want to work on it, feel free to take this definition (or discard it).

Josha Dekker (Dec 24 2023 at 14:38):

Okay, I’ll see if I can get around to that at some stage. I haven’t yet invested too much time into the measure theory parts of Mathlib, I’ll probably turn my attention there once I finish working on Lindelöf spaces.

Josha Dekker (Dec 24 2023 at 21:26):

I think that a feasible approach would be adapting some of the first chapter of Applebaum’s Lévy Processes and Stochastic Calculus. This provides the basic tools to work with characteristic functions (incl eg Lévy’s continuity theorem) and would also lay some more foundations for Lévy process theory at some point. If you’re aware of a better source, please let me know! I’ll try to adapt everything to more general measurable spaces where possible, of course! I’ll work on this somewhere in the new year, can’t say for certain yet when!

Josha Dekker (Dec 28 2023 at 09:10):

@Rémy Degenne your file looks very good, I'll add some more basic API and make a PR (linearity, Lévy's continuity theorem etc.). Shall I include you in the copyright statement? You provided all the essential pieces!

Rémy Degenne (Dec 28 2023 at 14:25):

@Josha Dekker I wanted to try setting up a blueprint to learn how it works, and your question about the CLT gave me a good reason: over the last two days I created a project with a blueprint for a proof of CLT using characteristic functions. See https://remydegenne.github.io/CLT/index.html for the blueprint, https://github.com/RemyDegenne/CLT for the github repository and https://remydegenne.github.io/CLT/dep_graph_document.html for the dependency graph. I have added my characteristic function code to that repository.

If anyone wants to contribute to the CLT proof, feel free to fork the repository and then open a PR!
The blueprint is very much WIP: it is not yet very detailed and many proofs are missing in the latex. I'll continue working on it, and PRs for that aspect of the project are also welcome. The home page of the blueprint lists a few helpful books.

The proof goes through Prokhorov's theorem, and I think that Kalle Kytölä is working on that. @Kalle Kytölä perhaps you'd like to expand the blueprint with details about your preferred way of proving that theorem?

The central limit theorem is one of the most embarrassing gaps in our probability library: let's close it!

Josha Dekker (Dec 28 2023 at 14:34):

Thanks, I’ll take a look later! I’ll probably try to work on building some more API for characteristic functions, something like Lévy’s continuity criterion is useful in multiple results!

Josha Dekker (Dec 28 2023 at 22:31):

We might consider a more general CLT like the Lindeberg-Feller CLT as a follow up, which also is proven through characteristic functions and doesn’t require the observations to be iid (which probably makes the approach a bit longer).

Josha Dekker (Dec 29 2023 at 08:11):

I'll take a shot at lemma 4.6!

Josha Dekker (Jan 16 2024 at 15:01):

(I'm not working on that at this point. I was first trying to work on convolution of measures for this, but hit a bump as my proficiency with integrals in Mathlib is too lacking)

Kalle Kytölä (Feb 10 2024 at 21:21):

Sorry for the very late reply!

Rémy Degenne said:

The proof goes through Prokhorov's theorem, and I think that Kalle Kytölä is working on that. Kalle Kytölä perhaps you'd like to expand the blueprint with details about your preferred way of proving that theorem?

Yes, I indeed have a preferred way to do the Prokhorov's theorem :grinning_face_with_smiling_eyes: :innocent:. I checked your blueprint, and it matches my preferred way --- I want to do it via Riesz-Markov-Kakutani. ~~Is there a way I can contribute to the blueprint?~~ [EDIT: I see now you said how to do that.] I don't have an awful lot to add, though... The plan is to use Riesz-Markov-Kakutani on compact spaces $X$ to identify the space of probability measures on $X$ with a closed subset of the unit ball of the weak dual of continuous functions, and get its compactness from Banach-Alaoglu. This gives Prokhorov's theorem on compact spaces $X$ . Then on general Polish spaces Prokhorov's theorem reduces to the case of compact spaces by omitting an $\varepsilon>0$ of total mass of tight families.

(I'd also want metrizability of the convergence in distribution, which is what I am doing in #10406 etc. This gets rid of any barriers between compactness and sequential compactness, for example.)

(Riesz-Markov-Kakutani on compact spaces has been proved by Jesse Reimann some years ago, but it is not in Mathlib. The PRs of the main content have essentially not been started because of stupid issues regarding the handling subtractions of bounded continuous NNReal-valued functions etc. The plan forward involved a bit of a refactor, but I have not had time to get started with it...)

Rémy Degenne said:

The central limit theorem is one of the most embarrassing gaps in our probability library: let's close it!

Yes!

If the bottleneck is Prokhorov, then there is, of course, a much simpler way for $\R$ via cumulative distribution functions ("Helly's selection theorem"). I have been intending to PR the cdf characterization of convergence in distribution for some time... I will try to do that soon (it is almost trivial with the portmanteau theorem, which exists in a sufficient form in Mathlib already), so that if someone wants to take the route of Helly's selection, that should not hold back the Central Limit Theorem.

(I suppose indeed the other needed component for CLT is inversion of characteristic functions, i.e., suitable Fourier theory. In my mind there might be one tricky lemma that is often proven by complex contour integrals, although ways around that seem to exist. Anyway, someone else knows about the status of that better than I do.)

Rémy Degenne (Apr 12 2025 at 06:56):

Central limit theorem update

There has been some work happening over at https://github.com/RemyDegenne/CLT and we now have a proof of the central limit theorem for real random variables, if we assume Prokhorov's theorem.

Jakob Stiefel has recently added to Mathlib a proof that characteristic functions separate measures, which is a key ingredient.
Using that, we were able to prove the Lévy convergence theorem (weak convergence of measures equivalent to convergence of characteristic functions), assuming Prokhorov's theorem.

@Thomas Zhu then proved the CLT from there, which needed facts about characteristic functions, independence and Gaussian random variables.

Prokhorov's theorem is being worked on by Arav Bhattacharyya , and once that's done our proof will be complete.
You might spot another blue node in the graph leading to the CLT corresponding to a version of Taylor's theorem, but there is an open Mathlib PR for that.

We are making PRs to Mathlib with the material that does not depend on Prokhorov.

Sébastien Gouëzel (Apr 12 2025 at 06:59):

Regarding Prokhorov, I've just sent the Riesz-Markov-Kakutani theorem of #12290 to bors. This should be an important ingredient for Prokhorov.

Rémy Degenne (Apr 12 2025 at 07:00):

Yes, if I recall correctly Arav is assuming RMK and going from there.

Sébastien Gouëzel (Apr 12 2025 at 07:02):

Could you add a link to the blueprint on the readme of the project? Otherwise, there is no direct way to find the blueprint when one is on the github page.

Rémy Degenne (Apr 12 2025 at 07:02):

There is a link in the "About" section on the right of the github page, but I'll add it to the readme as well.

Sébastien Gouëzel (Apr 12 2025 at 07:03):

Ah yes, thanks!

Rémy Degenne (Apr 12 2025 at 07:05):

And I need to restore the docs too. For now the "LEAN" links in the blueprint point nowhere, so it's a bit hard to find the results in the code.

Rémy Degenne (Apr 12 2025 at 08:30):

In the dependency graph I used a purple border color to denote things that are already in Mathlib, but I don't know how to add an entry in the legend for that color. Does anyone know?

Patrick Massot (Apr 12 2025 at 16:39):

It looks like I forgot that case in the function that creates the legend.

Patrick Massot (Apr 12 2025 at 16:42):

I just pushed a fix. Is your blueprint generated using the master branch of leanblueprint or do you take it from pypi (hence need a release)?

Yaël Dillies (Apr 12 2025 at 16:58):

Rémy Degenne said:

In the dependency graph I used a purple border color to denote things that are already in Mathlib

:idea: New blueprint color discovered

Rémy Degenne (Apr 12 2025 at 17:46):

Patrick Massot said:

I just pushed a fix. Is your blueprint generated using the master branch of leanblueprint or do you take it from pypi (hence need a release)?

My CI gets leanblueprint with pip install leanblueprint, so I guess I will have to wait for a release. That's not a big issue anyway. It can wait.

Rémy Degenne (Apr 12 2025 at 17:49):

Yaël Dillies said:

:idea: New blueprint color discovered

The default color for \mathlibok is a slightly darker green than the \leanok color. I can't easily tell them apart, so I replaced it with purple.

Thomas Zhu (Apr 12 2025 at 18:15):

You can use pip install git+https://github.com/PatrickMassot/leanblueprint.git@master for installing from the newest github version (or pip install --upgrade --force-reinstall --no-deps git+https://github.com/PatrickMassot/leanblueprint.git@master to force override an existing install)

Arav Bhattacharyya (Apr 12 2025 at 18:29):

Regarding Prokhorov- the easy direction is pretty much done. The other direction (requiring RMK) I will start soon. I’m having to take a little break from working properly on this right now as I have exams over May, but intend to get it done over June. I don’t want to hold up the CLT going in mathlib so if anyone wants to get it done sooner I don’t mind!

Patrick Massot (Apr 12 2025 at 18:38):

Rémy Degenne said:

My CI gets leanblueprint with pip install leanblueprint, so I guess I will have to wait for a release. That's not a big issue anyway. It can wait.

Releases are cheap. Could you try it locally using master? I didn’t test my fix at all…

Rémy Degenne (Apr 12 2025 at 18:42):

I tested in local: I have a new legend, but it says

The statement of this result is in Mathlib border
    this is in Mathlib

So there is "The statement of this result is in Mathlib" where I would expect the color. But I suspect it's my fault. I changed the color with

\graphcolor{mathlib}{purple}{The statement of this result is in Mathlib}

Patrick Massot (Apr 12 2025 at 18:43):

Ah yes, the laster argument is meant to be a description of the border.

Patrick Massot (Apr 12 2025 at 18:43):

So Purple would be fine.

Patrick Massot (Apr 12 2025 at 18:44):

It looks redundant with the previous argument, but the previous argument could a html color description such as #ff10aa

Rémy Degenne (Apr 12 2025 at 18:44):

Thanks! It works perfectly in local now.

Rémy Degenne (Apr 12 2025 at 18:48):

Arav Bhattacharyya said:

Regarding Prokhorov- the easy direction is pretty much done. The other direction (requiring RMK) I will start soon. I’m having to take a little break from working properly on this right now as I have exams over May, but intend to get it done over June. I don’t want to hold up the CLT going in mathlib so if anyone wants to get it done sooner I don’t mind!

I am not aware of any formalization project that desperately needs the CLT right now, so take all the time you need!
If at some point your university project is done but there is still something left to do and you want help, ping me.

Yaël Dillies (Apr 12 2025 at 22:58):

Rémy Degenne said:

Yaël Dillies said:

:idea: New blueprint color discovered

The default color for \mathlibok is a slightly darker green than the \leanok color. I can't easily tell them apart, so I replaced it with purple.

:light_bulb: New blueprint macro discovered

Patrick Massot (Apr 13 2025 at 10:25):

Note this macro is documented in the leanblueprint repo README. So maybe you could discover more things by reading that file.

Patrick Massot (Apr 13 2025 at 10:52):

I just released a new version on pypi with the fix.

Last updated: May 02 2025 at 03:31 UTC