Zulip Chat Archive

Stream: mathlib4

Topic: Hoeffding's lemma

Kei TSUKAMOTO (Oct 07 2024 at 07:22):

I would like to contribute to mathlib by formalizing Hoeffding's lemma. Could anyone enable access to mathlib for me? My GitHub account is tukamilano.

Rémy Degenne (Oct 07 2024 at 07:26):

Invite sent!

Kei TSUKAMOTO (Oct 07 2024 at 07:27):

Thank you so much!

Rémy Degenne (Oct 07 2024 at 07:35):

Do you have plans for things you want to formalize, beyond that lemma?

Kei TSUKAMOTO (Oct 07 2024 at 09:10):

Several of us are planning to formalizing the theorems in the textbook "foundation of machine learning" and aiming to formalize binary classification bounds of Rademacher complexity and linear prediction bounds.
https://cs.nyu.edu/~mohri/mlbook/

The proof of maximal inequality will be completed once I can deal with the issue relating to "sup", and some of us is trying to formalize mcdiarmid's inequality.

We haven't fully figured out how to define rademacher complexity yet.

Yaël Dillies (Oct 07 2024 at 09:17):

Very nice! Keep me updated and please request my review on your PRs. I don't do machine learning but I use a lot of the same machinery.

Rémy Degenne (Oct 07 2024 at 09:25):

About Hoeffding's lemma: its usual proof is in two steps, first proving that the cumulant generating function (cgf) of a random variable on [0,1] is less than the cgf of a Bernoulli distribution with same mean, then that a Bernoulli is sub-Gaussian. The first step is useful in itself, so please make sure to have that as an independent lemma! (for example it is used in theorem D.3 in the book you linked)

Kei TSUKAMOTO (Oct 07 2024 at 09:31):

I didn't follow the elementary proof of Hoeffding's lemma in the text I linked, but used cumulant which appears on another materials, so it's ok on that point

Kei TSUKAMOTO (Oct 17 2024 at 12:55):

As a preparation for the formalization of Hoeffding's lemma, I added the formalization of Popoviciu's inequality on variances in the Probability.Variance.lean file. Once the branch is successfully merged into the mathlib repository, I'm going to submit a pull request for Hoeffding's proof itself.

Kei TSUKAMOTO (Oct 17 2024 at 12:57):

https://github.com/leanprover-community/mathlib4/pull/17868

Rémy Degenne (Oct 17 2024 at 13:11):

Great! I'll have a look.

Yaël Dillies (Oct 17 2024 at 13:15):

On it too :search:

Kei TSUKAMOTO (Oct 22 2024 at 22:39):

I have just submitted a pull request for Hoeffding's lemma.
https://github.com/leanprover-community/mathlib4/pull/18091

Yaël Dillies (Oct 22 2024 at 22:46):

I have just been reading a pull request for Hoeffding's lemma :eyes:

Shreyas Srinivas (Oct 22 2024 at 22:50):

Kei TSUKAMOTO said:

Several of us are planning to formalizing the theorems in the textbook "foundation of machine learning" and aiming to formalize binary classification bounds of Rademacher complexity and linear prediction bounds.
https://cs.nyu.edu/~mohri/mlbook/

It's a nice book to formalise. We had a seminar on it last year. But also do check out "Understanding Machine Learning" by Shai Ben David and Shai Shalev Schwartz

Kei TSUKAMOTO (Oct 22 2024 at 22:52):

Thank you for telling me. I'll have a look

Shreyas Srinivas (Oct 22 2024 at 23:20):

Feel free to request my reviews as well. I am not an official reviewer but I do know most of the book

Kevin Buzzard (Oct 23 2024 at 05:48):

Anyone is welcome to review mathlib PRs!

Rémy Degenne (Oct 23 2024 at 07:56):

I left some preliminary comments on the PR (mainly: you should formulate your results with the definitions mgf and cgf that are in the Moments file), but I wanted to discuss here a more long term concern, assuming you want to formalize things like Hoeffding's inequality.

I had a PR for mathlib3 (https://github.com/leanprover-community/mathlib3/pull/15141) with Hoeffding's inequality for sub-Gaussian random variables (which you can combine with Hoeffding's lemma to get the inequality for bounded r.v.) and Sébastien Gouëzel rightly pointed out that it was a bit sad to not prove at the same time the Azuma-Hoeffding inequality. It has the same proof, but works for martingales and not just independent random variables.
There is however a catch: for Azuma-Hoeffding you don't have an a.e. bounded hypothesis, you should have random variables that are a.e. bounded conditionally on some sigma-algebra (or conditionally sub-Gaussian random variables, not just sub-Gaussian). That issue is often not even mentioned in books because the conditional definition is a natural extension of the non-conditional definition, but it matters when formalizing!

What it means in practice when implementing those inequalities, is that none of the lemmas that we are writing about a.e. bounded random variables will apply directly for the conditional versions. We could duplicate everything and have non-conditional statements and conditional statements, with duplicated or almost-duplicated proofs. That is not very appealing and not good for maintainability.

There was the same issue for independence versus conditional independence. The solution I adopted was to introduce a new notion of independence with respect to a kernel and a measure ProbabilityTheory.Kernel.Indep which has both non-conditional and conditional definitions as special cases. All proofs are done for that, and then we have two definitions ProbabilityTheory.Indep and ProbabilityTheory.CondIndep that are special cases of the kernel definition. So we have 3 definitions, but only one proof: the proofs for the two special cases are only one-liners. The kernel proof is essentially the conditional proof, but writing it with a kernel instead of a conditional expectation makes the expressions simpler and easier to manipulate in Lean.

The solution transfers directly to the issue we have with concentration inequalities: instead of writing lemmas for a.e. bounded random variables (∀ᵐ ω ∂μ, X ω ∈ Set.Icc a b), we could write lemmas for ∀ᵐ ω ∂μ, ∀ᵐ w ∂(κ ω), X w ∈ Set.Icc a b. Then for κ a constant kernel we get the unconditional hypothesis, and for κ equal to ProbabilityTheory.condexpKernel we get conditionally bounded.

I am not saying that you should do all of that in the current PR or even that you should be the one refactoring mgf and cgf in that way. But I wanted to caution that there is an issue that is easily overlooked when reading paper math, and describe the solution I came up with.

Yaël Dillies (Oct 23 2024 at 09:09):

Actually this makes sense. Should we refactor Popoviciu's inequality to use ae bounded wrt a kernel?

Rémy Degenne (Oct 23 2024 at 12:09):

If we want to also get conditional concentration inequalities without duplicating the proofs, then yes, that's what I am suggesting.
If we proceed in the way I described above more generally, we should give the same treatment to mgf and cgf, variance and a bunch of other things. Basically, we should do everything for kernels and then specialize.

I probably won't have time to do any of that refactoring myself in the next months.
Also I know that the kernel formulation is a bit obscure so I don't really want to demand in reviews that people write their PRs in that framework. It might put off people (especially new contributors). Better to have something for the non-conditional case that we can eventually refactor than to have nothing.

Kei TSUKAMOTO (Oct 24 2024 at 10:26):

Could you recommend any reference materials used in the formalization of probability theory? Especially, I'd like to fully understand the kernel in mathlib. I suppose probability theory in mathlib is formalized with the kernel as a starting point, and varies from the sequence of standard probability textbooks, and there are some parts that are difficult to understand just by reading the Mathlib files.

Ruben Van de Velde (Oct 24 2024 at 10:36):

Have you read https://leanprover-community.github.io/blog/posts/basic-probability-in-mathlib/ ?

Kexing Ying (Oct 24 2024 at 10:44):

Foundations of modern probability by Kallenberg was the main reference for martingales and if am not mistaken also for kernels.

Rémy Degenne (Oct 24 2024 at 10:49):

That link does not contain much about Markov kernels specificly, it's more a broad overview.
My go-to book for that subject is Foundations of modern probability. Chapters 3 and 8 in particular for kernels in the 3rd edition.

Rémy Degenne (Oct 24 2024 at 11:05):

"Independence with respect to a kernel and a measure" as in docs#ProbabilityTheory.Kernel.Indep is not something you will find in any book. I made it up when I needed something convenient to prove properties of both conditional and non-conditional independence. It is very weakly related to (or vaguely inspired by) the transitional conditional independence ideas in https://arxiv.org/abs/2104.11547 .
Likewise, the other kernel definitions I am proposing, like "sub-Gaussian with with respect to a kernel and a measure", are not described in any book or paper.

Shreyas Srinivas (Oct 24 2024 at 12:41):

On the discrete side, "Probability and Computing" By Eli Upfal and Michael Mitzenmacher is a good reference. Quite aside from algorithms, it has a lot of probability theory

Kei TSUKAMOTO (Oct 24 2024 at 15:59):

Thank you. For now, I will read Chapters 3 and 8 of Foundations of Modern Probability.

Kei TSUKAMOTO (Oct 29 2024 at 09:34):

@Kexing Ying

Have you ever considered formalizing the interchange between integration and differentiation under the condition that the sequence of functions is uniformly integrable? I'm currently using the interchange that follows from dominated convergence theorem in the proof, but I am wondering about generalizing the statement below.

https://github.com/leanprover-community/mathlib4/pull/18091#discussion_r1814475153

Kexing Ying (Oct 29 2024 at 09:45):

I've not thought about it but I would assume it would be the same as DCT since we do have the Vitali convergence theorem.

Kei TSUKAMOTO (Dec 12 2024 at 12:12):

@Rémy Degenne
I haven’t been working on this formalization, but I’d like to resume. I have a question about this comment. How much generalization is necessary to be adopted in mathlib in this case? It doesn’t seem clear to me what integrability condition is permitted. Could you give me an example if any?

Kei TSUKAMOTO (Dec 18 2024 at 05:34):

I split the PR of Hoeffding's lemma up to integrable_expt
https://github.com/leanprover-community/mathlib4/pull/20017

waiting for the reviews!

Kei TSUKAMOTO (Jan 03 2025 at 14:18):

If anyone has time, I'd like to request a review. It's a small pull request (aiming to use for Massart's lemma).
https://github.com/leanprover-community/mathlib4/pull/20437

Kei TSUKAMOTO (Mar 09 2025 at 04:11):

I have finished revising the Hoeffding's lemma according to the definition of moment (cumulant) generating function.

Sorry for not responding sooner.
https://github.com/leanprover-community/mathlib4/pull/18091

Last updated: May 02 2025 at 03:31 UTC