mathlib3 documentation

probability.strong_law

The strong law of large numbers #

THIS FILE IS SYNCHRONIZED WITH MATHLIB4. Any changes to this file require a corresponding PR to mathlib4.

We prove the strong law of large numbers, in probability_theory.strong_law_ae: If X n is a sequence of independent identically distributed integrable real-valued random variables, then ∑ i in range n, X i / n converges almost surely to 𝔼[X 0]. We give here the strong version, due to Etemadi, that only requires pairwise independence.

This file also contains the Lᵖ version of the strong law of large numbers provided by probability_theory.strong_law_Lp which shows ∑ i in range n, X i / n converges in Lᵖ to 𝔼[X 0] provided X n is independent identically distributed and is Lᵖ.

Implementation #

We follow the proof by Etemadi Etemadi, An elementary proof of the strong law of large numbers, which goes as follows.

It suffices to prove the result for nonnegative X, as one can prove the general result by splitting a general X into its positive part and negative part. Consider Xₙ a sequence of nonnegative integrable identically distributed pairwise independent random variables. Let Yₙ be the truncation of Xₙ up to n. We claim that

Almost surely, Xₙ = Yₙ for all but finitely many indices. Indeed, ∑ ℙ (Xₙ ≠ Yₙ) is bounded by 1 + 𝔼[X] (see sum_prob_mem_Ioc_le and tsum_prob_mem_Ioi_lt_top).
Let c > 1. Along the sequence n = c ^ k, then (∑_{i=0}^{n-1} Yᵢ - 𝔼[Yᵢ])/n converges almost surely to 0. This follows from a variance control, as

  ∑_k ℙ (|∑_{i=0}^{c^k - 1} Yᵢ - 𝔼[Yᵢ]| > c^k ε)
    ≤ ∑_k (c^k ε)^{-2} ∑_{i=0}^{c^k - 1} Var[Yᵢ]    (by Markov inequality)
    ≤ ∑_i (C/i^2) Var[Yᵢ]                           (as ∑_{c^k > i} 1/(c^k)^2 ≤ C/i^2)
    ≤ ∑_i (C/i^2) 𝔼[Yᵢ^2]
    ≤ 2C 𝔼[X^2]                                     (see `sum_variance_truncation_le`)

As 𝔼[Yᵢ] converges to 𝔼[X], it follows from the two previous items and Cesaro that, along the sequence n = c^k, one has (∑_{i=0}^{n-1} Xᵢ) / n → 𝔼[X] almost surely.
To generalize it to all indices, we use the fact that ∑_{i=0}^{n-1} Xᵢ is nondecreasing and that, if c is close enough to 1, the gap between c^k and c^(k+1) is small.

Prerequisites on truncations #

noncomputable def probability_theory.truncation {α : Type u_1} (f : α → ℝ) (A : ℝ) :

Truncating a real-valued function to the interval (-A, A].

Equations

probability_theory.truncation f A = (set.Ioc (-A) A).indicator id ∘ f

theorem measure_theory.ae_strongly_measurable.truncation {α : Type u_1} {m : measurable_space α} {μ : measure_theory.measure α} {f : α → ℝ} (hf : measure_theory.ae_strongly_measurable f μ) {A : ℝ} :

measure_theory.ae_strongly_measurable (probability_theory.truncation f A) μ

theorem probability_theory.abs_truncation_le_bound {α : Type u_1} (f : α → ℝ) (A : ℝ) (x : α) :

|probability_theory.truncation f A x| ≤ |A|

@[simp]

theorem probability_theory.truncation_zero {α : Type u_1} (f : α → ℝ) :

probability_theory.truncation f 0 = 0

theorem probability_theory.abs_truncation_le_abs_self {α : Type u_1} (f : α → ℝ) (A : ℝ) (x : α) :

|probability_theory.truncation f A x| ≤ |f x|

theorem probability_theory.truncation_eq_self {α : Type u_1} {f : α → ℝ} {A : ℝ} {x : α} (h : |f x| < A) :

probability_theory.truncation f A x = f x

theorem probability_theory.truncation_eq_of_nonneg {α : Type u_1} {f : α → ℝ} {A : ℝ} (h : ∀ (x : α), 0 ≤ f x) :

probability_theory.truncation f A = (set.Ioc 0 A).indicator id ∘ f

theorem probability_theory.truncation_nonneg {α : Type u_1} {f : α → ℝ} (A : ℝ) {x : α} (h : 0 ≤ f x) :

0 ≤ probability_theory.truncation f A x

theorem measure_theory.ae_strongly_measurable.mem_ℒp_truncation {α : Type u_1} {m : measurable_space α} {μ : measure_theory.measure α} {f : α → ℝ} [measure_theory.is_finite_measure μ] (hf : measure_theory.ae_strongly_measurable f μ) {A : ℝ} {p : ennreal} :

measure_theory.mem_ℒp (probability_theory.truncation f A) p μ

theorem measure_theory.ae_strongly_measurable.integrable_truncation {α : Type u_1} {m : measurable_space α} {μ : measure_theory.measure α} {f : α → ℝ} [measure_theory.is_finite_measure μ] (hf : measure_theory.ae_strongly_measurable f μ) {A : ℝ} :

measure_theory.integrable (probability_theory.truncation f A) μ

theorem probability_theory.moment_truncation_eq_interval_integral {α : Type u_1} {m : measurable_space α} {μ : measure_theory.measure α} {f : α → ℝ} (hf : measure_theory.ae_strongly_measurable f μ) {A : ℝ} (hA : 0 ≤ A) {n : ℕ} (hn : n ≠ 0) :

∫ (x : α), probability_theory.truncation f A x ^ n ∂μ = ∫ (y : ℝ) in -A..A, y ^ n ∂measure_theory.measure.map f μ

theorem probability_theory.moment_truncation_eq_interval_integral_of_nonneg {α : Type u_1} {m : measurable_space α} {μ : measure_theory.measure α} {f : α → ℝ} (hf : measure_theory.ae_strongly_measurable f μ) {A : ℝ} {n : ℕ} (hn : n ≠ 0) (h'f : 0 ≤ f) :

∫ (x : α), probability_theory.truncation f A x ^ n ∂μ = ∫ (y : ℝ) in 0..A, y ^ n ∂measure_theory.measure.map f μ

theorem probability_theory.integral_truncation_eq_interval_integral {α : Type u_1} {m : measurable_space α} {μ : measure_theory.measure α} {f : α → ℝ} (hf : measure_theory.ae_strongly_measurable f μ) {A : ℝ} (hA : 0 ≤ A) :

∫ (x : α), probability_theory.truncation f A x ∂μ = ∫ (y : ℝ) in -A..A, y ∂measure_theory.measure.map f μ

theorem probability_theory.integral_truncation_eq_interval_integral_of_nonneg {α : Type u_1} {m : measurable_space α} {μ : measure_theory.measure α} {f : α → ℝ} (hf : measure_theory.ae_strongly_measurable f μ) {A : ℝ} (h'f : 0 ≤ f) :

∫ (x : α), probability_theory.truncation f A x ∂μ = ∫ (y : ℝ) in 0..A, y ∂measure_theory.measure.map f μ

theorem probability_theory.integral_truncation_le_integral_of_nonneg {α : Type u_1} {m : measurable_space α} {μ : measure_theory.measure α} {f : α → ℝ} (hf : measure_theory.integrable f μ) (h'f : 0 ≤ f) {A : ℝ} :

∫ (x : α), probability_theory.truncation f A x ∂μ ≤ ∫ (x : α), f x ∂μ

theorem probability_theory.tendsto_integral_truncation {α : Type u_1} {m : measurable_space α} {μ : measure_theory.measure α} {f : α → ℝ} (hf : measure_theory.integrable f μ) :

filter.tendsto (λ (A : ℝ), ∫ (x : α), probability_theory.truncation f A x ∂μ) filter.at_top (nhds (∫ (x : α), f x ∂μ))

If a function is integrable, then the integral of its truncated versions converges to the integral of the whole function.

theorem probability_theory.ident_distrib.truncation {α : Type u_1} {m : measurable_space α} {μ : measure_theory.measure α} {β : Type u_2} [measurable_space β] {ν : measure_theory.measure β} {f : α → ℝ} {g : β → ℝ} (h : probability_theory.ident_distrib f g μ ν) {A : ℝ} :

probability_theory.ident_distrib (probability_theory.truncation f A) (probability_theory.truncation g A) μ ν

theorem probability_theory.sum_prob_mem_Ioc_le {Ω : Type u_1} [measure_theory.measure_space Ω] [measure_theory.is_probability_measure measure_theory.measure_space.volume] {X : Ω → ℝ} (hint : measure_theory.integrable X measure_theory.measure_space.volume) (hnonneg : 0 ≤ X) {K N : ℕ} (hKN : K ≤ N) :

(finset.range K).sum (λ (j : ℕ), ⇑measure_theory.measure_space.volume {ω : Ω | X ω ∈ set.Ioc ↑j ↑N}) ≤ ennreal.of_real ((∫ (a : Ω), X a) + 1)

theorem probability_theory.tsum_prob_mem_Ioi_lt_top {Ω : Type u_1} [measure_theory.measure_space Ω] [measure_theory.is_probability_measure measure_theory.measure_space.volume] {X : Ω → ℝ} (hint : measure_theory.integrable X measure_theory.measure_space.volume) (hnonneg : 0 ≤ X) :

∑' (j : ℕ), ⇑measure_theory.measure_space.volume {ω : Ω | X ω ∈ set.Ioi ↑j} < ⊤

theorem probability_theory.sum_variance_truncation_le {Ω : Type u_1} [measure_theory.measure_space Ω] [measure_theory.is_probability_measure measure_theory.measure_space.volume] {X : Ω → ℝ} (hint : measure_theory.integrable X measure_theory.measure_space.volume) (hnonneg : 0 ≤ X) (K : ℕ) :

(finset.range K).sum (λ (j : ℕ), (↑j ^ 2)⁻¹ * ∫ (a : Ω), (probability_theory.truncation X ↑j ^ 2) a) ≤ 2 * ∫ (a : Ω), X a

theorem probability_theory.strong_law_aux1 {Ω : Type u_1} [measure_theory.measure_space Ω] [measure_theory.is_probability_measure measure_theory.measure_space.volume] (X : ℕ → Ω → ℝ) (hint : measure_theory.integrable (X 0) measure_theory.measure_space.volume) (hindep : pairwise (λ (i j : ℕ), probability_theory.indep_fun (X i) (X j) measure_theory.measure_space.volume)) (hident : ∀ (i : ℕ), probability_theory.ident_distrib (X i) (X 0) measure_theory.measure_space.volume measure_theory.measure_space.volume) (hnonneg : ∀ (i : ℕ) (ω : Ω), 0 ≤ X i ω) {c : ℝ} (c_one : 1 < c) {ε : ℝ} (εpos : 0 < ε) :

∀ᵐ (ω : Ω), ∀ᶠ (n : ℕ) in filter.at_top , |(finset.range ⌊c ^ n⌋₊).sum (λ (i : ℕ), probability_theory.truncation (X i) ↑i ω) - ∫ (a : Ω), (finset.range ⌊c ^ n⌋₊).sum (λ (i : ℕ), probability_theory.truncation (X i) ↑i) a| < ε * ↑⌊c ^ n⌋₊

theorem probability_theory.strong_law_aux2 {Ω : Type u_1} [measure_theory.measure_space Ω] [measure_theory.is_probability_measure measure_theory.measure_space.volume] (X : ℕ → Ω → ℝ) (hint : measure_theory.integrable (X 0) measure_theory.measure_space.volume) (hindep : pairwise (λ (i j : ℕ), probability_theory.indep_fun (X i) (X j) measure_theory.measure_space.volume)) (hident : ∀ (i : ℕ), probability_theory.ident_distrib (X i) (X 0) measure_theory.measure_space.volume measure_theory.measure_space.volume) (hnonneg : ∀ (i : ℕ) (ω : Ω), 0 ≤ X i ω) {c : ℝ} (c_one : 1 < c) :

∀ᵐ (ω : Ω), (λ (n : ℕ), (finset.range ⌊c ^ n⌋₊).sum (λ (i : ℕ), probability_theory.truncation (X i) ↑i ω) - ∫ (a : Ω), (finset.range ⌊c ^ n⌋₊).sum (λ (i : ℕ), probability_theory.truncation (X i) ↑i) a) =o[filter.at_top ] λ (n : ℕ), ↑⌊c ^ n⌋₊

theorem probability_theory.strong_law_aux3 {Ω : Type u_1} [measure_theory.measure_space Ω] [measure_theory.is_probability_measure measure_theory.measure_space.volume] (X : ℕ → Ω → ℝ) (hint : measure_theory.integrable (X 0) measure_theory.measure_space.volume) (hident : ∀ (i : ℕ), probability_theory.ident_distrib (X i) (X 0) measure_theory.measure_space.volume measure_theory.measure_space.volume) :

(λ (n : ℕ), (∫ (a : Ω), (finset.range n).sum (λ (i : ℕ), probability_theory.truncation (X i) ↑i) a) - ↑n * ∫ (a : Ω), X 0 a) =o[filter.at_top ] coe

The expectation of the truncated version of Xᵢ behaves asymptotically like the whole expectation. This follows from convergence and Cesaro averaging.

theorem probability_theory.strong_law_aux4 {Ω : Type u_1} [measure_theory.measure_space Ω] [measure_theory.is_probability_measure measure_theory.measure_space.volume] (X : ℕ → Ω → ℝ) (hint : measure_theory.integrable (X 0) measure_theory.measure_space.volume) (hindep : pairwise (λ (i j : ℕ), probability_theory.indep_fun (X i) (X j) measure_theory.measure_space.volume)) (hident : ∀ (i : ℕ), probability_theory.ident_distrib (X i) (X 0) measure_theory.measure_space.volume measure_theory.measure_space.volume) (hnonneg : ∀ (i : ℕ) (ω : Ω), 0 ≤ X i ω) {c : ℝ} (c_one : 1 < c) :

∀ᵐ (ω : Ω), (λ (n : ℕ), (finset.range ⌊c ^ n⌋₊).sum (λ (i : ℕ), probability_theory.truncation (X i) ↑i ω) - ↑⌊c ^ n⌋₊ * ∫ (a : Ω), X 0 a) =o[filter.at_top ] λ (n : ℕ), ↑⌊c ^ n⌋₊

theorem probability_theory.strong_law_aux5 {Ω : Type u_1} [measure_theory.measure_space Ω] [measure_theory.is_probability_measure measure_theory.measure_space.volume] (X : ℕ → Ω → ℝ) (hint : measure_theory.integrable (X 0) measure_theory.measure_space.volume) (hident : ∀ (i : ℕ), probability_theory.ident_distrib (X i) (X 0) measure_theory.measure_space.volume measure_theory.measure_space.volume) (hnonneg : ∀ (i : ℕ) (ω : Ω), 0 ≤ X i ω) :

∀ᵐ (ω : Ω), (λ (n : ℕ), (finset.range n).sum (λ (i : ℕ), probability_theory.truncation (X i) ↑i ω) - (finset.range n).sum (λ (i : ℕ), X i ω)) =o[filter.at_top ] λ (n : ℕ), ↑n

The truncated and non-truncated versions of Xᵢ have the same asymptotic behavior, as they almost surely coincide at all but finitely many steps. This follows from a probability computation and Borel-Cantelli.

theorem probability_theory.strong_law_aux6 {Ω : Type u_1} [measure_theory.measure_space Ω] [measure_theory.is_probability_measure measure_theory.measure_space.volume] (X : ℕ → Ω → ℝ) (hint : measure_theory.integrable (X 0) measure_theory.measure_space.volume) (hindep : pairwise (λ (i j : ℕ), probability_theory.indep_fun (X i) (X j) measure_theory.measure_space.volume)) (hident : ∀ (i : ℕ), probability_theory.ident_distrib (X i) (X 0) measure_theory.measure_space.volume measure_theory.measure_space.volume) (hnonneg : ∀ (i : ℕ) (ω : Ω), 0 ≤ X i ω) {c : ℝ} (c_one : 1 < c) :

∀ᵐ (ω : Ω), filter.tendsto (λ (n : ℕ), (finset.range ⌊c ^ n⌋₊).sum (λ (i : ℕ), X i ω) / ↑⌊c ^ n⌋₊) filter.at_top (nhds (∫ (a : Ω), X 0 a))

theorem probability_theory.strong_law_aux7 {Ω : Type u_1} [measure_theory.measure_space Ω] [measure_theory.is_probability_measure measure_theory.measure_space.volume] (X : ℕ → Ω → ℝ) (hint : measure_theory.integrable (X 0) measure_theory.measure_space.volume) (hindep : pairwise (λ (i j : ℕ), probability_theory.indep_fun (X i) (X j) measure_theory.measure_space.volume)) (hident : ∀ (i : ℕ), probability_theory.ident_distrib (X i) (X 0) measure_theory.measure_space.volume measure_theory.measure_space.volume) (hnonneg : ∀ (i : ℕ) (ω : Ω), 0 ≤ X i ω) :

∀ᵐ (ω : Ω), filter.tendsto (λ (n : ℕ), (finset.range n).sum (λ (i : ℕ), X i ω) / ↑n) filter.at_top (nhds (∫ (a : Ω), X 0 a))

Xᵢ satisfies the strong law of large numbers along all integers. This follows from the corresponding fact along the sequences c^n, and the fact that any integer can be sandwiched between c^n and c^(n+1) with comparably small error if c is close enough to 1 (which is formalized in tendsto_div_of_monotone_of_tendsto_div_floor_pow).

theorem probability_theory.strong_law_ae {Ω : Type u_1} [measure_theory.measure_space Ω] [measure_theory.is_probability_measure measure_theory.measure_space.volume] (X : ℕ → Ω → ℝ) (hint : measure_theory.integrable (X 0) measure_theory.measure_space.volume) (hindep : pairwise (λ (i j : ℕ), probability_theory.indep_fun (X i) (X j) measure_theory.measure_space.volume)) (hident : ∀ (i : ℕ), probability_theory.ident_distrib (X i) (X 0) measure_theory.measure_space.volume measure_theory.measure_space.volume) :

∀ᵐ (ω : Ω), filter.tendsto (λ (n : ℕ), (finset.range n).sum (λ (i : ℕ), X i ω) / ↑n) filter.at_top (nhds (∫ (a : Ω), X 0 a))

Strong law of large numbers, almost sure version: if X n is a sequence of independent identically distributed integrable real-valued random variables, then ∑ i in range n, X i / n converges almost surely to 𝔼[X 0]. We give here the strong version, due to Etemadi, that only requires pairwise independence.

theorem probability_theory.strong_law_Lp {Ω : Type u_1} [measure_theory.measure_space Ω] [measure_theory.is_probability_measure measure_theory.measure_space.volume] {p : ennreal} (hp : 1 ≤ p) (hp' : p ≠ ⊤) (X : ℕ → Ω → ℝ) (hℒp : measure_theory.mem_ℒp (X 0) p measure_theory.measure_space.volume) (hindep : pairwise (λ (i j : ℕ), probability_theory.indep_fun (X i) (X j) measure_theory.measure_space.volume)) (hident : ∀ (i : ℕ), probability_theory.ident_distrib (X i) (X 0) measure_theory.measure_space.volume measure_theory.measure_space.volume) :

filter.tendsto (λ (n : ℕ), measure_theory.snorm (λ (ω : Ω), (finset.range n).sum (λ (i : ℕ), X i ω) / ↑n - ∫ (a : Ω), X 0 a) p measure_theory.measure_space.volume) filter.at_top (nhds 0)

Strong law of large numbers, Lᵖ version: if X n is a sequence of independent identically distributed real-valued random variables in Lᵖ, then ∑ i in range n, X i / n converges in Lᵖ to 𝔼[X 0].