Documentation

Mathlib.Probability.StrongLaw

The strong law of large numbers #

We prove the strong law of large numbers, in ProbabilityTheory.strong_law_ae: If X n is a sequence of independent identically distributed integrable random variables, then ∑ i ∈ range n, X i / n converges almost surely to 𝔼[X 0]. We give here the strong version, due to Etemadi, that only requires pairwise independence.

This file also contains the Lᵖ version of the strong law of large numbers provided by ProbabilityTheory.strong_law_Lp which shows ∑ i ∈ range n, X i / n converges in Lᵖ to 𝔼[X 0] provided X n is independent identically distributed and is Lᵖ.

Implementation #

The main point is to prove the result for real-valued random variables, as the general case of Banach-space-valued random variables follows from this case and approximation by simple functions. The real version is given in ProbabilityTheory.strong_law_ae_real.

We follow the proof by Etemadi Etemadi, An elementary proof of the strong law of large numbers, which goes as follows.

It suffices to prove the result for nonnegative X, as one can prove the general result by splitting a general X into its positive part and negative part. Consider Xₙ a sequence of nonnegative integrable identically distributed pairwise independent random variables. Let Yₙ be the truncation of Xₙ up to n. We claim that

Almost surely, Xₙ = Yₙ for all but finitely many indices. Indeed, ∑ ℙ (Xₙ ≠ Yₙ) is bounded by 1 + 𝔼[X] (see sum_prob_mem_Ioc_le and tsum_prob_mem_Ioi_lt_top).
Let c > 1. Along the sequence n = c ^ k, then (∑_{i=0}^{n-1} Yᵢ - 𝔼[Yᵢ])/n converges almost surely to 0. This follows from a variance control, as

  ∑_k ℙ (|∑_{i=0}^{c^k - 1} Yᵢ - 𝔼[Yᵢ]| > c^k ε)
    ≤ ∑_k (c^k ε)^{-2} ∑_{i=0}^{c^k - 1} Var[Yᵢ]    (by Markov inequality)
    ≤ ∑_i (C/i^2) Var[Yᵢ]                           (as ∑_{c^k > i} 1/(c^k)^2 ≤ C/i^2)
    ≤ ∑_i (C/i^2) 𝔼[Yᵢ^2]
    ≤ 2C 𝔼[X^2]                                     (see `sum_variance_truncation_le`)

As 𝔼[Yᵢ] converges to 𝔼[X], it follows from the two previous items and Cesàro that, along the sequence n = c^k, one has (∑_{i=0}^{n-1} Xᵢ) / n → 𝔼[X] almost surely.
To generalize it to all indices, we use the fact that ∑_{i=0}^{n-1} Xᵢ is nondecreasing and that, if c is close enough to 1, the gap between c^k and c^(k+1) is small.

Prerequisites on truncations #

def ProbabilityTheory.truncation {α : Type u_1} (f : α → ℝ) (A : ℝ) :

α → ℝ

Truncating a real-valued function to the interval (-A, A].

Equations

ProbabilityTheory.truncation f A = (Set.Ioc (-A) A).indicator id ∘ f

Instances For

theorem MeasureTheory.AEStronglyMeasurable.truncation {α : Type u_1} {m : MeasurableSpace α} {μ : Measure α} {f : α → ℝ} (hf : AEStronglyMeasurable f μ) {A : ℝ} :

AEStronglyMeasurable (ProbabilityTheory.truncation f A) μ

theorem ProbabilityTheory.abs_truncation_le_bound {α : Type u_1} (f : α → ℝ) (A : ℝ) (x : α) :

|truncation f A x| ≤ |A|

@[simp]

theorem ProbabilityTheory.truncation_zero {α : Type u_1} (f : α → ℝ) :

truncation f 0 = 0

theorem ProbabilityTheory.abs_truncation_le_abs_self {α : Type u_1} (f : α → ℝ) (A : ℝ) (x : α) :

|truncation f A x| ≤ |f x|

theorem ProbabilityTheory.truncation_eq_self {α : Type u_1} {f : α → ℝ} {A : ℝ} {x : α} (h : |f x| < A) :

truncation f A x = f x

theorem ProbabilityTheory.truncation_eq_of_nonneg {α : Type u_1} {f : α → ℝ} {A : ℝ} (h : ∀ (x : α), 0 ≤ f x) :

truncation f A = (Set.Ioc 0 A).indicator id ∘ f

theorem ProbabilityTheory.truncation_nonneg {α : Type u_1} {f : α → ℝ} (A : ℝ) {x : α} (h : 0 ≤ f x) :

0 ≤ truncation f A x

theorem MeasureTheory.AEStronglyMeasurable.memLp_truncation {α : Type u_1} {m : MeasurableSpace α} {μ : Measure α} {f : α → ℝ} [IsFiniteMeasure μ] (hf : AEStronglyMeasurable f μ) {A : ℝ} {p : ENNReal} :

MemLp (ProbabilityTheory.truncation f A) p μ

theorem MeasureTheory.AEStronglyMeasurable.integrable_truncation {α : Type u_1} {m : MeasurableSpace α} {μ : Measure α} {f : α → ℝ} [IsFiniteMeasure μ] (hf : AEStronglyMeasurable f μ) {A : ℝ} :

Integrable (ProbabilityTheory.truncation f A) μ

theorem ProbabilityTheory.moment_truncation_eq_intervalIntegral {α : Type u_1} {m : MeasurableSpace α} {μ : MeasureTheory.Measure α} {f : α → ℝ} (hf : MeasureTheory.AEStronglyMeasurable f μ) {A : ℝ} (hA : 0 ≤ A) {n : ℕ} (hn : n ≠ 0) :

∫ (x : α), truncation f A x ^ n ∂μ = ∫ (y : ℝ) in -A..A, y ^ n ∂MeasureTheory.Measure.map f μ

theorem ProbabilityTheory.moment_truncation_eq_intervalIntegral_of_nonneg {α : Type u_1} {m : MeasurableSpace α} {μ : MeasureTheory.Measure α} {f : α → ℝ} (hf : MeasureTheory.AEStronglyMeasurable f μ) {A : ℝ} {n : ℕ} (hn : n ≠ 0) (h'f : 0 ≤ f) :

∫ (x : α), truncation f A x ^ n ∂μ = ∫ (y : ℝ) in 0..A, y ^ n ∂MeasureTheory.Measure.map f μ

theorem ProbabilityTheory.integral_truncation_eq_intervalIntegral {α : Type u_1} {m : MeasurableSpace α} {μ : MeasureTheory.Measure α} {f : α → ℝ} (hf : MeasureTheory.AEStronglyMeasurable f μ) {A : ℝ} (hA : 0 ≤ A) :

∫ (x : α), truncation f A x ∂μ = ∫ (y : ℝ) in -A..A, y ∂MeasureTheory.Measure.map f μ

theorem ProbabilityTheory.integral_truncation_eq_intervalIntegral_of_nonneg {α : Type u_1} {m : MeasurableSpace α} {μ : MeasureTheory.Measure α} {f : α → ℝ} (hf : MeasureTheory.AEStronglyMeasurable f μ) {A : ℝ} (h'f : 0 ≤ f) :

∫ (x : α), truncation f A x ∂μ = ∫ (y : ℝ) in 0..A, y ∂MeasureTheory.Measure.map f μ

theorem ProbabilityTheory.integral_truncation_le_integral_of_nonneg {α : Type u_1} {m : MeasurableSpace α} {μ : MeasureTheory.Measure α} {f : α → ℝ} (hf : MeasureTheory.Integrable f μ) (h'f : 0 ≤ f) {A : ℝ} :

∫ (x : α), truncation f A x ∂μ ≤ ∫ (x : α), f x ∂μ

theorem ProbabilityTheory.tendsto_integral_truncation {α : Type u_1} {m : MeasurableSpace α} {μ : MeasureTheory.Measure α} {f : α → ℝ} (hf : MeasureTheory.Integrable f μ) :

Filter.Tendsto (fun (A : ℝ) => ∫ (x : α), truncation f A x ∂μ) Filter.atTop (nhds (∫ (x : α), f x ∂μ))

If a function is integrable, then the integral of its truncated versions converges to the integral of the whole function.

theorem ProbabilityTheory.IdentDistrib.truncation {α : Type u_1} {m : MeasurableSpace α} {μ : MeasureTheory.Measure α} {β : Type u_2} [MeasurableSpace β] {ν : MeasureTheory.Measure β} {f : α → ℝ} {g : β → ℝ} (h : IdentDistrib f g μ ν) {A : ℝ} :

IdentDistrib (ProbabilityTheory.truncation f A) (ProbabilityTheory.truncation g A) μ ν

theorem ProbabilityTheory.sum_prob_mem_Ioc_le {Ω : Type u_1} [MeasureTheory.MeasureSpace Ω] [MeasureTheory.IsProbabilityMeasure MeasureTheory.volume] {X : Ω → ℝ} (hint : MeasureTheory.Integrable X MeasureTheory.volume) (hnonneg : 0 ≤ X) {K N : ℕ} (hKN : K ≤ N) :

∑ j ∈ Finset.range K, MeasureTheory.volume {ω : Ω | X ω ∈ Set.Ioc ↑j ↑N} ≤ ENNReal.ofReal ((∫ (a : Ω), X a) + 1)

theorem ProbabilityTheory.tsum_prob_mem_Ioi_lt_top {Ω : Type u_1} [MeasureTheory.MeasureSpace Ω] [MeasureTheory.IsProbabilityMeasure MeasureTheory.volume] {X : Ω → ℝ} (hint : MeasureTheory.Integrable X MeasureTheory.volume) (hnonneg : 0 ≤ X) :

∑' (j : ℕ), MeasureTheory.volume {ω : Ω | X ω ∈ Set.Ioi ↑j} < ⊤

theorem ProbabilityTheory.sum_variance_truncation_le {Ω : Type u_1} [MeasureTheory.MeasureSpace Ω] [MeasureTheory.IsProbabilityMeasure MeasureTheory.volume] {X : Ω → ℝ} (hint : MeasureTheory.Integrable X MeasureTheory.volume) (hnonneg : 0 ≤ X) (K : ℕ) :

∑ j ∈ Finset.range K, (↑j ^ 2)⁻¹ * ∫ (a : Ω), (truncation X ↑j ^ 2) a ≤ 2 * ∫ (a : Ω), X a

Proof of the strong law of large numbers (almost sure version, assuming only pairwise independence) for nonnegative random variables, following Etemadi's proof.

theorem ProbabilityTheory.strong_law_aux1 {Ω : Type u_1} [MeasureTheory.MeasureSpace Ω] [MeasureTheory.IsProbabilityMeasure MeasureTheory.volume] (X : ℕ → Ω → ℝ) (hint : MeasureTheory.Integrable (X 0) MeasureTheory.volume) (hindep : Pairwise (Function.onFun (fun (f g : Ω → ℝ) => IndepFun f g MeasureTheory.volume) X)) (hident : ∀ (i : ℕ), IdentDistrib (X i) (X 0) MeasureTheory.volume MeasureTheory.volume) (hnonneg : ∀ (i : ℕ) (ω : Ω), 0 ≤ X i ω) {c : ℝ} (c_one : 1 < c) {ε : ℝ} (εpos : 0 < ε) :

∀ᵐ (ω : Ω), ∀ᶠ (n : ℕ) in Filter.atTop , |∑ i ∈ Finset.range ⌊c ^ n⌋₊, truncation (X i) (↑i) ω - ∫ (a : Ω), (∑ i ∈ Finset.range ⌊c ^ n⌋₊, truncation (X i) ↑i) a| < ε * ↑⌊c ^ n⌋₊

The truncation of Xᵢ up to i satisfies the strong law of large numbers (with respect to the truncated expectation) along the sequence c^n, for any c > 1, up to a given ε > 0. This follows from a variance control.

theorem ProbabilityTheory.strong_law_aux2 {Ω : Type u_1} [MeasureTheory.MeasureSpace Ω] [MeasureTheory.IsProbabilityMeasure MeasureTheory.volume] (X : ℕ → Ω → ℝ) (hint : MeasureTheory.Integrable (X 0) MeasureTheory.volume) (hindep : Pairwise (Function.onFun (fun (f g : Ω → ℝ) => IndepFun f g MeasureTheory.volume) X)) (hident : ∀ (i : ℕ), IdentDistrib (X i) (X 0) MeasureTheory.volume MeasureTheory.volume) (hnonneg : ∀ (i : ℕ) (ω : Ω), 0 ≤ X i ω) {c : ℝ} (c_one : 1 < c) :

∀ᵐ (ω : Ω), (fun (n : ℕ) => ∑ i ∈ Finset.range ⌊c ^ n⌋₊, truncation (X i) (↑i) ω - ∫ (a : Ω), (∑ i ∈ Finset.range ⌊c ^ n⌋₊, truncation (X i) ↑i) a) =o[Filter.atTop ] fun (n : ℕ) => ↑⌊c ^ n⌋₊

The truncation of Xᵢ up to i satisfies the strong law of large numbers (with respect to the truncated expectation) along the sequence c^n, for any c > 1. This follows from strong_law_aux1 by varying ε.

theorem ProbabilityTheory.strong_law_aux3 {Ω : Type u_1} [MeasureTheory.MeasureSpace Ω] [MeasureTheory.IsProbabilityMeasure MeasureTheory.volume] (X : ℕ → Ω → ℝ) (hint : MeasureTheory.Integrable (X 0) MeasureTheory.volume) (hident : ∀ (i : ℕ), IdentDistrib (X i) (X 0) MeasureTheory.volume MeasureTheory.volume) :

(fun (n : ℕ) => (∫ (a : Ω), (∑ i ∈ Finset.range n, truncation (X i) ↑i) a) - ↑n * ∫ (a : Ω), X 0 a) =o[Filter.atTop ] Nat.cast

The expectation of the truncated version of Xᵢ behaves asymptotically like the whole expectation. This follows from convergence and Cesàro averaging.

theorem ProbabilityTheory.strong_law_aux4 {Ω : Type u_1} [MeasureTheory.MeasureSpace Ω] [MeasureTheory.IsProbabilityMeasure MeasureTheory.volume] (X : ℕ → Ω → ℝ) (hint : MeasureTheory.Integrable (X 0) MeasureTheory.volume) (hindep : Pairwise (Function.onFun (fun (f g : Ω → ℝ) => IndepFun f g MeasureTheory.volume) X)) (hident : ∀ (i : ℕ), IdentDistrib (X i) (X 0) MeasureTheory.volume MeasureTheory.volume) (hnonneg : ∀ (i : ℕ) (ω : Ω), 0 ≤ X i ω) {c : ℝ} (c_one : 1 < c) :

∀ᵐ (ω : Ω), (fun (n : ℕ) => ∑ i ∈ Finset.range ⌊c ^ n⌋₊, truncation (X i) (↑i) ω - ↑⌊c ^ n⌋₊ * ∫ (a : Ω), X 0 a) =o[Filter.atTop ] fun (n : ℕ) => ↑⌊c ^ n⌋₊

The truncation of Xᵢ up to i satisfies the strong law of large numbers (with respect to the original expectation) along the sequence c^n, for any c > 1. This follows from the version from the truncated expectation, and the fact that the truncated and the original expectations have the same asymptotic behavior.

theorem ProbabilityTheory.strong_law_aux5 {Ω : Type u_1} [MeasureTheory.MeasureSpace Ω] [MeasureTheory.IsProbabilityMeasure MeasureTheory.volume] (X : ℕ → Ω → ℝ) (hint : MeasureTheory.Integrable (X 0) MeasureTheory.volume) (hident : ∀ (i : ℕ), IdentDistrib (X i) (X 0) MeasureTheory.volume MeasureTheory.volume) (hnonneg : ∀ (i : ℕ) (ω : Ω), 0 ≤ X i ω) :

∀ᵐ (ω : Ω), (fun (n : ℕ) => ∑ i ∈ Finset.range n, truncation (X i) (↑i) ω - ∑ i ∈ Finset.range n, X i ω) =o[Filter.atTop ] fun (n : ℕ) => ↑n

The truncated and non-truncated versions of Xᵢ have the same asymptotic behavior, as they almost surely coincide at all but finitely many steps. This follows from a probability computation and Borel-Cantelli.

theorem ProbabilityTheory.strong_law_aux6 {Ω : Type u_1} [MeasureTheory.MeasureSpace Ω] [MeasureTheory.IsProbabilityMeasure MeasureTheory.volume] (X : ℕ → Ω → ℝ) (hint : MeasureTheory.Integrable (X 0) MeasureTheory.volume) (hindep : Pairwise (Function.onFun (fun (f g : Ω → ℝ) => IndepFun f g MeasureTheory.volume) X)) (hident : ∀ (i : ℕ), IdentDistrib (X i) (X 0) MeasureTheory.volume MeasureTheory.volume) (hnonneg : ∀ (i : ℕ) (ω : Ω), 0 ≤ X i ω) {c : ℝ} (c_one : 1 < c) :

∀ᵐ (ω : Ω), Filter.Tendsto (fun (n : ℕ) => (∑ i ∈ Finset.range ⌊c ^ n⌋₊, X i ω) / ↑⌊c ^ n⌋₊) Filter.atTop (nhds (∫ (a : Ω), X 0 a))

Xᵢ satisfies the strong law of large numbers along the sequence c^n, for any c > 1. This follows from the version for the truncated Xᵢ, and the fact that Xᵢ and its truncated version have the same asymptotic behavior.

theorem ProbabilityTheory.strong_law_aux7 {Ω : Type u_1} [MeasureTheory.MeasureSpace Ω] [MeasureTheory.IsProbabilityMeasure MeasureTheory.volume] (X : ℕ → Ω → ℝ) (hint : MeasureTheory.Integrable (X 0) MeasureTheory.volume) (hindep : Pairwise (Function.onFun (fun (f g : Ω → ℝ) => IndepFun f g MeasureTheory.volume) X)) (hident : ∀ (i : ℕ), IdentDistrib (X i) (X 0) MeasureTheory.volume MeasureTheory.volume) (hnonneg : ∀ (i : ℕ) (ω : Ω), 0 ≤ X i ω) :

∀ᵐ (ω : Ω), Filter.Tendsto (fun (n : ℕ) => (∑ i ∈ Finset.range n, X i ω) / ↑n) Filter.atTop (nhds (∫ (a : Ω), X 0 a))

Xᵢ satisfies the strong law of large numbers along all integers. This follows from the corresponding fact along the sequences c^n, and the fact that any integer can be sandwiched between c^n and c^(n+1) with comparably small error if c is close enough to 1 (which is formalized in tendsto_div_of_monotone_of_tendsto_div_floor_pow).

theorem ProbabilityTheory.strong_law_ae_real {Ω : Type u_2} {m : MeasurableSpace Ω} {μ : MeasureTheory.Measure Ω} (X : ℕ → Ω → ℝ) (hint : MeasureTheory.Integrable (X 0) μ) (hindep : Pairwise (Function.onFun (fun (x1 x2 : Ω → ℝ) => IndepFun x1 x2 μ) X)) (hident : ∀ (i : ℕ), IdentDistrib (X i) (X 0) μ μ) :

∀ᵐ (ω : Ω) ∂μ, Filter.Tendsto (fun (n : ℕ) => (∑ i ∈ Finset.range n, X i ω) / ↑n) Filter.atTop (nhds (∫ (x : Ω), X 0 x ∂μ))

Strong law of large numbers, almost sure version: if X n is a sequence of independent identically distributed integrable real-valued random variables, then ∑ i ∈ range n, X i / n converges almost surely to 𝔼[X 0]. We give here the strong version, due to Etemadi, that only requires pairwise independence. Superseded by strong_law_ae, which works for random variables taking values in any Banach space.

theorem ProbabilityTheory.strong_law_ae_simpleFunc_comp {Ω : Type u_1} {mΩ : MeasurableSpace Ω} {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {E : Type u_2} [NormedAddCommGroup E] [NormedSpace ℝ E] [CompleteSpace E] [MeasurableSpace E] (X : ℕ → Ω → E) (h' : Measurable (X 0)) (hindep : Pairwise (Function.onFun (fun (x1 x2 : Ω → E) => IndepFun x1 x2 μ) X)) (hident : ∀ (i : ℕ), IdentDistrib (X i) (X 0) μ μ) (φ : MeasureTheory.SimpleFunc E E) :

∀ᵐ (ω : Ω) ∂μ, Filter.Tendsto (fun (n : ℕ) => (↑n)⁻¹ • ∑ i ∈ Finset.range n, φ (X i ω)) Filter.atTop (nhds (∫ (x : Ω), (⇑φ ∘ X 0) x ∂μ))

Preliminary lemma for the strong law of large numbers for vector-valued random variables: the composition of the random variables with a simple function satisfies the strong law of large numbers.

theorem ProbabilityTheory.strong_law_ae_of_measurable {Ω : Type u_1} {mΩ : MeasurableSpace Ω} {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {E : Type u_2} [NormedAddCommGroup E] [NormedSpace ℝ E] [CompleteSpace E] [MeasurableSpace E] [BorelSpace E] (X : ℕ → Ω → E) (hint : MeasureTheory.Integrable (X 0) μ) (h' : MeasureTheory.StronglyMeasurable (X 0)) (hindep : Pairwise (Function.onFun (fun (x1 x2 : Ω → E) => IndepFun x1 x2 μ) X)) (hident : ∀ (i : ℕ), IdentDistrib (X i) (X 0) μ μ) :

∀ᵐ (ω : Ω) ∂μ, Filter.Tendsto (fun (n : ℕ) => (↑n)⁻¹ • ∑ i ∈ Finset.range n, X i ω) Filter.atTop (nhds (∫ (x : Ω), X 0 x ∂μ))

Preliminary lemma for the strong law of large numbers for vector-valued random variables, assuming measurability in addition to integrability. This is weakened to ae measurability in the full version ProbabilityTheory.strong_law_ae.

theorem ProbabilityTheory.strong_law_ae {Ω : Type u_1} {mΩ : MeasurableSpace Ω} {μ : MeasureTheory.Measure Ω} {E : Type u_2} [NormedAddCommGroup E] [NormedSpace ℝ E] [CompleteSpace E] [MeasurableSpace E] [BorelSpace E] (X : ℕ → Ω → E) (hint : MeasureTheory.Integrable (X 0) μ) (hindep : Pairwise (Function.onFun (fun (x1 x2 : Ω → E) => IndepFun x1 x2 μ) X)) (hident : ∀ (i : ℕ), IdentDistrib (X i) (X 0) μ μ) :

∀ᵐ (ω : Ω) ∂μ, Filter.Tendsto (fun (n : ℕ) => (↑n)⁻¹ • ∑ i ∈ Finset.range n, X i ω) Filter.atTop (nhds (∫ (x : Ω), X 0 x ∂μ))

Strong law of large numbers, almost sure version: if X n is a sequence of independent identically distributed integrable random variables taking values in a Banach space, then n⁻¹ • ∑ i ∈ range n, X i converges almost surely to 𝔼[X 0]. We give here the strong version, due to Etemadi, that only requires pairwise independence.

theorem ProbabilityTheory.strong_law_Lp {Ω : Type u_1} {mΩ : MeasurableSpace Ω} {μ : MeasureTheory.Measure Ω} {E : Type u_2} [NormedAddCommGroup E] [NormedSpace ℝ E] [CompleteSpace E] [MeasurableSpace E] [BorelSpace E] {p : ENNReal} (hp : 1 ≤ p) (hp' : p ≠ ⊤) (X : ℕ → Ω → E) (hℒp : MeasureTheory.MemLp (X 0) p μ) (hindep : Pairwise (Function.onFun (fun (x1 x2 : Ω → E) => IndepFun x1 x2 μ) X)) (hident : ∀ (i : ℕ), IdentDistrib (X i) (X 0) μ μ) :

Filter.Tendsto (fun (n : ℕ) => MeasureTheory.eLpNorm (fun (ω : Ω) => (↑n)⁻¹ • ∑ i ∈ Finset.range n, X i ω - ∫ (x : Ω), X 0 x ∂μ) p μ) Filter.atTop (nhds 0)

Strong law of large numbers, Lᵖ version: if X n is a sequence of independent identically distributed random variables in Lᵖ, then n⁻¹ • ∑ i ∈ range n, X i converges in Lᵖ to 𝔼[X 0].