Zulip Chat Archive
Stream: Is there code for X?
Topic: Bayesian statistics
Willem vanhulle (Jun 30 2025 at 18:51):
Has anyone written code for doing Bayesian inference or statistics more easily? I have just finished a short example but wonder if others have tried before?
Matteo Cipollina (Jul 22 2025 at 18:01):
Willem vanhulle ha scritto:
Has anyone written code for doing Bayesian inference or statistics more easily? I have just finished a short example but wonder if others have tried before?
If it can be useful to you, I have made a prototype some time ago, tentatively modeled on the section in Roberts, Casella 'Monte Carlo Statistical Methods'. Have some work in progress prototypes for accept/reject and other initial concepts in the book.
import Mathlib.MeasureTheory.Measure.ProbabilityMeasure
import Mathlib.Probability.Kernel.RadonNikodym
/-!
# MCMC Methods
1. `StatisticalModel`**: A statistical model `P(θ)` is defined as a `ProbabilityTheory.Kernel Θ α`.
This is the natural object for parameterized measures and allows defining Markov chains
via kernel composition (`k₁ ∘ₖ k₂`) and iteration (`k ^ n`).
2. A `BayesianModel` bundles the statistical model (the kernel) with a prior
measure, creating a self-contained object for Bayesian inference.
3. We prove that the likelihood function is measurable under `PolishSpace` on the sample space.
-/
namespace MonteCarlo
open MeasureTheory ProbabilityTheory Filter
open scoped ENNReal Topology
-- Section: Bayesian Model
-- We use `PolishSpace` for the sample space `α` as it guarantees the existence of regular
-- conditional distributions and that key functions (like the Radon-Nikodym derivative) are measurable.
variable {α Θ : Type*} [MeasurableSpace Θ] [TopologicalSpace Θ] [BorelSpace Θ]
{mα : MeasurableSpace α} [TopologicalSpace α] [PolishSpace α]
[MeasurableSpace.CountableOrCountablyGenerated Θ α]
/-- A **statistical model** is a Markov kernel from the parameter space `Θ` to the
sample space `α`. This defines a family of probability measures `θ ↦ P(·|θ)`. -/
def StatisticalModel (α Θ : Type*) [MeasurableSpace Θ] [MeasurableSpace α] :=
Kernel Θ α
/-- A **Bayesian model** combines a statistical model with a prior probability measure on the
parameter space. It represents the setup for Bayesian inference before any data is observed.
We require the prior to be a probability measure for most applications. -/
structure BayesianModel (α Θ : Type*) [MeasurableSpace α] [MeasurableSpace Θ] where
P : StatisticalModel α Θ
prior : Measure Θ
[is_prior_prob : IsProbabilityMeasure prior]
instance {model : BayesianModel α Θ} : IsProbabilityMeasure model.prior := model.is_prior_prob
-- Section: Likelihood and Marginal
variable {μ : Measure α} [SFinite μ]
/-- The **likelihood function** `L(θ|x)` for a given observation `x : α`.
This uses the kernel Radon-Nikodym derivative that gives joint measurability, which is
`μ`-a.e. equal to the pointwise `(P θ).rnDeriv μ x`. -/
noncomputable def likelihood (P : StatisticalModel α Θ) (μ : Measure α) (x : α) : Θ → ℝ≥0∞ :=
fun θ ↦ Kernel.rnDeriv P (Kernel.const Θ μ) θ x
/-- If the uncurried function `(a, b) ↦ f a b` is measurable, then for any fixed `b`,
the function `a ↦ f a b` is measurable. This allows deriving the measurability of
partially applied functions. -/
lemma measurable_of_uncurry_left {α β γ : Type*}
{mα : MeasurableSpace α} {mβ : MeasurableSpace β} {mγ : MeasurableSpace γ}
(f : α → β → γ) (hf : Measurable (Function.uncurry f)) (b : β) :
Measurable (fun a => f a b) := by
exact Measurable.of_uncurry_right hf
omit [TopologicalSpace Θ] [BorelSpace Θ] [TopologicalSpace α] [PolishSpace α] in
/-- The likelihood function, seen as a function of `θ`, is measurable. -/
lemma likelihood_measurable (P : StatisticalModel α Θ) (μ : Measure α) [SFinite μ] (x : α) :
Measurable (likelihood P μ x) := by
unfold likelihood
let F_uncurried := Kernel.measurable_rnDeriv P (Kernel.const Θ μ)
exact F_uncurried.comp (measurable_id.prodMk (measurable_const : Measurable (fun (_:Θ) ↦ x)))
/-- The *unnormalized posterior measure*, `L(θ|x) dπ(θ)`. This measure is not necessarily finite. -/
noncomputable def unnormalizedPosterior (model : BayesianModel α Θ) (μ : Measure α) (x : α) : Measure Θ :=
model.prior.withDensity (likelihood model.P μ x)
/-- The *marginal likelihood* (or evidence) `m(x)`, which is the normalizing constant of the posterior.
(RC Equation 1.3.1). It is the integral of the likelihood over the prior. -/
noncomputable def marginal (model : BayesianModel α Θ) (μ : Measure α) (x : α) : ℝ≥0∞ :=
∫⁻ θ, likelihood model.P μ x θ ∂model.prior
omit [TopologicalSpace α] [PolishSpace α] in
lemma lintegral_withDensity_eq_lintegral_mul {m : MeasurableSpace α} (μ : Measure α) {f g : α → ℝ≥0∞}
(hf : Measurable f) (hg : AEMeasurable g μ) :
∫⁻ a, g a ∂(μ.withDensity f) = ∫⁻ a, f a * g a ∂μ := by
have h_ac : μ.withDensity f ≪ μ := by exact withDensity_absolutelyContinuous μ f
have hg' : AEMeasurable g (μ.withDensity f) := AEMeasurable.mono_ac hg h_ac
exact MeasureTheory.lintegral_withDensity_eq_lintegral_mul₀' hf.aemeasurable hg'
lemma Measure.withDensity_absolutelyContinuous {α : Type*} {m : MeasurableSpace α} (μ : Measure α) (f : α → ℝ≥0∞) :
μ.withDensity f ≪ μ := by
refine Measure.AbsolutelyContinuous.mk fun s hs₁ hs₂ => ?_
rw [withDensity_apply _ hs₁]
exact setLIntegral_measure_zero s f hs₂
-- Section: Posterior Distribution and Estimators
lemma lintegral_univ {α : Type*} [MeasurableSpace α] (μ : Measure α) (f : α → ℝ≥0∞) :
∫⁻ x in Set.univ, f x ∂μ = ∫⁻ x, f x ∂μ := by
rw [← lintegral_indicator, Set.indicator_univ]
simp only [MeasurableSet.univ]
/-- The *posterior distribution* `π(θ|x)`, defined as a `ProbabilityMeasure`.
This definition returns `none` if the
normalizing constant (the marginal) is zero, infinite, or NaN. (RC Equation 1.3.1). -/
noncomputable def posterior (model : BayesianModel α Θ) (μ : Measure α) [SFinite μ] (x : α) :
Option (ProbabilityMeasure Θ) :=
let unnorm_post_measure := unnormalizedPosterior model μ x
let marg_likelihood := marginal model μ x
if h_marg : marg_likelihood ≠ 0 ∧ marg_likelihood ≠ ∞ then
let norm_post_measure := marg_likelihood⁻¹ • unnorm_post_measure
let h_is_prob : IsProbabilityMeasure norm_post_measure := by
constructor
rw [Measure.smul_apply]
have h1 : unnorm_post_measure = unnormalizedPosterior model μ x := rfl
rw [h1]
have h2 : unnormalizedPosterior model μ x = model.prior.withDensity (likelihood model.P μ x) := rfl
rw [h2]
rw [withDensity_apply (likelihood model.P μ x) MeasurableSet.univ]
rw [lintegral_univ]
exact ENNReal.inv_mul_cancel h_marg.1 h_marg.2
some ⟨norm_post_measure, h_is_prob⟩
else
none
omit [TopologicalSpace Θ] [BorelSpace Θ] [TopologicalSpace α] [PolishSpace α] in
lemma posterior_eq_some_iff (model : BayesianModel α Θ) (μ : Measure α) [SFinite μ] (x : α)
(p : ProbabilityMeasure Θ) :
posterior model μ x = some p ↔
(marginal model μ x).toReal ≠ 0 ∧
p.toMeasure = (marginal model μ x)⁻¹ • (unnormalizedPosterior model μ x) := by
let marg_likelihood := marginal model μ x
let unnorm_post_measure := unnormalizedPosterior model μ x
simp only [posterior, ENNReal.toReal_ne_zero]
by_cases h_marg_nz_finite : marg_likelihood ≠ 0 ∧ marg_likelihood ≠ ∞
· simp_all only [ne_eq, not_false_eq_true, and_self, ↓reduceDIte, Option.some.injEq, true_and, marg_likelihood]
obtain ⟨left, right⟩ := h_marg_nz_finite
obtain ⟨left_1, right_1⟩ := h_marg_nz_finite
apply Iff.intro
· intro a
subst a
simp_all only [ProbabilityMeasure.coe_mk]
· intro a
ext s a_1 : 1
simp_all only [ProbabilityMeasure.coe_mk, Measure.smul_apply, smul_eq_mul]
· simp_all only [ne_eq, not_and, Decidable.not_not, not_false_eq_true, ENNReal.top_ne_zero, ENNReal.inv_top,
zero_smul, Option.dite_none_right_eq_some, Option.some.injEq, isEmpty_Prop, not_true_eq_false, implies_true,
IsEmpty.exists_iff, false_iff, IsEmpty.forall_iff, marg_likelihood]
Matteo Cipollina (Jul 22 2025 at 18:01):
/-- A Bayes estimator for a function `h : Θ → E` under squared error loss is its posterior mean.
The function returns `none` if the posterior is not a well-defined probability measure
or if `h` is not integrable with respect to the posterior.
-/
noncomputable def bayesEstimator' {E : Type*}
[NormedAddCommGroup E] [NormedSpace ℝ E] [CompleteSpace E]
(model : BayesianModel α Θ) (μ : Measure α) [SFinite μ] (x : α) (h : Θ → E)
(h_integrable : ∀ (p : ProbabilityMeasure Θ),
(posterior model μ x) = some p → Integrable h p.toMeasure) :
Option E :=
match h_posterior_eq_some : posterior model μ x with
| none => none
| some post =>
have : Integrable h post.toMeasure := h_integrable post h_posterior_eq_some
some (MeasureTheory.integral post.toMeasure h)
/-- A **Bayes estimator** for a function `h : Θ → E` under squared error loss is its posterior mean.
This version is more realistic as it derives integrability from a condition
on the prior and likelihood, which are known inputs. It returns `none` if the posterior is not
a well-defined probability measure. -/
noncomputable def bayesEstimator {E : Type*} [NormedAddCommGroup E] [NormedSpace ℝ E] [CompleteSpace E]
(model : BayesianModel α Θ) (μ : Measure α) [SFinite μ] (x : α) (h : Θ → E)
(h_aestrongly_measurable : AEStronglyMeasurable h model.prior)
(h_fin_int : HasFiniteIntegral (fun θ ↦ ENNReal.ofReal ‖h θ‖ * likelihood model.P μ x θ) model.prior) :
Option E :=
match h_posterior_eq_some : posterior model μ x with
| none => none
| some post =>
have h_integrable_post : Integrable h (post : Measure Θ) := by
let marg := marginal model μ x
have h_marg_props : marg ≠ 0 ∧ marg ≠ ∞ := by
rw [posterior] at h_posterior_eq_some
split_ifs at h_posterior_eq_some with h_marg_cond
exact h_marg_cond
constructor
· let unnorm_post := unnormalizedPosterior model μ x
have h_ac_post_unnorm : (post : Measure Θ) ≪ unnorm_post := by
rw [((posterior_eq_some_iff _ _ _ _).mp h_posterior_eq_some).2]
exact Measure.smul_absolutelyContinuous
have h_ac_unnorm_prior : unnorm_post ≪ model.prior := by
dsimp [unnormalizedPosterior]
exact Measure.withDensity_absolutelyContinuous model.prior (likelihood model.P μ x)
exact h_aestrongly_measurable.mono_ac (Measure.AbsolutelyContinuous.trans h_ac_post_unnorm h_ac_unnorm_prior)
· have h_post_eq := ((posterior_eq_some_iff _ _ _ _).mp h_posterior_eq_some).2
rw [h_post_eq]
have h_marg_inv_fin : (marginal model μ x)⁻¹ ≠ ∞ := by
exact ENNReal.inv_ne_top.mpr h_marg_props.1
refine HasFiniteIntegral.smul_measure ?_ h_marg_inv_fin
dsimp [unnormalizedPosterior, HasFiniteIntegral]
conv_lhs =>
rw [lintegral_withDensity_eq_lintegral_mul _ (likelihood_measurable model.P μ x) h_aestrongly_measurable.enorm]
rw [lintegral_congr_ae (ae_of_all _ (fun θ ↦ by rw [mul_comm]; ))]
simp_all only [ofReal_norm, ne_eq, ENNReal.inv_eq_top, not_false_eq_true, marg]
obtain ⟨left, right⟩ := h_marg_props
exact h_fin_int
some (MeasureTheory.integral (post : Measure Θ) h)
end MonteCarlo
Rémy Degenne (Jul 22 2025 at 18:25):
We also have a posterior kernel in Mathlib: ProbabilityTheory.posterior
Willem vanhulle (Jul 23 2025 at 19:13):
Matteo Cipollina zei:
Willem vanhulle ha scritto:
Has anyone written code for doing Bayesian inference or statistics more easily? I have just finished a short example but wonder if others have tried before?
If it can be useful to you, I have made a prototype some time ago, tentatively modeled on the section in Roberts, Casella 'Monte Carlo Statistical Methods'. Have some work in progress prototypes for accept/reject and other initial concepts in the book.
import Mathlib.MeasureTheory.Measure.ProbabilityMeasure import Mathlib.Probability.Kernel.RadonNikodym /-! # MCMC Methods 1. `StatisticalModel`**: A statistical model `P(θ)` is defined as a `ProbabilityTheory.Kernel Θ α`. This is the natural object for parameterized measures and allows defining Markov chains via kernel composition (`k₁ ∘ₖ k₂`) and iteration (`k ^ n`). 2. A `BayesianModel` bundles the statistical model (the kernel) with a prior measure, creating a self-contained object for Bayesian inference. 3. We prove that the likelihood function is measurable under `PolishSpace` on the sample space. -/ namespace MonteCarlo open MeasureTheory ProbabilityTheory Filter open scoped ENNReal Topology -- Section: Bayesian Model -- We use `PolishSpace` for the sample space `α` as it guarantees the existence of regular -- conditional distributions and that key functions (like the Radon-Nikodym derivative) are measurable. variable {α Θ : Type*} [MeasurableSpace Θ] [TopologicalSpace Θ] [BorelSpace Θ] {mα : MeasurableSpace α} [TopologicalSpace α] [PolishSpace α] [MeasurableSpace.CountableOrCountablyGenerated Θ α] /-- A **statistical model** is a Markov kernel from the parameter space `Θ` to the sample space `α`. This defines a family of probability measures `θ ↦ P(·|θ)`. -/ def StatisticalModel (α Θ : Type*) [MeasurableSpace Θ] [MeasurableSpace α] := Kernel Θ α /-- A **Bayesian model** combines a statistical model with a prior probability measure on the parameter space. It represents the setup for Bayesian inference before any data is observed. We require the prior to be a probability measure for most applications. -/ structure BayesianModel (α Θ : Type*) [MeasurableSpace α] [MeasurableSpace Θ] where P : StatisticalModel α Θ prior : Measure Θ [is_prior_prob : IsProbabilityMeasure prior] instance {model : BayesianModel α Θ} : IsProbabilityMeasure model.prior := model.is_prior_prob -- Section: Likelihood and Marginal variable {μ : Measure α} [SFinite μ] /-- The **likelihood function** `L(θ|x)` for a given observation `x : α`. This uses the kernel Radon-Nikodym derivative that gives joint measurability, which is `μ`-a.e. equal to the pointwise `(P θ).rnDeriv μ x`. -/ noncomputable def likelihood (P : StatisticalModel α Θ) (μ : Measure α) (x : α) : Θ → ℝ≥0∞ := fun θ ↦ Kernel.rnDeriv P (Kernel.const Θ μ) θ x /-- If the uncurried function `(a, b) ↦ f a b` is measurable, then for any fixed `b`, the function `a ↦ f a b` is measurable. This allows deriving the measurability of partially applied functions. -/ lemma measurable_of_uncurry_left {α β γ : Type*} {mα : MeasurableSpace α} {mβ : MeasurableSpace β} {mγ : MeasurableSpace γ} (f : α → β → γ) (hf : Measurable (Function.uncurry f)) (b : β) : Measurable (fun a => f a b) := by exact Measurable.of_uncurry_right hf omit [TopologicalSpace Θ] [BorelSpace Θ] [TopologicalSpace α] [PolishSpace α] in /-- The likelihood function, seen as a function of `θ`, is measurable. -/ lemma likelihood_measurable (P : StatisticalModel α Θ) (μ : Measure α) [SFinite μ] (x : α) : Measurable (likelihood P μ x) := by unfold likelihood let F_uncurried := Kernel.measurable_rnDeriv P (Kernel.const Θ μ) exact F_uncurried.comp (measurable_id.prodMk (measurable_const : Measurable (fun (_:Θ) ↦ x))) /-- The *unnormalized posterior measure*, `L(θ|x) dπ(θ)`. This measure is not necessarily finite. -/ noncomputable def unnormalizedPosterior (model : BayesianModel α Θ) (μ : Measure α) (x : α) : Measure Θ := model.prior.withDensity (likelihood model.P μ x) /-- The *marginal likelihood* (or evidence) `m(x)`, which is the normalizing constant of the posterior. (RC Equation 1.3.1). It is the integral of the likelihood over the prior. -/ noncomputable def marginal (model : BayesianModel α Θ) (μ : Measure α) (x : α) : ℝ≥0∞ := ∫⁻ θ, likelihood model.P μ x θ ∂model.prior omit [TopologicalSpace α] [PolishSpace α] in lemma lintegral_withDensity_eq_lintegral_mul {m : MeasurableSpace α} (μ : Measure α) {f g : α → ℝ≥0∞} (hf : Measurable f) (hg : AEMeasurable g μ) : ∫⁻ a, g a ∂(μ.withDensity f) = ∫⁻ a, f a * g a ∂μ := by have h_ac : μ.withDensity f ≪ μ := by exact withDensity_absolutelyContinuous μ f have hg' : AEMeasurable g (μ.withDensity f) := AEMeasurable.mono_ac hg h_ac exact MeasureTheory.lintegral_withDensity_eq_lintegral_mul₀' hf.aemeasurable hg' lemma Measure.withDensity_absolutelyContinuous {α : Type*} {m : MeasurableSpace α} (μ : Measure α) (f : α → ℝ≥0∞) : μ.withDensity f ≪ μ := by refine Measure.AbsolutelyContinuous.mk fun s hs₁ hs₂ => ?_ rw [withDensity_apply _ hs₁] exact setLIntegral_measure_zero s f hs₂ -- Section: Posterior Distribution and Estimators lemma lintegral_univ {α : Type*} [MeasurableSpace α] (μ : Measure α) (f : α → ℝ≥0∞) : ∫⁻ x in Set.univ, f x ∂μ = ∫⁻ x, f x ∂μ := by rw [← lintegral_indicator, Set.indicator_univ] simp only [MeasurableSet.univ] /-- The *posterior distribution* `π(θ|x)`, defined as a `ProbabilityMeasure`. This definition returns `none` if the normalizing constant (the marginal) is zero, infinite, or NaN. (RC Equation 1.3.1). -/ noncomputable def posterior (model : BayesianModel α Θ) (μ : Measure α) [SFinite μ] (x : α) : Option (ProbabilityMeasure Θ) := let unnorm_post_measure := unnormalizedPosterior model μ x let marg_likelihood := marginal model μ x if h_marg : marg_likelihood ≠ 0 ∧ marg_likelihood ≠ ∞ then let norm_post_measure := marg_likelihood⁻¹ • unnorm_post_measure let h_is_prob : IsProbabilityMeasure norm_post_measure := by constructor rw [Measure.smul_apply] have h1 : unnorm_post_measure = unnormalizedPosterior model μ x := rfl rw [h1] have h2 : unnormalizedPosterior model μ x = model.prior.withDensity (likelihood model.P μ x) := rfl rw [h2] rw [withDensity_apply (likelihood model.P μ x) MeasurableSet.univ] rw [lintegral_univ] exact ENNReal.inv_mul_cancel h_marg.1 h_marg.2 some ⟨norm_post_measure, h_is_prob⟩ else none omit [TopologicalSpace Θ] [BorelSpace Θ] [TopologicalSpace α] [PolishSpace α] in lemma posterior_eq_some_iff (model : BayesianModel α Θ) (μ : Measure α) [SFinite μ] (x : α) (p : ProbabilityMeasure Θ) : posterior model μ x = some p ↔ (marginal model μ x).toReal ≠ 0 ∧ p.toMeasure = (marginal model μ x)⁻¹ • (unnormalizedPosterior model μ x) := by let marg_likelihood := marginal model μ x let unnorm_post_measure := unnormalizedPosterior model μ x simp only [posterior, ENNReal.toReal_ne_zero] by_cases h_marg_nz_finite : marg_likelihood ≠ 0 ∧ marg_likelihood ≠ ∞ · simp_all only [ne_eq, not_false_eq_true, and_self, ↓reduceDIte, Option.some.injEq, true_and, marg_likelihood] obtain ⟨left, right⟩ := h_marg_nz_finite obtain ⟨left_1, right_1⟩ := h_marg_nz_finite apply Iff.intro · intro a subst a simp_all only [ProbabilityMeasure.coe_mk] · intro a ext s a_1 : 1 simp_all only [ProbabilityMeasure.coe_mk, Measure.smul_apply, smul_eq_mul] · simp_all only [ne_eq, not_and, Decidable.not_not, not_false_eq_true, ENNReal.top_ne_zero, ENNReal.inv_top, zero_smul, Option.dite_none_right_eq_some, Option.some.injEq, isEmpty_Prop, not_true_eq_false, implies_true, IsEmpty.exists_iff, false_iff, IsEmpty.forall_iff, marg_likelihood]
Great thanks! I will have a look at it. Have you read this master thesis?
https://escholarship.org/uc/item/8hb1w6js
Formalizing the Beginnings of Bayesian Probability Theory in the Lean Theorem Prover
It seems to talk about foundations but not that many concrete examples of inference.
Last updated: Dec 20 2025 at 21:32 UTC