Zulip Chat Archive

Stream: Is there code for X?

Topic: Bayesian statistics


Willem vanhulle (Jun 30 2025 at 18:51):

Has anyone written code for doing Bayesian inference or statistics more easily? I have just finished a short example but wonder if others have tried before?

Matteo Cipollina (Jul 22 2025 at 18:01):

Willem vanhulle ha scritto:

Has anyone written code for doing Bayesian inference or statistics more easily? I have just finished a short example but wonder if others have tried before?

If it can be useful to you, I have made a prototype some time ago, tentatively modeled on the section in Roberts, Casella 'Monte Carlo Statistical Methods'. Have some work in progress prototypes for accept/reject and other initial concepts in the book.

import Mathlib.MeasureTheory.Measure.ProbabilityMeasure
import Mathlib.Probability.Kernel.RadonNikodym

/-!
# MCMC Methods

1.  `StatisticalModel`**: A statistical model `P(θ)` is defined as a `ProbabilityTheory.Kernel Θ α`.
    This is the natural object for parameterized measures and allows defining Markov chains
    via kernel composition (`k₁ ∘ₖ k₂`) and iteration (`k ^ n`).
2.  A `BayesianModel` bundles the statistical model (the kernel) with a prior
    measure, creating a self-contained object for Bayesian inference.
3.  We prove that the likelihood function is measurable under `PolishSpace` on the sample space.
-/

namespace MonteCarlo

open MeasureTheory ProbabilityTheory Filter
open scoped ENNReal Topology

-- Section: Bayesian Model

-- We use `PolishSpace` for the sample space `α` as it guarantees the existence of regular
-- conditional distributions and that key functions (like the Radon-Nikodym derivative) are measurable.
variable {α Θ : Type*} [MeasurableSpace Θ] [TopologicalSpace Θ] [BorelSpace Θ]
  { : MeasurableSpace α} [TopologicalSpace α] [PolishSpace α]
  [MeasurableSpace.CountableOrCountablyGenerated Θ α]

/-- A **statistical model** is a Markov kernel from the parameter space `Θ` to the
sample space `α`. This defines a family of probability measures `θ ↦ P(·|θ)`. -/
def StatisticalModel (α Θ : Type*) [MeasurableSpace Θ] [MeasurableSpace α] :=
  Kernel Θ α

/-- A **Bayesian model** combines a statistical model with a prior probability measure on the
parameter space. It represents the  setup for Bayesian inference before any data is observed.
We require the prior to be a probability measure for most applications. -/
structure BayesianModel (α Θ : Type*) [MeasurableSpace α] [MeasurableSpace Θ] where
  P : StatisticalModel α Θ
  prior : Measure Θ
  [is_prior_prob : IsProbabilityMeasure prior]

instance {model : BayesianModel α Θ} : IsProbabilityMeasure model.prior := model.is_prior_prob

-- Section: Likelihood and Marginal

variable {μ : Measure α} [SFinite μ]

/-- The **likelihood function** `L(θ|x)` for a given observation `x : α`.
This uses the kernel Radon-Nikodym derivative that gives joint measurability, which is
`μ`-a.e. equal to the pointwise `(P θ).rnDeriv μ x`. -/
noncomputable def likelihood (P : StatisticalModel α Θ) (μ : Measure α) (x : α) : Θ  0 :=
  fun θ  Kernel.rnDeriv P (Kernel.const Θ μ) θ x

/-- If the uncurried function `(a, b) ↦ f a b` is measurable, then for any fixed `b`,
the function `a ↦ f a b` is measurable. This allows deriving the measurability of
partially applied functions. -/
lemma measurable_of_uncurry_left {α β γ : Type*}
    { : MeasurableSpace α} { : MeasurableSpace β} { : MeasurableSpace γ}
    (f : α  β  γ) (hf : Measurable (Function.uncurry f)) (b : β) :
    Measurable (fun a => f a b) := by
  exact Measurable.of_uncurry_right hf

omit [TopologicalSpace Θ] [BorelSpace Θ] [TopologicalSpace α] [PolishSpace α] in
/-- The likelihood function, seen as a function of `θ`, is measurable. -/
lemma likelihood_measurable (P : StatisticalModel α Θ) (μ : Measure α) [SFinite μ] (x : α) :
    Measurable (likelihood P μ x) := by
  unfold likelihood
  let F_uncurried := Kernel.measurable_rnDeriv P (Kernel.const Θ μ)
  exact F_uncurried.comp (measurable_id.prodMk (measurable_const : Measurable (fun (_:Θ)  x)))

/-- The *unnormalized posterior measure*, `L(θ|x) dπ(θ)`. This measure is not necessarily finite. -/
noncomputable def unnormalizedPosterior (model : BayesianModel α Θ) (μ : Measure α) (x : α) : Measure Θ :=
  model.prior.withDensity (likelihood model.P μ x)

/-- The *marginal likelihood* (or evidence) `m(x)`, which is the normalizing constant of the posterior.
(RC Equation 1.3.1). It is the integral of the likelihood over the prior. -/
noncomputable def marginal (model : BayesianModel α Θ) (μ : Measure α) (x : α) : 0 :=
  ∫⁻ θ, likelihood model.P μ x θ model.prior

omit [TopologicalSpace α] [PolishSpace α] in
lemma lintegral_withDensity_eq_lintegral_mul {m : MeasurableSpace α} (μ : Measure α) {f g : α  0}
    (hf : Measurable f) (hg : AEMeasurable g μ) :
    ∫⁻ a, g a (μ.withDensity f) = ∫⁻ a, f a * g a μ := by
  have h_ac : μ.withDensity f  μ := by exact withDensity_absolutelyContinuous μ f
  have hg' : AEMeasurable g (μ.withDensity f) := AEMeasurable.mono_ac hg h_ac
  exact MeasureTheory.lintegral_withDensity_eq_lintegral_mul₀' hf.aemeasurable hg'

lemma Measure.withDensity_absolutelyContinuous {α : Type*} {m : MeasurableSpace α} (μ : Measure α) (f : α  0) :
    μ.withDensity f  μ := by
  refine Measure.AbsolutelyContinuous.mk fun s hs₁ hs₂ => ?_
  rw [withDensity_apply _ hs₁]
  exact setLIntegral_measure_zero s f hs₂

-- Section: Posterior Distribution and Estimators

lemma lintegral_univ {α : Type*} [MeasurableSpace α] (μ : Measure α) (f : α  0) :
    ∫⁻ x in Set.univ, f x μ = ∫⁻ x, f x μ := by
  rw [ lintegral_indicator, Set.indicator_univ]
  simp only [MeasurableSet.univ]

/-- The *posterior distribution* `π(θ|x)`, defined as a `ProbabilityMeasure`.
This definition returns `none` if the
normalizing constant (the marginal) is zero, infinite, or NaN. (RC Equation 1.3.1). -/
noncomputable def posterior (model : BayesianModel α Θ) (μ : Measure α) [SFinite μ] (x : α) :
    Option (ProbabilityMeasure Θ) :=
  let unnorm_post_measure := unnormalizedPosterior model μ x
  let marg_likelihood := marginal model μ x
  if h_marg : marg_likelihood  0  marg_likelihood   then
    let norm_post_measure := marg_likelihood⁻¹  unnorm_post_measure
    let h_is_prob : IsProbabilityMeasure norm_post_measure := by
      constructor
      rw [Measure.smul_apply]
      have h1 : unnorm_post_measure = unnormalizedPosterior model μ x := rfl
      rw [h1]
      have h2 : unnormalizedPosterior model μ x = model.prior.withDensity (likelihood model.P μ x) := rfl
      rw [h2]
      rw [withDensity_apply (likelihood model.P μ x) MeasurableSet.univ]
      rw [lintegral_univ]
      exact ENNReal.inv_mul_cancel h_marg.1 h_marg.2
    some norm_post_measure, h_is_prob
  else
    none

omit [TopologicalSpace Θ] [BorelSpace Θ] [TopologicalSpace α] [PolishSpace α] in
lemma posterior_eq_some_iff (model : BayesianModel α Θ) (μ : Measure α) [SFinite μ] (x : α)
    (p : ProbabilityMeasure Θ) :
    posterior model μ x = some p 
      (marginal model μ x).toReal  0 
      p.toMeasure = (marginal model μ x)⁻¹  (unnormalizedPosterior model μ x) := by
  let marg_likelihood := marginal model μ x
  let unnorm_post_measure := unnormalizedPosterior model μ x
  simp only [posterior, ENNReal.toReal_ne_zero]
  by_cases h_marg_nz_finite : marg_likelihood  0  marg_likelihood  
  · simp_all only [ne_eq, not_false_eq_true, and_self, reduceDIte, Option.some.injEq, true_and, marg_likelihood]
    obtain left, right := h_marg_nz_finite
    obtain left_1, right_1 := h_marg_nz_finite
    apply Iff.intro
    · intro a
      subst a
      simp_all only [ProbabilityMeasure.coe_mk]
    · intro a
      ext s a_1 : 1
      simp_all only [ProbabilityMeasure.coe_mk, Measure.smul_apply, smul_eq_mul]
  · simp_all only [ne_eq, not_and, Decidable.not_not, not_false_eq_true, ENNReal.top_ne_zero, ENNReal.inv_top,
      zero_smul, Option.dite_none_right_eq_some, Option.some.injEq, isEmpty_Prop, not_true_eq_false, implies_true,
      IsEmpty.exists_iff, false_iff, IsEmpty.forall_iff, marg_likelihood]

Matteo Cipollina (Jul 22 2025 at 18:01):

/-- A Bayes estimator for a function `h : Θ → E` under squared error loss is its posterior mean.
The function returns `none` if the posterior is not a well-defined probability measure
or if `h` is not integrable with respect to the posterior.
-/
noncomputable def bayesEstimator' {E : Type*}
  [NormedAddCommGroup E] [NormedSpace  E] [CompleteSpace E]
  (model : BayesianModel α Θ) (μ : Measure α) [SFinite μ] (x : α) (h : Θ  E)
  (h_integrable :  (p : ProbabilityMeasure Θ),
    (posterior model μ x) = some p  Integrable h p.toMeasure) :
  Option E :=
match h_posterior_eq_some : posterior model μ x with
| none => none
| some post =>
  have : Integrable h post.toMeasure := h_integrable post h_posterior_eq_some
  some (MeasureTheory.integral post.toMeasure h)

/-- A **Bayes estimator** for a function `h : Θ → E` under squared error loss is its posterior mean.
This version is more realistic as it derives integrability from a condition
on the prior and likelihood, which are known inputs. It returns `none` if the posterior is not
a well-defined probability measure. -/
noncomputable def bayesEstimator {E : Type*} [NormedAddCommGroup E] [NormedSpace  E] [CompleteSpace E]
    (model : BayesianModel α Θ) (μ : Measure α) [SFinite μ] (x : α) (h : Θ  E)
    (h_aestrongly_measurable : AEStronglyMeasurable h model.prior)
    (h_fin_int : HasFiniteIntegral (fun θ  ENNReal.ofReal h θ * likelihood model.P μ x θ) model.prior) :
    Option E :=
  match h_posterior_eq_some : posterior model μ x with
  | none => none
  | some post =>
    have h_integrable_post : Integrable h (post : Measure Θ) := by
      let marg := marginal model μ x
      have h_marg_props : marg  0  marg   := by
        rw [posterior] at h_posterior_eq_some
        split_ifs at h_posterior_eq_some with h_marg_cond
        exact h_marg_cond
      constructor
      · let unnorm_post := unnormalizedPosterior model μ x
        have h_ac_post_unnorm : (post : Measure Θ)  unnorm_post := by
          rw [((posterior_eq_some_iff _ _ _ _).mp h_posterior_eq_some).2]
          exact Measure.smul_absolutelyContinuous
        have h_ac_unnorm_prior : unnorm_post  model.prior := by
          dsimp [unnormalizedPosterior]
          exact Measure.withDensity_absolutelyContinuous model.prior (likelihood model.P μ x)
        exact h_aestrongly_measurable.mono_ac (Measure.AbsolutelyContinuous.trans h_ac_post_unnorm h_ac_unnorm_prior)
      · have h_post_eq := ((posterior_eq_some_iff _ _ _ _).mp h_posterior_eq_some).2
        rw [h_post_eq]
        have h_marg_inv_fin : (marginal model μ x)⁻¹   := by
          exact ENNReal.inv_ne_top.mpr h_marg_props.1
        refine HasFiniteIntegral.smul_measure ?_ h_marg_inv_fin
        dsimp [unnormalizedPosterior, HasFiniteIntegral]
        conv_lhs =>
          rw [lintegral_withDensity_eq_lintegral_mul _ (likelihood_measurable model.P μ x) h_aestrongly_measurable.enorm]
        rw [lintegral_congr_ae (ae_of_all _ (fun θ  by rw [mul_comm];  ))]
        simp_all only [ofReal_norm, ne_eq, ENNReal.inv_eq_top, not_false_eq_true, marg]
        obtain left, right := h_marg_props
        exact h_fin_int
    some (MeasureTheory.integral (post : Measure Θ) h)

end MonteCarlo

Rémy Degenne (Jul 22 2025 at 18:25):

We also have a posterior kernel in Mathlib: ProbabilityTheory.posterior

Willem vanhulle (Jul 23 2025 at 19:13):

Matteo Cipollina zei:

Willem vanhulle ha scritto:

Has anyone written code for doing Bayesian inference or statistics more easily? I have just finished a short example but wonder if others have tried before?

If it can be useful to you, I have made a prototype some time ago, tentatively modeled on the section in Roberts, Casella 'Monte Carlo Statistical Methods'. Have some work in progress prototypes for accept/reject and other initial concepts in the book.

import Mathlib.MeasureTheory.Measure.ProbabilityMeasure
import Mathlib.Probability.Kernel.RadonNikodym

/-!
# MCMC Methods

1.  `StatisticalModel`**: A statistical model `P(θ)` is defined as a `ProbabilityTheory.Kernel Θ α`.
    This is the natural object for parameterized measures and allows defining Markov chains
    via kernel composition (`k₁ ∘ₖ k₂`) and iteration (`k ^ n`).
2.  A `BayesianModel` bundles the statistical model (the kernel) with a prior
    measure, creating a self-contained object for Bayesian inference.
3.  We prove that the likelihood function is measurable under `PolishSpace` on the sample space.
-/

namespace MonteCarlo

open MeasureTheory ProbabilityTheory Filter
open scoped ENNReal Topology

-- Section: Bayesian Model

-- We use `PolishSpace` for the sample space `α` as it guarantees the existence of regular
-- conditional distributions and that key functions (like the Radon-Nikodym derivative) are measurable.
variable {α Θ : Type*} [MeasurableSpace Θ] [TopologicalSpace Θ] [BorelSpace Θ]
  { : MeasurableSpace α} [TopologicalSpace α] [PolishSpace α]
  [MeasurableSpace.CountableOrCountablyGenerated Θ α]

/-- A **statistical model** is a Markov kernel from the parameter space `Θ` to the
sample space `α`. This defines a family of probability measures `θ ↦ P(·|θ)`. -/
def StatisticalModel (α Θ : Type*) [MeasurableSpace Θ] [MeasurableSpace α] :=
  Kernel Θ α

/-- A **Bayesian model** combines a statistical model with a prior probability measure on the
parameter space. It represents the  setup for Bayesian inference before any data is observed.
We require the prior to be a probability measure for most applications. -/
structure BayesianModel (α Θ : Type*) [MeasurableSpace α] [MeasurableSpace Θ] where
  P : StatisticalModel α Θ
  prior : Measure Θ
  [is_prior_prob : IsProbabilityMeasure prior]

instance {model : BayesianModel α Θ} : IsProbabilityMeasure model.prior := model.is_prior_prob

-- Section: Likelihood and Marginal

variable {μ : Measure α} [SFinite μ]

/-- The **likelihood function** `L(θ|x)` for a given observation `x : α`.
This uses the kernel Radon-Nikodym derivative that gives joint measurability, which is
`μ`-a.e. equal to the pointwise `(P θ).rnDeriv μ x`. -/
noncomputable def likelihood (P : StatisticalModel α Θ) (μ : Measure α) (x : α) : Θ  0 :=
  fun θ  Kernel.rnDeriv P (Kernel.const Θ μ) θ x

/-- If the uncurried function `(a, b) ↦ f a b` is measurable, then for any fixed `b`,
the function `a ↦ f a b` is measurable. This allows deriving the measurability of
partially applied functions. -/
lemma measurable_of_uncurry_left {α β γ : Type*}
    { : MeasurableSpace α} { : MeasurableSpace β} { : MeasurableSpace γ}
    (f : α  β  γ) (hf : Measurable (Function.uncurry f)) (b : β) :
    Measurable (fun a => f a b) := by
  exact Measurable.of_uncurry_right hf

omit [TopologicalSpace Θ] [BorelSpace Θ] [TopologicalSpace α] [PolishSpace α] in
/-- The likelihood function, seen as a function of `θ`, is measurable. -/
lemma likelihood_measurable (P : StatisticalModel α Θ) (μ : Measure α) [SFinite μ] (x : α) :
    Measurable (likelihood P μ x) := by
  unfold likelihood
  let F_uncurried := Kernel.measurable_rnDeriv P (Kernel.const Θ μ)
  exact F_uncurried.comp (measurable_id.prodMk (measurable_const : Measurable (fun (_:Θ)  x)))

/-- The *unnormalized posterior measure*, `L(θ|x) dπ(θ)`. This measure is not necessarily finite. -/
noncomputable def unnormalizedPosterior (model : BayesianModel α Θ) (μ : Measure α) (x : α) : Measure Θ :=
  model.prior.withDensity (likelihood model.P μ x)

/-- The *marginal likelihood* (or evidence) `m(x)`, which is the normalizing constant of the posterior.
(RC Equation 1.3.1). It is the integral of the likelihood over the prior. -/
noncomputable def marginal (model : BayesianModel α Θ) (μ : Measure α) (x : α) : 0 :=
  ∫⁻ θ, likelihood model.P μ x θ model.prior

omit [TopologicalSpace α] [PolishSpace α] in
lemma lintegral_withDensity_eq_lintegral_mul {m : MeasurableSpace α} (μ : Measure α) {f g : α  0}
    (hf : Measurable f) (hg : AEMeasurable g μ) :
    ∫⁻ a, g a (μ.withDensity f) = ∫⁻ a, f a * g a μ := by
  have h_ac : μ.withDensity f  μ := by exact withDensity_absolutelyContinuous μ f
  have hg' : AEMeasurable g (μ.withDensity f) := AEMeasurable.mono_ac hg h_ac
  exact MeasureTheory.lintegral_withDensity_eq_lintegral_mul₀' hf.aemeasurable hg'

lemma Measure.withDensity_absolutelyContinuous {α : Type*} {m : MeasurableSpace α} (μ : Measure α) (f : α  0) :
    μ.withDensity f  μ := by
  refine Measure.AbsolutelyContinuous.mk fun s hs₁ hs₂ => ?_
  rw [withDensity_apply _ hs₁]
  exact setLIntegral_measure_zero s f hs₂

-- Section: Posterior Distribution and Estimators

lemma lintegral_univ {α : Type*} [MeasurableSpace α] (μ : Measure α) (f : α  0) :
    ∫⁻ x in Set.univ, f x μ = ∫⁻ x, f x μ := by
  rw [ lintegral_indicator, Set.indicator_univ]
  simp only [MeasurableSet.univ]

/-- The *posterior distribution* `π(θ|x)`, defined as a `ProbabilityMeasure`.
This definition returns `none` if the
normalizing constant (the marginal) is zero, infinite, or NaN. (RC Equation 1.3.1). -/
noncomputable def posterior (model : BayesianModel α Θ) (μ : Measure α) [SFinite μ] (x : α) :
    Option (ProbabilityMeasure Θ) :=
  let unnorm_post_measure := unnormalizedPosterior model μ x
  let marg_likelihood := marginal model μ x
  if h_marg : marg_likelihood  0  marg_likelihood   then
    let norm_post_measure := marg_likelihood⁻¹  unnorm_post_measure
    let h_is_prob : IsProbabilityMeasure norm_post_measure := by
      constructor
      rw [Measure.smul_apply]
      have h1 : unnorm_post_measure = unnormalizedPosterior model μ x := rfl
      rw [h1]
      have h2 : unnormalizedPosterior model μ x = model.prior.withDensity (likelihood model.P μ x) := rfl
      rw [h2]
      rw [withDensity_apply (likelihood model.P μ x) MeasurableSet.univ]
      rw [lintegral_univ]
      exact ENNReal.inv_mul_cancel h_marg.1 h_marg.2
    some norm_post_measure, h_is_prob
  else
    none

omit [TopologicalSpace Θ] [BorelSpace Θ] [TopologicalSpace α] [PolishSpace α] in
lemma posterior_eq_some_iff (model : BayesianModel α Θ) (μ : Measure α) [SFinite μ] (x : α)
    (p : ProbabilityMeasure Θ) :
    posterior model μ x = some p 
      (marginal model μ x).toReal  0 
      p.toMeasure = (marginal model μ x)⁻¹  (unnormalizedPosterior model μ x) := by
  let marg_likelihood := marginal model μ x
  let unnorm_post_measure := unnormalizedPosterior model μ x
  simp only [posterior, ENNReal.toReal_ne_zero]
  by_cases h_marg_nz_finite : marg_likelihood  0  marg_likelihood  
  · simp_all only [ne_eq, not_false_eq_true, and_self, reduceDIte, Option.some.injEq, true_and, marg_likelihood]
    obtain left, right := h_marg_nz_finite
    obtain left_1, right_1 := h_marg_nz_finite
    apply Iff.intro
    · intro a
      subst a
      simp_all only [ProbabilityMeasure.coe_mk]
    · intro a
      ext s a_1 : 1
      simp_all only [ProbabilityMeasure.coe_mk, Measure.smul_apply, smul_eq_mul]
  · simp_all only [ne_eq, not_and, Decidable.not_not, not_false_eq_true, ENNReal.top_ne_zero, ENNReal.inv_top,
      zero_smul, Option.dite_none_right_eq_some, Option.some.injEq, isEmpty_Prop, not_true_eq_false, implies_true,
      IsEmpty.exists_iff, false_iff, IsEmpty.forall_iff, marg_likelihood]

Great thanks! I will have a look at it. Have you read this master thesis?
https://escholarship.org/uc/item/8hb1w6js
Formalizing the Beginnings of Bayesian Probability Theory in the Lean Theorem Prover

It seems to talk about foundations but not that many concrete examples of inference.


Last updated: Dec 20 2025 at 21:32 UTC