Zulip Chat Archive

Stream: mathlib4

Topic: Discrete Probability Proposal

David Ledvinka (Sep 26 2025 at 22:50):

I have been thinking lately about how to formulate discrete probability in mathlib so that working with concrete distributions (with parameters) is not too painful yet without sacrificing generality or not fitting into the broader probability theory API. After trying a few approaches I have found that linear combination of dirac distributions was the nicest representation to work with. Thus id propose the following format for concrete discrete distributions in mathlib (either defined in this way or with lemmas to give this representation if possible). This would replace the current PMF approach :

Say we want to define discrete distribution X supported on $\{1,\ldots,n\}$ as a distribution on Fin n and ℝ. (The approach works more generally but this is just easier for explanation).

We have a "PMF" : X_PMF_Real : Fin n → ℝ
We then define the "PMF" on ENNReal by composing the Real PMF with ENNReal.ofReal.

Note that up until this point this mirrors how the probability distributions with pdfs with respect to Lebesgue measure are currently defined in mathlib. However I found that using the PMF to define the coefficients of the dirac distributions to be easier to work with in practice than mirroring the rest of the approach by using count.withDensity. Also note that the point of step 1 and 2 is that it makes it easier to compute with the Bochner integral. If you want to define a distribution that can take on the value infinity you can skip step 1 and define a PMF directly onto ENNReal and compute with the Lebesgue integral fine.

Define the measure on Fin nas the linear combination of diracs with coefficients given by the PMF.
Define the measure on ℝ by pushforward of the injection of Fin n into ℝ (which is shown to be equivalent to summing over the the diracs of the pushforwarded elements).

With this approach and just a little API, one can easily reduce calculations about these distributions to discrete sums. Here is a "prototype" (not intended to be directly PRed to mathlib, just testing the idea):

David Ledvinka (Sep 26 2025 at 22:50):

import Mathlib

open MeasureTheory ProbabilityTheory Measure Function Complex

open scoped ENNReal NNReal

/- PR #29959 ----------------------/
section Existence

variable {𝓧 : Type*} {m𝓧 : MeasurableSpace 𝓧} {μ : Measure 𝓧}

universe u v

lemma exists_hasLaw_indepFun {ι : Type v} (𝓧 : ι → Type u)
    {m𝓧 : ∀ i, MeasurableSpace (𝓧 i)} (μ : (i : ι) → Measure (𝓧 i))
    [hμ : ∀ i, IsProbabilityMeasure (μ i)] :
    ∃ Ω : Type (max u v), ∃ _ : MeasurableSpace Ω, ∃ P : Measure Ω, ∃ X : (i : ι) → Ω → (𝓧 i),
      (∀ i, HasLaw (X i) (μ i) P) ∧ (iIndepFun X P) := by
  use Π i, (𝓧 i), .pi, infinitePi μ, fun i ↦ Function.eval i
  refine ⟨fun i ↦ MeasurePreserving.hasLaw (measurePreserving_eval_infinitePi _ _), ?_⟩
  rw [iIndepFun_iff_map_fun_eq_infinitePi_map (by fun_prop), map_id']
  congr
  funext i
  exact ((measurePreserving_eval_infinitePi μ i).map_eq).symm

lemma exists_iid (ι : Type v) {𝓧 : Type u} {m𝓧 : MeasurableSpace 𝓧}
    (μ : Measure 𝓧) [IsProbabilityMeasure μ] :
    ∃ Ω : Type (max u v), ∃ _ : MeasurableSpace Ω, ∃ P : Measure Ω, ∃ X : ι → Ω → 𝓧,
      (∀ i, HasLaw (X i) μ P) ∧ (iIndepFun X P) :=
  exists_hasLaw_indepFun (fun _ ↦ 𝓧) (fun _ ↦ μ)

end Existence

section charFun

variable {E : Type*} [MeasurableSpace E] {μ ν : Measure E} {t : E}
  [NormedAddCommGroup E] [InnerProductSpace ℝ E] [BorelSpace E] [SecondCountableTopology E]

/- From CLT Project (not my code) -/
lemma charFun_map_sum_pi_const (μ : Measure E) [IsFiniteMeasure μ] (n : ℕ) (t : E) :
    charFun ((Measure.pi fun (_ : Fin n) ↦ μ).map fun x ↦ ∑ i, x i) t = charFun μ t ^ n := by
  induction n with
  | zero => simp [Measure.map_const, charFun_apply]
  | succ n ih =>
    rw [pow_succ', ← ih, ← charFun_conv]
    congr 1
    have h := (measurePreserving_piFinSuccAbove (fun (_ : Fin (n + 1)) ↦ μ) 0).map_eq
    nth_rw 2 [← μ.map_id]
    rw [Measure.conv, Measure.map_prod_map, ← h, Measure.map_map, Measure.map_map]
    · congr 1 with x
      apply Fin.sum_univ_succ
    all_goals { fun_prop }

variable {Ω : Type*} (n : ℕ) {mΩ : MeasurableSpace Ω} {P : Measure Ω} {X : Fin n → Ω → E}

/- Corollary -/
lemma ProbabilityTheory.iIndepFun.idd_charFun (hn : 1 ≤ n) {μ : Measure E}
    [hμ : IsProbabilityMeasure μ] (hX : ∀ i, HasLaw (X i) μ P) (hXindep : iIndepFun X P) (t : E) :
    charFun (P.map (∑ i, X i)) t = charFun μ t ^ n := by
  have : IsProbabilityMeasure P :=
    ((hX ⟨0, hn⟩).isProbabilityMeasure_iff).mp hμ
  rw [← charFun_map_sum_pi_const]
  congr
  rw [iIndepFun_iff_map_fun_eq_pi_map (by fun_prop)] at hXindep
  conv in μ => rw [← (hX _).map_eq]
  rw [← hXindep, AEMeasurable.map_map_of_aemeasurable (by fun_prop) (by fun_prop)]
  congr
  ext _
  simp

end charFun

/- Discrete Probability API -------/
section DiscreteMeasure

variable {Ω : Type*} {mΩ : MeasurableSpace Ω} {P : Measure Ω}

variable {α : Type*} [mα : MeasurableSpace α]

variable {β : Type*} [mβ : MeasurableSpace β]

noncomputable def Function.to_discrete_measure (f : α → ℝ≥0∞) : Measure α :=
  sum (fun a ↦  f a • (dirac a))

lemma Function.to_discrete_measure_isProbabilityMeasure {f : α → ℝ≥0∞} (hf : HasSum f 1) :
    IsProbabilityMeasure f.to_discrete_measure :=
  ⟨by simp [Function.to_discrete_measure, hf.tsum_eq]⟩

-- Optimize Measurability Assumptions
lemma Function.to_discrete_measure_map_eq (f : α → ℝ≥0∞) {φ : α → β} (hφ : Measurable φ) :
    f.to_discrete_measure.map φ = sum (fun a ↦ f a • (dirac (φ a))) := by
  simp [Function.to_discrete_measure, MeasureTheory.Measure.map_sum hφ.aemeasurable,
    Measure.map_smul, map_dirac hφ]

section Fintype

variable {α : Type*} [Fintype α]

variable {E : Type*} [NormedAddCommGroup E]

-- Optimize Measurability Assumptions
theorem integrable_linear_combination_dirac_fintype [MeasurableSingletonClass β]
    (f : α → ℝ) (φ : α → β) {g : β → E}
    (hg : AEStronglyMeasurable g (sum (fun a ↦ (ENNReal.ofReal ∘ f) a • (dirac (φ a))))) :
    Integrable g (sum (fun a ↦ (ENNReal.ofReal ∘ f) a • (dirac (φ a)))) := by
  refine ⟨hg, ?_⟩
  simp [HasFiniteIntegral]
  finiteness

-- Optimize Measurability Assumptions
theorem integral_linear_combination_dirac_fintype [MeasurableSingletonClass β]
    [NormedSpace ℝ E] [CompleteSpace E]
    {f : α → ℝ} (hf : 0 ≤ f) {φ : α → β} {g : β → E}
    (hg : AEStronglyMeasurable g (sum (fun a ↦ (ENNReal.ofReal ∘ f) a • (dirac (φ a))))) :
    ∫ b : β, g b ∂ sum (fun a ↦ (ENNReal.ofReal ∘ f) a • (dirac (φ a)))
    = ∑ a : α, f a • g (φ a) := by
  rw [integral_sum_measure (integrable_linear_combination_dirac_fintype f φ hg)]
  simp [tsum_fintype, fun x ↦ ENNReal.toReal_ofReal (hf x)]


end Fintype

end DiscreteMeasure

David Ledvinka (Sep 26 2025 at 22:50):

namespace ProbabilityTheory

section Bernoulli

/- Bernoulli Measure -/

def bernoulli_PMF_Real (p : ℝ) (i : Fin 2) : ℝ := if i = 1 then p else 1 - p

def bernoulli_PMF_Real_nonneg {p : ℝ} (hp₀ : 0 ≤ p) (hp₁ : p ≤ 1) : 0 ≤ bernoulli_PMF_Real p :=
  fun i ↦ by fin_cases i <;> simpa [bernoulli_PMF_Real]

def bernoulli_PMF (p : ℝ) : (Fin 2) → ℝ≥0∞ := ENNReal.ofReal ∘ (bernoulli_PMF_Real p)

noncomputable def fin_bernoulli (p : ℝ) : Measure (Fin 2) :=
  (bernoulli_PMF p).to_discrete_measure

lemma HasSum_bernoulli_PMF_one {p : ℝ} (hp₀ : 0 ≤ p) (hp₁ : p ≤ 1) :
  HasSum (bernoulli_PMF p) 1 := by
  convert hasSum_fintype (bernoulli_PMF p)
  have : 1 = ENNReal.ofReal (1 - p) + ENNReal.ofReal p := by
    rw [← ENNReal.ofReal_add (by bound) hp₀]
    simp
  simp [bernoulli_PMF, bernoulli_PMF_Real, this]

theorem isProbabilityMeasure_fin_bernoulli {p : ℝ} (hp₀ : 0 ≤ p) (hp₁ : p ≤ 1) :
    IsProbabilityMeasure (fin_bernoulli p) :=
  (bernoulli_PMF p).to_discrete_measure_isProbabilityMeasure (HasSum_bernoulli_PMF_one hp₀ hp₁)

noncomputable def real_bernoulli (p : ℝ) : Measure ℝ :=
  (fin_bernoulli p).map (↑)

theorem real_bernoulli_def (p : ℝ) :
    real_bernoulli p = sum (fun i ↦ (bernoulli_PMF p i) • dirac (i : ℝ)) := by
  unfold real_bernoulli fin_bernoulli
  rw [(bernoulli_PMF p).to_discrete_measure_map_eq (by fun_prop)]

theorem isProbabilityMeasure_real_bernoulli {p : ℝ} (hp₀ : 0 ≤ p) (hp₁ : p ≤ 1) :
    IsProbabilityMeasure (real_bernoulli p) :=
  have := isProbabilityMeasure_fin_bernoulli hp₀ hp₁
  isProbabilityMeasure_map (by fun_prop (maxTransitionDepth := 2))

theorem real_bernoulli_charFun_eq {p : ℝ} (hp₀ : 0 ≤ p) (hp₁ : p ≤ 1) (t : ℝ) :
    charFun (real_bernoulli p) t = (1 - p) + p * exp (t * I) := by
  rw [charFun_apply, real_bernoulli_def, bernoulli_PMF,
    integral_linear_combination_dirac_fintype (bernoulli_PMF_Real_nonneg hp₀ hp₁) (by fun_prop)]
  simp [bernoulli_PMF_Real]

/- Bernoulli Random Variables -/

variable {Ω : Type*} {mΩ : MeasurableSpace Ω} {P : Measure Ω} {X : Ω → ℝ} {p : ℝ}

theorem HasLaw.real_bernoulli_ae_zero_or_one (hX : HasLaw X (real_bernoulli p) P) :
    ∀ᵐ ω ∂P, X ω = 0 ∨ X ω = 1 := by
  change P (X ⁻¹' {0, 1}ᶜ) = 0
  rw [← Measure.map_apply₀ hX.aemeasurable (by simp), hX.map_eq,
    ← lintegral_indicator_one (by measurability), real_bernoulli_def]
  simp

theorem HasLaw.real_bernoulli_memLp (hp₀ : 0 ≤ p) (hp₁ : p ≤ 1)
    (hX : HasLaw X (real_bernoulli p) P) (q : ℝ≥0∞) :
    MemLp X q P := by
  have : IsProbabilityMeasure P :=
    hX.isProbabilityMeasure_iff.mp (isProbabilityMeasure_real_bernoulli hp₀ hp₁)
  apply MemLp.of_bound (by fun_prop (maxTransitionDepth := 2)) 1
  filter_upwards [hX.real_bernoulli_ae_zero_or_one] with ω
  rintro (h | h) <;> simp [h]

theorem HasLaw.real_bernoulli_integrable (hp₀ : 0 ≤ p) (hp₁ : p ≤ 1)
    (hX : HasLaw X (real_bernoulli p) P) :
    Integrable X P :=
  memLp_one_iff_integrable.mp (hX.real_bernoulli_memLp hp₀ hp₁ 1)

theorem HasLaw.real_bernoulli_moment_eq_p (hp₀ : 0 ≤ p) (hp₁ : p ≤ 1) {n : ℕ} (hn : 1 ≤ n)
    (hX : HasLaw X (real_bernoulli p) P) :
    moment X n P = p := by
  unfold moment
  conv in (X ^ n) => change (· ^ n) ∘ X
  rw [hX.integral_comp (by fun_prop), real_bernoulli_def, bernoulli_PMF,
    integral_linear_combination_dirac_fintype (bernoulli_PMF_Real_nonneg hp₀ hp₁) (by fun_prop)]
  simp [bernoulli_PMF_Real]
  grind

theorem HasLaw.real_bernoulli_mean (hp₀ : 0 ≤ p) (hp₁ : p ≤ 1)
    (hX : HasLaw X (real_bernoulli p) P) :
    P[X] = p := by
  rw [← moment_one, hX.real_bernoulli_moment_eq_p hp₀ hp₁ (by rfl)]

theorem HasLaw.real_bernoulli_variance (hp₀ : 0 ≤ p) (hp₁ : p ≤ 1)
    (hX : HasLaw X (real_bernoulli p) P) :
    Var[X; P] = p * (1 - p) := by
  have : IsProbabilityMeasure P :=
    hX.isProbabilityMeasure_iff.mp (isProbabilityMeasure_real_bernoulli hp₀ hp₁)
  rw [variance_eq_sub (hX.real_bernoulli_memLp hp₀ hp₁ 2), ← moment_def,
    hX.real_bernoulli_moment_eq_p hp₀ hp₁ (by bound), hX.real_bernoulli_mean hp₀ hp₁]
  ring

end Bernoulli

section Binomial

/- Binomial Measure -/

def binomial_PMF_Real (p : ℝ) (n : ℕ) (i : Fin (n + 1)) : ℝ :=
  p ^ (i : ℕ) * (1 - p) ^ (n - i) * (n.choose i)

def binomial_PMF_Real_nonneg {p : ℝ} (n : ℕ) (hp₀ : 0 ≤ p) (hp₁ : p ≤ 1) :
    0 ≤ binomial_PMF_Real p n := by
  intro x
  simp [binomial_PMF_Real]
  bound

def binomial_PMF (p : ℝ) (n : ℕ) : (Fin (n + 1)) → ℝ≥0∞ := ENNReal.ofReal ∘ (binomial_PMF_Real p n)

noncomputable def fin_binomial (p : ℝ) (n : ℕ) : Measure (Fin (n + 1)) :=
  (binomial_PMF p n).to_discrete_measure

lemma HasSum_binomial_PMF_one {p : ℝ} (n : ℕ) (hp₀ : 0 ≤ p) (hp₁ : p ≤ 1) :
  HasSum (binomial_PMF p n) 1 := by
  convert hasSum_fintype (binomial_PMF p n)
  simp only [binomial_PMF, binomial_PMF_Real, comp_apply]
  let f (x : ℕ) : ℝ := p ^ x * (1 - p) ^ (n - x) * ↑(n.choose x)
  rw [← ENNReal.ofReal_sum_of_nonneg (by bound), Fin.sum_univ_eq_sum_range (f := f), ← add_pow]
  simp

theorem isProbabilityMeasure_fin_binomial {p : ℝ} (n : ℕ) (hp₀ : 0 ≤ p) (hp₁ : p ≤ 1) :
    IsProbabilityMeasure (fin_binomial p n) :=
  (binomial_PMF p n).to_discrete_measure_isProbabilityMeasure (HasSum_binomial_PMF_one n hp₀ hp₁)

noncomputable def real_binomial (p : ℝ) (n : ℕ) : Measure ℝ :=
  (fin_binomial p n).map (↑)

theorem real_binomial_def (p : ℝ) (n : ℕ) :
    real_binomial p n = sum (fun i ↦ (binomial_PMF p n i) • dirac (i : ℝ)) := by
  unfold real_binomial fin_binomial
  rw [(binomial_PMF p n).to_discrete_measure_map_eq (by fun_prop)]

theorem isProbabilityMeasure_real_binomial {p : ℝ} (n : ℕ) (hp₀ : 0 ≤ p) (hp₁ : p ≤ 1) :
    IsProbabilityMeasure (real_binomial p n) :=
  have := isProbabilityMeasure_fin_binomial n hp₀ hp₁
  isProbabilityMeasure_map (by fun_prop (maxTransitionDepth := 2))

theorem real_binomial_charFun_eq {p : ℝ} (n : ℕ) (hp₀ : 0 ≤ p) (hp₁ : p ≤ 1) (t : ℝ) :
    charFun (real_binomial p n) t = ((1 - p) + p * exp (t * I)) ^ n := by
  rw [charFun_apply, real_binomial_def, binomial_PMF,
    integral_linear_combination_dirac_fintype (binomial_PMF_Real_nonneg n hp₀ hp₁) (by fun_prop)]
  simp [binomial_PMF_Real, add_comm, add_pow, ← Fin.sum_univ_eq_sum_range]
  congr
  ext i
  rw [mul_pow, ← exp_nat_mul, ← mul_assoc, mul_comm (i : ℂ)]
  ring

/- Bernoulli Binomial Connection -/

variable {Ω : Type*} {mΩ : MeasurableSpace Ω} {P : Measure Ω}

theorem bernoulli_eq_binomial_one {p : ℝ} :
    real_bernoulli p = real_binomial p 1 := by
  rw [real_bernoulli_def, real_binomial_def, bernoulli_PMF, binomial_PMF]
  congr; ext _; congr; ext i
  fin_cases i <;> simp [bernoulli_PMF_Real, binomial_PMF_Real]

theorem iIndepFun.sum_bernoulli {p : ℝ} {n : ℕ} (hp₀ : 0 ≤ p) (hp₁ : p ≤ 1) (hn : 1 ≤ n)
    {X : Fin n → Ω → ℝ} (hX : ∀ i, HasLaw (X i) (real_bernoulli p) P) (hXindep : iIndepFun X P) :
    HasLaw (∑ i, X i) (real_binomial p n) P where
  map_eq := by
    have := isProbabilityMeasure_real_bernoulli hp₀ hp₁
    have := isProbabilityMeasure_real_binomial n hp₀ hp₁
    have : IsProbabilityMeasure P :=
      (hX ⟨0, hn⟩).isProbabilityMeasure_iff.mp (isProbabilityMeasure_real_bernoulli hp₀ hp₁)
    apply Measure.ext_of_charFun
    ext t
    rw [hXindep.idd_charFun n hn hX, real_bernoulli_charFun_eq hp₀ hp₁ t,
      real_binomial_charFun_eq n hp₀ hp₁ t]

theorem HasLaw.binomial_integral {p : ℝ} {n : ℕ} (hp₀ : 0 ≤ p) (hp₁ : p ≤ 1) (hn : 1 ≤ n)
    {X : Ω → ℝ} (hX : HasLaw X (real_binomial p n) P) : P[X] = n * p := by
  have := isProbabilityMeasure_real_bernoulli hp₀ hp₁
  obtain ⟨Ω, _, P, Y, hY, hIndep⟩ := exists_iid (Fin n) (real_bernoulli p)
  rw [hX.integral_eq, ← (hIndep.sum_bernoulli hp₀ hp₁ hn hY).integral_eq]
  simp only [Finset.sum_apply]
  rw [integral_finset_sum _ (fun i _ ↦ (hY i).real_bernoulli_integrable hp₀ hp₁)]
  conv => enter [1, 2, i]; rw [(hY i).real_bernoulli_mean hp₀ hp₁]
  simp

end Binomial

end ProbabilityTheory

David Ledvinka (Sep 26 2025 at 22:50):

I haven't tested it with countable distributions yet (I should) but I believe it should all work similarly. Would love to hear any feedback on this approach!

Matteo Cipollina (Sep 27 2025 at 09:58):

Nice idea, I can partly vouch for it since I've recently used the sum of diracs representation to define measures and kernels from matrices in this file toKernel. My goal here was much more specific—bridging matrix stationarity with kernel invariance— but I found it was the most effective way (I could think of) to connect the linear algebra / computational view with mathlib's measure theory API.

Peter Pfaffelhuber (Oct 22 2025 at 07:50):

Also thinking about discrete probability (only using PMFs and struggling with them a bit until now), Some thoughts:

Your real_bernoulli is a Measure ℝ. The underlying sigma-algebra is therefore the Borel sigma-algebra on ℝ, right? This shouldn't be the case since it should be a Measure ℝ on the full power set of ℝ.
A related question might be if PMFs should be defined as they currently are as some α → ℝ≥0∞ which sums to 1, or as special (probability) measures, e.g. making use of Function.to_discrete_measure?
I like your Function.to_discrete_measure, since it can be used in 2. At least if it gives rise to a probability measure, however, I think it should be a special case of something more general. You can view it as picking the probability measure dirac a with probability f a, and evaluating the expected outcome. More generally, if you have a random probability measure $\Xi \in \mathcal P(\mathcal P(\alpha))$ , this is $\int \nu(.) \Xi(d\nu)$ . In your case, $\Xi = \sum f (a) \delta_{\delta a}$ .

David Ledvinka (Oct 22 2025 at 09:05):

Its not clear to me you actually want this, to me its a practical question of which will give you more of a hassle. Note with the way measures are defined in Lean you can apply them to any set whether part of the sigma algebra or not (it must come from an outer measure on the power set). I think using the full power set will give you more of a hassle but that is the kind of thing that would be best determined by using the definitions and trying to prove things. I don't think any of the things I proved tests this claim either way.
Do we need a special case for PMF (summing to 1)? At the very least I think the API should be general until specialization is absolutely necessary. Its not clear to me that this is needed on top of IsProbabilityMeasure and appropriate API.
The main motivation of my work here was to investigate how to make doing concrete problems in discrete probability as painless as possible without sacrificing generality and integrating well with the abstract API. Here I am very much exploiting the sum of measures API and dirac API so that simp can do much of the work of simplifying discrete problems in ways that wouldn't work for the general case. Though its possible a different definition of these distributions (such as using your more general definition or withDensity) would be preferable and this could just be API.

I definitely am unsure about a lot of things here and I think the best way to find the right answers is to actually try to prove things and see what works and what doesn't. On that note I would definitely be curious to hear what you were/are working on and what the struggles were if you are willing to share at some point

Peter Pfaffelhuber (Oct 22 2025 at 20:15):

Ok, all are fair points, and I am equally unsure about many things:

In retrospect, learning discrete probability is like the measure-theoretic perspecitve but one never has to worry about measurability. The reason is that the power set can be used as a sigma-algebra on the basic space, which was the reason for my suggestion.
My more general question was if rather PMFs should be defined as special measures, or PMFs should be defined by themselves, but give rise to measures (as currently implemented).
Maybe the best first step would be to determine which requirements the API for PMFs should satisfy. I can see (i) defining PMFs should be easy; (ii) integration with measure theory, including integration (using simps) should be straight-forward; (iii) weak convergence might also be inherited from measure theory. As (iv) -- and I am currrently playing around with that -- is that PMFs are a lawdul monad, and therefore allow for do-notation, which I would not like to lose.

David Ledvinka (Oct 22 2025 at 20:52):

I guess to clarify my ideas here are less about developing an API for PMFs and more about developing a template for defining and proving things about concrete probability distributions. My concern about developing the PMF API is that most things you would want to prove about PMFs abstractly hold more generally (such as to densities, densities that sum to 1, maybe bounded etc...). I think the strongest counterpoint for PMF is definitely the fact that they allow do notation as you mention. However I worry that using this in mathlib will lead to things that are not expressed at the proper generality for mathlib. Of course we could keep this as a leaf API.

As an example of both this kind of thing and an answer to 1 (assuming by base space you mean the sample space) you don't want to state any theorems in mathlib with the assumption that the sample space is the powerset even if everything I consider is discrete probability, because then I can't combine it with more general probability. For example if I assumed Ω is finite when proving the formula for the expectation of a bernoulli random variable, I can't then use this fact when I am considering a setting with a mix of continuous random variables and bernoulli random variables on the same sample space.

Note that the category of measure spaces with their probability measures also forms a monad (the Giry monad) but it can't be made to work with lean do notation because of the need to choose a measure space. However a long term goal of mine would be to investigate whether we can could create a custom do notation that can work so that you could express custom algorithms like MCMC with do notation in full generality. I think in theory this should be possible but it might be harder to translate the API for proving things about do notation as easily.

David Ledvinka (Oct 22 2025 at 20:57):

I think given my uncertainty my current proposal would not be to deprecate PMF but just instead redefine the current definitions of each of the concrete distributions in mathlib with a new template (either resembling an improved version of mine or something completely different) and then one could prove that this is equivalent to defining it with a PMF.

To me a test of whether it is a good template is how easy it is to prove basic facts about the distribution. Ideally trivial things should be trivial.

Note currently very few facts are proven about the concrete distributions in mathlib outside of the Gaussian distribution.

Peter Pfaffelhuber (Oct 24 2025 at 21:26):

I agree that easy things should be easy and one has to try proving things on concrete distributions. My "Level 0" is (working well with PMFs, but various things need to be introduced when using sums of Diracs at the moment):

open PMF

lemma bernoulli_not : (bernoulli p h).map not = bernoulli (1-p) tsub_le_self := by
  simp only [PMF.map]
  ext x
  cases' x <;>
  simp [tsum_bool, ENNReal.sub_sub_cancel one_ne_top, h]

David Ledvinka (Oct 24 2025 at 22:58):

Even without adding any API to what I have above its not so bad... There definitely should be some API for this though.

variable {p : ℝ}

def not' : Fin 2 → Fin 2
  | 0 => 1
  | 1 => 0

example : (fin_bernoulli p).map not' = (fin_bernoulli (1 - p)) := by
  simp [fin_bernoulli, to_discrete_measure, bernoulli_PMF, bernoulli_PMF_Real]
  rw [Measure.map_add _ _ (by fun_prop)]
  repeat rw [Measure.map_smul]
  repeat rw [Measure.map_dirac (by fun_prop)]
  simp [not', add_comm]

I think probably you could design a simproc to do stuff like this automatically.

But one main point to emphasize is that to do most of the interesting things using the general mathlib API you will need to work on ℝ or some Banach space so you can do things like integrate. Thus its important that you can easily "pull back" computations from the measure defined on ℝ to the measure defined on the discrete space. This is particularly where I found the sum of diracs to work best (as opposed to using withDensity which I initially tried). I want it to be easy to turn the expectation of a discrete random variable on ℝ (or a function of that random variable) to a discrete sum.

Note that if you can find a way to do this all with PMF for free then its fine but if you have to develop any API for it you will surely lose generality since it won't apply to arbitrary discrete measures, such as the uniform measure on the natural numbers or the dirac delta function (as a measure). One would still want to do computations in some cases with these measures and in those cases one would have to define a separate API for these.

I think if you can drop the sum to one part and develop a nice API with "mass functions" that can easily do stuff like the stuff I did above (for example compute characteristic functions) this could be better. Of course you will lose the monad but no matter what the definitions are we can prove an equivalence and translate to PMF if we want to use it.

Aaron Liu (Oct 24 2025 at 23:53):

Your not' is docs#Fin.rev

David Ledvinka (Oct 24 2025 at 23:53):

As a simple example, the following is also relatively easy

variable {Ω : Type*} [MeasurableSpace Ω] {P : Measure Ω} {X : Ω → ℝ}  (hX : HasLaw X (real_bernoulli p) P)

example : HasLaw (1 - X) (real_bernoulli (1 - p)) P where
  map_eq := by
    rw [Measure.ext_iff_lintegral]
    intro f hf
    rw [lintegral_map' (by fun_prop) (by fun_prop)]
    conv => enter [1, 2, a]; change (f ∘ (fun x ↦ 1 - x)) (X a)
    rw [hX.lintegral_comp (by fun_prop)]
    simp [real_bernoulli_def, bernoulli_PMF, bernoulli_PMF_Real, add_comm]

EDIT:

Also we can use this strategy to give a shorter proof of the above:

example : (fin_bernoulli p).map not' = (fin_bernoulli (1 - p)) := by
  refine Measure.ext_of_lintegral _ (fun f hf ↦ ?_)
  rw [lintegral_map (by fun_prop) (by fun_prop)]
  simp [fin_bernoulli, to_discrete_measure, not', bernoulli_PMF, bernoulli_PMF_Real, add_comm]

Aaron Liu (Oct 24 2025 at 23:54):

David Ledvinka said:

Note that if you can find a way to do this all with PMF for free then its fine but if you have to develop any API for it you will surely lose generality since it won't apply to arbitrary discrete measures, such as the uniform measure on the natural numbers or the dirac delta function (as a measure). One would still want to do computations in some cases with these measures and in those cases one would have to define a separate API for these.

What is the uniform measure on the natural numbers?

David Ledvinka (Oct 24 2025 at 23:58):

I meant the counting measure (or a multiple of it).

Peter Pfaffelhuber (Oct 28 2025 at 21:45):

In the meantime, I agree that working with sums of Diracs works pretty well. I pushed a bit further my idea that the resulting measure should be defined on the powerset (rather than assuming that the underlying type has a measurable structure). Here are some partial results:

import Mathlib

open MeasureTheory ProbabilityTheory Measure Function

open scoped ENNReal NNReal

section PMFassumsofDiracs

variable {α β γ : Type*}

example (s : Set β) (b : β): MeasurableSet[(OuterMeasure.dirac b).caratheodory] s := by
  simp only [OuterMeasure.dirac_caratheodory, MeasurableSpace.measurableSet_top]

instance topMeasurableSpace : MeasurableSpace β := ⊤

-- Given `f : α → ℝ≥0∞` and `g : α → β`, this is the measure (on `⊤`, i.e. the power set of `β`),
-- which adds mass `f a` to `g a`.
noncomputable def Function.to_discrete_measure (f : α → ℝ≥0∞) (g : α → β) : @Measure β ⊤ :=
  sum (fun a ↦ f a • (OuterMeasure.dirac (g a)).toMeasure
    ((OuterMeasure.dirac_caratheodory (g a)).symm ▸ le_top))

noncomputable def Function.to_discrete_measure_ofReal (f : α → ℝ) (g : α → β): @Measure β ⊤ :=
  Function.to_discrete_measure (ENNReal.ofReal ∘ f) g

lemma to_discrete_measure_apply' (f : α → ℝ≥0∞) (g : α → β) (s : Set β) : f.to_discrete_measure g s =
    ∑' a, (f a) * s.indicator (fun _ ↦ 1) (g a) := by
  simp only [to_discrete_measure, MeasurableSpace.measurableSet_top, sum_apply, smul_apply,
    toMeasure_apply, OuterMeasure.dirac_apply, smul_eq_mul]

lemma to_discrete_measure_apply (f : α → ℝ≥0∞) (g : α → β) (s : Set β) :
    f.to_discrete_measure g s = ∑' a, (f a) * s.indicator 1 (g a) := by
  rw [to_discrete_measure_apply']
  rfl

lemma to_discrete_measure_ofReal_apply (f : α → ℝ) (g : α → β) (s : Set β) :
    f.to_discrete_measure_ofReal g s = ∑' a, (ENNReal.ofReal (f a)) * s.indicator (fun _ ↦ 1) (g a) := by
  rw [to_discrete_measure_ofReal]
  exact to_discrete_measure_apply (ENNReal.ofReal ∘ f) g s

lemma indicator_eq (f : α → ℝ≥0∞) (s : Set α) : f a * s.indicator 1 a = s.indicator f a := by
  simp [Set.indicator]

lemma to_discrete_measure_apply_id (f : α → ℝ≥0∞) (s : Set α) : f.to_discrete_measure id s = ∑' a, s.indicator f a := by
  rw [to_discrete_measure_apply]
  exact tsum_congr (fun _ ↦ indicator_eq f s)

lemma to_discrete_measure_apply_id_singleton (f : α → ℝ≥0∞) (u : α) :
    f.to_discrete_measure id {u} = f u := by
  rw [to_discrete_measure_apply_id, ← tsum_subtype, tsum_singleton]

theorem to_discrete_measure_eq_iff {f₁ f₂ : α → ℝ≥0∞} : f₁ = f₂ ↔
    f₁.to_discrete_measure id = f₂.to_discrete_measure id := by
  refine ⟨fun h ↦ by rw [h], ?_⟩
  rw [← not_imp_not]
  intro h
  change f₁ ≠ f₂ at h
  rw [ne_iff] at h
  obtain ⟨a, ha⟩ := h
  change _ ≠ _
  rw [DFunLike.ne_iff]
  use {a}
  simp [to_discrete_measure_apply_id_singleton, ha]

theorem Function.to_discrete_measure_map (f : α → ℝ≥0∞) (g : α → β) (h : β → γ) :
    (f.to_discrete_measure g).map h = f.to_discrete_measure (h ∘ g) := by
  ext s
  rw [map_apply (by fun_prop) (by measurability)]
  rw [to_discrete_measure_apply']
  rw [to_discrete_measure_apply']
  apply tsum_congr
  intro b
  congr

end PMFassumsofDiracs

Wrenna Robson (Oct 29 2025 at 12:54):

I do think some kind of accessible API for discrete probability would be good.

Wrenna Robson (Oct 29 2025 at 12:56):

It might be worth distinguishing between finite pmfs and discrete ones but I'm not sure.

Wrenna Robson (Oct 29 2025 at 12:57):

The advantage to considering finite pmfs is that then you can use, like, Finsets and the like.

Wrenna Robson (Oct 29 2025 at 12:58):

OTOH a lot of natural discrete random variables take values in Nat.

Wrenna Robson (Oct 29 2025 at 13:08):

In general you want a theory of f : α → ℝ for encodable α, with f a nonnegative and <= 1, which you can extend to f' : Option α → ℝby setting f' none = 0 and f' (some a) = f a, such that the sum over Nat for f' (decode n) is 1.

Wrenna Robson (Oct 29 2025 at 13:08):

Or something like that.

Wrenna Robson (Oct 29 2025 at 13:09):

But when you have Fintype α it is just a lot easier to talk about.

Wrenna Robson (Oct 29 2025 at 13:10):

Think this is mostly PMF or close to it, but it isn't very ergonomic, especially in the finite case.

Wrenna Robson (Oct 29 2025 at 13:11):

I've experimented a bit with this but for subprobability distributions before (when the sum is <= 1 rather than = 1). I think I have a whole treatment somewhere. The monad stuff is nice and you really want that for certain theoretical application.

Wrenna Robson (Oct 29 2025 at 13:12):

At some point it would be good to be able to have probabilistic hoare logic :)

Peter Pfaffelhuber (Nov 09 2025 at 15:38):

After some more hours of experience with discrete measures (as in the code snippet above), the best definition so far is

noncomputable def discreteMeasure (f : α → ℝ≥0∞) : @Measure α ⊤ :=
  sum (fun a ↦ (f a) • (@Measure.dirac α ⊤ a))

together with

structure DiscreteMeasure (α : Type*) where
  weight : α → ℝ≥0∞

noncomputable def toDiscreteMeasure (μ : DiscreteProbabilityMeasure α) : DiscreteMeasure α :=
  ⟨μ.weight⟩

noncomputable def DiscreteMeasuretoMeasure (μ : DiscreteMeasure α) : @Measure α ⊤ :=
  discreteMeasure μ.weight

That way I have access to all results from MeasureTheory and still can compute decently with the weight. Another thing that I like is that I am still confident to be able to show that DiscreteMeasure is a LawfulMonad, i.e. I will have do notation. Would that be a way to go?

David Ledvinka (Nov 09 2025 at 16:03):

I think this is fine but I am basically just skeptical that we should develop much API for DiscreteMeasure (in Mathlib) since my hunch is that most things that could be proved about it should be proved at a higher level of generality (depending on that thing).

Im curious what you would like do notation for? Is it something that belongs in mathlib or outside of it? As I mentioned above I think in the long term I think it would be better to get a custom do notation to work for all probability measures (using the Giry monad) even though it might take a lot more work.

Regardless of my reservations I think if we start proving (medium trivial and above) things, I think it will become more clear whether these are good definitions or not.

Peter Pfaffelhuber (Nov 09 2025 at 20:25):

Here is an example of do-notation that I like to use. (Since PMF is a LawfulMonad, it allows for such notation.)

lemma PMF.bernoulli_not (p : ℝ≥0) (h : p ≤ 1) : (
    do
      let X ← PMF.bernoulli p h
      return (not X))
    =
    PMF.bernoulli (1 - p) tsub_le_self := by
  simp only [LawfulMonad.bind_pure_comp]
  ext x
  change map not (PMF.bernoulli p h) x = (PMF.bernoulli (1 - p) _ ) x
  rw [PMF.map_apply]
  by_cases h' : x <;> rw [tsum_bool] <;> simp [h']
  rw [ENNReal.sub_sub_cancel (one_ne_top)]
  simp only [coe_le_one_iff, h]

As you see, in the do block, it looks as if I am drawing the random variable X from bernoulli p h, so it describes what happens within the random experiment. On the formal side, only PMFs appear on either side of the statement.

Peter Pfaffelhuber (Nov 09 2025 at 20:29):

David Ledvinka schrieb:

Regardless of my reservations I think if we start proving (medium trivial and above) things, I think it will become more clear whether these are good definitions or not.

I totally agree on this point. Do you have any specific medium trivial things to try?

By the way, one nice thing about the DiscreteMeasure from above is that it is (not sigma-, but completely) additive:

lemma m_iUnion (μ : DiscreteMeasure α) (s : δ → Set α)
  (hs : Pairwise (Disjoint on s)) : μ (⋃ d, s d) =
    ∑' (d : δ), μ (s d) := by
  sorry

David Ledvinka (Nov 09 2025 at 20:48):

I think a good starting point is basically picking any discrete distribution and trying to prove all the things on its Wikipedia page (I did a very small bit of this for Bernoulli and Binomial above).

Yuval Filmus (Nov 16 2025 at 06:24):

Another suggestions is to prove easy properties of random graphs, such as the threshold of the appearance of a triangle. This involves several concepts: a product measure, the union bound, and Chebyshev's inequality.
In combinatorics there is much use of discrete probability, so it would be nice to streamline it.

Last updated: Feb 28 2026 at 14:05 UTC