Zulip Chat Archive

Stream: Is there code for X?

Topic: Statistical distance for `PMF`s?


Quang Dao (Jun 22 2025 at 21:20):

There seems to be no definition of statistical distance / total variation distance for PMF?

I only work with PMF on finite types, in which case the definition is trivial, though I'm sure there are complications in the general case.

Bolton Bailey (Jun 22 2025 at 22:39):

Something like

import Mathlib

theorem PMF.dist_summable {α : Type u_1} (X Y : PMF α) :
    Summable fun i => dist (X i).toReal (Y i).toReal := by
  sorry

noncomputable instance StatisticalDistance {α} : PseudoMetricSpace (PMF α) where
  dist X Y := (1 / 2) * ∑' i, dist ((X i).toReal) ((Y i).toReal)
  dist_self X := by
    simp
  dist_comm X Y := by
    congr
    ext i
    exact PseudoMetricSpace.dist_comm (X i).toReal (Y i).toReal
  dist_triangle X Y Z := by
    rw [<-mul_add]
    rw [<-Summable.tsum_add]
    simp
    refine tsum_le_tsum ?_ ?_ ?_
    intro i
    exact dist_triangle (X i).toReal (Y i).toReal (Z i).toReal
    exact PMF.dist_summable X Z
    apply Summable.add
    exact PMF.dist_summable X Y
    exact PMF.dist_summable Y Z
    exact PMF.dist_summable X Y
    exact PMF.dist_summable Y Z

(IDK why a PMF returns ENNReal rather than NNReal, I guess because we sum over it often enough that we would like the junk value for the sum to be top rather than 0?)

Quang Dao (Jun 23 2025 at 05:12):

Thanks! I guess for PMF over infinite types, it's ok for there to be infinity on some values, as long as their measure is zero?

Aaron Liu (Jun 23 2025 at 10:06):

These aren't measure spaces here, what do you mean by measure?

Kexing Ying (Jun 23 2025 at 12:32):

We have total variation for signed measures (there are discussions for general TV for vector measures
here). So you can convert the PMFs to measures first and get the TV that way

Thomas Zhu (Jun 24 2025 at 04:55):

I think there are a lot of statistical distances in @Rémy Degenne 's testing-lower-bounds, including total variation distance of two measures

Rémy Degenne (Jun 24 2025 at 10:51):

This is the repository in question, which is joint work with @Lorenzo Luccioli : https://github.com/RemyDegenne/testing-lower-bounds/
Sadly, that repository in a weird state: there are many results on the master branch, but we realized that the choice of types was not ideal (Real vs ENNReal for domain or codomain of some functions). There is a partial refactor on the branch ennreal but we did not finish it.
I added the definition of the Kullback-Leibler divergence to Mathlib (following the choice of types in the ennreal branch) and we PRed many auxiliary lemmas, but most of the new definitions are stuck in the repository until we gather the courage to update and PR them.


Last updated: Dec 20 2025 at 21:32 UTC