Zulip Chat Archive

Stream: Polynomial Freiman-Ruzsa conjecture

Topic: How to define uniform distribution?


Terence Tao (Nov 17 2023 at 21:57):

Given

import Mathlib

variable {Ω S : Type*} [MeasurableSpace Ω] [MeasurableSpace S] [MeasurableSingletonClass S] (H : Finset S) (X : Ω  S) (μ : Measure Ω)

I would like to define the assertion that X has the uniform distribution on H. So basically
μ.map X {x} = μ Set.univ / (Finset.card H) when x ∈ H and μ.map X {x} = 0 otherwise. Is there a slick way to define this? In practice H will be non-empty, X measurable, and μ a probability measure, but for various technical reasons it would be nice to have a definition that did not require these hypotheses.

Yaël Dillies (Nov 17 2023 at 22:06):

docs#MeasureTheory.pdf.IsUniform exists. I think that might be verbatim what you want?

Yaël Dillies (Nov 17 2023 at 22:07):

In particular, it does not need singletons (or finite sets) to be measurable, does not assume H is nonempty and X is any function.

Jason KY. (Nov 17 2023 at 22:39):

Beware that I dont know how useable the definition of pdf is in that file. I wrote that definition of pdf to work with abstract pdfs and it works fine for that purpose. I don't think anyone has used it for specific distributions yet (e.g. the Gaussian distribution file does not import that file at all). So for finsets it might be easier to use docs#PMF.uniformOfFinset instead

Kalle Kytölä (Nov 17 2023 at 22:40):

@Jason KY. was faster than me :racecar:

I agree that using the Mathib canonical spelling is the right thing to do, and Yaël located the canonical spelling. (Although Jason says maybe it is not so canonical after all... But I think it should be.)

However, (as Jason said) it is likely that there is API missing for uniform distributions on finite sets, so there is some work to tie the canonical spelling to a useful spelling for the PFR project. I can try to fill in some API (unless others are quicker as they usually are).

Kalle Kytölä (Nov 17 2023 at 22:40):

I have also always been at least slightly unconvinced about the canonical spelling of distributions involving the random variable itself rather than just the law of the random variable. In particular, I don't understand why the type of IsUniform is essentially

(Ω  E)  (Set E)  (Measure Ω)  (Measure E)  Prop

instead of the simpler

(Set E)  (Measure E)  (Measure E)  Prop

With the latter spelling, one could

  • speak of uniformity of a measure ν (which currently requires inserting an artificial identity random variable, IsUniform id s ν μ)
  • spell the uniformity of the distribution of a random variable X : Ω → E as IsUniform s (ℙ.map X) μ (for comparison, the current spelling is IsUniform X s ℙ μ).

Kalle Kytölä (Nov 17 2023 at 22:41):

The same design goes for other distributions than uniform. I'd like to be able to say that a measure is gaussian or that a measure is exponential --- not just that a random variable is gaussian or exponential.

Jason KY. (Nov 17 2023 at 22:49):

Yeah, it is probably better to define uniform as a prop on measure rather than a random variable and then asserting the pushforward measure is uniform. Although in that case, it might be harder if you want to work simultaniously with specific pdfs and abstract pdf with the same API

Kalle Kytölä (Nov 17 2023 at 22:50):

...or in fact even better, I guess the uniformity of the law of X : Ω → E would be the Prop

ℙ.map X = uniform s μ

where (uniform : Set E → Measure E) is just an explicit construction of the uniform measure on s : Set E (w.r.t. reference measure μ := by volume_tac).

Similarly X gaussian distributed could be ℙ.map X = gaussian mean var, where gaussian explicitly constructs the gaussian measure, etc.

Jason KY. (Nov 17 2023 at 22:51):

I think it will be more difficult to prove properties about how transforming a random variable affects the pdf. E.g. adding random variables corresponds to convolution

Jason KY. (Nov 17 2023 at 22:52):

But I agree that the spelling of a specific distribution should be a prop on measures rather than with pdfs

Kalle Kytölä (Nov 17 2023 at 22:53):

If there would be a lemma (h : Indep X Y) : (X + Y).map ℙ = convolution (X.map ℙ) (Y.map ℙ), I'm not sure how much more work it would be.

Jason KY. (Nov 17 2023 at 22:54):

My point is that is not a property about pdfs but about the law of the random variables.

Jason KY. (Nov 17 2023 at 22:55):

In any case, I think we should scrap pdf.IsUniform and do what was done for the Gaussian instead

Kalle Kytölä (Nov 17 2023 at 22:57):

docs#ProbabilityTheory.gaussianReal

Patrick Massot (Nov 17 2023 at 22:58):

I think @Claus Clausen is working on this.

Jason KY. (Nov 17 2023 at 22:59):

#8330

Kalle Kytölä (Nov 17 2023 at 22:59):

On changing the spelling of distributions of random variables to be in terms of their laws?

Terence Tao (Nov 17 2023 at 23:01):

Hmm. I think for the short term I will punt on this by defining

def ProbabilityTheory.isUniform (H : Finset S) (X : Ω  S) (μ : Measure Ω) : Prop := sorry

and try to formalize other things that use that definition, and then maybe it will be clearer how to connect this to existing Mathlib methods. Thanks for the input!

Kalle Kytölä (Nov 17 2023 at 23:06):

I agree that the approach taken in #8330 to e.g. the exponential distribution looks good!

For docs#MeasureTheory.pdf.IsUniform, I think the only change I'd like to see is to "generalize" in the sense that it applies to any measure instead of just a measure that is the push-forward of another measure. And then one needs to add API for special cases such as uniform measures on finite sets.

Kalle Kytölä (Nov 17 2023 at 23:22):

Btw, using docs#PMF.uniformOfFinset, as Jason suggests, probably has an advantage of simplifying the (implementation of the) compactness argument in this project. And it maybe requires less to add also otherwise to make it usable here.

(But in Mathlib, I think we should tie the general Radon-Nikodym density-based approach written by Jason to the probability mass function based one.)

Terence Tao (Nov 18 2023 at 00:02):

Hmm, I'm encountering a weird issue with the requirement H : Finset S that H be a finset in my proposed definition of uniform distribution. Later in PFR we need to consider a uniform distribution on a subgroup H: AddSubgroup G of a finite abelian group {G : Type*} [AddCommGroup G] [Fintype G]. However I can't seem to easily coerce H to a Finset; H.carrier lies in Set G, but not Finset G, even though H is obviously finite, being a subset of G. One way around it is to define isUniform for H: Set S rather than H: Finset S but make isUniform false if H is infinite. Otherwise I have to figure out the coercion. What would be the best way to proceed here? Is this one of these Classical.choice issues?

Adam Topaz (Nov 18 2023 at 00:04):

one option is to use docs#Set.Finite as opposed to Finset. That's a predicate on sets.

Adam Topaz (Nov 18 2023 at 00:05):

There is a "good enough" API to go back and forth between sets satisfying this predicate and Finsets

Adam Topaz (Nov 18 2023 at 00:06):

And I'm sure there is a lemma saying that any subset of a finite type is finite in this sense of finiteness.

Terence Tao (Nov 18 2023 at 00:09):

OK that seems to work since there is a casting operator from Finset S to Set S. As long as Set.Finite H is part of the definition of isUniform everything seems to work (at least on the level of being able to state things). Thanks!

Adam Topaz (Nov 18 2023 at 00:10):

If you want to stick with finsets, you could do this:

import Mathlib

noncomputable
example (G : Type*) [AddCommGroup G] [Fintype G] (H : AddSubgroup G) : Finset G :=
  (H : Set G).toFinite.toFinset

Terence Tao (Nov 18 2023 at 00:18):

OK. The point is that there is a canonical certificate that H is finite, but not a canonical enumeration of H, which is why one cannot automatically make H a Finset without doing something noncomputable?

Terence Tao (Nov 18 2023 at 00:19):

But wait, G is a Fintype so it has a canonical enumeration already, so there doesn't seem to be any constructive obstruction to making H an element of Finset G.

Terence Tao (Nov 18 2023 at 00:24):

In any case, working with sets H that are not necessarily assumed to be finite when defining uniform distribution seems consistent with the philosophy already used in the project, where random variables are not necessarily assumed to be measurable, and the measure is not necessarily assumed to be a probability measure. It's weird to think in this way by including all the degenerate, useless, and/or nonsensical cases but it does seem to avoid a lot of annoying coercion.

Adam Topaz (Nov 18 2023 at 00:28):

I thought we had a computable version of the code above given enough decidability instances, but I can’t seem to find it…


Last updated: Dec 20 2023 at 11:08 UTC