Zulip Chat Archive

Stream: mathlib4

Topic: Uniform distribution


Josha Dekker (Jan 05 2024 at 15:04):

I've noticed that Probability/Distributions/Uniform (the continuous one!) does not yet exist. According to https://leanprover-community.github.io/undergrad_todo.html this is still missing and part of the undergrad curriculum, so I'd like to add this with contents similar to Probability/Distributions/Gamma (although a bit shorter because the proofs are easier), but I wanted to check that nobody was working on this yet? Note that Probability/ProbabilityMassFunction/Uniform deals with the discrete uniform distribution.

Josha Dekker (Jan 05 2024 at 15:09):

also, the linked overview thinks that exponential distribution and Gaussian distributions are missing, shall I update this at some point after the continuous uniform distribution is in there as well?

Thomas Zhu (Jan 06 2024 at 14:51):

There was #8330, though I don't know why it is closed

Josha Dekker (Jan 06 2024 at 14:54):

I see that the thread points to a file that contains the density for the uniform distribution, I’ll see what’s in there!

Thomas Zhu (Jan 06 2024 at 15:00):

Also just my opinion, but I would prefer to have a uniform distribution defined over any subset of finite measure in any measure space (like (μ s)⁻¹ • μ.restrict s), not just intervals in the reals. This would also make sense for discrete spaces or union of intervals, etc. There is something like it in docs#MeasureTheory.pdf.IsUniform, but it should definitely live in its own file and not in Probability/Density

Josha Dekker (Jan 06 2024 at 15:07):

I agree, that should probably be the starting point, from which the rest (and the standard definition over the reals) can be derived. I’ll put some thought in this soon!

Yaël Dillies (Jan 06 2024 at 15:41):

We have some stuff in github.com/teorth/pfr

Josha Dekker (Jan 06 2024 at 15:46):

I’ll wait until everything from PFR has been added then! I’ll focus on refactoring the exponential distribution for a while, and adding some other basic distributions!

Enrico Borba (Jan 26 2024 at 22:50):

Yaël Dillies said:

We have some stuff in github.com/teorth/pfr

The definition of IsUniform in the PFR repo is only useful for distributions that are uniform over a finite set no?

Yaël Dillies (Jan 26 2024 at 23:45):

What's a uniform distribution over an infinite set?

Enrico Borba (Jan 27 2024 at 01:33):

Uniform over [0,1] ?

Enrico Borba (Jan 27 2024 at 01:36):

I was trying to use the definition from the PFR repo to define a random variable in the reals uniform over [0,1] and couldn't understand how the uniformity condition (equal measure on individual elements of [0,1]) would make much sense

Enrico Borba (Jan 27 2024 at 01:44):

Maybe I'm missing something obvious...

Terence Tao (Jan 27 2024 at 01:51):

Yeah, the PFR IsUniform is intended for discrete use only. I guess the general definition of IsUniform would be that the measure is equal to c • μ.restrict s for some ENNReal c, and then some auxiliary lemmas could be written to show that this is equivalent to the PFR definition in the case where s is finite.

Enrico Borba (Jan 27 2024 at 02:18):

docs#MeasureTheory.pdf.IsUniform is defined nearly in that way

Josha Dekker (Jan 27 2024 at 07:27):

We’ll probably want something similar to Probability/Distributions/Gamma for UNIF(a,b), the continuous uniform distribution, which can probably largely be obtained by appealing to the definition in the linked docs (I’m on mobile, so can’t type backticks).

I think that we may want to move everything on the uniform distribution to that file, using sections for the discrete and continuous versions, and the special versions on the naturals and real line respectively.

if this is a shared sentiment, feel free to do this, otherwise I’m happy to set this up!

Josha Dekker (Jan 27 2024 at 07:27):

I’ll probably be able to take a shot at it in about a week, so let me know what you want!

Yaël Dillies (Jan 27 2024 at 13:37):

Enrico Borba said:

Uniform over [0,1] ?

:see_no_evil: All sets are discrete to me

Josha Dekker (Jan 29 2024 at 19:43):

I'll start a branch on this, not sure how much I can contribute in the following days but I'll see!

Josha Dekker (Jan 29 2024 at 20:38):

As the continuous and discrete uniform probability distributions are in some sense just two sides of the same coin, I propose that I start by creating a new file Probability/Distributions/Uniform which extracts the relevant material from Probability/Density and ProbabilityMassFunction/Uniform and puts it in one more unified format. Is this a shared sentiment? In that case, I'll make a PR and do this. I'll then make a new PR after that one merges to add whatever is missing, to keep the PRs distinct and relatively small.

Yaël Dillies (Jan 29 2024 at 21:18):

Looks reasonable at first sight

Josha Dekker (Jan 29 2024 at 21:40):

(deleted)

Josha Dekker (Jan 29 2024 at 21:44):

Great, I'll think about it a bit more and try to get some work in on this tomorrow. I think that definitions such as uniformOfFinset (a PMF) would perhaps be nice to derive from IsUniform, rather than being stated directly.

Enrico Borba (Jan 29 2024 at 22:04):

I've been working with the IsUniform predicate definition recently so I'm definitely interested in taking a look at this PR. Can you post it here when it's ready?

Josha Dekker (Jan 29 2024 at 22:13):

Yes, I haven’t done anything yet (tried something, then realized I want more generality, so I’ll start over). So, if you’re interested in doing this, that is fine with me as well! Otherwise I’ll work on it slowly over the coming days!

Enrico Borba (Jan 29 2024 at 22:33):

I'll post here if I decide to take a stab at it!

Josha Dekker (Jan 29 2024 at 22:35):

Okay!

Josha Dekker (Jan 29 2024 at 22:36):

I’ll do the same so that it doesn’t cross!

Josha Dekker (Jan 31 2024 at 10:06):

I think that I know how to unite the two, I’ll make a start and make a PR. Basically I think we need a new definition that strips away the random variable X from IsUniform (to directly interpret things as probability measures), which should then make it easier to derive the PMF results directly from this same definition. Note also that the files in Probability/Distributions start working from a PDF, so they never needed a random variable to be specified in the first place, which I think is useful as it is more minimalistic. In a similar vein, I think starting from one measure (mu) may be a bit more intuitive here.

Josha Dekker (Jan 31 2024 at 11:53):

This is now #10141. I've basically made a new definition IsUniformVolume which does not require a random variable and probability space to be specified. I then moved everything related to IsUniform from Probability/Density to Probability/Distributions/Uniform and generalised the proofs in terms of IsUniformVolume. After showing that we can always find a probability space $\Omega$ and a random variable X to move from IsUniformVolume to IsUniform it should be relatively easy to move Probability/ProbabilityMassFunction/Uniform over to this file and refactor it accordingly, such that the PMFs there arise from specifying a uniform distribution, rather than defining a PMF directly. (If people think that is a good idea, of course!)

Thomas Zhu (Jan 31 2024 at 11:55):

I think there is no need for random variables at all in the definition. I wrote the current definition of MeasureTheory.pdf.IsUniform as a refactor from Jason Ying's original code, which was a very long time ago and probably only for a specific purpose, but I think there is no need at all to single out "pdfs of uniformly distributed variables". So you shouldn't feel compelled to follow / preserve everything from either of the two files. A good definition should just be what uniform distributions are (maybe c smul mu.restrict s, mu being a reference measure), then possibly proving this is equivalent to RN derivative being constant ae on s (hence the pdf; or conversely use RN derivative to define the measure), then rvs being uniform is just their pushforward distribution being uniform.

Josha Dekker (Jan 31 2024 at 12:00):

Thank you, a fresh approach based on a reference measure was the philosophy that I was going for in defining IsUniformVolume. Perhaps I should name that one IsUniform instead and rename the old IsUniform to IsUniformRandomVariable? You are right that I should probably define the PDF directly in terms of the measure and a reference measure rather than via the random variable way, let me work that out. PMF should then arise immediately as a special case.

Josha Dekker (Jan 31 2024 at 12:02):

I think that for the other distributions in Probability/Distributions/. it is more natural to start from a PDF/PMF and work up from there, but for the uniform distribution I think the correct level of generality is to start from something like IsUniformVolume and work from there.

Thomas Zhu (Jan 31 2024 at 12:02):

Ah I saw #10141 just as I was sending the previous message. I totally agree

Josha Dekker (Jan 31 2024 at 12:03):

Great! I'll work on this a bit more during my commute later today! What is your thought on renaming?

Thomas Zhu (Jan 31 2024 at 12:05):

I think I would expect the name IsUniform to refer to a distribution and not a random variable

Thomas Zhu (Jan 31 2024 at 12:10):

although, since the definition of uniform IsUniform m is basically m = ..., perhaps we can just define a uniformDistribution : Measure E := ..., rather than making it a Prop?

Josha Dekker (Jan 31 2024 at 12:22):

Something like

def UniformVolume (s : Set E) (μ : Measure E := by volume_tac) : Measure E := (μ s)⁻¹  μ.restrict s

and then work directly from there? I agree that IsUniformVolume is perhaps a bit unnecessary. I'll add a prop IsUniformVolume then to check if a given measure is such a UniformVolume.

Thomas Zhu (Jan 31 2024 at 12:28):

Yes (although it should be camelCase uniformVolume). Then I don't think IsUniformVolume m would be necessary because it would just be m = uniformVolume, and instead of showing IsUniformVolume m -> properties of m you can just show properties of uniformVolume

Josha Dekker (Jan 31 2024 at 12:48):

I've modified the PR. The way pdf and HasPDF and friends have been defined, I need to consider a random variable X at some point, so I'll probably show how to create an IsUniform instance when we have a uniformVolume, which then allows one to pass around between the two.

Josha Dekker (Jan 31 2024 at 13:44):

I've added that now.

Josha Dekker (Jan 31 2024 at 14:08):

I've added a few small things, I'm not entirely sure how to proceed at this point, any feedback/suggestions are welcome! Otherwise I'll take another stab at it later, I have to focus on my research now!

Josha Dekker (Jan 31 2024 at 17:33):

I've added a few things and made some modifications based on suggestions that were kindly provided. I could use another pair of eyes on #10141, especially for two things:

  • The last few lemmas require @..., which I don't like and feels like it should not be necessary. Probably I'm doing some implicit/explicit arguments wrong, which is causing this.
  • See the last comment in the file, I need to move from uniformPDF to the function used for the PMF there, but I can't seem to get that to work
    I probably won't have time to work on this the coming days, so feel free to take a shot at it!

Josha Dekker (Jan 31 2024 at 17:36):

Any other comments/style suggestions are more than welcome as well!

Josha Dekker (Feb 12 2024 at 09:57):

I was recommended to split this up into two PRs, in which the first one just moves the relevant existing definitions and results to a new file, this was #10451, ready for review which is merged!

#10456 refactors IsUniform in terms of the new uniformMeasure, which allows us to prove the main properties without requiring a choice of random variable and probability space and is awaiting review!

Josha Dekker (Feb 12 2024 at 13:19):

I've decided to put the refactoring of the related PMF... results as a To Do in the file, as this lowers the amount of reviewing necessary for the current PR, and I'm not entirely sure how to proceed on refactoring PMF.


Last updated: May 02 2025 at 03:31 UTC