# Zulip Chat Archive

## Stream: new members

### Topic: How do I tell Lean that a subclass is a kind of the supercla

#### Lars Ericson (Dec 17 2020 at 16:57):

I'm working on this:

```
import measure_theory.lebesgue_measure
open measure_theory
noncomputable theory
class probability_space (α : Type*) extends measure_space α :=
(is_probability_measure: probability_measure volume)
class random_variable (α β : Type*) :=
(domain : probability_space α )
(codomain : measurable_space β )
(outcome : α → β)
(is_measurable_outcome : @measurable α β domain codomain outcome)
```

I am trying to say that a random variable is a measurable function from a probability space to a measurable space. Even though the function is declared α → β and I have proofs that `domain`

is a `probability_space`

and `codomain`

is a `measurable_space`

, I still guess that I need to add a proof that `outcome`

is a measurable function. Or is it obvious? I'm unsure.

Anyway, if I do have to do the extra step of proving that `outcome`

is `measurable`

, I need to get this type right:

```
@measurable α β domain codomain outcome
```

However, it is complaining

```
type mismatch at application
measurable
term
domain
has type
probability_space α
but is expected to have type
measurable_space α
```

even though

```
class probability_space (α : Type*) extends measure_space
```

and

```
class measure_space (α : Type*) extends measurable_space α
```

How do I tell Lean that a subclass is a kind of the superclass?

#### Adam Topaz (Dec 17 2020 at 17:04):

I would use something approximately like this:

```
import measure_theory.lebesgue_measure
open measure_theory
noncomputable theory
class probability_space (α : Type*) extends measure_space α :=
(is_probability_measure: probability_measure volume)
structure random_variable (α β : Type*) [probability_space α] [measurable_space β] :=
(outcome : α → β)
(is_measurable_outcome : measurable outcome)
```

#### Lars Ericson (Dec 17 2020 at 17:06):

Thanks! Why `structure`

and not `class`

in this case? `random_variable`

is a class of function, it seems possible people might want to create instances.

#### Reid Barton (Dec 17 2020 at 17:06):

This is more sensible yes, but also in the original formulation you can just write `is_measurable_outcome : measurable outcome`

--if you don't try to do Lean's work for it, it will do it correctly.

#### Adam Topaz (Dec 17 2020 at 17:08):

Oh it looks like `measurable`

is not a class, so I should use regular parens. (fixed code)

#### Adam Topaz (Dec 17 2020 at 17:10):

Does mathlib have bundled measurable functions?

#### Lars Ericson (Dec 17 2020 at 17:51):

Please help me take it a little further. To the random variable I want to add an inverse. So if `X(ω)`

is a random variable, then

`inverse(y) = {ω : X(ω) = y}`

. I tried this:

```
class random_variable (α β : Type*) [probability_space α] [measurable_space β] :=
(outcome : α → β)
(is_measurable_outcome : measurable outcome)
(inverse : β → set α := { ω : α // ∃ y : β, y = outcome ω }) // ERROR
```

I got this error message, which I don't know how to fix:

```
invalid let-expression, term
{ω // ∃ (y : β), y = outcome ω}
has type
Type ? : Type (max 1 (?+1))
but is expected to have type
β → set α : Type (max ? ?)
```

From there I want to do this, to say that the distribution space is a probability space induced by `X`

with probability measure `volumeX : β→ [0,1]`

such that `volumeX = volume ∘ (X^-1)`

is the pushforward measure and `volumeX`

is called the distribution of `X`

:

```
instance distribution_space (α β : Type*) (V: random_variable α β) (ΩAP: probability_space α ): probability_space β :=
(volume : ΩAP.volume ∘ V.generalize_inverse)
(is_probability_measure := sorry)
```

I'm not sure how to get this all right in Lean, but the ideas are there.

"pushforward" sounds categorical. If there were a category `Prob`

there may be some categorical way to express what I'm trying to do here. I don't know category theory though. For now I'm just trying to get this part right.

#### Lars Ericson (Dec 17 2020 at 19:07):

OK I found something in the guts of `measure_space`

but it doesn't show up in docs#pushforward for in Lean which is all category theory related:

```
/-- The pushforward of a measure. It is defined to be `0` if `f` is not a measurable function. -/
def map (f : α → β) : measure α →ₗ[ennreal] measure β :=
if hf : measurable f then
lift_linear (outer_measure.map f) $ λ μ s hs t,
le_to_outer_measure_caratheodory μ _ (hf hs) (f ⁻¹' t)
else 0
```

#### Adam Topaz (Dec 17 2020 at 20:20):

```
import measure_theory.lebesgue_measure
open measure_theory
noncomputable theory
class probability_space (α : Type*) extends measure_space α :=
(is_probability_measure : probability_measure volume)
structure random_variable (α β : Type*) [probability_space α] [measurable_space β] :=
(outcome : α → β)
(is_measurable_outcome : measurable outcome)
variables {α β : Type*} [probability_space α] [measurable_space β]
def induced (X : random_variable α β) : probability_space β :=
{ volume := measure.map X.outcome volume,
is_probability_measure := begin
constructor,
rw measure_theory.measure.map_apply X.is_measurable_outcome is_measurable.univ,
apply probability_space.is_probability_measure.measure_univ,
end }
```

One thing to note, as we discussed somewhere before, one should really make an API for `probability_space`

so that the proof in `is_probability_measure`

becomes essentially a one-liner.

#### Adam Topaz (Dec 17 2020 at 20:21):

The structure `random_variable`

needs an API as well.

#### Adam Topaz (Dec 17 2020 at 20:21):

But I hope the code above helps you get started :)

#### Lars Ericson (Dec 17 2020 at 20:52):

Thank you @Adam Topaz that is extremely helpful. I have been looking at the API of `measure.map`

:

```
import measure_theory.measure_space
open measure_theory
variables α β : Type*
variable MS1 : measurable_space α
variable MS2 : measurable_space β
variable μ_α : α → β
variable m_α : measure α
#check @measure_theory.measure.map α β MS1 MS2 μ_α m_α
#check measure_theory.measure.map μ_α m_α
```

This is the pushforward measure in Wikipedia: https://en.wikipedia.org/wiki/Pushforward_measure

There is also a `comap`

in `measure_space`

:

```
/-- Pullback of a `measure`. If `f` sends each `measurable` set to a `measurable` set, then for each
measurable set `s` we have `comap f μ s = μ (f '' s)`. -/
def comap (f : α → β) : measure β →ₗ[ennreal] measure α :=
```

What are the aesthetics of calling these `map`

and `comap`

versus `pushforward`

and `pullback`

? This would provide a conceptual link to various pushforward definitions in `Top`

and `category_theory`

: docs#pushforward. I'm just asking, pure curiosity, not a criticism.

#### Lars Ericson (Dec 17 2020 at 23:24):

Here is the last bit I want to get to before thinking about the overall API. Please take a look at Definition 7:

Screenshot-from-2020-12-17-18-23-57.png

#### Lars Ericson (Dec 17 2020 at 23:25):

Now take a look at section 1.1 B:

Screenshot-from-2020-12-17-18-24-10.png

#### Adam Topaz (Dec 17 2020 at 23:26):

What's the question? What is it precisely you want to get?

#### Yakov Pechersky (Dec 17 2020 at 23:30):

Are you asking about inverses? You could have:

```
class random_variable (α β : Type*) [probability_space α] [measurable_space β] :=
(outcome : α → β)
(is_measurable_outcome : measurable outcome)
(surj: function.surjective outcome)
```

and for the inverse use

docs#function.surj_inv

#### Lars Ericson (Dec 17 2020 at 23:30):

Here it comes, trying to type faster. Here's the first part:

So Mark Dean defines a notion of a distribution and a distribution function (when the random variable is into ℝ). Kloeden&co give an expression for the distribution function: `FX(x) = PX((-∞ ,x)) = P({ω ∈ Ω : X(ω) ≤ x})`

.

They are both assuming that the random variable is into ℝ. I don't see any reason for this, because this: `X(ω) ≤ x`

only requires that there be a preorder ≤ on the type β that the preorder is into.

#### Adam Topaz (Dec 17 2020 at 23:30):

Oh so you want the distribution then?

#### Lars Ericson (Dec 17 2020 at 23:31):

No, we have the `distribution`

, I want to define the `distribution_function`

in case `β`

has a preorder in this sketch:

```
import measure_theory.lebesgue_measure
import measure_theory.measure_space
open measure_theory
noncomputable theory
class probability_space (α : Type*) extends measure_space α :=
(is_probability_measure : probability_measure volume)
def probability_space.distribution (α : Type*) [ps: probability_space α] :=
ps.volume
def probability_space.distribution_function (X : random_variable α β ) [ps: probability_space β ] [preorder β] :=
sorry
structure random_variable (α β : Type*) [probability_space α] [measurable_space β] :=
(outcome : α → β)
(is_measurable_outcome : measurable outcome)
variables {α β : Type} [probability_space α] [measurable_space β]
instance distribution_space (X : random_variable α β) : probability_space β :=
{ volume := measure.map X.outcome volume,
is_probability_measure := begin
constructor,
rw measure_theory.measure.map_apply X.is_measurable_outcome is_measurable.univ,
apply probability_space.is_probability_measure.measure_univ,
end }
instance {α} {p : α → Prop} [m : measure_space α] : measure_space (subtype p) :=
{ volume := measure.comap (coe : _ → α) volume }
theorem subtype.volume_apply {α} {p : α → Prop} [measure_space α]
(hp : is_measurable {x | p x}) {s : set (subtype p)} (hs : is_measurable s) :
volume s = volume ((coe : _ → α) '' s) :=
measure.comap_apply _ subtype.coe_injective (λ _, is_measurable.subtype_image hp) _ hs
namespace Steinhaus
instance P : probability_measure (volume : measure (set.Icc (0 : ℝ) 1)) :=
{ measure_univ := begin
refine (subtype.volume_apply is_measurable_Icc is_measurable.univ).trans _,
suffices : volume (set.Icc (0 : ℝ) 1) = 1, {simpa},
rw [real.volume_Icc], simp
end }
instance ΩAP : probability_space (set.Icc (0 : ℝ) 1) :=
{ is_probability_measure := Steinhaus.P }
#check Steinhaus.ΩAP
#check Steinhaus.ΩAP.volume -- volume : measure ↥(set.Icc 0 1)
end Steinhaus
```

#### Lars Ericson (Dec 17 2020 at 23:32):

So we have probability space, random variable, distribution space induced by random variable on probability space, and the final icing on the cake is to show that if I have a preorder on the measurable space that the random variable is into, then I have a distribution function.

#### Lars Ericson (Dec 17 2020 at 23:33):

Why this is at all interesting is because everybody in the world only defines `distribution_function`

when the random variable is into `ℝ`

. Nobody seems to care about the case where it is into a measurable space with a preorder. I just think it's an interesting generalization.

#### Lars Ericson (Dec 17 2020 at 23:34):

But it's only possible for me to talk about whether it is interesting or definable in the presence of something like Lean.

#### Mario Carneiro (Dec 17 2020 at 23:35):

@Yakov Pechersky There aren't any inverses involved in those snippets. That's a `preimage`

you see

#### Lars Ericson (Dec 17 2020 at 23:37):

What I'm working on is this generalization of the distribution function in the Kloeden snippet, which Dean defines only if the RV is into the reals:

```
def probability_space.distribution_function (X : random_variable α β ) (y: β) \[ps: probability_space β ] [preorder β] :=
ps.volume({ω ∈ α : X(ω) ≤ y})
```

#### Yakov Pechersky (Dec 17 2020 at 23:37):

Ah, misread. Gotcha. So the clean wya to state that is outcome.preimage?

#### Mario Carneiro (Dec 17 2020 at 23:37):

Well, the whole measure is `measure.comap`

I think

#### Mario Carneiro (Dec 17 2020 at 23:38):

actually `measure.map`

#### Lars Ericson (Dec 17 2020 at 23:40):

That's another question I had about terminology, why `measure_space`

uses `map`

and `comap`

but in the comments it uses `pushforward`

and `pullback`

which comments match terminology used in `Top`

and `category_theory`

.

#### Mario Carneiro (Dec 17 2020 at 23:40):

so there isn't really anything to do in definition 7. `measure.map x volume`

is a measure, check

#### Mario Carneiro (Dec 17 2020 at 23:40):

because there is a clash of terminology from multiple fields

#### Mario Carneiro (Dec 17 2020 at 23:41):

lean uses `map`

and `comap`

for regularity, but this gets instantiated in category theory and topology and algebra and measure theory and mathematicians in these areas all have their own ideas about what to call it

#### Lars Ericson (Dec 17 2020 at 23:41):

OK thanks. The above sketch works fine. @Adam Topaz worked out how to derive a distribution space from a random variable on a probability space:

```
instance distribution_space (X : random_variable α β) : probability_space β :=
{ volume := measure.map X.outcome volume,
is_probability_measure := begin
constructor,
rw measure_theory.measure.map_apply X.is_measurable_outcome is_measurable.univ,
apply probability_space.is_probability_measure.measure_univ,
end }
```

The only thing I have left to complete the functionality is generalized distribution function.

#### Mario Carneiro (Dec 17 2020 at 23:42):

I think you are missing a lemma there

#### Mario Carneiro (Dec 17 2020 at 23:42):

you just proved the `map`

of a probability measure is a probability measure

#### Mario Carneiro (Dec 17 2020 at 23:42):

that's a lemma

#### Adam Topaz (Dec 17 2020 at 23:43):

```
import measure_theory.lebesgue_measure
open measure_theory
noncomputable theory
class probability_space (α : Type*) extends measure_space α :=
(is_probability_measure : probability_measure volume)
namespace probability_space
lemma volume_univ {α : Type*} [probability_space α] : volume (set.univ : set α) = 1 :=
is_probability_measure.measure_univ
def prob {α : Type*} [probability_space α] {S : set α} (h : is_measurable S) : ennreal :=
volume S
end probability_space
structure random_variable (α β : Type*) [probability_space α] [measurable_space β] :=
(outcome : α → β)
(is_measurable_outcome : measurable outcome)
variables {α β : Type*} [probability_space α] [measurable_space β]
namespace random_variable
instance : has_coe_to_fun (random_variable α β) := ⟨_,outcome⟩
lemma measurable (X : random_variable α β) : measurable X := X.is_measurable_outcome
def induced (X : random_variable α β) : probability_space β :=
{ volume := measure.map X volume,
is_probability_measure := begin
constructor,
erw measure_theory.measure.map_apply X.measurable is_measurable.univ,
exact probability_space.volume_univ
end }
def distribution_function (X : random_variable α ℝ) : ℝ → ennreal := λ x,
@probability_space.prob _ X.induced {c | c ≤ x} is_measurable_Iic
end random_variable
```

#### Mario Carneiro (Dec 17 2020 at 23:43):

why is `prob`

an `ennreal`

?

#### Adam Topaz (Dec 17 2020 at 23:43):

Because I'm lazy

#### Adam Topaz (Dec 17 2020 at 23:44):

One should change it to `\R`

.

#### Adam Topaz (Dec 17 2020 at 23:44):

Same with `distribtion_function`

.

#### Lars Ericson (Dec 17 2020 at 23:44):

Well that's what I'm trying to generalize. I want to replace ℝ with β where β has a `preorder`

. What do you think?

#### Adam Topaz (Dec 17 2020 at 23:45):

Sure you could do that.

#### Mario Carneiro (Dec 17 2020 at 23:45):

I don't think that's a good idea, it's not general enough

#### Mario Carneiro (Dec 17 2020 at 23:46):

the sets `{c | c ≤ x}`

in a preorder don't generate all borel sets

#### Mario Carneiro (Dec 17 2020 at 23:46):

you will have to work with the preimages of general sets, at which point it's not clear what the CDF gets you

#### Lars Ericson (Dec 17 2020 at 23:47):

It would be a really wonderful solution to a Math StackExchange question I posed in August, i.e. exactly what kind of order does β have to have for this to go through in Lean.

#### Lars Ericson (Dec 17 2020 at 23:48):

For one thing you know by construction of the `random_variable`

that β is a `measurable_space`

.

#### Lars Ericson (Dec 18 2020 at 00:51):

Mario Carneiro said:

you just proved the

`map`

of a probability measure is a probability measure, that's a lemma

It could be that makes it a category, which is something to think about. I.e. if you were building a probability library from scratch would you want to make it as category theory-ish as possible or just express the main ideas on their own. There seems to be big interest in category theory here, but it seems that using it in practice like for GCD/Fibonacci example is quite hard. So then the question is whether the generalization that it provides is productive for particular applications, or is it just nice to think about on its own. The fact that there is map/comap and pullback/pushthrough in different parts of the library shows that there is not a unified view among all the mathlib contributors. Not that there should be, I'm just slowly learning the "style guide" and "make it a category" doesn't seem to be one of the style imperatives.

#### Lars Ericson (Dec 18 2020 at 00:59):

Regarding this expression for the `distribution_function`

, which is on single points in the codomain of the RV, as opposed to the `distribution`

, which is on subsets of the domain of the RV: `F(x) = P({ω ∈ Ω : X(ω) ≤ x})`

. We can see by inspection, since `P`

is a probability measure, that it will be in `[0,1]`

. If the codomain of the RV is some kind of partially ordered set, the distribution function might not behave in the same way as if it were a linearly ordered dense set. But the result will still be a CDF. I will Google "generalized distribution function" and see what I come up with. In any event, it seems well defined for any order relation `≤ `

.

#### Mario Carneiro (Dec 18 2020 at 01:03):

The fact that there is map/comap and pullback/pushthrough in different parts of the library shows that there is not a unified view among all the mathlib contributors

To clarify: `map`

and `comap`

are used consistently in lean names, while "pullback" and "pushforward" are used in comments to help connect names to things in the literature (which are all over the place, as one would expect)

#### Mario Carneiro (Dec 18 2020 at 01:06):

It could be that makes it a category, which is something to think about. I.e. if you were building a probability library from scratch would you want to make it as category theory-ish as possible or just express the main ideas on their own.

I would not suggest adding additional layers unless it gives you an advantage in proving theorems. Instantiating a category is a thing you can do, but it's not particularly helpful on its own. There are lots of things in mathlib that could potentially be instantiations of the category library, but generally it's just another layer on an existing theorem. I would say prove the theorem and add the instantiation if and when you need it for a theorem

#### Lars Ericson (Dec 18 2020 at 01:08):

OK so my main concern, all abstraction aside, is whether any `measurable_set`

`S`

with ≤ for which this expression: `F(x: S) = P({ω ∈ Ω : X(ω) ≤ x})`

makes sense is necessarily linearly ordered, dense and isomorphic to ℝ, or whether any `measurable_set`

with any ≤ defined on it will do.

#### Adam Topaz (Dec 18 2020 at 01:09):

Just to clarify, whenever you have a random variable `X`

with codomain `A`

, and some function from `A`

to the type of all measurable sets in `A`

, you can define something like a distribution function on `A`

. The question is whether this function determines the probability density on `A`

induced by `X`

, and this is certainly not true in general.

#### Mario Carneiro (Dec 18 2020 at 01:10):

I think instead of formalizing definitions, you should try to get to the actual theorems, because we can work backward from there to the abstractions that are useful for formalizing the theorem. So far, I don't see any value in the definition `random_variable`

, or `distribution`

, or `prob`

or `distribution_function`

, they are all just wrappers around other things. So let's get to theorems where you really *need* some probability vocabulary to prove the theorem effectively

#### Lars Ericson (Dec 18 2020 at 01:11):

Random variable, distribution, distribution function and probability space *are* probability vocabulary.

#### Mario Carneiro (Dec 18 2020 at 01:11):

yes I'm aware

#### Mario Carneiro (Dec 18 2020 at 01:11):

a wrapper doesn't help you prove theorems

#### Lars Ericson (Dec 18 2020 at 01:11):

Google gave me this, maybe it's of some use: https://www.ams.org/journals/tran/1957-084-02/S0002-9947-1957-0085326-5/S0002-9947-1957-0085326-5.pdf

#### Mario Carneiro (Dec 18 2020 at 01:13):

more words to express things you could already express are not helpful. The power of a definition is when it abbreviates a nontrivial combination of things, like "group" meaning "associative binary operation with a neutral element"

#### Mario Carneiro (Dec 18 2020 at 01:13):

the latter expression is more cumbersome than the former, so the definition pulls its weight

#### Mario Carneiro (Dec 18 2020 at 01:14):

there are also a bunch of theorems about groups that are not theorems about monoids or associative operations on their own

#### Lars Ericson (Dec 18 2020 at 01:14):

I'm not really trying to prove a theorem, I'm just trying to get the type classes straight to describe the usual vocabulary of probability theory. That's all. Because even though that vocabulary is just sugar for things that are already in`mathlib`

in one way or another, they are still challenging for me. I can't go to`mathlib`

and see the usual vocabulary of probability theory. It's a modest exercise.

#### Mario Carneiro (Dec 18 2020 at 01:15):

So what's the raison d'etre of probability vocabulary? What fact about probability measures is not true about general measures? I have no doubt that such theorems *exist*, but nothing interesting will happen until you start trying to prove one of them

#### Mario Carneiro (Dec 18 2020 at 01:18):

It is really important to find and at least start trying to prove one of these theorems because it guides the structure of definitions around the whole topic. Without it I see all these definitions and they are all useless cruft with negative utility

#### Lars Ericson (Dec 18 2020 at 01:21):

My longer range goal is to understand say the construction of the stochastic Taylor series approximation of the Ito or Stratonovich integral. I am not trying to prove any new theorems about this topic. I'm trying to build up the language used in a textbook that goes through that topic. The textbook doesn't use the measure-theoretic equivalent of `probability_space`

. It uses `probability_space`

. I don't think I should feel bad about, for my own purposes in reading that book, providing a definition of `probability_space`

.

Along the way I noticed that `F(x: S) = P({ω ∈ Ω : X(ω) ≤ x}`

could possibly be generalized. @Adam Topaz expresses this as whether the random variable `X`

"determines the probability density on A induced by X". It seems that "determines the probability density on A induced by X" is something that could be written in Lean and expressed as a condition that the order relation ≤ has to satisfy. So, for my own tastes and ambitions, I think it would be interesting to say formalize "determines the probability density on A induced by X" and see where that leads. It could lead to the requirement that the codomain of the random variable be isomorphic to the reals, which would be an interesting and natural explanation for why every text I can find restricts `distribution_function`

to random variables into the reals.

#### Mario Carneiro (Dec 18 2020 at 01:22):

The textbook doesn't use the measure-theoretic equivalent of probability_space. It uses probability_space. I don't think I should feel bad about, for my own purposes in reading that book, providing a definition of probability_space.

The point isn't to feel bad about it, it is to build the theory of probability spaces by looking at how it is *used*

#### Mario Carneiro (Dec 18 2020 at 01:23):

The question to answer isn't *what is a probability space*, it is *what do people do with probability spaces*

#### Mario Carneiro (Dec 18 2020 at 01:24):

the purpose of a library is to make it easier to write proofs in the area

#### Mario Carneiro (Dec 18 2020 at 01:24):

so you have to look at the proofs to understand how to help

#### Mario Carneiro (Dec 18 2020 at 01:27):

Regarding what you would need to make the exact definition `F(x: S) = P({ω ∈ Ω : X(ω) ≤ x})`

generalize, I would expect that you need the sets `{a | a < x}`

to generate the topology and hence the entire borel algebra. That is, you need an `orderable_topology`

on the codomain

#### Mario Carneiro (Dec 18 2020 at 01:29):

oops it's called docs#order_closed_topology now

#### Mario Carneiro (Dec 18 2020 at 01:30):

I think you will need to get to some really interesting theorems before the codomain starts to be constrained to look anything like the reals

#### Adam Topaz (Dec 18 2020 at 01:45):

Law of large numbers might be a nice goal.

#### Lars Ericson (Dec 18 2020 at 02:09):

In terms of "what do people do with probability spaces", and the book I am reading, I have some very simple applications that I want to proof-check in Lean:

- A one-size-fits-all probability space:([0,1], B([0,1]), [a,b] -> b-a). That's done. Cruft is to name this the Steinhaus space
- Some basic language: Random variable, distribution, distribution function
- Distribution function for a two-point RV with codomain {0,1} and probabilities p and 1-p
- Uniform distribution U[a,b]
- Exponential distribution
- Gaussian distribution

For each of these I want to:

- Start with Steinhaus as the domain of the random variable X,
- Define the codomain as either a finite measurable space for a continuous random variable or the appropriate continuous codomain which could be ereal or ennreal or a real interval
- Define the corresponding random variable X
- From X derive the distribution function FX

This is not the usual order. The book and most presentations tend to start with the distribution function. I just like doing it in this order because it appeals to me. I'm not proving theorems really, just trying to create language for a sequence that appeals to me for presenting these different discrete and continuous distributions in a consistent kind of top-down way. Also, I am annoyed that the book introduces `probability_space`

on page 3 and then drops it thereafter and just presents the `distribution_function`

for each distribution. I think if you define something you should use it, so I'm spending my time doing it in this direction. It's just a personal choice.

#### Mario Carneiro (Dec 18 2020 at 02:16):

well the main issue I have with that list is that every single one is a definition, there are no theorems

#### Mario Carneiro (Dec 18 2020 at 02:17):

you need at least some sanity checks on the definitions

#### Mario Carneiro (Dec 18 2020 at 02:20):

Perhaps your goal might be to have a way to prove that a real function (which is what the books will give in order to define a distribution) yields a probability distribution, that's not completely obvious from the definitions

#### Mario Carneiro (Dec 18 2020 at 02:22):

you can then use that to get the exponential distribution and gaussian distribution

#### Mario Carneiro (Dec 18 2020 at 02:23):

I think you already did the uniform distribution earlier

#### Lars Ericson (Dec 18 2020 at 03:07):

The interesting (to me) part, which I already did by hand, is when they give a distribution function, to back out what the random variable is that has that distribution function. It's not always easy to get it right. So my work is to take the RV that I backed out and prove that if you grind it through the Lean `distribution_function`

construction, that I get the same function. So that's a proof.

Regarding the generalized disrtibution function, I am trying to generalize

```
def distribution_function (X : random_variable α ℝ) : ℝ → ennreal := λ x,
```

to

```
def distribution_function (X : random_variable α β ) [preorder α] : β → (set.Icc (0 : ℝ) 1) := λ x,
```

Adam's distribution function full definition is

```
def distribution_function (X : random_variable α ℝ) : ℝ → ennreal := λ x,
@probability_space.prob _ X.induced {c | c ≤ x} is_measurable_Iic
```

I am trying to understand the application of `@probability_space.prob`

. I've got this:

```
import measure_theory.lebesgue_measure
open measure_theory
noncomputable theory
class probability_space (α : Type*) extends measure_space α :=
(is_probability_measure : probability_measure volume)
namespace probability_space
lemma volume_univ {α : Type*} [probability_space α] : volume (set.univ : set α) = 1 :=
is_probability_measure.measure_univ
def prob {α : Type*} [probability_space α] {S : set α} (h : is_measurable S) : ennreal :=
volume S
end probability_space
structure random_variable (α β : Type*) [probability_space α] [measurable_space β] :=
(outcome : α → β)
(is_measurable_outcome : measurable outcome)
variables {α β : Type*} [probability_space α] [measurable_space β]
namespace random_variable
instance : has_coe_to_fun (random_variable α β) := ⟨_,outcome⟩
lemma measurable (X : random_variable α β) : measurable X := X.is_measurable_outcome
def induced (X : random_variable α β) : probability_space β :=
{ volume := measure.map X volume,
is_probability_measure := begin
constructor,
erw measure_theory.measure.map_apply X.measurable is_measurable.univ,
exact probability_space.volume_univ
end }
variable MSα : measurable_space α
variable Pα : probability_space α
variable Pβ : probability_space β
variable Sα : set α
variable Sβ : set β
lemma SMα : is_measurable Sα := sorry
lemma SMβ : is_measurable Sβ := sorry
def Xαβ : random_variable α β := sorry
#check Xαβ.induced -- X.induced : probability_space β
#check @probability_space.prob α Pα Sα (SMα Sα )
#check @probability_space.prob β Pβ Sβ (SMβ Sβ) -- FAILS
def distribution_function (X : random_variable α β ) [preorder α] : β → (set.Icc (0 : ℝ) 1) := λ x, sorry
-- @probability_space.prob _ X.induced {c | c ≤ x} is_measurable_Iic
end random_variable
```

Even though the one before it works, this fails:

```
#check @probability_space.prob β Pβ Sβ (SMβ Sβ) -- FAILS
```

I don't fully understand why the type of `SMα`

is `Mα : ∀ (Sα : set ?M_1), is_measurable Sα`

, so I have to re-supply Sα when I use it even though it's declared

```
lemma SMα : is_measurable Sα := sorry
```

So

```
#check @probability_space.prob α Pα Sα (SMα Sα )
```

works with type

```
probability_space.prob _ : ennreal
```

But

```
#check @probability_space.prob β Pβ Sβ (SMβ Sβ)
```

gives

```
synthesized type class instance is not definitionally equal to expression inferred by typing rules, synthesized
_inst_2
inferred
measure_space.to_measurable_space
```

I am using @ to avoid this kind of mystery but it's not helping in this case. Any suggestions? This is a problem because in this line

```
@probability_space.prob _ X.induced {c | c ≤ x} is_measurable_Iic
```

the term `X.induced`

is going to have type `probability_space β`

. So I need to understand how to fill out each of the arguments to `@probability_space.prob`

but I can't even `#check`

it in the β case, it only works for α.

The other minor issue is that `probability_space.prob`

has return type `ennreal`

when I know that in the end it will be constrained to ` (set.Icc (0 : ℝ) 1)`

by `probability_measure volume`

. I'd rather just say it in the return type rather than `ennreal`

, but that doesn't go through.

#### Mario Carneiro (Dec 18 2020 at 03:14):

So my work is to take the RV that I backed out and prove that if you grind it through the Lean distribution_function construction, that I get the same function. So that's a proof.

Can you state this theorem?

#### Mario Carneiro (Dec 18 2020 at 03:15):

it's not entirely clear to me what you mean by it

#### Mario Carneiro (Dec 18 2020 at 03:17):

I'm not sure there is anything to be gained with the `#check`

s, all those variables are tripping you up. Just state a def or theorem and try to get the assumptions and the statement of the theorem right

#### Mario Carneiro (Dec 18 2020 at 03:18):

`lemma SMα`

is false, it asserts all sets are measurable

#### Mario Carneiro (Dec 18 2020 at 03:18):

`lemma SMβ`

is the same as `lemma SMα`

#### Lars Ericson (Dec 18 2020 at 05:03):

The point of the `#check`

s was simply to check the type of the arguments to the expression, I was trying to figure out what kind of thing to fill in the blank for each of the slots in `probability_space.prob`

. I know that it just applies the volume function, but in the explicit @ form there are a lot of things that need to be in place for the application to go through, and I was trying to understand their types.

The theorem to prove is as follows (for the type of random variables in the text, which is the usual type, not the generalized type I want):

- We are given, in the text
`F_X: ℝ → [0,1]`

, the`distribution_function`

for the`random_variable`

named`X: [0,1] → ℝ`

. - I construct, by inspection, a definition
`myX`

for the random variable`X`

. - There is a
`distribution`

,`μ_X : borel ℝ → [0,1]`

, which is implicit. - In the Steinhaus space we have uniform probability measure 'P: borel ℝ → [0,1]`.
- By construction (this is already proven out of the by the machinery in the cruft above), we have that
`μ_X = P ∘ X⁻¹`

**To prove**: that`myX = λ x, μ_X (set.Icc -∞ x)`

For example, consider the two point distribution:

Screenshot-from-2020-12-18-00-03-15.png

#### Lars Ericson (Dec 18 2020 at 05:21):

Let `x₁, x₂ ∈ ℝ`

and `p₁`

be the probability of drawing `x₁`

. Then we can define `X(ω) = if ω < p₁ then x₁ else x₂ `

or

```
def myX(ω: set.Icc (0 : ℝ) 1) := if ω < p₁ then x₁ else x₂
```

also, the text is telling us that

```
def FX (x: ℝ ) := if x<x₁ then 0 else if x₁ ≤ x < x₂ then p₁ else 1
```

and then the proof to achieve is

```
example : FX = λ x, (Steinhaus.volume ∘ myX.induced) (set.Icc ∞ x) := sorry
```

I find this interesting and a little bit tricky. It's not rocket science but it does seem to take some effort to get all the pieces in place. It's also not the generalized distribution function I was going on about, but it sketches out the path through the woods I am looking to take.

#### Mario Carneiro (Dec 18 2020 at 05:26):

I don't think `X(ω) = if ω < p₁ then x₁ else x₂`

is correct. What you need is for the probability that `X = x1`

to equal `p1`

, meaning that the event space is `bool`

or perhaps a general probability space on which you posit an event `A`

whose probability is `p1`

#### Lars Ericson (Dec 18 2020 at 14:45):

On the index space `[0,1]`

I am defining an event`A=[0,p1)`

. According to Kloeden et al's book, `I(A)(ω)= if ω∈A then x₁ else x₂`

is a random variable on this space, for `x₁=0`

and `x₂=1`

. Please let me know if I'm reading this wrong:

Screenshot-from-2020-12-18-09-43-54.png

#### Lars Ericson (Dec 18 2020 at 14:58):

Note instead of clumping the events in `[0,p1)`

I could have defined `A`

to be the union of a bunch of more randomly scattered subsets. That's not necessary for the definitions to work. There is no inherent "randomness" in `random_variable`

in that sense. A `random_variable`

is a fully determined function from a sample space to a measurable space.

#### Mario Carneiro (Dec 18 2020 at 16:44):

You don't need to define `A`

to be anything in particular, it can be a hypothesis

#### Mario Carneiro (Dec 18 2020 at 16:44):

it's more general this way anyway because you don't know what event space is needed in order to match all the other constraints in a downstream theorem

#### Lars Ericson (Dec 18 2020 at 17:37):

The Kloeden book brings in (briefly) the idea of probability space, so I want to demonstrate it when explaining things to myself. I can use `(Ω,A,P)= ([0,1],B([0,1]),[a,b]↦ b-a)`

as a "one sits fits all" probability space. Then I want to define a random variable on that space, obtain it's distribution `D`

, and then obtain the distribution function `F`

, which results in a new distribution (probability) space like `(S,Σ,D)=(R, B(R), D)`

with CDF `F`

. I want to do that in Lean for the 6 particular cases in the book. I'm not trying to make the most general Lean probability theory module, or prove old or new theorems. I'm just trying to demonstrate a little piece of mechanism, to myself, because it seems pretty to me. I don't expect anybody else to find it interesting. Think of it as me doing beginner-level gardening in my back yard. It's a hobby, not a calling.

On the task of describing stochastic integration at a Lean level of detail, Wilfrid Kendall has done a lot of work, 21 years ago, which seems easily adaptable. (Adapting old stuff to new contexts is also more hobby gardening, not necessarily of general interest.) Kendall worked in Reduce, then Axiom with fine details, then Mathematica. (Note, his .ps.gz files are really just .ps files, they aren't zipped.) His Axiom papers provide the most detail. Axiom has a much richer type system than Mathematica, so the Axiom implementation is an improvement on it's successor.

#### Mario Carneiro (Dec 18 2020 at 17:49):

I can use (Ω,A,P)= ([0,1],B([0,1]),[a,b]↦ b-a) as a "one sits fits all" probability space.

It's not, though, as soon as you have more than one variable. What if you have two {0,1} variables `X`

and `Y`

with probabilities `p`

and `q`

to be 1, which are independent? This model will not help there.

#### Lars Ericson (Dec 18 2020 at 18:52):

These 6 examples are all univariate. Steinhaus space is a good enough indexing space for univariate examples whether continuous or discrete. I'm not sure that what's in the sketch above (aside from my desire to generalize `distribution_function`

) excludes product measures. Baby steps!

#### Mario Carneiro (Dec 18 2020 at 19:00):

it's generally easier to not commit to particular example spaces if you can help it. Define a gaussian distributed RV over any event space

#### Mario Carneiro (Dec 18 2020 at 19:00):

you are doing lean on hard mode for some reason I don't understand

#### Mario Carneiro (Dec 18 2020 at 19:01):

The use of [0,1] is AFAICT entirely superfluous, and your proofs and definitions will be smaller and easier without it

#### Mario Carneiro (Dec 18 2020 at 19:03):

certainly the book you are reading from has made no such commitment to an event space

#### Lars Ericson (Dec 18 2020 at 19:40):

These are toy examples for exposition of a small point. Before I can generalize these toy examples I have to be able to do them first. I'm not there yet. I can't generalize something I don't know how to do in a smaller context.

In terms of a grander design, the above sketch has

```
def prob {α : Type*} [probability_space α] {S : set α} (h : is_measurable S) : ennreal :=
volume S
```

I would like to be able to say

```
def prob {α : Type*} [probability_space α] {S : set α} (h : is_measurable S) : (set.Icc (0 : ℝ) 1) :=
volume S
```

because probability measures by definition have as codomain `[0,1]`

. Here's something I need to prove, but I don't know how to say it correctly in Lean:

```
lemma volume_type_01 {α : Type*} [probability_space α] (X: set α) :
((volume X) : (set.Icc (0 : ℝ) 1)) := sorry
```

I can say

```
lemma volume_gt_0_lt_1 {α : Type*} [probability_space α] (X: set α) :
0 ≤ (volume X) ∧ (volume X) ≤ 1 := sorry
```

but that doesn't give me a way to replace `ennreal`

with `(set.Icc (0 : ℝ) 1)`

.

#### Adam Topaz (Dec 18 2020 at 19:43):

I suggest using docs#ennreal.to_real

#### Mario Carneiro (Dec 18 2020 at 19:57):

These are toy examples for exposition of a small point. Before I can generalize these toy examples I have to be able to do them first.

The general case is easier than the specific case. I know this is counterintuitive, but it's true. The toy examples are way harder than general theorems about all measure spaces or all probability spaces, because the contingent details of the example don't get in the way

#### Lars Ericson (Dec 18 2020 at 20:24):

OK. The other thing I want to prove, to get the generalized distribution function, is something I at least know how to state:

```
lemma is_measurable_measurable_with_le (xx: β) [measurable_space β] [has_le β] : is_measurable {c | c ≤ xx} :=
sorry
```

With this, the following #check works fine:

```
variable XX: random_variable α β
variable xx : β
variable Xlte : has_le β
#check @probability_space.prob β XX.induced {c | c ≤ xx} _ -- probability_space.prob ?M_1 : ennreal
```

However the exact same expression preceded by a `def f :=`

instead of a `#check`

results in a fail:

```
def generalized_distribution_function := @probability_space.prob β XX.induced {c | c ≤ xx} _ -- ERROR
```

The fail is:

```
failed to synthesize type class instance for
α : Type u_1,
β : Type u_2,
_inst_1 : probability_space α,
_inst_2 : measurable_space β,
XX : random_variable α β,
xx c : β
⊢ has_le β
```

where the whole file is:

```
import measure_theory.lebesgue_measure
open measure_theory
noncomputable theory
class probability_space (α : Type*) extends measure_space α :=
(is_probability_measure : probability_measure volume)
namespace probability_space
lemma volume_univ {α : Type*} [probability_space α] : volume (set.univ : set α) = 1 :=
is_probability_measure.measure_univ
def prob {α : Type*} [probability_space α] {S : set α} (h : is_measurable S) : ennreal :=
volume S
end probability_space
structure random_variable (α β : Type*) [probability_space α] [measurable_space β] :=
(outcome : α → β)
(is_measurable_outcome : measurable outcome)
variables {α β : Type*} [probability_space α] [measurable_space β]
namespace random_variable
instance : has_coe_to_fun (random_variable α β) := ⟨_,outcome⟩
lemma measurable (X : random_variable α β) : measurable X := X.is_measurable_outcome
def induced (X : random_variable α β) : probability_space β :=
{ volume := measure.map X volume,
is_probability_measure := begin
constructor,
erw measure_theory.measure.map_apply X.measurable is_measurable.univ,
exact probability_space.volume_univ
end }
def distribution_function (X : random_variable α ℝ) : ℝ → ennreal := λ x,
@probability_space.prob _ X.induced {c | c ≤ x} is_measurable_Iic
lemma is_measurable_measurable_with_le (xx: β) [measurable_space β] [has_le β] : is_measurable {c | c ≤ xx} :=
sorry
variable XX: random_variable α β
variable xx : β
variable Xlte : has_le β
#check @probability_space.prob β XX.induced {c | c ≤ xx} _ -- probability_space.prob ?M_1 : ennreal
def generalized_distribution_function := @probability_space.prob β XX.induced {c | c ≤ xx} _ -- ERROR
end random_variable
```

#### Kevin Buzzard (Dec 18 2020 at 20:27):

As you can see, your variable `Xlte`

is nowhere in your local context. `has_le`

is a typeclass, so you need to use square brackets. Note that `has_le`

is just an arbitrary relation, and whilst I know nothing about probability theory or measure theory, assuming `[has_le \beta]`

is a very weak statement.

#### Lars Ericson (Dec 19 2020 at 00:38):

Thanks @Kevin Buzzard I figured out how to state things in a way that typechecks. Not quite done yet, I want to replace `ennreal`

with real interval `[0,1]`

following @Adam Topaz 's hint. Here is how it is with `ennreal`

. There is just no `sorry`

. Instead I introduced a new structure `is_measurable_with_le`

to hold the proof that the set is measurable. It's a punt, but a more well-defined punt:

```
import measure_theory.lebesgue_measure
open measure_theory
noncomputable theory
class probability_space (α : Type*) extends measure_space α :=
(is_probability_measure : probability_measure volume)
namespace probability_space
lemma volume_univ {α : Type*} [probability_space α] : volume (set.univ : set α) = 1 :=
is_probability_measure.measure_univ
def prob {α : Type*} [probability_space α] {S : set α} (h : is_measurable S) : ennreal :=
volume S
end probability_space
structure random_variable (α β : Type*) [probability_space α] [measurable_space β] :=
(outcome : α → β)
(is_measurable_outcome : measurable outcome)
variables {α β : Type*} [probability_space α] [measurable_space β]
namespace random_variable
instance : has_coe_to_fun (random_variable α β) := ⟨_,outcome⟩
lemma measurable (X : random_variable α β) : measurable X := X.is_measurable_outcome
def induced (X : random_variable α β) : probability_space β :=
{ volume := measure.map X volume,
is_probability_measure := begin
constructor,
erw measure_theory.measure.map_apply X.measurable is_measurable.univ,
exact probability_space.volume_univ
end }
def distribution_function (X : random_variable α ℝ) : ℝ → ennreal := λ x,
@probability_space.prob _ X.induced {c | c ≤ x} is_measurable_Iic
structure is_measurable_with_le (β: Type*) (b: β) [measurable_space β] [preorder β] :=
(proof: is_measurable {c: β | c ≤ b})
def generalized_distribution_function (α β : Type*) [probability_space α] [mb: measurable_space β]
[po: preorder β] (X: random_variable α β) (b: β) [im: @is_measurable_with_le β b mb po] :=
@probability_space.prob β X.induced {c | c ≤ b} im.proof
end random_variable
```

#### Mario Carneiro (Dec 19 2020 at 01:46):

Here are some theorems. Try filling the sorries at the end.

```
import measure_theory.lebesgue_measure
open measure_theory
noncomputable theory
class probability_space (α : Type*) extends measure_space α :=
(is_probability_measure : probability_measure volume)
namespace probability_space
variables {α : Type*} [probability_space α]
lemma volume_univ : volume (set.univ : set α) = 1 :=
is_probability_measure.measure_univ
lemma volume_le_one (S : set α) : volume S ≤ 1 :=
by rw ← @volume_univ α; exact measure_mono (set.subset_univ _)
def prob (S : set α) : ℝ := (volume S).to_real
def nnprob (S : set α) : nnreal := (volume S).to_nnreal
theorem volume_eq_nnprob (S : set α) : volume S = nnprob S :=
(ennreal.coe_to_nnreal $ ne_top_of_le_ne_top (by rintro ⟨⟩) (volume_le_one S)).symm
theorem prob_eq_nnprob (S : set α) : prob S = nnprob S := rfl
theorem nnprob_le_one (S : set α) : nnprob S ≤ 1 :=
ennreal.coe_le_coe.1 $ by rw ← volume_eq_nnprob; exact volume_le_one S
theorem prob_le_one (S : set α) : prob S ≤ 1 := nnprob_le_one _
theorem zero_le_prob (S : set α) : 0 ≤ prob S := nnreal.zero_le_coe
theorem nnprob_mono {S T : set α} (h : S ⊆ T) : nnprob S ≤ nnprob T :=
ennreal.coe_le_coe.1 $ by simpa only [← volume_eq_nnprob] using measure_mono h
theorem prob_mono {S T : set α} (h : S ⊆ T) : prob S ≤ prob T :=
nnreal.coe_le_coe.2 $ nnprob_mono h
end probability_space
theorem measurable.map_probability_measure {α β} [probability_space α] [measurable_space β]
{f : α → β} (hf : measurable f) :
probability_measure (measure.map f volume) :=
⟨by rw measure_theory.measure.map_apply hf is_measurable.univ;
exact probability_space.volume_univ⟩
structure random_variable (α β : Type*) [probability_space α] [measurable_space β] :=
(outcome : α → β)
(is_measurable_outcome : measurable outcome)
variables {α β : Type*} [probability_space α] [measurable_space β]
namespace random_variable
instance : has_coe_to_fun (random_variable α β) := ⟨_,outcome⟩
variable (X : random_variable α β)
lemma measurable : measurable X := X.is_measurable_outcome
def induced (X : random_variable α β) : probability_space β :=
{ volume := measure.map X volume,
is_probability_measure := X.measurable.map_probability_measure }
variables [preorder β]
def CDF (x : β) : ℝ := @probability_space.prob _ X.induced {c | c ≤ x}
theorem zero_le_CDF (x : β) : 0 ≤ X.CDF x := sorry
theorem CDF_le_one (x : β) : X.CDF x ≤ 1 := sorry
theorem CDF_mono {x y : β} (h : x ≤ y) : X.CDF x ≤ X.CDF y := sorry
variables [topological_space β] [order_closed_topology β] [opens_measurable_space β]
example (b : β) : is_measurable {c | c ≤ b} := is_closed.is_measurable is_closed_Iic
end random_variable
```

#### Lars Ericson (Dec 19 2020 at 02:14):

Thank you @Mario Carneiro I will work on that.

#### Lars Ericson (Dec 19 2020 at 18:23):

I was able to prove `zero_le_CDF`

and `CDF_le_one`

. I am stuck on `CDF_mono`

. I have it reduced to a proposition of form A implies B where I know B. I have a lemma for A. I don't know how to introduce the results of A into the hypothesis set. That's a simple Lean thing but I don't know how to do it. Then if I had that I could finish. Also I have to prove A which also seems simple but I don't know how to get started. Where

`A`

is`{c : β | c ≤ x} ⊆ {c : β | c ≤ y}`

`B`

is`@probability_space.prob_mono β X.induced {c : β | c ≤ x} {c : β | c ≤ y} to hp`

Here is the sketch so far. The part where I'm stuck as at the bottom:

```
import measure_theory.lebesgue_measure
import tactic
noncomputable theory
open measure_theory
class probability_space (α : Type*) extends measure_space α :=
(is_probability_measure : probability_measure volume)
namespace probability_space
variables {α : Type*} [probability_space α]
lemma volume_univ : volume (set.univ : set α) = 1 :=
is_probability_measure.measure_univ
lemma volume_le_one (S : set α) : volume S ≤ 1 :=
by rw ← @volume_univ α; exact measure_mono (set.subset_univ _)
def prob (S : set α) : ℝ := (volume S).to_real
def nnprob (S : set α) : nnreal := (volume S).to_nnreal
theorem volume_eq_nnprob (S : set α) : volume S = nnprob S :=
(ennreal.coe_to_nnreal $ ne_top_of_le_ne_top (by rintro ⟨⟩) (volume_le_one S)).symm
theorem prob_eq_nnprob (S : set α) : prob S = nnprob S := rfl
theorem nnprob_le_one (S : set α) : nnprob S ≤ 1 :=
ennreal.coe_le_coe.1 $ by rw ← volume_eq_nnprob; exact volume_le_one S
theorem prob_le_one (S : set α) : prob S ≤ 1 := nnprob_le_one _
theorem zero_le_prob (S : set α) : 0 ≤ prob S := nnreal.zero_le_coe
theorem nnprob_mono {S T : set α} (h : S ⊆ T) : nnprob S ≤ nnprob T :=
ennreal.coe_le_coe.1 $ by simpa only [← volume_eq_nnprob] using measure_mono h
theorem prob_mono {S T : set α} (h : S ⊆ T) : prob S ≤ prob T :=
nnreal.coe_le_coe.2 $ nnprob_mono h
end probability_space
theorem measurable.map_probability_measure {α β} [probability_space α] [measurable_space β]
{f : α → β} (hf : measurable f) :
probability_measure (measure.map f volume) :=
⟨by rw measure_theory.measure.map_apply hf is_measurable.univ;
exact probability_space.volume_univ⟩
structure random_variable (α β : Type*) [probability_space α] [measurable_space β] :=
(outcome : α → β)
(is_measurable_outcome : measurable outcome)
variables {α β : Type*} [probability_space α] [measurable_space β]
namespace random_variable
instance : has_coe_to_fun (random_variable α β) := ⟨_,outcome⟩
variable (X : random_variable α β)
lemma measurable : measurable X := X.is_measurable_outcome
def induced (X : random_variable α β) : probability_space β :=
{ volume := measure.map X volume,
is_probability_measure := X.measurable.map_probability_measure }
variables [preorder β]
def CDF (x : β) : ℝ := @probability_space.prob _ X.induced {c | c ≤ x}
variables [topological_space β] [order_closed_topology β] [opens_measurable_space β]
example (b : β) : is_measurable {c | c ≤ b} := is_closed.is_measurable is_closed_Iic
lemma prob_eq_CDF (x : β) : X.CDF x = @probability_space.prob β X.induced {c | c ≤ x} := rfl
theorem zero_le_CDF (x : β) : 0 ≤ X.CDF x :=
begin
rw prob_eq_CDF,
exact (@probability_space.zero_le_prob β X.induced {c : β | c ≤ x}),
end
theorem CDF_le_one (x : β) : X.CDF x ≤ 1 :=
begin
rw prob_eq_CDF,
exact (@probability_space.prob_le_one β X.induced {c : β | c ≤ x}),
end
lemma factoid (x y : β ): {c : β | c ≤ x} ⊆ {c : β | c ≤ y} :=
begin
sorry -- STUCK
end
theorem CDF_mono {x y : β} (h : x ≤ y) : X.CDF x ≤ X.CDF y :=
begin
repeat { rw prob_eq_CDF },
-- introduce hp: factoid(x y) -- STUCK
-- apply @probability_space.prob_mono β X.induced {c : β | c ≤ x} {c : β | c ≤ y} to hp
sorry
end
end random_variable
```

#### Mario Carneiro (Dec 19 2020 at 18:31):

The `factoid`

is false, because you forgot the essential assumption `x <= y`

! Do you see why this theorem should be true mathematically?

#### Mario Carneiro (Dec 19 2020 at 18:37):

also you changed the theorems I wrote by moving them after the `variables [topological_space beta]...`

line. They were written before that line because they don't require any of that stuff to be proven

#### Mario Carneiro (Dec 19 2020 at 18:39):

In fact, I think that line is wrong; indeed there shouldn't be any measurable space structure on beta, because the theorems are all about `X.induced`

which uses only the measurable structure on alpha

#### Lars Ericson (Dec 19 2020 at 19:03):

I moved those up just to put the `sorry`

parts the bottom. I will move back down. How is this for `factoid`

:

```
lemma factoid [preorder β] (x y : β ) (h : x ≤ y) : {c : β | c ≤ x} ⊆ {c : β | c ≤ y} :=
begin
sorry
end
```

Given `factoid`

what's the proof language to introduce the result of a `lemma`

into the hypothesis set? If I have revised `factoid`

then I need that to apply here:

```
theorem CDF_mono {x y : β} (h : x ≤ y) : X.CDF x ≤ X.CDF y :=
begin
repeat { rw prob_eq_CDF },
-- introduce hp: factoid(x y)
-- apply @probability_space.prob_mono β X.induced {c : β | c ≤ x} {c : β | c ≤ y} to hp
sorry
end
```

The revised sketch is

```
import measure_theory.lebesgue_measure
import tactic
noncomputable theory
open measure_theory
class probability_space (α : Type*) extends measure_space α :=
(is_probability_measure : probability_measure volume)
namespace probability_space
variables {α : Type*} [probability_space α]
lemma volume_univ : volume (set.univ : set α) = 1 :=
is_probability_measure.measure_univ
lemma volume_le_one (S : set α) : volume S ≤ 1 :=
by rw ← @volume_univ α; exact measure_mono (set.subset_univ _)
def prob (S : set α) : ℝ := (volume S).to_real
def nnprob (S : set α) : nnreal := (volume S).to_nnreal
theorem volume_eq_nnprob (S : set α) : volume S = nnprob S :=
(ennreal.coe_to_nnreal $ ne_top_of_le_ne_top (by rintro ⟨⟩) (volume_le_one S)).symm
theorem prob_eq_nnprob (S : set α) : prob S = nnprob S := rfl
theorem nnprob_le_one (S : set α) : nnprob S ≤ 1 :=
ennreal.coe_le_coe.1 $ by rw ← volume_eq_nnprob; exact volume_le_one S
theorem prob_le_one (S : set α) : prob S ≤ 1 := nnprob_le_one _
theorem zero_le_prob (S : set α) : 0 ≤ prob S := nnreal.zero_le_coe
theorem nnprob_mono {S T : set α} (h : S ⊆ T) : nnprob S ≤ nnprob T :=
ennreal.coe_le_coe.1 $ by simpa only [← volume_eq_nnprob] using measure_mono h
theorem prob_mono {S T : set α} (h : S ⊆ T) : prob S ≤ prob T :=
nnreal.coe_le_coe.2 $ nnprob_mono h
end probability_space
theorem measurable.map_probability_measure {α β} [probability_space α] [measurable_space β]
{f : α → β} (hf : measurable f) :
probability_measure (measure.map f volume) :=
⟨by rw measure_theory.measure.map_apply hf is_measurable.univ;
exact probability_space.volume_univ⟩
structure random_variable (α β : Type*) [probability_space α] [measurable_space β] :=
(outcome : α → β)
(is_measurable_outcome : measurable outcome)
variables {α β : Type*} [probability_space α] [measurable_space β]
namespace random_variable
instance : has_coe_to_fun (random_variable α β) := ⟨_,outcome⟩
variable (X : random_variable α β)
lemma measurable : measurable X := X.is_measurable_outcome
def induced (X : random_variable α β) : probability_space β :=
{ volume := measure.map X volume,
is_probability_measure := X.measurable.map_probability_measure }
variables [preorder β]
def CDF (x : β) : ℝ := @probability_space.prob _ X.induced {c | c ≤ x}
lemma prob_eq_CDF (x : β) : X.CDF x = @probability_space.prob β X.induced {c | c ≤ x} := rfl
theorem zero_le_CDF (x : β) : 0 ≤ X.CDF x :=
begin
rw prob_eq_CDF,
exact (@probability_space.zero_le_prob β X.induced {c : β | c ≤ x}),
end
theorem CDF_le_one (x : β) : X.CDF x ≤ 1 :=
begin
rw prob_eq_CDF,
exact (@probability_space.prob_le_one β X.induced {c : β | c ≤ x}),
end
lemma factoid [preorder β] (x y : β ) (h : x ≤ y) : {c : β | c ≤ x} ⊆ {c : β | c ≤ y} :=
begin
sorry
end
theorem CDF_mono {x y : β} (h : x ≤ y) : X.CDF x ≤ X.CDF y :=
begin
repeat { rw prob_eq_CDF },
-- introduce hp: factoid(x y)
-- apply @probability_space.prob_mono β X.induced {c : β | c ≤ x} {c : β | c ≤ y} to hp
sorry
end
variables [topological_space β] [order_closed_topology β] [opens_measurable_space β]
example (b : β) : is_measurable {c | c ≤ b} := is_closed.is_measurable is_closed_Iic
end random_variable
```

#### Mario Carneiro (Dec 19 2020 at 19:32):

Have you read #tpil or played the natural number game? They introduce all the tools you need to write proofs. In this case, you need `have := factoid x y h`

.

#### Lars Ericson (Dec 19 2020 at 19:59):

Yes I did both but in their examples the `have`

is usually something like `have hp : p, from h.left`

so a labelled proposition from a tactic applied to the existing goal state. I didn't find any examples in #tpil of `have := `

with a term. In #tpil and the number game, lemmas and theorems are used into introduce equalities that get applied with `rw`

tactic or implications that get applied with `apply`

or `exact`

.

#### Lars Ericson (Dec 19 2020 at 19:59):

Anyway, DONE:

```
import measure_theory.lebesgue_measure
import tactic
noncomputable theory
open measure_theory
class probability_space (α : Type*) extends measure_space α :=
(is_probability_measure : probability_measure volume)
namespace probability_space
variables {α : Type*} [probability_space α]
lemma volume_univ : volume (set.univ : set α) = 1 :=
is_probability_measure.measure_univ
lemma volume_le_one (S : set α) : volume S ≤ 1 :=
by rw ← @volume_univ α; exact measure_mono (set.subset_univ _)
def prob (S : set α) : ℝ := (volume S).to_real
def nnprob (S : set α) : nnreal := (volume S).to_nnreal
theorem volume_eq_nnprob (S : set α) : volume S = nnprob S :=
(ennreal.coe_to_nnreal $ ne_top_of_le_ne_top (by rintro ⟨⟩) (volume_le_one S)).symm
theorem prob_eq_nnprob (S : set α) : prob S = nnprob S := rfl
theorem nnprob_le_one (S : set α) : nnprob S ≤ 1 :=
ennreal.coe_le_coe.1 $ by rw ← volume_eq_nnprob; exact volume_le_one S
theorem prob_le_one (S : set α) : prob S ≤ 1 := nnprob_le_one _
theorem zero_le_prob (S : set α) : 0 ≤ prob S := nnreal.zero_le_coe
theorem nnprob_mono {S T : set α} (h : S ⊆ T) : nnprob S ≤ nnprob T :=
ennreal.coe_le_coe.1 $ by simpa only [← volume_eq_nnprob] using measure_mono h
theorem prob_mono {S T : set α} (h : S ⊆ T) : prob S ≤ prob T :=
nnreal.coe_le_coe.2 $ nnprob_mono h
end probability_space
theorem measurable.map_probability_measure {α β} [probability_space α] [measurable_space β]
{f : α → β} (hf : measurable f) :
probability_measure (measure.map f volume) :=
⟨by rw measure_theory.measure.map_apply hf is_measurable.univ;
exact probability_space.volume_univ⟩
structure random_variable (α β : Type*) [probability_space α] [measurable_space β] :=
(outcome : α → β)
(is_measurable_outcome : measurable outcome)
variables {α β : Type*} [probability_space α] [measurable_space β]
namespace random_variable
instance : has_coe_to_fun (random_variable α β) := ⟨_,outcome⟩
variable (X : random_variable α β)
lemma measurable : measurable X := X.is_measurable_outcome
def induced (X : random_variable α β) : probability_space β :=
{ volume := measure.map X volume,
is_probability_measure := X.measurable.map_probability_measure }
variables [preorder β]
def CDF (x : β) : ℝ := @probability_space.prob _ X.induced {c | c ≤ x}
lemma prob_eq_CDF (x : β) : X.CDF x = @probability_space.prob β X.induced {c | c ≤ x} := rfl
theorem zero_le_CDF (x : β) : 0 ≤ X.CDF x :=
begin
rw prob_eq_CDF,
exact (@probability_space.zero_le_prob β X.induced {c : β | c ≤ x}),
end
theorem CDF_le_one (x : β) : X.CDF x ≤ 1 :=
begin
rw prob_eq_CDF,
exact (@probability_space.prob_le_one β X.induced {c : β | c ≤ x}),
end
lemma factoid [preorder β] (x y : β ) (h : x ≤ y) : {c : β | c ≤ x} ⊆ {c : β | c ≤ y} :=
set.Iic_subset_Iic.mpr h
theorem CDF_mono {x y : β} (h : x ≤ y) : X.CDF x ≤ X.CDF y :=
begin
repeat { rw prob_eq_CDF },
have h1 := factoid x y h,
have h2 := @probability_space.prob_mono β X.induced {c : β | c ≤ x} {c : β | c ≤ y},
exact h2 h1,
end
variables [topological_space β] [order_closed_topology β] [opens_measurable_space β]
example (b : β) : is_measurable {c | c ≤ b} := is_closed.is_measurable is_closed_Iic
end random_variable
```

#### Mario Carneiro (Dec 19 2020 at 20:21):

In your example `h.left`

is a term

#### Mario Carneiro (Dec 19 2020 at 20:22):

Note that `have :=`

(which is part of the less abbreviated `have h : t := term`

) is the tactic mode version; the term mode version is `have h : t, from term,`

.

#### Mario Carneiro (Dec 19 2020 at 20:26):

It looks like `have`

is introduced in https://wwwf.imperial.ac.uk/~buzzard/xena/natural_number_game/?world=5&level=3

#### Lars Ericson (Dec 19 2020 at 20:36):

Thanks you are correct! In terms of my original generalized `distribution_function`

goal, the above can be trimmed down to the assumption that a random variable is from a probability space α to a measurable space β which is a preorder. `CDF`

is from β to ℝ. Ideally it should be to `set.Icc (0: ℝ) 1`

. I guess the additional theorems will help me prove that and restate `def CDF (x : β) : ℝ `

as `def CDF (x : β) : set.Icc (0: ℝ) 1`

? The example at the end shows how to prove that `{c | c ≤ b}`

is measurable for a type which is a topological space, an order closed topology and a measurable space. Bu those don't seem to be conditions required by the `factoid`

, which only needs `preorder`

. Are all these properties needed for anything or just measurable space with preorder?

#### Mario Carneiro (Dec 19 2020 at 20:55):

Ideally it should be to set.Icc (0: ℝ) 1.

As you can see, we have theorems that it is between zero and one. This is generally preferable up until you actually need to put it into a structure that expects `set.Icc (0: ℝ) 1`

, which you don't currently have

#### Mario Carneiro (Dec 19 2020 at 20:57):

The example at the end shows how to prove that {c | c ≤ b} is measurable for a type which is a topological space, an order closed topology and a measurable space. Bu those don't seem to be conditions required by the factoid, which only needs preorder. Are all these properties needed for anything or just measurable space with preorder?

I put it as an `example`

to show that it is derivable, but so far nothing actually needs those assumptions. You haven't done anything that requires `{c | c ≤ b}`

to be measurable yet, but if you do those are the assumptions you will need (at that point, and not before)

#### Lars Ericson (Dec 19 2020 at 21:00):

Can you help me with the mechanics of this:

```
def distribution_function (x: β) : set.Icc (0: ℝ) 1 := (CDF x).to_set_Icc_01
```

It seems I would need to define

```
def nat.to_set_Icc_01 (x: ℝ) := sorry
```

but that doesn't use the 0/1 theorems and doesn't seem right.

#### Mario Carneiro (Dec 19 2020 at 21:02):

```
def CDF_in_01 (x: β) : set.Icc (0 : ℝ) 1 := ⟨X.CDF x, X.zero_le_CDF x, X.CDF_le_one x⟩
```

#### Mario Carneiro (Dec 19 2020 at 21:02):

to reiterate, don't use this unless you need it

#### Lars Ericson (Dec 19 2020 at 21:03):

It's in the nature of CDF, what would be the aesthetic or practical reason not to constrain the codomain in this way?

#### Mario Carneiro (Dec 19 2020 at 21:04):

because `real`

has operations on it, like addition, while `set.Icc 0 1`

doesn't

#### Lars Ericson (Dec 19 2020 at 21:05):

Anyway isn't it exciting to know that CDF can be generalized to any measurable space with a preorder? It's maybe a useless fact but I've never seen it in any presentation of CDF.

#### Mario Carneiro (Dec 19 2020 at 21:06):

Sure, this kind of "trivial generalization" comes up all the time in mathlib. It makes you consider questions the original authors hadn't thought about, like what closure properties are needed to prove that the CDF is left-continuous?

#### Mario Carneiro (Dec 19 2020 at 21:07):

You just read what the authors say, delete all the assumptions, and then just add the assumptions if they become necessary and not before

#### Mario Carneiro (Dec 19 2020 at 21:08):

A nonzero proportion of the time you never add some of them back

#### Mario Carneiro (Dec 19 2020 at 21:08):

In fact mathlib has a linter which will do this for you

#### Lars Ericson (Dec 19 2020 at 21:12):

If you don't mind my quoting your proof above, I can now close my old Stack Exchange question with a definitive answer.

```
```

#### Mario Carneiro (Dec 19 2020 at 21:17):

It seems that you got an answer in that question already:

CDFs are defined only for $\mathbb{R}$ (even for $\mathbb{R}^n$) and are used for convenience. The push-forward is of a measure $\mu$ by a Random variable $X:(\Omega,\mathscr{F},\mu)\rightarrow(S,\Sigma)$, $\mu_X(A):=\mu(X\in A)$ is the object of interest and it is as general as it can be, since $(S,\Sigma)$ has no structure other than being a measurable space.

#### Mario Carneiro (Dec 19 2020 at 21:18):

This is another way to say that in a more general context we stop talking about the CDF because the pushforward measure is more general

#### Mario Carneiro (Dec 19 2020 at 21:19):

A CDF is only useful to the extent that it determines the pushforward measure

#### Lars Ericson (Dec 19 2020 at 21:27):

CDF is defined on points in `S`

. The convention is that only the case `S=R`

has any use. What we showed is that if `S`

is any measurable space with a preorder, `CDF`

can still be computed and give back a number in `[0,1]`

. So I can define a preorder on a domain like binary expression trees of depth at most `n`

and compute the `CDF`

of a tree in the space of all binary trees depth at most `n`

. I can say that and say that it is interesting to me because it is. This has nothing to do with using `CDF`

to determine the pushforward measure. So that comment doesn't make any sense to me. I got the result I was looking for, and it doesn't seem to be relevant to that comment. That or it's not possible to put a preorder on binary trees, and continuing on in the vein, maybe the claim is that any measurable set with preorder is isomorphic to `ℝ`

. Is that the case?

#### Mario Carneiro (Dec 19 2020 at 21:28):

Yes you can compute a number, but the question is what does that number mean? It's not true for a general preorder that giving the CDF of the random variable uniquely determines the distribution of the variable

#### Mario Carneiro (Dec 19 2020 at 21:29):

maybe the claim is that any measurable set with preorder is isomorphic to ℝ. Is that the case?

This is certainly not the case

#### Mario Carneiro (Dec 19 2020 at 21:31):

you have a function called CDF, but that doesn't on its own mean much. Here's another function called CDF:

```
def CDF' (X : random_variable α β) (x : β) : ℝ := 0
```

This one doesn't even need a preorder structure on beta, so it's a clear improvement. It's also monotone, between 0 and 1, and right continuous

#### Lars Ericson (Dec 19 2020 at 21:36):

Except our CDF is proven to be a `probability_measure`

on a `measurable_space`

with `preorder`

, by construction. I'll take that. It's not 0.

#### Mario Carneiro (Dec 19 2020 at 21:36):

Our CDF is no such thing

#### Mario Carneiro (Dec 19 2020 at 21:36):

it's a real function

#### Mario Carneiro (Dec 19 2020 at 21:36):

I can put all those assumptions on `CDF'`

too

#### Mario Carneiro (Dec 19 2020 at 21:37):

Obviously `CDF'`

is a silly function, but the point is that we haven't shown anything about `CDF`

that makes it meaningfully different from `CDF'`

#### Mario Carneiro (Dec 19 2020 at 21:38):

And my claim is that the real theorem of interest that will distinguish the two doesn't hold on arbitrary preorders

#### Lars Ericson (Dec 19 2020 at 21:38):

Obtained by running `prob`

, where `prob`

was supplied as a `probability_measure`

when building the initial probability space, and then running it through `induced`

on the `random_variable`

. As long as you have a meaningful and correct `probability_space`

to begin with, and a meaningful and correct `random_variable`

, the `CDF`

will be meaningful and correct and derived from the original `probability_space`

and the supplied `random_variable`

. There is nothing trivial about it at all.

#### Mario Carneiro (Dec 19 2020 at 21:39):

How do you know it is meaningful and correct?

#### Mario Carneiro (Dec 19 2020 at 21:39):

That's what theorems are for

#### Mario Carneiro (Dec 19 2020 at 21:39):

and we don't have any

#### Mario Carneiro (Dec 19 2020 at 21:39):

except the obvious ones that are true also for `CDF'`

#### Lars Ericson (Dec 19 2020 at 21:40):

If I start out with say the Steinhaus space and the definition of the random variable for the 2 point-distribution and crank it through I will get the distribution and distribution function for the 2-point random variable. What's missing?

#### Mario Carneiro (Dec 19 2020 at 21:41):

well that's over `real`

#### Mario Carneiro (Dec 19 2020 at 21:41):

I thought you wanted arbitrary preorders?

#### Lars Ericson (Dec 19 2020 at 21:42):

As another example we could compute distributions of binary trees of depth `n`

. This is something people actually think about. Why can't I have a CDF of those?

#### Mario Carneiro (Dec 19 2020 at 21:42):

The main theorem here is that two random variables with the same `CDF`

are the same random variable

#### Mario Carneiro (Dec 19 2020 at 21:42):

that's not true for `CDF'`

#### Mario Carneiro (Dec 19 2020 at 21:42):

and it's not true for `CDF`

on some preorders

#### Mario Carneiro (Dec 19 2020 at 21:43):

Really, I think it's not worth thinking about the CDF in the more unusual codomains. Just call `X.induced`

CDF if it makes you feel better

#### Lars Ericson (Dec 19 2020 at 21:46):

So what you proved at the bottom required that `S`

be a topological space, an order closed topology and an opens measurable space. I don't need those assumptions for `def CDF`

. If I delete `variables [preorder β]`

, then `def CDF`

crashes. It doesn't rely however on being topological space, order closed topology and opens measurable space. Are you saying that I will crash if I try to define a random variable into binary trees of length N with a preorder, because of something I'm not understanding?

#### Mario Carneiro (Dec 19 2020 at 21:46):

In measure theory, a distribution is no more or less than a measure. It takes all subsets of X as its input, and it doesn't need to be boiled down to a function on X

#### Mario Carneiro (Dec 19 2020 at 21:49):

I don't need those assumptions for def CDF. If I delete variables [preorder β], then def CDF crashes. It doesn't rely however on being topological space, order closed topology and opens measurable space. Are you saying that I will crash if I try to define a random variable into binary trees of length N with a preorder, because of something I'm not understanding?

This is why you need to prove theorems about your definitions. It's perfectly possible that a definition can be *made* in some tremendous generality but it doesn't *make sense* without some more assumptions. In fact `def CDF`

works using only `has_le beta`

, it doesn't even need `preorder`

, but it certainly doesn't *make sense* in that generality; we already have a theorem that will detect this nonsense, namely `CDF_mono`

. WIthout that theorem you might well think that there are no problems with that definition

#### Mario Carneiro (Dec 19 2020 at 21:49):

But you haven't tried to prove anything more complicated about `CDF`

yet, and when you do you will find that `preorder beta`

isn't good enough either

#### Mario Carneiro (Dec 19 2020 at 21:51):

It all comes back to the thing I mentioned earlier: how do people *use* the definition?

#### Mario Carneiro (Dec 19 2020 at 21:51):

making definitions doesn't tell you any deep truths about it, because you can very easily write nonsense without noticing

#### Mario Carneiro (Dec 19 2020 at 21:52):

but once you try to use the definition it all falls down

#### Lars Ericson (Dec 19 2020 at 21:56):

Which brings me back to trees. A long time ago I helped a mathematician counting trees to write a paper making some probabiliistic arguments about operations on trees. Conceivably, while doing that, in a practical sense, I might want to define a preorder on trees and compute the percentage of trees in the entire space of trees of size `n`

which were "less than" a single particular tree, or an equivalence class of trees. So that would be the CDF of a single tree. It might be boring nonsense to almost everybody in the world, but for the two people in that room, it might be helpful. So then the question, how much of this will fall down if my trees have a preorder but do not constitute a type which is a topological space, order closed topology and opens measurable space? All of it, some of it or none of it? And if it is all of it, can we go further and say that there is a 1-1 onto map between the reals and any such space?

#### Reid Barton (Dec 19 2020 at 22:06):

Even in this hypothetical scenario, you could just write down "the probability that a random tree is 'less than' the particular tree in question" and not bother with this CDF terminology stuff.

#### Lars Ericson (Dec 20 2020 at 15:37):

Yes, I could write "the probability that a random tree is 'less than' the particular tree in question" as `CDF T`

after defining a preorder on trees, following the development of the generalized distribution function above, which establishes that I can do so in a well-typed way. This is very helpful for me. Thanks!

#### Lars Ericson (Dec 20 2020 at 16:36):

But I think the answer to the question of why distribution function is only defined by the literature for R is in this comment by Oliver Diaz in the Wikipedia article:

"If (𝑌,𝑑) is s Borel space (a Polish space for instance) then it is measurable isomorphic to (0,1) and one can define an order in 𝑌. The isomorphism however is not explicit, it is based on constructions that use the axiom of choice."

"Measurable isomorphic to (0,1)" is what I was looking for, it gives motivation for only talking about R, in the sense that talking about `Y`

would be redundant under the isomorphism.

#### Mario Carneiro (Dec 20 2020 at 17:27):

I don't think that is true at all

#### Mario Carneiro (Dec 20 2020 at 17:28):

Y can easily be something like `bool`

with the preorder formulation

#### Mario Carneiro (Dec 20 2020 at 17:30):

you need a lot more assumptions to make it look like real. Borel won't cut it, Polish might but it's not clear what the reason for such an assumption is

Last updated: May 15 2021 at 02:11 UTC