Zulip Chat Archive

Stream: new members

Topic: How do I tell Lean that a subclass is a kind of the supercla

Lars Ericson (Dec 17 2020 at 16:57):

I'm working on this:

import measure_theory.lebesgue_measure

open measure_theory

noncomputable theory

class probability_space (α : Type*) extends measure_space α :=
(is_probability_measure:  probability_measure volume)

class random_variable (α β : Type*) :=
(domain : probability_space α )
(codomain : measurable_space β )
(outcome : α → β)
(is_measurable_outcome : @measurable α  β domain codomain outcome)

I am trying to say that a random variable is a measurable function from a probability space to a measurable space. Even though the function is declared α → β and I have proofs that domain is a probability_space and codomain is a measurable_space, I still guess that I need to add a proof that outcome is a measurable function. Or is it obvious? I'm unsure.

Anyway, if I do have to do the extra step of proving that outcome is measurable, I need to get this type right:

@measurable α  β domain codomain outcome

However, it is complaining

type mismatch at application
  measurable
term
  domain
has type
  probability_space α
but is expected to have type
  measurable_space α

even though

class probability_space (α : Type*) extends measure_space

and

class measure_space (α : Type*) extends measurable_space α

How do I tell Lean that a subclass is a kind of the superclass?

Adam Topaz (Dec 17 2020 at 17:04):

I would use something approximately like this:

import measure_theory.lebesgue_measure

open measure_theory

noncomputable theory

class probability_space (α : Type*) extends measure_space α :=
(is_probability_measure:  probability_measure volume)

structure random_variable (α β : Type*) [probability_space α] [measurable_space β] :=
(outcome : α → β)
(is_measurable_outcome : measurable outcome)

Lars Ericson (Dec 17 2020 at 17:06):

Thanks! Why structure and not class in this case? random_variable is a class of function, it seems possible people might want to create instances.

Reid Barton (Dec 17 2020 at 17:06):

This is more sensible yes, but also in the original formulation you can just write is_measurable_outcome : measurable outcome--if you don't try to do Lean's work for it, it will do it correctly.

Adam Topaz (Dec 17 2020 at 17:08):

Oh it looks like measurable is not a class, so I should use regular parens. (fixed code)

Adam Topaz (Dec 17 2020 at 17:10):

Does mathlib have bundled measurable functions?

Lars Ericson (Dec 17 2020 at 17:51):

Please help me take it a little further. To the random variable I want to add an inverse. So if X(ω) is a random variable, then
inverse(y) = {ω : X(ω) = y}. I tried this:

class random_variable (α β : Type*) [probability_space α] [measurable_space β] :=
(outcome : α → β)
(is_measurable_outcome : measurable outcome)
(inverse : β → set α := { ω : α // ∃ y : β, y = outcome ω }) // ERROR

I got this error message, which I don't know how to fix:

invalid let-expression, term
  {ω // ∃ (y : β), y = outcome ω}
has type
  Type ? : Type (max 1 (?+1))
but is expected to have type
  β → set α : Type (max ? ?)

From there I want to do this, to say that the distribution space is a probability space induced by X with probability measure volumeX : β→ [0,1] such that volumeX = volume ∘ (X^-1) is the pushforward measure and volumeX is called the distribution of X:

instance distribution_space (α β : Type*) (V: random_variable α β) (ΩAP: probability_space α ):  probability_space β :=
(volume : ΩAP.volume ∘ V.generalize_inverse)
(is_probability_measure := sorry)

I'm not sure how to get this all right in Lean, but the ideas are there.

"pushforward" sounds categorical. If there were a category Prob there may be some categorical way to express what I'm trying to do here. I don't know category theory though. For now I'm just trying to get this part right.

Lars Ericson (Dec 17 2020 at 19:07):

OK I found something in the guts of measure_space but it doesn't show up in docs#pushforward for in Lean which is all category theory related:

/-- The pushforward of a measure. It is defined to be `0` if `f` is not a measurable function. -/
def map (f : α → β) : measure α →ₗ[ennreal] measure β :=
if hf : measurable f then
  lift_linear (outer_measure.map f) $ λ μ s hs t,
    le_to_outer_measure_caratheodory μ _ (hf hs) (f ⁻¹' t)
else 0

Adam Topaz (Dec 17 2020 at 20:20):

import measure_theory.lebesgue_measure

open measure_theory

noncomputable theory

class probability_space (α : Type*) extends measure_space α :=
(is_probability_measure :  probability_measure volume)

structure random_variable (α β : Type*) [probability_space α] [measurable_space β] :=
(outcome : α → β)
(is_measurable_outcome : measurable outcome)

variables {α β : Type*} [probability_space α] [measurable_space β]

def induced (X : random_variable α β) : probability_space β :=
{ volume := measure.map X.outcome volume,
  is_probability_measure := begin
    constructor,
    rw measure_theory.measure.map_apply X.is_measurable_outcome is_measurable.univ,
    apply probability_space.is_probability_measure.measure_univ,
  end }

One thing to note, as we discussed somewhere before, one should really make an API for probability_space so that the proof in is_probability_measure becomes essentially a one-liner.

Adam Topaz (Dec 17 2020 at 20:21):

The structure random_variable needs an API as well.

Adam Topaz (Dec 17 2020 at 20:21):

But I hope the code above helps you get started :)

Lars Ericson (Dec 17 2020 at 20:52):

Thank you @Adam Topaz that is extremely helpful. I have been looking at the API of measure.map:

import measure_theory.measure_space

open measure_theory

variables α β : Type*
variable MS1 : measurable_space α
variable MS2 : measurable_space β
variable μ_α : α → β
variable m_α : measure α

#check @measure_theory.measure.map α β MS1 MS2 μ_α m_α
#check measure_theory.measure.map μ_α m_α

This is the pushforward measure in Wikipedia: https://en.wikipedia.org/wiki/Pushforward_measure

There is also a comap in measure_space:

/-- Pullback of a `measure`. If `f` sends each `measurable` set to a `measurable` set, then for each
measurable set `s` we have `comap f μ s = μ (f '' s)`. -/
def comap (f : α → β) : measure β →ₗ[ennreal] measure α :=

What are the aesthetics of calling these map and comap versus pushforward and pullback? This would provide a conceptual link to various pushforward definitions in Topand category_theory: docs#pushforward. I'm just asking, pure curiosity, not a criticism.

Lars Ericson (Dec 17 2020 at 23:24):

Here is the last bit I want to get to before thinking about the overall API. Please take a look at Definition 7:

Screenshot-from-2020-12-17-18-23-57.png

Lars Ericson (Dec 17 2020 at 23:25):

Now take a look at section 1.1 B:
Screenshot-from-2020-12-17-18-24-10.png

Adam Topaz (Dec 17 2020 at 23:26):

What's the question? What is it precisely you want to get?

Yakov Pechersky (Dec 17 2020 at 23:30):

Are you asking about inverses? You could have:

class random_variable (α β : Type*) [probability_space α] [measurable_space β] :=
(outcome : α → β)
(is_measurable_outcome : measurable outcome)
(surj: function.surjective outcome)

and for the inverse use
docs#function.surj_inv

Lars Ericson (Dec 17 2020 at 23:30):

Here it comes, trying to type faster. Here's the first part:

So Mark Dean defines a notion of a distribution and a distribution function (when the random variable is into ℝ). Kloeden&co give an expression for the distribution function: FX(x) = PX((-∞ ,x)) = P({ω ∈ Ω : X(ω) ≤ x}).

They are both assuming that the random variable is into ℝ. I don't see any reason for this, because this: X(ω) ≤ x only requires that there be a preorder ≤ on the type β that the preorder is into.

Adam Topaz (Dec 17 2020 at 23:30):

Oh so you want the distribution then?

Lars Ericson (Dec 17 2020 at 23:31):

No, we have the distribution, I want to define the distribution_function in case β has a preorder in this sketch:

import measure_theory.lebesgue_measure
import measure_theory.measure_space

open measure_theory
noncomputable theory

class probability_space (α : Type*) extends measure_space α :=
(is_probability_measure :  probability_measure volume)

def probability_space.distribution (α : Type*) [ps: probability_space α] :=
  ps.volume

def probability_space.distribution_function  (X : random_variable α β ) [ps: probability_space β ]  [preorder β] :=
  sorry

structure random_variable (α β : Type*) [probability_space α] [measurable_space β] :=
(outcome : α → β)
(is_measurable_outcome : measurable outcome)

variables {α β : Type} [probability_space α] [measurable_space β]

instance distribution_space  (X : random_variable α β) : probability_space β :=
{ volume := measure.map X.outcome volume,
  is_probability_measure := begin
    constructor,
    rw measure_theory.measure.map_apply X.is_measurable_outcome is_measurable.univ,
    apply probability_space.is_probability_measure.measure_univ,
  end }

instance {α} {p : α → Prop} [m : measure_space α] : measure_space (subtype p) :=
{ volume := measure.comap (coe : _ → α) volume }

theorem subtype.volume_apply {α} {p : α → Prop} [measure_space α]
  (hp : is_measurable {x | p x}) {s : set (subtype p)} (hs : is_measurable s) :
  volume s = volume ((coe : _ → α) '' s) :=
measure.comap_apply _ subtype.coe_injective (λ _, is_measurable.subtype_image hp) _ hs

namespace Steinhaus

instance P : probability_measure (volume : measure (set.Icc (0 : ℝ) 1)) :=
{ measure_univ := begin
    refine (subtype.volume_apply is_measurable_Icc is_measurable.univ).trans _,
    suffices : volume (set.Icc (0 : ℝ) 1) = 1, {simpa},
    rw [real.volume_Icc], simp
  end }

instance ΩAP : probability_space (set.Icc (0 : ℝ) 1) :=
{ is_probability_measure := Steinhaus.P }

#check Steinhaus.ΩAP
#check Steinhaus.ΩAP.volume -- volume : measure ↥(set.Icc 0 1)

end Steinhaus

Lars Ericson (Dec 17 2020 at 23:32):

So we have probability space, random variable, distribution space induced by random variable on probability space, and the final icing on the cake is to show that if I have a preorder on the measurable space that the random variable is into, then I have a distribution function.

Lars Ericson (Dec 17 2020 at 23:33):

Why this is at all interesting is because everybody in the world only defines distribution_function when the random variable is into ℝ. Nobody seems to care about the case where it is into a measurable space with a preorder. I just think it's an interesting generalization.

Lars Ericson (Dec 17 2020 at 23:34):

But it's only possible for me to talk about whether it is interesting or definable in the presence of something like Lean.

Mario Carneiro (Dec 17 2020 at 23:35):

@Yakov Pechersky There aren't any inverses involved in those snippets. That's a preimage you see

Lars Ericson (Dec 17 2020 at 23:37):

What I'm working on is this generalization of the distribution function in the Kloeden snippet, which Dean defines only if the RV is into the reals:

def probability_space.distribution_function  (X : random_variable α β ) (y: β)  \[ps: probability_space β ]  [preorder β] :=
  ps.volume({ω ∈ α : X(ω) ≤ y})

Yakov Pechersky (Dec 17 2020 at 23:37):

Ah, misread. Gotcha. So the clean wya to state that is outcome.preimage?

Mario Carneiro (Dec 17 2020 at 23:37):

Well, the whole measure is measure.comap I think

Mario Carneiro (Dec 17 2020 at 23:38):

actually measure.map

Lars Ericson (Dec 17 2020 at 23:40):

That's another question I had about terminology, why measure_space uses map and comap but in the comments it uses pushforward and pullback which comments match terminology used in Top and category_theory.

Mario Carneiro (Dec 17 2020 at 23:40):

so there isn't really anything to do in definition 7. measure.map x volume is a measure, check

Mario Carneiro (Dec 17 2020 at 23:40):

because there is a clash of terminology from multiple fields

Mario Carneiro (Dec 17 2020 at 23:41):

lean uses map and comap for regularity, but this gets instantiated in category theory and topology and algebra and measure theory and mathematicians in these areas all have their own ideas about what to call it

Lars Ericson (Dec 17 2020 at 23:41):

OK thanks. The above sketch works fine. @Adam Topaz worked out how to derive a distribution space from a random variable on a probability space:

instance distribution_space  (X : random_variable α β) : probability_space β :=
{ volume := measure.map X.outcome volume,
  is_probability_measure := begin
    constructor,
    rw measure_theory.measure.map_apply X.is_measurable_outcome is_measurable.univ,
    apply probability_space.is_probability_measure.measure_univ,
  end }

The only thing I have left to complete the functionality is generalized distribution function.

Mario Carneiro (Dec 17 2020 at 23:42):

I think you are missing a lemma there

Mario Carneiro (Dec 17 2020 at 23:42):

you just proved the map of a probability measure is a probability measure

Mario Carneiro (Dec 17 2020 at 23:42):

that's a lemma

Adam Topaz (Dec 17 2020 at 23:43):

import measure_theory.lebesgue_measure

open measure_theory

noncomputable theory

class probability_space (α : Type*) extends measure_space α :=
(is_probability_measure :  probability_measure volume)

namespace probability_space

lemma volume_univ {α : Type*} [probability_space α] : volume (set.univ : set α) = 1 :=
  is_probability_measure.measure_univ

def prob {α : Type*} [probability_space α] {S : set α} (h : is_measurable S) : ennreal :=
  volume S

end probability_space

structure random_variable (α β : Type*) [probability_space α] [measurable_space β] :=
(outcome : α → β)
(is_measurable_outcome : measurable outcome)

variables {α β : Type*} [probability_space α] [measurable_space β]

namespace random_variable

instance : has_coe_to_fun (random_variable α β) := ⟨_,outcome⟩

lemma measurable (X : random_variable α β) : measurable X := X.is_measurable_outcome

def induced (X : random_variable α β) : probability_space β :=
{ volume := measure.map X volume,
  is_probability_measure := begin
    constructor,
    erw measure_theory.measure.map_apply X.measurable is_measurable.univ,
    exact probability_space.volume_univ
  end }

def distribution_function (X : random_variable α ℝ) : ℝ → ennreal := λ x,
  @probability_space.prob _ X.induced {c | c ≤ x} is_measurable_Iic

end random_variable

Mario Carneiro (Dec 17 2020 at 23:43):

why is prob an ennreal?

Adam Topaz (Dec 17 2020 at 23:43):

Because I'm lazy

Adam Topaz (Dec 17 2020 at 23:44):

One should change it to \R.

Adam Topaz (Dec 17 2020 at 23:44):

Same with distribtion_function.

Lars Ericson (Dec 17 2020 at 23:44):

Well that's what I'm trying to generalize. I want to replace ℝ with β where β has a preorder. What do you think?

Adam Topaz (Dec 17 2020 at 23:45):

Sure you could do that.

Mario Carneiro (Dec 17 2020 at 23:45):

I don't think that's a good idea, it's not general enough

Mario Carneiro (Dec 17 2020 at 23:46):

the sets {c | c ≤ x} in a preorder don't generate all borel sets

Mario Carneiro (Dec 17 2020 at 23:46):

you will have to work with the preimages of general sets, at which point it's not clear what the CDF gets you

Lars Ericson (Dec 17 2020 at 23:47):

It would be a really wonderful solution to a Math StackExchange question I posed in August, i.e. exactly what kind of order does β have to have for this to go through in Lean.

Lars Ericson (Dec 17 2020 at 23:48):

For one thing you know by construction of the random_variable that β is a measurable_space.

Lars Ericson (Dec 18 2020 at 00:51):

Mario Carneiro said:

you just proved the map of a probability measure is a probability measure, that's a lemma

It could be that makes it a category, which is something to think about. I.e. if you were building a probability library from scratch would you want to make it as category theory-ish as possible or just express the main ideas on their own. There seems to be big interest in category theory here, but it seems that using it in practice like for GCD/Fibonacci example is quite hard. So then the question is whether the generalization that it provides is productive for particular applications, or is it just nice to think about on its own. The fact that there is map/comap and pullback/pushthrough in different parts of the library shows that there is not a unified view among all the mathlib contributors. Not that there should be, I'm just slowly learning the "style guide" and "make it a category" doesn't seem to be one of the style imperatives.

Lars Ericson (Dec 18 2020 at 00:59):

Regarding this expression for the distribution_function, which is on single points in the codomain of the RV, as opposed to the distribution, which is on subsets of the domain of the RV: F(x) = P({ω ∈ Ω : X(ω) ≤ x}). We can see by inspection, since P is a probability measure, that it will be in [0,1]. If the codomain of the RV is some kind of partially ordered set, the distribution function might not behave in the same way as if it were a linearly ordered dense set. But the result will still be a CDF. I will Google "generalized distribution function" and see what I come up with. In any event, it seems well defined for any order relation ≤ .

Mario Carneiro (Dec 18 2020 at 01:03):

The fact that there is map/comap and pullback/pushthrough in different parts of the library shows that there is not a unified view among all the mathlib contributors

To clarify: map and comap are used consistently in lean names, while "pullback" and "pushforward" are used in comments to help connect names to things in the literature (which are all over the place, as one would expect)

Mario Carneiro (Dec 18 2020 at 01:06):

It could be that makes it a category, which is something to think about. I.e. if you were building a probability library from scratch would you want to make it as category theory-ish as possible or just express the main ideas on their own.

I would not suggest adding additional layers unless it gives you an advantage in proving theorems. Instantiating a category is a thing you can do, but it's not particularly helpful on its own. There are lots of things in mathlib that could potentially be instantiations of the category library, but generally it's just another layer on an existing theorem. I would say prove the theorem and add the instantiation if and when you need it for a theorem

Lars Ericson (Dec 18 2020 at 01:08):

OK so my main concern, all abstraction aside, is whether any measurable_set S with ≤ for which this expression: F(x: S) = P({ω ∈ Ω : X(ω) ≤ x}) makes sense is necessarily linearly ordered, dense and isomorphic to ℝ, or whether any measurable_set with any ≤ defined on it will do.

Adam Topaz (Dec 18 2020 at 01:09):

Just to clarify, whenever you have a random variable X with codomain A, and some function from A to the type of all measurable sets in A, you can define something like a distribution function on A. The question is whether this function determines the probability density on A induced by X, and this is certainly not true in general.

Mario Carneiro (Dec 18 2020 at 01:10):

I think instead of formalizing definitions, you should try to get to the actual theorems, because we can work backward from there to the abstractions that are useful for formalizing the theorem. So far, I don't see any value in the definition random_variable, or distribution, or prob or distribution_function, they are all just wrappers around other things. So let's get to theorems where you really need some probability vocabulary to prove the theorem effectively

Lars Ericson (Dec 18 2020 at 01:11):

Random variable, distribution, distribution function and probability space are probability vocabulary.

Mario Carneiro (Dec 18 2020 at 01:11):

yes I'm aware

Mario Carneiro (Dec 18 2020 at 01:11):

a wrapper doesn't help you prove theorems

Lars Ericson (Dec 18 2020 at 01:11):

Google gave me this, maybe it's of some use: https://www.ams.org/journals/tran/1957-084-02/S0002-9947-1957-0085326-5/S0002-9947-1957-0085326-5.pdf

Mario Carneiro (Dec 18 2020 at 01:13):

more words to express things you could already express are not helpful. The power of a definition is when it abbreviates a nontrivial combination of things, like "group" meaning "associative binary operation with a neutral element"

Mario Carneiro (Dec 18 2020 at 01:13):

the latter expression is more cumbersome than the former, so the definition pulls its weight

Mario Carneiro (Dec 18 2020 at 01:14):

there are also a bunch of theorems about groups that are not theorems about monoids or associative operations on their own

Lars Ericson (Dec 18 2020 at 01:14):

I'm not really trying to prove a theorem, I'm just trying to get the type classes straight to describe the usual vocabulary of probability theory. That's all. Because even though that vocabulary is just sugar for things that are already inmathlib in one way or another, they are still challenging for me. I can't go tomathlib and see the usual vocabulary of probability theory. It's a modest exercise.

Mario Carneiro (Dec 18 2020 at 01:15):

So what's the raison d'etre of probability vocabulary? What fact about probability measures is not true about general measures? I have no doubt that such theorems exist, but nothing interesting will happen until you start trying to prove one of them

Mario Carneiro (Dec 18 2020 at 01:18):

It is really important to find and at least start trying to prove one of these theorems because it guides the structure of definitions around the whole topic. Without it I see all these definitions and they are all useless cruft with negative utility

Lars Ericson (Dec 18 2020 at 01:21):

My longer range goal is to understand say the construction of the stochastic Taylor series approximation of the Ito or Stratonovich integral. I am not trying to prove any new theorems about this topic. I'm trying to build up the language used in a textbook that goes through that topic. The textbook doesn't use the measure-theoretic equivalent of probability_space. It uses probability_space. I don't think I should feel bad about, for my own purposes in reading that book, providing a definition of probability_space.

Along the way I noticed that F(x: S) = P({ω ∈ Ω : X(ω) ≤ x} could possibly be generalized. @Adam Topaz expresses this as whether the random variable X "determines the probability density on A induced by X". It seems that "determines the probability density on A induced by X" is something that could be written in Lean and expressed as a condition that the order relation ≤ has to satisfy. So, for my own tastes and ambitions, I think it would be interesting to say formalize "determines the probability density on A induced by X" and see where that leads. It could lead to the requirement that the codomain of the random variable be isomorphic to the reals, which would be an interesting and natural explanation for why every text I can find restricts distribution_function to random variables into the reals.

Mario Carneiro (Dec 18 2020 at 01:22):

The textbook doesn't use the measure-theoretic equivalent of probability_space. It uses probability_space. I don't think I should feel bad about, for my own purposes in reading that book, providing a definition of probability_space.

The point isn't to feel bad about it, it is to build the theory of probability spaces by looking at how it is used

Mario Carneiro (Dec 18 2020 at 01:23):

The question to answer isn't what is a probability space, it is what do people do with probability spaces

Mario Carneiro (Dec 18 2020 at 01:24):

the purpose of a library is to make it easier to write proofs in the area

Mario Carneiro (Dec 18 2020 at 01:24):

so you have to look at the proofs to understand how to help

Mario Carneiro (Dec 18 2020 at 01:27):

Regarding what you would need to make the exact definition F(x: S) = P({ω ∈ Ω : X(ω) ≤ x}) generalize, I would expect that you need the sets {a | a < x} to generate the topology and hence the entire borel algebra. That is, you need an orderable_topology on the codomain

Mario Carneiro (Dec 18 2020 at 01:29):

oops it's called docs#order_closed_topology now

Mario Carneiro (Dec 18 2020 at 01:30):

I think you will need to get to some really interesting theorems before the codomain starts to be constrained to look anything like the reals

Adam Topaz (Dec 18 2020 at 01:45):

Law of large numbers might be a nice goal.

Lars Ericson (Dec 18 2020 at 02:09):

In terms of "what do people do with probability spaces", and the book I am reading, I have some very simple applications that I want to proof-check in Lean:

A one-size-fits-all probability space:([0,1], B([0,1]), [a,b] -> b-a). That's done. Cruft is to name this the Steinhaus space
Some basic language: Random variable, distribution, distribution function
Distribution function for a two-point RV with codomain {0,1} and probabilities p and 1-p
Uniform distribution U[a,b]
Exponential distribution
Gaussian distribution

For each of these I want to:

Start with Steinhaus as the domain of the random variable X,
Define the codomain as either a finite measurable space for a continuous random variable or the appropriate continuous codomain which could be ereal or ennreal or a real interval
Define the corresponding random variable X
From X derive the distribution function FX

This is not the usual order. The book and most presentations tend to start with the distribution function. I just like doing it in this order because it appeals to me. I'm not proving theorems really, just trying to create language for a sequence that appeals to me for presenting these different discrete and continuous distributions in a consistent kind of top-down way. Also, I am annoyed that the book introduces probability_space on page 3 and then drops it thereafter and just presents the distribution_function for each distribution. I think if you define something you should use it, so I'm spending my time doing it in this direction. It's just a personal choice.

Mario Carneiro (Dec 18 2020 at 02:16):

well the main issue I have with that list is that every single one is a definition, there are no theorems

Mario Carneiro (Dec 18 2020 at 02:17):

you need at least some sanity checks on the definitions

Mario Carneiro (Dec 18 2020 at 02:20):

Perhaps your goal might be to have a way to prove that a real function (which is what the books will give in order to define a distribution) yields a probability distribution, that's not completely obvious from the definitions

Mario Carneiro (Dec 18 2020 at 02:22):

you can then use that to get the exponential distribution and gaussian distribution

Mario Carneiro (Dec 18 2020 at 02:23):

I think you already did the uniform distribution earlier

Lars Ericson (Dec 18 2020 at 03:07):

The interesting (to me) part, which I already did by hand, is when they give a distribution function, to back out what the random variable is that has that distribution function. It's not always easy to get it right. So my work is to take the RV that I backed out and prove that if you grind it through the Lean distribution_function construction, that I get the same function. So that's a proof.

Regarding the generalized disrtibution function, I am trying to generalize

def distribution_function (X : random_variable α ℝ) : ℝ → ennreal := λ x,

def distribution_function (X : random_variable α β ) [preorder α] : β → (set.Icc (0 : ℝ) 1) := λ x,

Adam's distribution function full definition is

def distribution_function (X : random_variable α ℝ) : ℝ → ennreal := λ x,
  @probability_space.prob _ X.induced {c | c ≤ x} is_measurable_Iic

I am trying to understand the application of @probability_space.prob. I've got this:

import measure_theory.lebesgue_measure

open measure_theory

noncomputable theory

class probability_space (α : Type*) extends measure_space α :=
(is_probability_measure :  probability_measure volume)

namespace probability_space

lemma volume_univ {α : Type*} [probability_space α] : volume (set.univ : set α) = 1 :=
  is_probability_measure.measure_univ

def prob {α : Type*} [probability_space α] {S : set α} (h : is_measurable S) : ennreal :=
  volume S

end probability_space

structure random_variable (α β : Type*) [probability_space α] [measurable_space β] :=
(outcome : α → β)
(is_measurable_outcome : measurable outcome)

variables {α β : Type*} [probability_space α] [measurable_space β]

namespace random_variable

instance : has_coe_to_fun (random_variable α β) := ⟨_,outcome⟩

lemma measurable (X : random_variable α β) : measurable X := X.is_measurable_outcome

def induced (X : random_variable α β) : probability_space β :=
{ volume := measure.map X volume,
  is_probability_measure := begin
    constructor,
    erw measure_theory.measure.map_apply X.measurable is_measurable.univ,
    exact probability_space.volume_univ
  end }

variable MSα : measurable_space α
variable Pα : probability_space α
variable Pβ : probability_space  β
variable Sα : set α
variable Sβ : set β

lemma SMα : is_measurable Sα := sorry
lemma SMβ : is_measurable Sβ := sorry
def Xαβ : random_variable α β := sorry

#check Xαβ.induced -- X.induced : probability_space β

#check @probability_space.prob α Pα Sα (SMα Sα )
#check @probability_space.prob β Pβ Sβ (SMβ Sβ) -- FAILS

def distribution_function (X : random_variable α β ) [preorder α] : β → (set.Icc (0 : ℝ) 1) := λ x, sorry
--  @probability_space.prob _ X.induced {c | c ≤ x} is_measurable_Iic

end random_variable

Even though the one before it works, this fails:

#check @probability_space.prob β Pβ Sβ (SMβ Sβ) -- FAILS

I don't fully understand why the type of SMα is Mα : ∀ (Sα : set ?M_1), is_measurable Sα, so I have to re-supply Sα when I use it even though it's declared

lemma SMα : is_measurable Sα := sorry

#check @probability_space.prob α Pα Sα (SMα Sα )

works with type

probability_space.prob _ : ennreal

But

#check @probability_space.prob β Pβ Sβ (SMβ Sβ)

gives

synthesized type class instance is not definitionally equal to expression inferred by typing rules, synthesized
  _inst_2
inferred
  measure_space.to_measurable_space

I am using @ to avoid this kind of mystery but it's not helping in this case. Any suggestions? This is a problem because in this line

@probability_space.prob _ X.induced {c | c ≤ x} is_measurable_Iic

the term X.induced is going to have type probability_space β. So I need to understand how to fill out each of the arguments to @probability_space.prob but I can't even #check it in the β case, it only works for α.

The other minor issue is that probability_space.prob has return type ennreal when I know that in the end it will be constrained to (set.Icc (0 : ℝ) 1) by probability_measure volume. I'd rather just say it in the return type rather than ennreal, but that doesn't go through.

Mario Carneiro (Dec 18 2020 at 03:14):

So my work is to take the RV that I backed out and prove that if you grind it through the Lean distribution_function construction, that I get the same function. So that's a proof.

Can you state this theorem?

Mario Carneiro (Dec 18 2020 at 03:15):

it's not entirely clear to me what you mean by it

Mario Carneiro (Dec 18 2020 at 03:17):

I'm not sure there is anything to be gained with the #checks, all those variables are tripping you up. Just state a def or theorem and try to get the assumptions and the statement of the theorem right

Mario Carneiro (Dec 18 2020 at 03:18):

lemma SMα is false, it asserts all sets are measurable

Mario Carneiro (Dec 18 2020 at 03:18):

lemma SMβ is the same as lemma SMα

Lars Ericson (Dec 18 2020 at 05:03):

The point of the #checks was simply to check the type of the arguments to the expression, I was trying to figure out what kind of thing to fill in the blank for each of the slots in probability_space.prob. I know that it just applies the volume function, but in the explicit @ form there are a lot of things that need to be in place for the application to go through, and I was trying to understand their types.

The theorem to prove is as follows (for the type of random variables in the text, which is the usual type, not the generalized type I want):

We are given, in the text F_X: ℝ → [0,1] , the distribution_function for the random_variable named X: [0,1] → ℝ.
I construct, by inspection, a definition myX for the random variableX.
There is a distribution, μ_X : borel ℝ → [0,1], which is implicit.
In the Steinhaus space we have uniform probability measure 'P: borel ℝ → [0,1]`.
By construction (this is already proven out of the by the machinery in the cruft above), we have that μ_X = P ∘ X⁻¹
To prove: that myX = λ x, μ_X (set.Icc -∞ x)

For example, consider the two point distribution:
Screenshot-from-2020-12-18-00-03-15.png

Lars Ericson (Dec 18 2020 at 05:21):

Let x₁, x₂ ∈ ℝ and p₁ be the probability of drawing x₁. Then we can define X(ω) = if ω < p₁ then x₁ else x₂ or

def myX(ω: set.Icc (0 : ℝ) 1) := if ω < p₁ then x₁ else x₂

also, the text is telling us that

def FX (x: ℝ ) := if x<x₁ then 0 else if x₁ ≤ x < x₂ then p₁ else 1

and then the proof to achieve is

example : FX = λ x, (Steinhaus.volume ∘ myX.induced) (set.Icc ∞ x) := sorry

I find this interesting and a little bit tricky. It's not rocket science but it does seem to take some effort to get all the pieces in place. It's also not the generalized distribution function I was going on about, but it sketches out the path through the woods I am looking to take.

Mario Carneiro (Dec 18 2020 at 05:26):

I don't think X(ω) = if ω < p₁ then x₁ else x₂ is correct. What you need is for the probability that X = x1 to equal p1, meaning that the event space is bool or perhaps a general probability space on which you posit an event A whose probability is p1

Lars Ericson (Dec 18 2020 at 14:45):

On the index space [0,1] I am defining an eventA=[0,p1). According to Kloeden et al's book, I(A)(ω)= if ω∈A then x₁ else x₂ is a random variable on this space, for x₁=0 and x₂=1. Please let me know if I'm reading this wrong:
Screenshot-from-2020-12-18-09-43-54.png

Lars Ericson (Dec 18 2020 at 14:58):

Note instead of clumping the events in [0,p1) I could have defined A to be the union of a bunch of more randomly scattered subsets. That's not necessary for the definitions to work. There is no inherent "randomness" in random_variable in that sense. A random_variable is a fully determined function from a sample space to a measurable space.

Mario Carneiro (Dec 18 2020 at 16:44):

You don't need to define A to be anything in particular, it can be a hypothesis

Mario Carneiro (Dec 18 2020 at 16:44):

it's more general this way anyway because you don't know what event space is needed in order to match all the other constraints in a downstream theorem

Lars Ericson (Dec 18 2020 at 17:37):

The Kloeden book brings in (briefly) the idea of probability space, so I want to demonstrate it when explaining things to myself. I can use (Ω,A,P)= ([0,1],B([0,1]),[a,b]↦ b-a) as a "one sits fits all" probability space. Then I want to define a random variable on that space, obtain it's distribution D, and then obtain the distribution function F, which results in a new distribution (probability) space like (S,Σ,D)=(R, B(R), D) with CDF F. I want to do that in Lean for the 6 particular cases in the book. I'm not trying to make the most general Lean probability theory module, or prove old or new theorems. I'm just trying to demonstrate a little piece of mechanism, to myself, because it seems pretty to me. I don't expect anybody else to find it interesting. Think of it as me doing beginner-level gardening in my back yard. It's a hobby, not a calling.

On the task of describing stochastic integration at a Lean level of detail, Wilfrid Kendall has done a lot of work, 21 years ago, which seems easily adaptable. (Adapting old stuff to new contexts is also more hobby gardening, not necessarily of general interest.) Kendall worked in Reduce, then Axiom with fine details, then Mathematica. (Note, his .ps.gz files are really just .ps files, they aren't zipped.) His Axiom papers provide the most detail. Axiom has a much richer type system than Mathematica, so the Axiom implementation is an improvement on it's successor.

Mario Carneiro (Dec 18 2020 at 17:49):

I can use (Ω,A,P)= ([0,1],B([0,1]),[a,b]↦ b-a) as a "one sits fits all" probability space.

It's not, though, as soon as you have more than one variable. What if you have two {0,1} variables X and Y with probabilities p and q to be 1, which are independent? This model will not help there.

Lars Ericson (Dec 18 2020 at 18:52):

These 6 examples are all univariate. Steinhaus space is a good enough indexing space for univariate examples whether continuous or discrete. I'm not sure that what's in the sketch above (aside from my desire to generalize distribution_function) excludes product measures. Baby steps!

Mario Carneiro (Dec 18 2020 at 19:00):

it's generally easier to not commit to particular example spaces if you can help it. Define a gaussian distributed RV over any event space

Mario Carneiro (Dec 18 2020 at 19:00):

you are doing lean on hard mode for some reason I don't understand

Mario Carneiro (Dec 18 2020 at 19:01):

The use of [0,1] is AFAICT entirely superfluous, and your proofs and definitions will be smaller and easier without it

Mario Carneiro (Dec 18 2020 at 19:03):

certainly the book you are reading from has made no such commitment to an event space

Lars Ericson (Dec 18 2020 at 19:40):

These are toy examples for exposition of a small point. Before I can generalize these toy examples I have to be able to do them first. I'm not there yet. I can't generalize something I don't know how to do in a smaller context.

In terms of a grander design, the above sketch has

def prob {α : Type*} [probability_space α] {S : set α} (h : is_measurable S) : ennreal :=
  volume S

I would like to be able to say

def prob {α : Type*} [probability_space α] {S : set α} (h : is_measurable S) : (set.Icc (0 : ℝ) 1) :=
  volume S

because probability measures by definition have as codomain [0,1]. Here's something I need to prove, but I don't know how to say it correctly in Lean:

lemma volume_type_01 {α : Type*}  [probability_space α] (X: set α) :
  ((volume X) : (set.Icc (0 : ℝ) 1)) := sorry

I can say

lemma volume_gt_0_lt_1 {α : Type*}  [probability_space α] (X: set α) :
  0 ≤ (volume X) ∧ (volume X) ≤ 1 := sorry

but that doesn't give me a way to replace ennreal with (set.Icc (0 : ℝ) 1).

Adam Topaz (Dec 18 2020 at 19:43):

I suggest using docs#ennreal.to_real

Mario Carneiro (Dec 18 2020 at 19:57):

These are toy examples for exposition of a small point. Before I can generalize these toy examples I have to be able to do them first.

The general case is easier than the specific case. I know this is counterintuitive, but it's true. The toy examples are way harder than general theorems about all measure spaces or all probability spaces, because the contingent details of the example don't get in the way

Lars Ericson (Dec 18 2020 at 20:24):

OK. The other thing I want to prove, to get the generalized distribution function, is something I at least know how to state:

lemma is_measurable_measurable_with_le (xx: β) [measurable_space β] [has_le β] : is_measurable {c | c ≤ xx} :=
  sorry

With this, the following #check works fine:

variable XX: random_variable α β
variable xx : β
variable Xlte : has_le β

#check @probability_space.prob β XX.induced {c | c ≤ xx} _  -- probability_space.prob ?M_1 : ennreal

However the exact same expression preceded by a def f := instead of a #check results in a fail:

def generalized_distribution_function := @probability_space.prob β XX.induced {c | c ≤ xx} _ -- ERROR

The fail is:

failed to synthesize type class instance for
α : Type u_1,
β : Type u_2,
_inst_1 : probability_space α,
_inst_2 : measurable_space β,
XX : random_variable α β,
xx c : β
⊢ has_le β

where the whole file is:

import measure_theory.lebesgue_measure
open measure_theory
noncomputable theory

class probability_space (α : Type*) extends measure_space α :=
(is_probability_measure :  probability_measure volume)

namespace probability_space

lemma volume_univ {α : Type*} [probability_space α] : volume (set.univ : set α) = 1 :=
  is_probability_measure.measure_univ

def prob {α : Type*} [probability_space α] {S : set α} (h : is_measurable S) : ennreal :=
  volume S

end probability_space

structure random_variable (α β : Type*) [probability_space α] [measurable_space β] :=
(outcome : α → β)
(is_measurable_outcome : measurable outcome)

variables {α β : Type*} [probability_space α] [measurable_space β]

namespace random_variable
instance : has_coe_to_fun (random_variable α β) := ⟨_,outcome⟩
lemma measurable (X : random_variable α β) : measurable X := X.is_measurable_outcome

def induced (X : random_variable α β) : probability_space β :=
{ volume := measure.map X volume,
  is_probability_measure := begin
    constructor,
    erw measure_theory.measure.map_apply X.measurable is_measurable.univ,
    exact probability_space.volume_univ
  end }

def distribution_function (X : random_variable α ℝ) : ℝ → ennreal := λ x,
  @probability_space.prob _ X.induced {c | c ≤ x} is_measurable_Iic

lemma is_measurable_measurable_with_le (xx: β) [measurable_space β] [has_le β] : is_measurable {c | c ≤ xx} :=
  sorry

variable XX: random_variable α β
variable xx : β
variable Xlte : has_le β

#check @probability_space.prob β XX.induced {c | c ≤ xx} _  -- probability_space.prob ?M_1 : ennreal

def generalized_distribution_function := @probability_space.prob β XX.induced {c | c ≤ xx} _ -- ERROR

end random_variable

Kevin Buzzard (Dec 18 2020 at 20:27):

As you can see, your variable Xlte is nowhere in your local context. has_le is a typeclass, so you need to use square brackets. Note that has_le is just an arbitrary relation, and whilst I know nothing about probability theory or measure theory, assuming [has_le \beta] is a very weak statement.

Lars Ericson (Dec 19 2020 at 00:38):

Thanks @Kevin Buzzard I figured out how to state things in a way that typechecks. Not quite done yet, I want to replace ennreal with real interval [0,1] following @Adam Topaz 's hint. Here is how it is with ennreal. There is just no sorry. Instead I introduced a new structure is_measurable_with_le to hold the proof that the set is measurable. It's a punt, but a more well-defined punt:

import measure_theory.lebesgue_measure
open measure_theory
noncomputable theory

class probability_space (α : Type*) extends measure_space α :=
(is_probability_measure :  probability_measure volume)

namespace probability_space

lemma volume_univ {α : Type*} [probability_space α] : volume (set.univ : set α) = 1 :=
  is_probability_measure.measure_univ

def prob {α : Type*} [probability_space α] {S : set α} (h : is_measurable S) : ennreal :=
  volume S

end probability_space

structure random_variable (α β : Type*) [probability_space α] [measurable_space β] :=
(outcome : α → β)
(is_measurable_outcome : measurable outcome)

variables {α β : Type*} [probability_space α] [measurable_space β]

namespace random_variable
instance : has_coe_to_fun (random_variable α β) := ⟨_,outcome⟩
lemma measurable (X : random_variable α β) : measurable X := X.is_measurable_outcome

def induced (X : random_variable α β) : probability_space β :=
{ volume := measure.map X volume,
  is_probability_measure := begin
    constructor,
    erw measure_theory.measure.map_apply X.measurable is_measurable.univ,
    exact probability_space.volume_univ
  end }

def distribution_function (X : random_variable α ℝ) : ℝ → ennreal := λ x,
  @probability_space.prob _ X.induced {c | c ≤ x} is_measurable_Iic

structure is_measurable_with_le (β: Type*) (b: β) [measurable_space β] [preorder β] :=
 (proof: is_measurable {c: β | c ≤ b})

def generalized_distribution_function (α β : Type*) [probability_space α] [mb: measurable_space β]
          [po: preorder β] (X: random_variable α β) (b: β) [im: @is_measurable_with_le β b mb po]  :=
    @probability_space.prob β X.induced {c | c ≤ b} im.proof

end random_variable

Mario Carneiro (Dec 19 2020 at 01:46):

Here are some theorems. Try filling the sorries at the end.

import measure_theory.lebesgue_measure
open measure_theory
noncomputable theory

class probability_space (α : Type*) extends measure_space α :=
(is_probability_measure :  probability_measure volume)

namespace probability_space
variables {α : Type*} [probability_space α]

lemma volume_univ : volume (set.univ : set α) = 1 :=
is_probability_measure.measure_univ

lemma volume_le_one (S : set α) : volume S ≤ 1 :=
by rw ← @volume_univ α; exact measure_mono (set.subset_univ _)

def prob (S : set α) : ℝ := (volume S).to_real

def nnprob (S : set α) : nnreal := (volume S).to_nnreal

theorem volume_eq_nnprob (S : set α) : volume S = nnprob S :=
(ennreal.coe_to_nnreal $ ne_top_of_le_ne_top (by rintro ⟨⟩) (volume_le_one S)).symm

theorem prob_eq_nnprob (S : set α) : prob S = nnprob S := rfl

theorem nnprob_le_one (S : set α) : nnprob S ≤ 1 :=
ennreal.coe_le_coe.1 $ by rw ← volume_eq_nnprob; exact volume_le_one S

theorem prob_le_one (S : set α) : prob S ≤ 1 := nnprob_le_one _

theorem zero_le_prob (S : set α) : 0 ≤ prob S := nnreal.zero_le_coe

theorem nnprob_mono {S T : set α} (h : S ⊆ T) : nnprob S ≤ nnprob T :=
ennreal.coe_le_coe.1 $ by simpa only [← volume_eq_nnprob] using measure_mono h

theorem prob_mono {S T : set α} (h : S ⊆ T) : prob S ≤ prob T :=
nnreal.coe_le_coe.2 $ nnprob_mono h

end probability_space

theorem measurable.map_probability_measure {α β} [probability_space α] [measurable_space β]
  {f : α → β} (hf : measurable f) :
  probability_measure (measure.map f volume) :=
⟨by rw measure_theory.measure.map_apply hf is_measurable.univ;
    exact probability_space.volume_univ⟩

structure random_variable (α β : Type*) [probability_space α] [measurable_space β] :=
(outcome : α → β)
(is_measurable_outcome : measurable outcome)

variables {α β : Type*} [probability_space α] [measurable_space β]

namespace random_variable
instance : has_coe_to_fun (random_variable α β) := ⟨_,outcome⟩
variable (X : random_variable α β)
lemma measurable : measurable X := X.is_measurable_outcome

def induced (X : random_variable α β) : probability_space β :=
{ volume := measure.map X volume,
  is_probability_measure := X.measurable.map_probability_measure }

variables [preorder β]

def CDF (x : β) : ℝ := @probability_space.prob _ X.induced {c | c ≤ x}

theorem zero_le_CDF (x : β) : 0 ≤ X.CDF x := sorry
theorem CDF_le_one (x : β) : X.CDF x ≤ 1 := sorry
theorem CDF_mono {x y : β} (h : x ≤ y) : X.CDF x ≤ X.CDF y := sorry

variables [topological_space β] [order_closed_topology β] [opens_measurable_space β]

example (b : β) : is_measurable {c | c ≤ b} := is_closed.is_measurable is_closed_Iic

end random_variable

Lars Ericson (Dec 19 2020 at 02:14):

Thank you @Mario Carneiro I will work on that.

Lars Ericson (Dec 19 2020 at 18:23):

I was able to prove zero_le_CDF and CDF_le_one. I am stuck on CDF_mono. I have it reduced to a proposition of form A implies B where I know B. I have a lemma for A. I don't know how to introduce the results of A into the hypothesis set. That's a simple Lean thing but I don't know how to do it. Then if I had that I could finish. Also I have to prove A which also seems simple but I don't know how to get started. Where

A is{c : β | c ≤ x} ⊆ {c : β | c ≤ y}
B is @probability_space.prob_mono β X.induced {c : β | c ≤ x} {c : β | c ≤ y} to hp

Here is the sketch so far. The part where I'm stuck as at the bottom:

import measure_theory.lebesgue_measure
import tactic

noncomputable theory

open measure_theory

class probability_space (α : Type*) extends measure_space α :=
(is_probability_measure :  probability_measure volume)

namespace probability_space
variables {α : Type*} [probability_space α]

lemma volume_univ : volume (set.univ : set α) = 1 :=
is_probability_measure.measure_univ

lemma volume_le_one (S : set α) : volume S ≤ 1 :=
by rw ← @volume_univ α; exact measure_mono (set.subset_univ _)

def prob (S : set α) : ℝ := (volume S).to_real

def nnprob (S : set α) : nnreal := (volume S).to_nnreal

theorem volume_eq_nnprob (S : set α) : volume S = nnprob S :=
(ennreal.coe_to_nnreal $ ne_top_of_le_ne_top (by rintro ⟨⟩) (volume_le_one S)).symm

theorem prob_eq_nnprob (S : set α) : prob S = nnprob S := rfl

theorem nnprob_le_one (S : set α) : nnprob S ≤ 1 :=
ennreal.coe_le_coe.1 $ by rw ← volume_eq_nnprob; exact volume_le_one S

theorem prob_le_one (S : set α) : prob S ≤ 1 := nnprob_le_one _

theorem zero_le_prob (S : set α) : 0 ≤ prob S := nnreal.zero_le_coe

theorem nnprob_mono {S T : set α} (h : S ⊆ T) : nnprob S ≤ nnprob T :=
ennreal.coe_le_coe.1 $ by simpa only [← volume_eq_nnprob] using measure_mono h

theorem prob_mono {S T : set α} (h : S ⊆ T) : prob S ≤ prob T :=
nnreal.coe_le_coe.2 $ nnprob_mono h

end probability_space

theorem measurable.map_probability_measure {α β} [probability_space α] [measurable_space β]
  {f : α → β} (hf : measurable f) :
  probability_measure (measure.map f volume) :=
⟨by rw measure_theory.measure.map_apply hf is_measurable.univ;
    exact probability_space.volume_univ⟩

structure random_variable (α β : Type*) [probability_space α] [measurable_space β] :=
(outcome : α → β)
(is_measurable_outcome : measurable outcome)

variables {α β : Type*} [probability_space α] [measurable_space β]

namespace random_variable
instance : has_coe_to_fun (random_variable α β) := ⟨_,outcome⟩
variable (X : random_variable α β)
lemma measurable : measurable X := X.is_measurable_outcome

def induced (X : random_variable α β) : probability_space β :=
{ volume := measure.map X volume,
  is_probability_measure := X.measurable.map_probability_measure }

variables [preorder β]

def CDF (x : β) : ℝ := @probability_space.prob _ X.induced {c | c ≤ x}

variables [topological_space β] [order_closed_topology β] [opens_measurable_space β]

example (b : β) : is_measurable {c | c ≤ b} := is_closed.is_measurable is_closed_Iic

lemma prob_eq_CDF (x : β) : X.CDF x = @probability_space.prob β X.induced {c | c ≤ x} := rfl

theorem zero_le_CDF (x : β) : 0 ≤ X.CDF x :=
begin
  rw prob_eq_CDF,
  exact (@probability_space.zero_le_prob β X.induced {c : β | c ≤ x}),
end

theorem CDF_le_one (x : β) : X.CDF x ≤ 1 :=
begin
  rw prob_eq_CDF,
  exact (@probability_space.prob_le_one β X.induced {c : β | c ≤ x}),
end

lemma factoid (x y : β ): {c : β | c ≤ x} ⊆ {c : β | c ≤ y} :=
begin
  sorry   -- STUCK
end

theorem CDF_mono {x y : β} (h : x ≤ y) : X.CDF x ≤ X.CDF y :=
begin
  repeat { rw prob_eq_CDF },
  -- introduce hp: factoid(x y)  -- STUCK
  -- apply @probability_space.prob_mono β X.induced {c : β | c ≤ x} {c : β | c ≤ y}  to hp
  sorry
end

end random_variable

Mario Carneiro (Dec 19 2020 at 18:31):

The factoid is false, because you forgot the essential assumption x <= y! Do you see why this theorem should be true mathematically?

Mario Carneiro (Dec 19 2020 at 18:37):

also you changed the theorems I wrote by moving them after the variables [topological_space beta]... line. They were written before that line because they don't require any of that stuff to be proven

Mario Carneiro (Dec 19 2020 at 18:39):

In fact, I think that line is wrong; indeed there shouldn't be any measurable space structure on beta, because the theorems are all about X.induced which uses only the measurable structure on alpha

Lars Ericson (Dec 19 2020 at 19:03):

I moved those up just to put the sorry parts the bottom. I will move back down. How is this for factoid:

lemma factoid [preorder β] (x y : β ) (h : x ≤ y) : {c : β | c ≤ x} ⊆ {c : β | c ≤ y} :=
begin
  sorry
end

Given factoid what's the proof language to introduce the result of a lemma into the hypothesis set? If I have revised factoid then I need that to apply here:

theorem CDF_mono {x y : β} (h : x ≤ y) : X.CDF x ≤ X.CDF y :=
begin
  repeat { rw prob_eq_CDF },
  -- introduce hp: factoid(x y)
  -- apply @probability_space.prob_mono β X.induced {c : β | c ≤ x} {c : β | c ≤ y}  to hp
  sorry
end

The revised sketch is

import measure_theory.lebesgue_measure
import tactic

noncomputable theory

open measure_theory

class probability_space (α : Type*) extends measure_space α :=
(is_probability_measure :  probability_measure volume)

namespace probability_space
variables {α : Type*} [probability_space α]

lemma volume_univ : volume (set.univ : set α) = 1 :=
is_probability_measure.measure_univ

lemma volume_le_one (S : set α) : volume S ≤ 1 :=
by rw ← @volume_univ α; exact measure_mono (set.subset_univ _)

def prob (S : set α) : ℝ := (volume S).to_real

def nnprob (S : set α) : nnreal := (volume S).to_nnreal

theorem volume_eq_nnprob (S : set α) : volume S = nnprob S :=
(ennreal.coe_to_nnreal $ ne_top_of_le_ne_top (by rintro ⟨⟩) (volume_le_one S)).symm

theorem prob_eq_nnprob (S : set α) : prob S = nnprob S := rfl

theorem nnprob_le_one (S : set α) : nnprob S ≤ 1 :=
ennreal.coe_le_coe.1 $ by rw ← volume_eq_nnprob; exact volume_le_one S

theorem prob_le_one (S : set α) : prob S ≤ 1 := nnprob_le_one _

theorem zero_le_prob (S : set α) : 0 ≤ prob S := nnreal.zero_le_coe

theorem nnprob_mono {S T : set α} (h : S ⊆ T) : nnprob S ≤ nnprob T :=
ennreal.coe_le_coe.1 $ by simpa only [← volume_eq_nnprob] using measure_mono h

theorem prob_mono {S T : set α} (h : S ⊆ T) : prob S ≤ prob T :=
nnreal.coe_le_coe.2 $ nnprob_mono h

end probability_space

theorem measurable.map_probability_measure {α β} [probability_space α] [measurable_space β]
  {f : α → β} (hf : measurable f) :
  probability_measure (measure.map f volume) :=
⟨by rw measure_theory.measure.map_apply hf is_measurable.univ;
    exact probability_space.volume_univ⟩

structure random_variable (α β : Type*) [probability_space α] [measurable_space β] :=
(outcome : α → β)
(is_measurable_outcome : measurable outcome)

variables {α β : Type*} [probability_space α] [measurable_space β]

namespace random_variable
instance : has_coe_to_fun (random_variable α β) := ⟨_,outcome⟩
variable (X : random_variable α β)
lemma measurable : measurable X := X.is_measurable_outcome

def induced (X : random_variable α β) : probability_space β :=
{ volume := measure.map X volume,
  is_probability_measure := X.measurable.map_probability_measure }

variables [preorder β]

def CDF (x : β) : ℝ := @probability_space.prob _ X.induced {c | c ≤ x}

lemma prob_eq_CDF (x : β) : X.CDF x = @probability_space.prob β X.induced {c | c ≤ x} := rfl

theorem zero_le_CDF (x : β) : 0 ≤ X.CDF x :=
begin
  rw prob_eq_CDF,
  exact (@probability_space.zero_le_prob β X.induced {c : β | c ≤ x}),
end

theorem CDF_le_one (x : β) : X.CDF x ≤ 1 :=
begin
  rw prob_eq_CDF,
  exact (@probability_space.prob_le_one β X.induced {c : β | c ≤ x}),
end

lemma factoid [preorder β] (x y : β ) (h : x ≤ y) : {c : β | c ≤ x} ⊆ {c : β | c ≤ y} :=
begin
  sorry
end

theorem CDF_mono {x y : β} (h : x ≤ y) : X.CDF x ≤ X.CDF y :=
begin
  repeat { rw prob_eq_CDF },
  -- introduce hp: factoid(x y)
  -- apply @probability_space.prob_mono β X.induced {c : β | c ≤ x} {c : β | c ≤ y}  to hp
  sorry
end

variables [topological_space β] [order_closed_topology β] [opens_measurable_space β]

example (b : β) : is_measurable {c | c ≤ b} := is_closed.is_measurable is_closed_Iic

end random_variable

Mario Carneiro (Dec 19 2020 at 19:32):

Have you read #tpil or played the natural number game? They introduce all the tools you need to write proofs. In this case, you need have := factoid x y h.

Lars Ericson (Dec 19 2020 at 19:59):

Yes I did both but in their examples the have is usually something like have hp : p, from h.left so a labelled proposition from a tactic applied to the existing goal state. I didn't find any examples in #tpil of have := with a term. In #tpil and the number game, lemmas and theorems are used into introduce equalities that get applied with rw tactic or implications that get applied with apply or exact.

Lars Ericson (Dec 19 2020 at 19:59):

Anyway, DONE:

import measure_theory.lebesgue_measure
import tactic

noncomputable theory

open measure_theory

class probability_space (α : Type*) extends measure_space α :=
(is_probability_measure :  probability_measure volume)

namespace probability_space
variables {α : Type*} [probability_space α]

lemma volume_univ : volume (set.univ : set α) = 1 :=
is_probability_measure.measure_univ

lemma volume_le_one (S : set α) : volume S ≤ 1 :=
by rw ← @volume_univ α; exact measure_mono (set.subset_univ _)

def prob (S : set α) : ℝ := (volume S).to_real

def nnprob (S : set α) : nnreal := (volume S).to_nnreal

theorem volume_eq_nnprob (S : set α) : volume S = nnprob S :=
(ennreal.coe_to_nnreal $ ne_top_of_le_ne_top (by rintro ⟨⟩) (volume_le_one S)).symm

theorem prob_eq_nnprob (S : set α) : prob S = nnprob S := rfl

theorem nnprob_le_one (S : set α) : nnprob S ≤ 1 :=
ennreal.coe_le_coe.1 $ by rw ← volume_eq_nnprob; exact volume_le_one S

theorem prob_le_one (S : set α) : prob S ≤ 1 := nnprob_le_one _

theorem zero_le_prob (S : set α) : 0 ≤ prob S := nnreal.zero_le_coe

theorem nnprob_mono {S T : set α} (h : S ⊆ T) : nnprob S ≤ nnprob T :=
ennreal.coe_le_coe.1 $ by simpa only [← volume_eq_nnprob] using measure_mono h

theorem prob_mono {S T : set α} (h : S ⊆ T) : prob S ≤ prob T :=
nnreal.coe_le_coe.2 $ nnprob_mono h

end probability_space

theorem measurable.map_probability_measure {α β} [probability_space α] [measurable_space β]
  {f : α → β} (hf : measurable f) :
  probability_measure (measure.map f volume) :=
⟨by rw measure_theory.measure.map_apply hf is_measurable.univ;
    exact probability_space.volume_univ⟩

structure random_variable (α β : Type*) [probability_space α] [measurable_space β] :=
(outcome : α → β)
(is_measurable_outcome : measurable outcome)

variables {α β : Type*} [probability_space α] [measurable_space β]

namespace random_variable
instance : has_coe_to_fun (random_variable α β) := ⟨_,outcome⟩
variable (X : random_variable α β)
lemma measurable : measurable X := X.is_measurable_outcome

def induced (X : random_variable α β) : probability_space β :=
{ volume := measure.map X volume,
  is_probability_measure := X.measurable.map_probability_measure }

variables [preorder β]

def CDF (x : β) : ℝ := @probability_space.prob _ X.induced {c | c ≤ x}

lemma prob_eq_CDF (x : β) : X.CDF x = @probability_space.prob β X.induced {c | c ≤ x} := rfl

theorem zero_le_CDF (x : β) : 0 ≤ X.CDF x :=
begin
  rw prob_eq_CDF,
  exact (@probability_space.zero_le_prob β X.induced {c : β | c ≤ x}),
end

theorem CDF_le_one (x : β) : X.CDF x ≤ 1 :=
begin
  rw prob_eq_CDF,
  exact (@probability_space.prob_le_one β X.induced {c : β | c ≤ x}),
end

lemma factoid [preorder β] (x y : β ) (h : x ≤ y) : {c : β | c ≤ x} ⊆ {c : β | c ≤ y} :=
  set.Iic_subset_Iic.mpr h

theorem CDF_mono {x y : β} (h : x ≤ y) : X.CDF x ≤ X.CDF y :=
begin
  repeat { rw prob_eq_CDF },
  have h1 := factoid x y h,
  have h2 := @probability_space.prob_mono β X.induced {c : β | c ≤ x} {c : β | c ≤ y},
  exact h2 h1,
end

variables [topological_space β] [order_closed_topology β] [opens_measurable_space β]

example (b : β) : is_measurable {c | c ≤ b} := is_closed.is_measurable is_closed_Iic

end random_variable

Mario Carneiro (Dec 19 2020 at 20:21):

In your example h.left is a term

Mario Carneiro (Dec 19 2020 at 20:22):

Note that have := (which is part of the less abbreviated have h : t := term) is the tactic mode version; the term mode version is have h : t, from term,.

Mario Carneiro (Dec 19 2020 at 20:26):

It looks like have is introduced in https://wwwf.imperial.ac.uk/~buzzard/xena/natural_number_game/?world=5&level=3

Lars Ericson (Dec 19 2020 at 20:36):

Thanks you are correct! In terms of my original generalized distribution_function goal, the above can be trimmed down to the assumption that a random variable is from a probability space α to a measurable space β which is a preorder. CDF is from β to ℝ. Ideally it should be to set.Icc (0: ℝ) 1. I guess the additional theorems will help me prove that and restate def CDF (x : β) : ℝ as def CDF (x : β) : set.Icc (0: ℝ) 1? The example at the end shows how to prove that {c | c ≤ b} is measurable for a type which is a topological space, an order closed topology and a measurable space. Bu those don't seem to be conditions required by the factoid, which only needs preorder. Are all these properties needed for anything or just measurable space with preorder?

Mario Carneiro (Dec 19 2020 at 20:55):

Ideally it should be to set.Icc (0: ℝ) 1.

As you can see, we have theorems that it is between zero and one. This is generally preferable up until you actually need to put it into a structure that expects set.Icc (0: ℝ) 1, which you don't currently have

Mario Carneiro (Dec 19 2020 at 20:57):

The example at the end shows how to prove that {c | c ≤ b} is measurable for a type which is a topological space, an order closed topology and a measurable space. Bu those don't seem to be conditions required by the factoid, which only needs preorder. Are all these properties needed for anything or just measurable space with preorder?

I put it as an example to show that it is derivable, but so far nothing actually needs those assumptions. You haven't done anything that requires {c | c ≤ b} to be measurable yet, but if you do those are the assumptions you will need (at that point, and not before)

Lars Ericson (Dec 19 2020 at 21:00):

Can you help me with the mechanics of this:

def distribution_function (x: β) :  set.Icc (0: ℝ) 1 := (CDF x).to_set_Icc_01

It seems I would need to define

def nat.to_set_Icc_01 (x: ℝ) := sorry

but that doesn't use the 0/1 theorems and doesn't seem right.

Mario Carneiro (Dec 19 2020 at 21:02):

def CDF_in_01 (x: β) : set.Icc (0 : ℝ) 1 := ⟨X.CDF x, X.zero_le_CDF x, X.CDF_le_one x⟩

Mario Carneiro (Dec 19 2020 at 21:02):

to reiterate, don't use this unless you need it

Lars Ericson (Dec 19 2020 at 21:03):

It's in the nature of CDF, what would be the aesthetic or practical reason not to constrain the codomain in this way?

Mario Carneiro (Dec 19 2020 at 21:04):

because real has operations on it, like addition, while set.Icc 0 1 doesn't

Lars Ericson (Dec 19 2020 at 21:05):

Anyway isn't it exciting to know that CDF can be generalized to any measurable space with a preorder? It's maybe a useless fact but I've never seen it in any presentation of CDF.

Mario Carneiro (Dec 19 2020 at 21:06):

Sure, this kind of "trivial generalization" comes up all the time in mathlib. It makes you consider questions the original authors hadn't thought about, like what closure properties are needed to prove that the CDF is left-continuous?

Mario Carneiro (Dec 19 2020 at 21:07):

You just read what the authors say, delete all the assumptions, and then just add the assumptions if they become necessary and not before

Mario Carneiro (Dec 19 2020 at 21:08):

A nonzero proportion of the time you never add some of them back

Mario Carneiro (Dec 19 2020 at 21:08):

In fact mathlib has a linter which will do this for you

Lars Ericson (Dec 19 2020 at 21:12):

If you don't mind my quoting your proof above, I can now close my old Stack Exchange question with a definitive answer.

Mario Carneiro (Dec 19 2020 at 21:17):

It seems that you got an answer in that question already:

CDFs are defined only for $\mathbb{R}$ (even for $\mathbb{R}^n$ ) and are used for convenience. The push-forward is of a measure $\mu$ by a Random variable $X:(\Omega,\mathscr{F},\mu)\rightarrow(S,\Sigma)$ , $\mu_X(A):=\mu(X\in A)$ is the object of interest and it is as general as it can be, since $(S,\Sigma)$ has no structure other than being a measurable space.

Mario Carneiro (Dec 19 2020 at 21:18):

This is another way to say that in a more general context we stop talking about the CDF because the pushforward measure is more general

Mario Carneiro (Dec 19 2020 at 21:19):

A CDF is only useful to the extent that it determines the pushforward measure

Lars Ericson (Dec 19 2020 at 21:27):

CDF is defined on points in S. The convention is that only the case S=R has any use. What we showed is that if S is any measurable space with a preorder, CDF can still be computed and give back a number in [0,1]. So I can define a preorder on a domain like binary expression trees of depth at most n and compute the CDF of a tree in the space of all binary trees depth at most n. I can say that and say that it is interesting to me because it is. This has nothing to do with using CDF to determine the pushforward measure. So that comment doesn't make any sense to me. I got the result I was looking for, and it doesn't seem to be relevant to that comment. That or it's not possible to put a preorder on binary trees, and continuing on in the vein, maybe the claim is that any measurable set with preorder is isomorphic to ℝ. Is that the case?

Mario Carneiro (Dec 19 2020 at 21:28):

Yes you can compute a number, but the question is what does that number mean? It's not true for a general preorder that giving the CDF of the random variable uniquely determines the distribution of the variable

Mario Carneiro (Dec 19 2020 at 21:29):

maybe the claim is that any measurable set with preorder is isomorphic to ℝ. Is that the case?

This is certainly not the case

Mario Carneiro (Dec 19 2020 at 21:31):

you have a function called CDF, but that doesn't on its own mean much. Here's another function called CDF:

def CDF' (X : random_variable α β) (x : β) : ℝ := 0

This one doesn't even need a preorder structure on beta, so it's a clear improvement. It's also monotone, between 0 and 1, and right continuous

Lars Ericson (Dec 19 2020 at 21:36):

Except our CDF is proven to be a probability_measure on a measurable_space with preorder, by construction. I'll take that. It's not 0.

Mario Carneiro (Dec 19 2020 at 21:36):

Our CDF is no such thing

Mario Carneiro (Dec 19 2020 at 21:36):

it's a real function

Mario Carneiro (Dec 19 2020 at 21:36):

I can put all those assumptions on CDF' too

Mario Carneiro (Dec 19 2020 at 21:37):

Obviously CDF' is a silly function, but the point is that we haven't shown anything about CDF that makes it meaningfully different from CDF'

Mario Carneiro (Dec 19 2020 at 21:38):

And my claim is that the real theorem of interest that will distinguish the two doesn't hold on arbitrary preorders

Lars Ericson (Dec 19 2020 at 21:38):

Obtained by running prob, where prob was supplied as a probability_measure when building the initial probability space, and then running it through induced on the random_variable. As long as you have a meaningful and correct probability_space to begin with, and a meaningful and correct random_variable, the CDF will be meaningful and correct and derived from the original probability_space and the supplied random_variable. There is nothing trivial about it at all.

Mario Carneiro (Dec 19 2020 at 21:39):

How do you know it is meaningful and correct?

Mario Carneiro (Dec 19 2020 at 21:39):

That's what theorems are for

Mario Carneiro (Dec 19 2020 at 21:39):

and we don't have any

Mario Carneiro (Dec 19 2020 at 21:39):

except the obvious ones that are true also for CDF'

Lars Ericson (Dec 19 2020 at 21:40):

If I start out with say the Steinhaus space and the definition of the random variable for the 2 point-distribution and crank it through I will get the distribution and distribution function for the 2-point random variable. What's missing?

Mario Carneiro (Dec 19 2020 at 21:41):

well that's over real

Mario Carneiro (Dec 19 2020 at 21:41):

I thought you wanted arbitrary preorders?

Lars Ericson (Dec 19 2020 at 21:42):

As another example we could compute distributions of binary trees of depth n. This is something people actually think about. Why can't I have a CDF of those?

Mario Carneiro (Dec 19 2020 at 21:42):

The main theorem here is that two random variables with the same CDF are the same random variable

Mario Carneiro (Dec 19 2020 at 21:42):

that's not true for CDF'

Mario Carneiro (Dec 19 2020 at 21:42):

and it's not true for CDF on some preorders

Mario Carneiro (Dec 19 2020 at 21:43):

Really, I think it's not worth thinking about the CDF in the more unusual codomains. Just call X.induced CDF if it makes you feel better

Lars Ericson (Dec 19 2020 at 21:46):

So what you proved at the bottom required that S be a topological space, an order closed topology and an opens measurable space. I don't need those assumptions for def CDF. If I delete variables [preorder β], then def CDF crashes. It doesn't rely however on being topological space, order closed topology and opens measurable space. Are you saying that I will crash if I try to define a random variable into binary trees of length N with a preorder, because of something I'm not understanding?

Mario Carneiro (Dec 19 2020 at 21:46):

In measure theory, a distribution is no more or less than a measure. It takes all subsets of X as its input, and it doesn't need to be boiled down to a function on X

Mario Carneiro (Dec 19 2020 at 21:49):

I don't need those assumptions for def CDF. If I delete variables [preorder β], then def CDF crashes. It doesn't rely however on being topological space, order closed topology and opens measurable space. Are you saying that I will crash if I try to define a random variable into binary trees of length N with a preorder, because of something I'm not understanding?

This is why you need to prove theorems about your definitions. It's perfectly possible that a definition can be made in some tremendous generality but it doesn't make sense without some more assumptions. In fact def CDF works using only has_le beta, it doesn't even need preorder, but it certainly doesn't make sense in that generality; we already have a theorem that will detect this nonsense, namely CDF_mono. WIthout that theorem you might well think that there are no problems with that definition

Mario Carneiro (Dec 19 2020 at 21:49):

But you haven't tried to prove anything more complicated about CDF yet, and when you do you will find that preorder beta isn't good enough either

Mario Carneiro (Dec 19 2020 at 21:51):

It all comes back to the thing I mentioned earlier: how do people use the definition?

Mario Carneiro (Dec 19 2020 at 21:51):

making definitions doesn't tell you any deep truths about it, because you can very easily write nonsense without noticing

Mario Carneiro (Dec 19 2020 at 21:52):

but once you try to use the definition it all falls down

Lars Ericson (Dec 19 2020 at 21:56):

Which brings me back to trees. A long time ago I helped a mathematician counting trees to write a paper making some probabiliistic arguments about operations on trees. Conceivably, while doing that, in a practical sense, I might want to define a preorder on trees and compute the percentage of trees in the entire space of trees of size n which were "less than" a single particular tree, or an equivalence class of trees. So that would be the CDF of a single tree. It might be boring nonsense to almost everybody in the world, but for the two people in that room, it might be helpful. So then the question, how much of this will fall down if my trees have a preorder but do not constitute a type which is a topological space, order closed topology and opens measurable space? All of it, some of it or none of it? And if it is all of it, can we go further and say that there is a 1-1 onto map between the reals and any such space?

Reid Barton (Dec 19 2020 at 22:06):

Even in this hypothetical scenario, you could just write down "the probability that a random tree is 'less than' the particular tree in question" and not bother with this CDF terminology stuff.

Lars Ericson (Dec 20 2020 at 15:37):

Yes, I could write "the probability that a random tree is 'less than' the particular tree in question" as CDF T after defining a preorder on trees, following the development of the generalized distribution function above, which establishes that I can do so in a well-typed way. This is very helpful for me. Thanks!

Lars Ericson (Dec 20 2020 at 16:36):

But I think the answer to the question of why distribution function is only defined by the literature for R is in this comment by Oliver Diaz in the Wikipedia article:

"If (𝑌,𝑑) is s Borel space (a Polish space for instance) then it is measurable isomorphic to (0,1) and one can define an order in 𝑌. The isomorphism however is not explicit, it is based on constructions that use the axiom of choice."

"Measurable isomorphic to (0,1)" is what I was looking for, it gives motivation for only talking about R, in the sense that talking about Y would be redundant under the isomorphism.

Mario Carneiro (Dec 20 2020 at 17:27):

I don't think that is true at all

Mario Carneiro (Dec 20 2020 at 17:28):

Y can easily be something like bool with the preorder formulation

Mario Carneiro (Dec 20 2020 at 17:30):

you need a lot more assumptions to make it look like real. Borel won't cut it, Polish might but it's not clear what the reason for such an assumption is

Last updated: May 02 2025 at 03:31 UTC