Zulip Chat Archive
Stream: new members
Topic: How do I tell Lean that a subclass is a kind of the supercla
Lars Ericson (Dec 17 2020 at 16:57):
I'm working on this:
import measure_theory.lebesgue_measure
open measure_theory
noncomputable theory
class probability_space (α : Type*) extends measure_space α :=
(is_probability_measure: probability_measure volume)
class random_variable (α β : Type*) :=
(domain : probability_space α )
(codomain : measurable_space β )
(outcome : α → β)
(is_measurable_outcome : @measurable α β domain codomain outcome)
I am trying to say that a random variable is a measurable function from a probability space to a measurable space. Even though the function is declared α → β and I have proofs that domain
is a probability_space
and codomain
is a measurable_space
, I still guess that I need to add a proof that outcome
is a measurable function. Or is it obvious? I'm unsure.
Anyway, if I do have to do the extra step of proving that outcome
is measurable
, I need to get this type right:
@measurable α β domain codomain outcome
However, it is complaining
type mismatch at application
measurable
term
domain
has type
probability_space α
but is expected to have type
measurable_space α
even though
class probability_space (α : Type*) extends measure_space
and
class measure_space (α : Type*) extends measurable_space α
How do I tell Lean that a subclass is a kind of the superclass?
Adam Topaz (Dec 17 2020 at 17:04):
I would use something approximately like this:
import measure_theory.lebesgue_measure
open measure_theory
noncomputable theory
class probability_space (α : Type*) extends measure_space α :=
(is_probability_measure: probability_measure volume)
structure random_variable (α β : Type*) [probability_space α] [measurable_space β] :=
(outcome : α → β)
(is_measurable_outcome : measurable outcome)
Lars Ericson (Dec 17 2020 at 17:06):
Thanks! Why structure
and not class
in this case? random_variable
is a class of function, it seems possible people might want to create instances.
Reid Barton (Dec 17 2020 at 17:06):
This is more sensible yes, but also in the original formulation you can just write is_measurable_outcome : measurable outcome
--if you don't try to do Lean's work for it, it will do it correctly.
Adam Topaz (Dec 17 2020 at 17:08):
Oh it looks like measurable
is not a class, so I should use regular parens. (fixed code)
Adam Topaz (Dec 17 2020 at 17:10):
Does mathlib have bundled measurable functions?
Lars Ericson (Dec 17 2020 at 17:51):
Please help me take it a little further. To the random variable I want to add an inverse. So if X(ω)
is a random variable, then
inverse(y) = {ω : X(ω) = y}
. I tried this:
class random_variable (α β : Type*) [probability_space α] [measurable_space β] :=
(outcome : α → β)
(is_measurable_outcome : measurable outcome)
(inverse : β → set α := { ω : α // ∃ y : β, y = outcome ω }) // ERROR
I got this error message, which I don't know how to fix:
invalid let-expression, term
{ω // ∃ (y : β), y = outcome ω}
has type
Type ? : Type (max 1 (?+1))
but is expected to have type
β → set α : Type (max ? ?)
From there I want to do this, to say that the distribution space is a probability space induced by X
with probability measure volumeX : β→ [0,1]
such that volumeX = volume ∘ (X^-1)
is the pushforward measure and volumeX
is called the distribution of X
:
instance distribution_space (α β : Type*) (V: random_variable α β) (ΩAP: probability_space α ): probability_space β :=
(volume : ΩAP.volume ∘ V.generalize_inverse)
(is_probability_measure := sorry)
I'm not sure how to get this all right in Lean, but the ideas are there.
"pushforward" sounds categorical. If there were a category Prob
there may be some categorical way to express what I'm trying to do here. I don't know category theory though. For now I'm just trying to get this part right.
Lars Ericson (Dec 17 2020 at 19:07):
OK I found something in the guts of measure_space
but it doesn't show up in docs#pushforward for in Lean which is all category theory related:
/-- The pushforward of a measure. It is defined to be `0` if `f` is not a measurable function. -/
def map (f : α → β) : measure α →ₗ[ennreal] measure β :=
if hf : measurable f then
lift_linear (outer_measure.map f) $ λ μ s hs t,
le_to_outer_measure_caratheodory μ _ (hf hs) (f ⁻¹' t)
else 0
Adam Topaz (Dec 17 2020 at 20:20):
import measure_theory.lebesgue_measure
open measure_theory
noncomputable theory
class probability_space (α : Type*) extends measure_space α :=
(is_probability_measure : probability_measure volume)
structure random_variable (α β : Type*) [probability_space α] [measurable_space β] :=
(outcome : α → β)
(is_measurable_outcome : measurable outcome)
variables {α β : Type*} [probability_space α] [measurable_space β]
def induced (X : random_variable α β) : probability_space β :=
{ volume := measure.map X.outcome volume,
is_probability_measure := begin
constructor,
rw measure_theory.measure.map_apply X.is_measurable_outcome is_measurable.univ,
apply probability_space.is_probability_measure.measure_univ,
end }
One thing to note, as we discussed somewhere before, one should really make an API for probability_space
so that the proof in is_probability_measure
becomes essentially a one-liner.
Adam Topaz (Dec 17 2020 at 20:21):
The structure random_variable
needs an API as well.
Adam Topaz (Dec 17 2020 at 20:21):
But I hope the code above helps you get started :)
Lars Ericson (Dec 17 2020 at 20:52):
Thank you @Adam Topaz that is extremely helpful. I have been looking at the API of measure.map
:
import measure_theory.measure_space
open measure_theory
variables α β : Type*
variable MS1 : measurable_space α
variable MS2 : measurable_space β
variable μ_α : α → β
variable m_α : measure α
#check @measure_theory.measure.map α β MS1 MS2 μ_α m_α
#check measure_theory.measure.map μ_α m_α
This is the pushforward measure in Wikipedia: https://en.wikipedia.org/wiki/Pushforward_measure
There is also a comap
in measure_space
:
/-- Pullback of a `measure`. If `f` sends each `measurable` set to a `measurable` set, then for each
measurable set `s` we have `comap f μ s = μ (f '' s)`. -/
def comap (f : α → β) : measure β →ₗ[ennreal] measure α :=
What are the aesthetics of calling these map
and comap
versus pushforward
and pullback
? This would provide a conceptual link to various pushforward definitions in Top
and category_theory
: docs#pushforward. I'm just asking, pure curiosity, not a criticism.
Lars Ericson (Dec 17 2020 at 23:24):
Here is the last bit I want to get to before thinking about the overall API. Please take a look at Definition 7:
Screenshot-from-2020-12-17-18-23-57.png
Lars Ericson (Dec 17 2020 at 23:25):
Now take a look at section 1.1 B:
Screenshot-from-2020-12-17-18-24-10.png
Adam Topaz (Dec 17 2020 at 23:26):
What's the question? What is it precisely you want to get?
Yakov Pechersky (Dec 17 2020 at 23:30):
Are you asking about inverses? You could have:
class random_variable (α β : Type*) [probability_space α] [measurable_space β] :=
(outcome : α → β)
(is_measurable_outcome : measurable outcome)
(surj: function.surjective outcome)
and for the inverse use
docs#function.surj_inv
Lars Ericson (Dec 17 2020 at 23:30):
Here it comes, trying to type faster. Here's the first part:
So Mark Dean defines a notion of a distribution and a distribution function (when the random variable is into ℝ). Kloeden&co give an expression for the distribution function: FX(x) = PX((-∞ ,x)) = P({ω ∈ Ω : X(ω) ≤ x})
.
They are both assuming that the random variable is into ℝ. I don't see any reason for this, because this: X(ω) ≤ x
only requires that there be a preorder ≤ on the type β that the preorder is into.
Adam Topaz (Dec 17 2020 at 23:30):
Oh so you want the distribution then?
Lars Ericson (Dec 17 2020 at 23:31):
No, we have the distribution
, I want to define the distribution_function
in case β
has a preorder in this sketch:
import measure_theory.lebesgue_measure
import measure_theory.measure_space
open measure_theory
noncomputable theory
class probability_space (α : Type*) extends measure_space α :=
(is_probability_measure : probability_measure volume)
def probability_space.distribution (α : Type*) [ps: probability_space α] :=
ps.volume
def probability_space.distribution_function (X : random_variable α β ) [ps: probability_space β ] [preorder β] :=
sorry
structure random_variable (α β : Type*) [probability_space α] [measurable_space β] :=
(outcome : α → β)
(is_measurable_outcome : measurable outcome)
variables {α β : Type} [probability_space α] [measurable_space β]
instance distribution_space (X : random_variable α β) : probability_space β :=
{ volume := measure.map X.outcome volume,
is_probability_measure := begin
constructor,
rw measure_theory.measure.map_apply X.is_measurable_outcome is_measurable.univ,
apply probability_space.is_probability_measure.measure_univ,
end }
instance {α} {p : α → Prop} [m : measure_space α] : measure_space (subtype p) :=
{ volume := measure.comap (coe : _ → α) volume }
theorem subtype.volume_apply {α} {p : α → Prop} [measure_space α]
(hp : is_measurable {x | p x}) {s : set (subtype p)} (hs : is_measurable s) :
volume s = volume ((coe : _ → α) '' s) :=
measure.comap_apply _ subtype.coe_injective (λ _, is_measurable.subtype_image hp) _ hs
namespace Steinhaus
instance P : probability_measure (volume : measure (set.Icc (0 : ℝ) 1)) :=
{ measure_univ := begin
refine (subtype.volume_apply is_measurable_Icc is_measurable.univ).trans _,
suffices : volume (set.Icc (0 : ℝ) 1) = 1, {simpa},
rw [real.volume_Icc], simp
end }
instance ΩAP : probability_space (set.Icc (0 : ℝ) 1) :=
{ is_probability_measure := Steinhaus.P }
#check Steinhaus.ΩAP
#check Steinhaus.ΩAP.volume -- volume : measure ↥(set.Icc 0 1)
end Steinhaus
Lars Ericson (Dec 17 2020 at 23:32):
So we have probability space, random variable, distribution space induced by random variable on probability space, and the final icing on the cake is to show that if I have a preorder on the measurable space that the random variable is into, then I have a distribution function.
Lars Ericson (Dec 17 2020 at 23:33):
Why this is at all interesting is because everybody in the world only defines distribution_function
when the random variable is into ℝ
. Nobody seems to care about the case where it is into a measurable space with a preorder. I just think it's an interesting generalization.
Lars Ericson (Dec 17 2020 at 23:34):
But it's only possible for me to talk about whether it is interesting or definable in the presence of something like Lean.
Mario Carneiro (Dec 17 2020 at 23:35):
@Yakov Pechersky There aren't any inverses involved in those snippets. That's a preimage
you see
Lars Ericson (Dec 17 2020 at 23:37):
What I'm working on is this generalization of the distribution function in the Kloeden snippet, which Dean defines only if the RV is into the reals:
def probability_space.distribution_function (X : random_variable α β ) (y: β) \[ps: probability_space β ] [preorder β] :=
ps.volume({ω ∈ α : X(ω) ≤ y})
Yakov Pechersky (Dec 17 2020 at 23:37):
Ah, misread. Gotcha. So the clean wya to state that is outcome.preimage?
Mario Carneiro (Dec 17 2020 at 23:37):
Well, the whole measure is measure.comap
I think
Mario Carneiro (Dec 17 2020 at 23:38):
actually measure.map
Lars Ericson (Dec 17 2020 at 23:40):
That's another question I had about terminology, why measure_space
uses map
and comap
but in the comments it uses pushforward
and pullback
which comments match terminology used in Top
and category_theory
.
Mario Carneiro (Dec 17 2020 at 23:40):
so there isn't really anything to do in definition 7. measure.map x volume
is a measure, check
Mario Carneiro (Dec 17 2020 at 23:40):
because there is a clash of terminology from multiple fields
Mario Carneiro (Dec 17 2020 at 23:41):
lean uses map
and comap
for regularity, but this gets instantiated in category theory and topology and algebra and measure theory and mathematicians in these areas all have their own ideas about what to call it
Lars Ericson (Dec 17 2020 at 23:41):
OK thanks. The above sketch works fine. @Adam Topaz worked out how to derive a distribution space from a random variable on a probability space:
instance distribution_space (X : random_variable α β) : probability_space β :=
{ volume := measure.map X.outcome volume,
is_probability_measure := begin
constructor,
rw measure_theory.measure.map_apply X.is_measurable_outcome is_measurable.univ,
apply probability_space.is_probability_measure.measure_univ,
end }
The only thing I have left to complete the functionality is generalized distribution function.
Mario Carneiro (Dec 17 2020 at 23:42):
I think you are missing a lemma there
Mario Carneiro (Dec 17 2020 at 23:42):
you just proved the map
of a probability measure is a probability measure
Mario Carneiro (Dec 17 2020 at 23:42):
that's a lemma
Adam Topaz (Dec 17 2020 at 23:43):
import measure_theory.lebesgue_measure
open measure_theory
noncomputable theory
class probability_space (α : Type*) extends measure_space α :=
(is_probability_measure : probability_measure volume)
namespace probability_space
lemma volume_univ {α : Type*} [probability_space α] : volume (set.univ : set α) = 1 :=
is_probability_measure.measure_univ
def prob {α : Type*} [probability_space α] {S : set α} (h : is_measurable S) : ennreal :=
volume S
end probability_space
structure random_variable (α β : Type*) [probability_space α] [measurable_space β] :=
(outcome : α → β)
(is_measurable_outcome : measurable outcome)
variables {α β : Type*} [probability_space α] [measurable_space β]
namespace random_variable
instance : has_coe_to_fun (random_variable α β) := ⟨_,outcome⟩
lemma measurable (X : random_variable α β) : measurable X := X.is_measurable_outcome
def induced (X : random_variable α β) : probability_space β :=
{ volume := measure.map X volume,
is_probability_measure := begin
constructor,
erw measure_theory.measure.map_apply X.measurable is_measurable.univ,
exact probability_space.volume_univ
end }
def distribution_function (X : random_variable α ℝ) : ℝ → ennreal := λ x,
@probability_space.prob _ X.induced {c | c ≤ x} is_measurable_Iic
end random_variable
Mario Carneiro (Dec 17 2020 at 23:43):
why is prob
an ennreal
?
Adam Topaz (Dec 17 2020 at 23:43):
Because I'm lazy
Adam Topaz (Dec 17 2020 at 23:44):
One should change it to \R
.
Adam Topaz (Dec 17 2020 at 23:44):
Same with distribtion_function
.
Lars Ericson (Dec 17 2020 at 23:44):
Well that's what I'm trying to generalize. I want to replace ℝ with β where β has a preorder
. What do you think?
Adam Topaz (Dec 17 2020 at 23:45):
Sure you could do that.
Mario Carneiro (Dec 17 2020 at 23:45):
I don't think that's a good idea, it's not general enough
Mario Carneiro (Dec 17 2020 at 23:46):
the sets {c | c ≤ x}
in a preorder don't generate all borel sets
Mario Carneiro (Dec 17 2020 at 23:46):
you will have to work with the preimages of general sets, at which point it's not clear what the CDF gets you
Lars Ericson (Dec 17 2020 at 23:47):
It would be a really wonderful solution to a Math StackExchange question I posed in August, i.e. exactly what kind of order does β have to have for this to go through in Lean.
Lars Ericson (Dec 17 2020 at 23:48):
For one thing you know by construction of the random_variable
that β is a measurable_space
.
Lars Ericson (Dec 18 2020 at 00:51):
Mario Carneiro said:
you just proved the
map
of a probability measure is a probability measure, that's a lemma
It could be that makes it a category, which is something to think about. I.e. if you were building a probability library from scratch would you want to make it as category theory-ish as possible or just express the main ideas on their own. There seems to be big interest in category theory here, but it seems that using it in practice like for GCD/Fibonacci example is quite hard. So then the question is whether the generalization that it provides is productive for particular applications, or is it just nice to think about on its own. The fact that there is map/comap and pullback/pushthrough in different parts of the library shows that there is not a unified view among all the mathlib contributors. Not that there should be, I'm just slowly learning the "style guide" and "make it a category" doesn't seem to be one of the style imperatives.
Lars Ericson (Dec 18 2020 at 00:59):
Regarding this expression for the distribution_function
, which is on single points in the codomain of the RV, as opposed to the distribution
, which is on subsets of the domain of the RV: F(x) = P({ω ∈ Ω : X(ω) ≤ x})
. We can see by inspection, since P
is a probability measure, that it will be in [0,1]
. If the codomain of the RV is some kind of partially ordered set, the distribution function might not behave in the same way as if it were a linearly ordered dense set. But the result will still be a CDF. I will Google "generalized distribution function" and see what I come up with. In any event, it seems well defined for any order relation ≤
.
Mario Carneiro (Dec 18 2020 at 01:03):
The fact that there is map/comap and pullback/pushthrough in different parts of the library shows that there is not a unified view among all the mathlib contributors
To clarify: map
and comap
are used consistently in lean names, while "pullback" and "pushforward" are used in comments to help connect names to things in the literature (which are all over the place, as one would expect)
Mario Carneiro (Dec 18 2020 at 01:06):
It could be that makes it a category, which is something to think about. I.e. if you were building a probability library from scratch would you want to make it as category theory-ish as possible or just express the main ideas on their own.
I would not suggest adding additional layers unless it gives you an advantage in proving theorems. Instantiating a category is a thing you can do, but it's not particularly helpful on its own. There are lots of things in mathlib that could potentially be instantiations of the category library, but generally it's just another layer on an existing theorem. I would say prove the theorem and add the instantiation if and when you need it for a theorem
Lars Ericson (Dec 18 2020 at 01:08):
OK so my main concern, all abstraction aside, is whether any measurable_set
S
with ≤ for which this expression: F(x: S) = P({ω ∈ Ω : X(ω) ≤ x})
makes sense is necessarily linearly ordered, dense and isomorphic to ℝ, or whether any measurable_set
with any ≤ defined on it will do.
Adam Topaz (Dec 18 2020 at 01:09):
Just to clarify, whenever you have a random variable X
with codomain A
, and some function from A
to the type of all measurable sets in A
, you can define something like a distribution function on A
. The question is whether this function determines the probability density on A
induced by X
, and this is certainly not true in general.
Mario Carneiro (Dec 18 2020 at 01:10):
I think instead of formalizing definitions, you should try to get to the actual theorems, because we can work backward from there to the abstractions that are useful for formalizing the theorem. So far, I don't see any value in the definition random_variable
, or distribution
, or prob
or distribution_function
, they are all just wrappers around other things. So let's get to theorems where you really need some probability vocabulary to prove the theorem effectively
Lars Ericson (Dec 18 2020 at 01:11):
Random variable, distribution, distribution function and probability space are probability vocabulary.
Mario Carneiro (Dec 18 2020 at 01:11):
yes I'm aware
Mario Carneiro (Dec 18 2020 at 01:11):
a wrapper doesn't help you prove theorems
Lars Ericson (Dec 18 2020 at 01:11):
Google gave me this, maybe it's of some use: https://www.ams.org/journals/tran/1957-084-02/S0002-9947-1957-0085326-5/S0002-9947-1957-0085326-5.pdf
Mario Carneiro (Dec 18 2020 at 01:13):
more words to express things you could already express are not helpful. The power of a definition is when it abbreviates a nontrivial combination of things, like "group" meaning "associative binary operation with a neutral element"
Mario Carneiro (Dec 18 2020 at 01:13):
the latter expression is more cumbersome than the former, so the definition pulls its weight
Mario Carneiro (Dec 18 2020 at 01:14):
there are also a bunch of theorems about groups that are not theorems about monoids or associative operations on their own
Lars Ericson (Dec 18 2020 at 01:14):
I'm not really trying to prove a theorem, I'm just trying to get the type classes straight to describe the usual vocabulary of probability theory. That's all. Because even though that vocabulary is just sugar for things that are already inmathlib
in one way or another, they are still challenging for me. I can't go tomathlib
and see the usual vocabulary of probability theory. It's a modest exercise.
Mario Carneiro (Dec 18 2020 at 01:15):
So what's the raison d'etre of probability vocabulary? What fact about probability measures is not true about general measures? I have no doubt that such theorems exist, but nothing interesting will happen until you start trying to prove one of them
Mario Carneiro (Dec 18 2020 at 01:18):
It is really important to find and at least start trying to prove one of these theorems because it guides the structure of definitions around the whole topic. Without it I see all these definitions and they are all useless cruft with negative utility
Lars Ericson (Dec 18 2020 at 01:21):
My longer range goal is to understand say the construction of the stochastic Taylor series approximation of the Ito or Stratonovich integral. I am not trying to prove any new theorems about this topic. I'm trying to build up the language used in a textbook that goes through that topic. The textbook doesn't use the measure-theoretic equivalent of probability_space
. It uses probability_space
. I don't think I should feel bad about, for my own purposes in reading that book, providing a definition of probability_space
.
Along the way I noticed that F(x: S) = P({ω ∈ Ω : X(ω) ≤ x}
could possibly be generalized. @Adam Topaz expresses this as whether the random variable X
"determines the probability density on A induced by X". It seems that "determines the probability density on A induced by X" is something that could be written in Lean and expressed as a condition that the order relation ≤ has to satisfy. So, for my own tastes and ambitions, I think it would be interesting to say formalize "determines the probability density on A induced by X" and see where that leads. It could lead to the requirement that the codomain of the random variable be isomorphic to the reals, which would be an interesting and natural explanation for why every text I can find restricts distribution_function
to random variables into the reals.
Mario Carneiro (Dec 18 2020 at 01:22):
The textbook doesn't use the measure-theoretic equivalent of probability_space. It uses probability_space. I don't think I should feel bad about, for my own purposes in reading that book, providing a definition of probability_space.
The point isn't to feel bad about it, it is to build the theory of probability spaces by looking at how it is used
Mario Carneiro (Dec 18 2020 at 01:23):
The question to answer isn't what is a probability space, it is what do people do with probability spaces
Mario Carneiro (Dec 18 2020 at 01:24):
the purpose of a library is to make it easier to write proofs in the area
Mario Carneiro (Dec 18 2020 at 01:24):
so you have to look at the proofs to understand how to help
Mario Carneiro (Dec 18 2020 at 01:27):
Regarding what you would need to make the exact definition F(x: S) = P({ω ∈ Ω : X(ω) ≤ x})
generalize, I would expect that you need the sets {a | a < x}
to generate the topology and hence the entire borel algebra. That is, you need an orderable_topology
on the codomain
Mario Carneiro (Dec 18 2020 at 01:29):
oops it's called docs#order_closed_topology now
Mario Carneiro (Dec 18 2020 at 01:30):
I think you will need to get to some really interesting theorems before the codomain starts to be constrained to look anything like the reals
Adam Topaz (Dec 18 2020 at 01:45):
Law of large numbers might be a nice goal.
Lars Ericson (Dec 18 2020 at 02:09):
In terms of "what do people do with probability spaces", and the book I am reading, I have some very simple applications that I want to proof-check in Lean:
- A one-size-fits-all probability space:([0,1], B([0,1]), [a,b] -> b-a). That's done. Cruft is to name this the Steinhaus space
- Some basic language: Random variable, distribution, distribution function
- Distribution function for a two-point RV with codomain {0,1} and probabilities p and 1-p
- Uniform distribution U[a,b]
- Exponential distribution
- Gaussian distribution
For each of these I want to:
- Start with Steinhaus as the domain of the random variable X,
- Define the codomain as either a finite measurable space for a continuous random variable or the appropriate continuous codomain which could be ereal or ennreal or a real interval
- Define the corresponding random variable X
- From X derive the distribution function FX
This is not the usual order. The book and most presentations tend to start with the distribution function. I just like doing it in this order because it appeals to me. I'm not proving theorems really, just trying to create language for a sequence that appeals to me for presenting these different discrete and continuous distributions in a consistent kind of top-down way. Also, I am annoyed that the book introduces probability_space
on page 3 and then drops it thereafter and just presents the distribution_function
for each distribution. I think if you define something you should use it, so I'm spending my time doing it in this direction. It's just a personal choice.
Mario Carneiro (Dec 18 2020 at 02:16):
well the main issue I have with that list is that every single one is a definition, there are no theorems
Mario Carneiro (Dec 18 2020 at 02:17):
you need at least some sanity checks on the definitions
Mario Carneiro (Dec 18 2020 at 02:20):
Perhaps your goal might be to have a way to prove that a real function (which is what the books will give in order to define a distribution) yields a probability distribution, that's not completely obvious from the definitions
Mario Carneiro (Dec 18 2020 at 02:22):
you can then use that to get the exponential distribution and gaussian distribution
Mario Carneiro (Dec 18 2020 at 02:23):
I think you already did the uniform distribution earlier
Lars Ericson (Dec 18 2020 at 03:07):
The interesting (to me) part, which I already did by hand, is when they give a distribution function, to back out what the random variable is that has that distribution function. It's not always easy to get it right. So my work is to take the RV that I backed out and prove that if you grind it through the Lean distribution_function
construction, that I get the same function. So that's a proof.
Regarding the generalized disrtibution function, I am trying to generalize
def distribution_function (X : random_variable α ℝ) : ℝ → ennreal := λ x,
to
def distribution_function (X : random_variable α β ) [preorder α] : β → (set.Icc (0 : ℝ) 1) := λ x,
Adam's distribution function full definition is
def distribution_function (X : random_variable α ℝ) : ℝ → ennreal := λ x,
@probability_space.prob _ X.induced {c | c ≤ x} is_measurable_Iic
I am trying to understand the application of @probability_space.prob
. I've got this:
import measure_theory.lebesgue_measure
open measure_theory
noncomputable theory
class probability_space (α : Type*) extends measure_space α :=
(is_probability_measure : probability_measure volume)
namespace probability_space
lemma volume_univ {α : Type*} [probability_space α] : volume (set.univ : set α) = 1 :=
is_probability_measure.measure_univ
def prob {α : Type*} [probability_space α] {S : set α} (h : is_measurable S) : ennreal :=
volume S
end probability_space
structure random_variable (α β : Type*) [probability_space α] [measurable_space β] :=
(outcome : α → β)
(is_measurable_outcome : measurable outcome)
variables {α β : Type*} [probability_space α] [measurable_space β]
namespace random_variable
instance : has_coe_to_fun (random_variable α β) := ⟨_,outcome⟩
lemma measurable (X : random_variable α β) : measurable X := X.is_measurable_outcome
def induced (X : random_variable α β) : probability_space β :=
{ volume := measure.map X volume,
is_probability_measure := begin
constructor,
erw measure_theory.measure.map_apply X.measurable is_measurable.univ,
exact probability_space.volume_univ
end }
variable MSα : measurable_space α
variable Pα : probability_space α
variable Pβ : probability_space β
variable Sα : set α
variable Sβ : set β
lemma SMα : is_measurable Sα := sorry
lemma SMβ : is_measurable Sβ := sorry
def Xαβ : random_variable α β := sorry
#check Xαβ.induced -- X.induced : probability_space β
#check @probability_space.prob α Pα Sα (SMα Sα )
#check @probability_space.prob β Pβ Sβ (SMβ Sβ) -- FAILS
def distribution_function (X : random_variable α β ) [preorder α] : β → (set.Icc (0 : ℝ) 1) := λ x, sorry
-- @probability_space.prob _ X.induced {c | c ≤ x} is_measurable_Iic
end random_variable
Even though the one before it works, this fails:
#check @probability_space.prob β Pβ Sβ (SMβ Sβ) -- FAILS
I don't fully understand why the type of SMα
is Mα : ∀ (Sα : set ?M_1), is_measurable Sα
, so I have to re-supply Sα when I use it even though it's declared
lemma SMα : is_measurable Sα := sorry
So
#check @probability_space.prob α Pα Sα (SMα Sα )
works with type
probability_space.prob _ : ennreal
But
#check @probability_space.prob β Pβ Sβ (SMβ Sβ)
gives
synthesized type class instance is not definitionally equal to expression inferred by typing rules, synthesized
_inst_2
inferred
measure_space.to_measurable_space
I am using @ to avoid this kind of mystery but it's not helping in this case. Any suggestions? This is a problem because in this line
@probability_space.prob _ X.induced {c | c ≤ x} is_measurable_Iic
the term X.induced
is going to have type probability_space β
. So I need to understand how to fill out each of the arguments to @probability_space.prob
but I can't even #check
it in the β case, it only works for α.
The other minor issue is that probability_space.prob
has return type ennreal
when I know that in the end it will be constrained to (set.Icc (0 : ℝ) 1)
by probability_measure volume
. I'd rather just say it in the return type rather than ennreal
, but that doesn't go through.
Mario Carneiro (Dec 18 2020 at 03:14):
So my work is to take the RV that I backed out and prove that if you grind it through the Lean distribution_function construction, that I get the same function. So that's a proof.
Can you state this theorem?
Mario Carneiro (Dec 18 2020 at 03:15):
it's not entirely clear to me what you mean by it
Mario Carneiro (Dec 18 2020 at 03:17):
I'm not sure there is anything to be gained with the #check
s, all those variables are tripping you up. Just state a def or theorem and try to get the assumptions and the statement of the theorem right
Mario Carneiro (Dec 18 2020 at 03:18):
lemma SMα
is false, it asserts all sets are measurable
Mario Carneiro (Dec 18 2020 at 03:18):
lemma SMβ
is the same as lemma SMα
Lars Ericson (Dec 18 2020 at 05:03):
The point of the #check
s was simply to check the type of the arguments to the expression, I was trying to figure out what kind of thing to fill in the blank for each of the slots in probability_space.prob
. I know that it just applies the volume function, but in the explicit @ form there are a lot of things that need to be in place for the application to go through, and I was trying to understand their types.
The theorem to prove is as follows (for the type of random variables in the text, which is the usual type, not the generalized type I want):
- We are given, in the text
F_X: ℝ → [0,1]
, thedistribution_function
for therandom_variable
namedX: [0,1] → ℝ
. - I construct, by inspection, a definition
myX
for the random variableX
. - There is a
distribution
,μ_X : borel ℝ → [0,1]
, which is implicit. - In the Steinhaus space we have uniform probability measure 'P: borel ℝ → [0,1]`.
- By construction (this is already proven out of the by the machinery in the cruft above), we have that
μ_X = P ∘ X⁻¹
- To prove: that
myX = λ x, μ_X (set.Icc -∞ x)
For example, consider the two point distribution:
Screenshot-from-2020-12-18-00-03-15.png
Lars Ericson (Dec 18 2020 at 05:21):
Let x₁, x₂ ∈ ℝ
and p₁
be the probability of drawing x₁
. Then we can define X(ω) = if ω < p₁ then x₁ else x₂
or
def myX(ω: set.Icc (0 : ℝ) 1) := if ω < p₁ then x₁ else x₂
also, the text is telling us that
def FX (x: ℝ ) := if x<x₁ then 0 else if x₁ ≤ x < x₂ then p₁ else 1
and then the proof to achieve is
example : FX = λ x, (Steinhaus.volume ∘ myX.induced) (set.Icc ∞ x) := sorry
I find this interesting and a little bit tricky. It's not rocket science but it does seem to take some effort to get all the pieces in place. It's also not the generalized distribution function I was going on about, but it sketches out the path through the woods I am looking to take.
Mario Carneiro (Dec 18 2020 at 05:26):
I don't think X(ω) = if ω < p₁ then x₁ else x₂
is correct. What you need is for the probability that X = x1
to equal p1
, meaning that the event space is bool
or perhaps a general probability space on which you posit an event A
whose probability is p1
Lars Ericson (Dec 18 2020 at 14:45):
On the index space [0,1]
I am defining an eventA=[0,p1)
. According to Kloeden et al's book, I(A)(ω)= if ω∈A then x₁ else x₂
is a random variable on this space, for x₁=0
and x₂=1
. Please let me know if I'm reading this wrong:
Screenshot-from-2020-12-18-09-43-54.png
Lars Ericson (Dec 18 2020 at 14:58):
Note instead of clumping the events in [0,p1)
I could have defined A
to be the union of a bunch of more randomly scattered subsets. That's not necessary for the definitions to work. There is no inherent "randomness" in random_variable
in that sense. A random_variable
is a fully determined function from a sample space to a measurable space.
Mario Carneiro (Dec 18 2020 at 16:44):
You don't need to define A
to be anything in particular, it can be a hypothesis
Mario Carneiro (Dec 18 2020 at 16:44):
it's more general this way anyway because you don't know what event space is needed in order to match all the other constraints in a downstream theorem
Lars Ericson (Dec 18 2020 at 17:37):
The Kloeden book brings in (briefly) the idea of probability space, so I want to demonstrate it when explaining things to myself. I can use (Ω,A,P)= ([0,1],B([0,1]),[a,b]↦ b-a)
as a "one sits fits all" probability space. Then I want to define a random variable on that space, obtain it's distribution D
, and then obtain the distribution function F
, which results in a new distribution (probability) space like (S,Σ,D)=(R, B(R), D)
with CDF F
. I want to do that in Lean for the 6 particular cases in the book. I'm not trying to make the most general Lean probability theory module, or prove old or new theorems. I'm just trying to demonstrate a little piece of mechanism, to myself, because it seems pretty to me. I don't expect anybody else to find it interesting. Think of it as me doing beginner-level gardening in my back yard. It's a hobby, not a calling.
On the task of describing stochastic integration at a Lean level of detail, Wilfrid Kendall has done a lot of work, 21 years ago, which seems easily adaptable. (Adapting old stuff to new contexts is also more hobby gardening, not necessarily of general interest.) Kendall worked in Reduce, then Axiom with fine details, then Mathematica. (Note, his .ps.gz files are really just .ps files, they aren't zipped.) His Axiom papers provide the most detail. Axiom has a much richer type system than Mathematica, so the Axiom implementation is an improvement on it's successor.
Mario Carneiro (Dec 18 2020 at 17:49):
I can use (Ω,A,P)= ([0,1],B([0,1]),[a,b]↦ b-a) as a "one sits fits all" probability space.
It's not, though, as soon as you have more than one variable. What if you have two {0,1} variables X
and Y
with probabilities p
and q
to be 1, which are independent? This model will not help there.
Lars Ericson (Dec 18 2020 at 18:52):
These 6 examples are all univariate. Steinhaus space is a good enough indexing space for univariate examples whether continuous or discrete. I'm not sure that what's in the sketch above (aside from my desire to generalize distribution_function
) excludes product measures. Baby steps!
Mario Carneiro (Dec 18 2020 at 19:00):
it's generally easier to not commit to particular example spaces if you can help it. Define a gaussian distributed RV over any event space
Mario Carneiro (Dec 18 2020 at 19:00):
you are doing lean on hard mode for some reason I don't understand
Mario Carneiro (Dec 18 2020 at 19:01):
The use of [0,1] is AFAICT entirely superfluous, and your proofs and definitions will be smaller and easier without it
Mario Carneiro (Dec 18 2020 at 19:03):
certainly the book you are reading from has made no such commitment to an event space
Lars Ericson (Dec 18 2020 at 19:40):
These are toy examples for exposition of a small point. Before I can generalize these toy examples I have to be able to do them first. I'm not there yet. I can't generalize something I don't know how to do in a smaller context.
In terms of a grander design, the above sketch has
def prob {α : Type*} [probability_space α] {S : set α} (h : is_measurable S) : ennreal :=
volume S
I would like to be able to say
def prob {α : Type*} [probability_space α] {S : set α} (h : is_measurable S) : (set.Icc (0 : ℝ) 1) :=
volume S
because probability measures by definition have as codomain [0,1]
. Here's something I need to prove, but I don't know how to say it correctly in Lean:
lemma volume_type_01 {α : Type*} [probability_space α] (X: set α) :
((volume X) : (set.Icc (0 : ℝ) 1)) := sorry
I can say
lemma volume_gt_0_lt_1 {α : Type*} [probability_space α] (X: set α) :
0 ≤ (volume X) ∧ (volume X) ≤ 1 := sorry
but that doesn't give me a way to replace ennreal
with (set.Icc (0 : ℝ) 1)
.
Adam Topaz (Dec 18 2020 at 19:43):
I suggest using docs#ennreal.to_real
Mario Carneiro (Dec 18 2020 at 19:57):
These are toy examples for exposition of a small point. Before I can generalize these toy examples I have to be able to do them first.
The general case is easier than the specific case. I know this is counterintuitive, but it's true. The toy examples are way harder than general theorems about all measure spaces or all probability spaces, because the contingent details of the example don't get in the way
Lars Ericson (Dec 18 2020 at 20:24):
OK. The other thing I want to prove, to get the generalized distribution function, is something I at least know how to state:
lemma is_measurable_measurable_with_le (xx: β) [measurable_space β] [has_le β] : is_measurable {c | c ≤ xx} :=
sorry
With this, the following #check works fine:
variable XX: random_variable α β
variable xx : β
variable Xlte : has_le β
#check @probability_space.prob β XX.induced {c | c ≤ xx} _ -- probability_space.prob ?M_1 : ennreal
However the exact same expression preceded by a def f :=
instead of a #check
results in a fail:
def generalized_distribution_function := @probability_space.prob β XX.induced {c | c ≤ xx} _ -- ERROR
The fail is:
failed to synthesize type class instance for
α : Type u_1,
β : Type u_2,
_inst_1 : probability_space α,
_inst_2 : measurable_space β,
XX : random_variable α β,
xx c : β
⊢ has_le β
where the whole file is:
import measure_theory.lebesgue_measure
open measure_theory
noncomputable theory
class probability_space (α : Type*) extends measure_space α :=
(is_probability_measure : probability_measure volume)
namespace probability_space
lemma volume_univ {α : Type*} [probability_space α] : volume (set.univ : set α) = 1 :=
is_probability_measure.measure_univ
def prob {α : Type*} [probability_space α] {S : set α} (h : is_measurable S) : ennreal :=
volume S
end probability_space
structure random_variable (α β : Type*) [probability_space α] [measurable_space β] :=
(outcome : α → β)
(is_measurable_outcome : measurable outcome)
variables {α β : Type*} [probability_space α] [measurable_space β]
namespace random_variable
instance : has_coe_to_fun (random_variable α β) := ⟨_,outcome⟩
lemma measurable (X : random_variable α β) : measurable X := X.is_measurable_outcome
def induced (X : random_variable α β) : probability_space β :=
{ volume := measure.map X volume,
is_probability_measure := begin
constructor,
erw measure_theory.measure.map_apply X.measurable is_measurable.univ,
exact probability_space.volume_univ
end }
def distribution_function (X : random_variable α ℝ) : ℝ → ennreal := λ x,
@probability_space.prob _ X.induced {c | c ≤ x} is_measurable_Iic
lemma is_measurable_measurable_with_le (xx: β) [measurable_space β] [has_le β] : is_measurable {c | c ≤ xx} :=
sorry
variable XX: random_variable α β
variable xx : β
variable Xlte : has_le β
#check @probability_space.prob β XX.induced {c | c ≤ xx} _ -- probability_space.prob ?M_1 : ennreal
def generalized_distribution_function := @probability_space.prob β XX.induced {c | c ≤ xx} _ -- ERROR
end random_variable
Kevin Buzzard (Dec 18 2020 at 20:27):
As you can see, your variable Xlte
is nowhere in your local context. has_le
is a typeclass, so you need to use square brackets. Note that has_le
is just an arbitrary relation, and whilst I know nothing about probability theory or measure theory, assuming [has_le \beta]
is a very weak statement.
Lars Ericson (Dec 19 2020 at 00:38):
Thanks @Kevin Buzzard I figured out how to state things in a way that typechecks. Not quite done yet, I want to replace ennreal
with real interval [0,1]
following @Adam Topaz 's hint. Here is how it is with ennreal
. There is just no sorry
. Instead I introduced a new structure is_measurable_with_le
to hold the proof that the set is measurable. It's a punt, but a more well-defined punt:
import measure_theory.lebesgue_measure
open measure_theory
noncomputable theory
class probability_space (α : Type*) extends measure_space α :=
(is_probability_measure : probability_measure volume)
namespace probability_space
lemma volume_univ {α : Type*} [probability_space α] : volume (set.univ : set α) = 1 :=
is_probability_measure.measure_univ
def prob {α : Type*} [probability_space α] {S : set α} (h : is_measurable S) : ennreal :=
volume S
end probability_space
structure random_variable (α β : Type*) [probability_space α] [measurable_space β] :=
(outcome : α → β)
(is_measurable_outcome : measurable outcome)
variables {α β : Type*} [probability_space α] [measurable_space β]
namespace random_variable
instance : has_coe_to_fun (random_variable α β) := ⟨_,outcome⟩
lemma measurable (X : random_variable α β) : measurable X := X.is_measurable_outcome
def induced (X : random_variable α β) : probability_space β :=
{ volume := measure.map X volume,
is_probability_measure := begin
constructor,
erw measure_theory.measure.map_apply X.measurable is_measurable.univ,
exact probability_space.volume_univ
end }
def distribution_function (X : random_variable α ℝ) : ℝ → ennreal := λ x,
@probability_space.prob _ X.induced {c | c ≤ x} is_measurable_Iic
structure is_measurable_with_le (β: Type*) (b: β) [measurable_space β] [preorder β] :=
(proof: is_measurable {c: β | c ≤ b})
def generalized_distribution_function (α β : Type*) [probability_space α] [mb: measurable_space β]
[po: preorder β] (X: random_variable α β) (b: β) [im: @is_measurable_with_le β b mb po] :=
@probability_space.prob β X.induced {c | c ≤ b} im.proof
end random_variable
Mario Carneiro (Dec 19 2020 at 01:46):
Here are some theorems. Try filling the sorries at the end.
import measure_theory.lebesgue_measure
open measure_theory
noncomputable theory
class probability_space (α : Type*) extends measure_space α :=
(is_probability_measure : probability_measure volume)
namespace probability_space
variables {α : Type*} [probability_space α]
lemma volume_univ : volume (set.univ : set α) = 1 :=
is_probability_measure.measure_univ
lemma volume_le_one (S : set α) : volume S ≤ 1 :=
by rw ← @volume_univ α; exact measure_mono (set.subset_univ _)
def prob (S : set α) : ℝ := (volume S).to_real
def nnprob (S : set α) : nnreal := (volume S).to_nnreal
theorem volume_eq_nnprob (S : set α) : volume S = nnprob S :=
(ennreal.coe_to_nnreal $ ne_top_of_le_ne_top (by rintro ⟨⟩) (volume_le_one S)).symm
theorem prob_eq_nnprob (S : set α) : prob S = nnprob S := rfl
theorem nnprob_le_one (S : set α) : nnprob S ≤ 1 :=
ennreal.coe_le_coe.1 $ by rw ← volume_eq_nnprob; exact volume_le_one S
theorem prob_le_one (S : set α) : prob S ≤ 1 := nnprob_le_one _
theorem zero_le_prob (S : set α) : 0 ≤ prob S := nnreal.zero_le_coe
theorem nnprob_mono {S T : set α} (h : S ⊆ T) : nnprob S ≤ nnprob T :=
ennreal.coe_le_coe.1 $ by simpa only [← volume_eq_nnprob] using measure_mono h
theorem prob_mono {S T : set α} (h : S ⊆ T) : prob S ≤ prob T :=
nnreal.coe_le_coe.2 $ nnprob_mono h
end probability_space
theorem measurable.map_probability_measure {α β} [probability_space α] [measurable_space β]
{f : α → β} (hf : measurable f) :
probability_measure (measure.map f volume) :=
⟨by rw measure_theory.measure.map_apply hf is_measurable.univ;
exact probability_space.volume_univ⟩
structure random_variable (α β : Type*) [probability_space α] [measurable_space β] :=
(outcome : α → β)
(is_measurable_outcome : measurable outcome)
variables {α β : Type*} [probability_space α] [measurable_space β]
namespace random_variable
instance : has_coe_to_fun (random_variable α β) := ⟨_,outcome⟩
variable (X : random_variable α β)
lemma measurable : measurable X := X.is_measurable_outcome
def induced (X : random_variable α β) : probability_space β :=
{ volume := measure.map X volume,
is_probability_measure := X.measurable.map_probability_measure }
variables [preorder β]
def CDF (x : β) : ℝ := @probability_space.prob _ X.induced {c | c ≤ x}
theorem zero_le_CDF (x : β) : 0 ≤ X.CDF x := sorry
theorem CDF_le_one (x : β) : X.CDF x ≤ 1 := sorry
theorem CDF_mono {x y : β} (h : x ≤ y) : X.CDF x ≤ X.CDF y := sorry
variables [topological_space β] [order_closed_topology β] [opens_measurable_space β]
example (b : β) : is_measurable {c | c ≤ b} := is_closed.is_measurable is_closed_Iic
end random_variable
Lars Ericson (Dec 19 2020 at 02:14):
Thank you @Mario Carneiro I will work on that.
Lars Ericson (Dec 19 2020 at 18:23):
I was able to prove zero_le_CDF
and CDF_le_one
. I am stuck on CDF_mono
. I have it reduced to a proposition of form A implies B where I know B. I have a lemma for A. I don't know how to introduce the results of A into the hypothesis set. That's a simple Lean thing but I don't know how to do it. Then if I had that I could finish. Also I have to prove A which also seems simple but I don't know how to get started. Where
A
is{c : β | c ≤ x} ⊆ {c : β | c ≤ y}
B
is@probability_space.prob_mono β X.induced {c : β | c ≤ x} {c : β | c ≤ y} to hp
Here is the sketch so far. The part where I'm stuck as at the bottom:
import measure_theory.lebesgue_measure
import tactic
noncomputable theory
open measure_theory
class probability_space (α : Type*) extends measure_space α :=
(is_probability_measure : probability_measure volume)
namespace probability_space
variables {α : Type*} [probability_space α]
lemma volume_univ : volume (set.univ : set α) = 1 :=
is_probability_measure.measure_univ
lemma volume_le_one (S : set α) : volume S ≤ 1 :=
by rw ← @volume_univ α; exact measure_mono (set.subset_univ _)
def prob (S : set α) : ℝ := (volume S).to_real
def nnprob (S : set α) : nnreal := (volume S).to_nnreal
theorem volume_eq_nnprob (S : set α) : volume S = nnprob S :=
(ennreal.coe_to_nnreal $ ne_top_of_le_ne_top (by rintro ⟨⟩) (volume_le_one S)).symm
theorem prob_eq_nnprob (S : set α) : prob S = nnprob S := rfl
theorem nnprob_le_one (S : set α) : nnprob S ≤ 1 :=
ennreal.coe_le_coe.1 $ by rw ← volume_eq_nnprob; exact volume_le_one S
theorem prob_le_one (S : set α) : prob S ≤ 1 := nnprob_le_one _
theorem zero_le_prob (S : set α) : 0 ≤ prob S := nnreal.zero_le_coe
theorem nnprob_mono {S T : set α} (h : S ⊆ T) : nnprob S ≤ nnprob T :=
ennreal.coe_le_coe.1 $ by simpa only [← volume_eq_nnprob] using measure_mono h
theorem prob_mono {S T : set α} (h : S ⊆ T) : prob S ≤ prob T :=
nnreal.coe_le_coe.2 $ nnprob_mono h
end probability_space
theorem measurable.map_probability_measure {α β} [probability_space α] [measurable_space β]
{f : α → β} (hf : measurable f) :
probability_measure (measure.map f volume) :=
⟨by rw measure_theory.measure.map_apply hf is_measurable.univ;
exact probability_space.volume_univ⟩
structure random_variable (α β : Type*) [probability_space α] [measurable_space β] :=
(outcome : α → β)
(is_measurable_outcome : measurable outcome)
variables {α β : Type*} [probability_space α] [measurable_space β]
namespace random_variable
instance : has_coe_to_fun (random_variable α β) := ⟨_,outcome⟩
variable (X : random_variable α β)
lemma measurable : measurable X := X.is_measurable_outcome
def induced (X : random_variable α β) : probability_space β :=
{ volume := measure.map X volume,
is_probability_measure := X.measurable.map_probability_measure }
variables [preorder β]
def CDF (x : β) : ℝ := @probability_space.prob _ X.induced {c | c ≤ x}
variables [topological_space β] [order_closed_topology β] [opens_measurable_space β]
example (b : β) : is_measurable {c | c ≤ b} := is_closed.is_measurable is_closed_Iic
lemma prob_eq_CDF (x : β) : X.CDF x = @probability_space.prob β X.induced {c | c ≤ x} := rfl
theorem zero_le_CDF (x : β) : 0 ≤ X.CDF x :=
begin
rw prob_eq_CDF,
exact (@probability_space.zero_le_prob β X.induced {c : β | c ≤ x}),
end
theorem CDF_le_one (x : β) : X.CDF x ≤ 1 :=
begin
rw prob_eq_CDF,
exact (@probability_space.prob_le_one β X.induced {c : β | c ≤ x}),
end
lemma factoid (x y : β ): {c : β | c ≤ x} ⊆ {c : β | c ≤ y} :=
begin
sorry -- STUCK
end
theorem CDF_mono {x y : β} (h : x ≤ y) : X.CDF x ≤ X.CDF y :=
begin
repeat { rw prob_eq_CDF },
-- introduce hp: factoid(x y) -- STUCK
-- apply @probability_space.prob_mono β X.induced {c : β | c ≤ x} {c : β | c ≤ y} to hp
sorry
end
end random_variable
Mario Carneiro (Dec 19 2020 at 18:31):
The factoid
is false, because you forgot the essential assumption x <= y
! Do you see why this theorem should be true mathematically?
Mario Carneiro (Dec 19 2020 at 18:37):
also you changed the theorems I wrote by moving them after the variables [topological_space beta]...
line. They were written before that line because they don't require any of that stuff to be proven
Mario Carneiro (Dec 19 2020 at 18:39):
In fact, I think that line is wrong; indeed there shouldn't be any measurable space structure on beta, because the theorems are all about X.induced
which uses only the measurable structure on alpha
Lars Ericson (Dec 19 2020 at 19:03):
I moved those up just to put the sorry
parts the bottom. I will move back down. How is this for factoid
:
lemma factoid [preorder β] (x y : β ) (h : x ≤ y) : {c : β | c ≤ x} ⊆ {c : β | c ≤ y} :=
begin
sorry
end
Given factoid
what's the proof language to introduce the result of a lemma
into the hypothesis set? If I have revised factoid
then I need that to apply here:
theorem CDF_mono {x y : β} (h : x ≤ y) : X.CDF x ≤ X.CDF y :=
begin
repeat { rw prob_eq_CDF },
-- introduce hp: factoid(x y)
-- apply @probability_space.prob_mono β X.induced {c : β | c ≤ x} {c : β | c ≤ y} to hp
sorry
end
The revised sketch is
import measure_theory.lebesgue_measure
import tactic
noncomputable theory
open measure_theory
class probability_space (α : Type*) extends measure_space α :=
(is_probability_measure : probability_measure volume)
namespace probability_space
variables {α : Type*} [probability_space α]
lemma volume_univ : volume (set.univ : set α) = 1 :=
is_probability_measure.measure_univ
lemma volume_le_one (S : set α) : volume S ≤ 1 :=
by rw ← @volume_univ α; exact measure_mono (set.subset_univ _)
def prob (S : set α) : ℝ := (volume S).to_real
def nnprob (S : set α) : nnreal := (volume S).to_nnreal
theorem volume_eq_nnprob (S : set α) : volume S = nnprob S :=
(ennreal.coe_to_nnreal $ ne_top_of_le_ne_top (by rintro ⟨⟩) (volume_le_one S)).symm
theorem prob_eq_nnprob (S : set α) : prob S = nnprob S := rfl
theorem nnprob_le_one (S : set α) : nnprob S ≤ 1 :=
ennreal.coe_le_coe.1 $ by rw ← volume_eq_nnprob; exact volume_le_one S
theorem prob_le_one (S : set α) : prob S ≤ 1 := nnprob_le_one _
theorem zero_le_prob (S : set α) : 0 ≤ prob S := nnreal.zero_le_coe
theorem nnprob_mono {S T : set α} (h : S ⊆ T) : nnprob S ≤ nnprob T :=
ennreal.coe_le_coe.1 $ by simpa only [← volume_eq_nnprob] using measure_mono h
theorem prob_mono {S T : set α} (h : S ⊆ T) : prob S ≤ prob T :=
nnreal.coe_le_coe.2 $ nnprob_mono h
end probability_space
theorem measurable.map_probability_measure {α β} [probability_space α] [measurable_space β]
{f : α → β} (hf : measurable f) :
probability_measure (measure.map f volume) :=
⟨by rw measure_theory.measure.map_apply hf is_measurable.univ;
exact probability_space.volume_univ⟩
structure random_variable (α β : Type*) [probability_space α] [measurable_space β] :=
(outcome : α → β)
(is_measurable_outcome : measurable outcome)
variables {α β : Type*} [probability_space α] [measurable_space β]
namespace random_variable
instance : has_coe_to_fun (random_variable α β) := ⟨_,outcome⟩
variable (X : random_variable α β)
lemma measurable : measurable X := X.is_measurable_outcome
def induced (X : random_variable α β) : probability_space β :=
{ volume := measure.map X volume,
is_probability_measure := X.measurable.map_probability_measure }
variables [preorder β]
def CDF (x : β) : ℝ := @probability_space.prob _ X.induced {c | c ≤ x}
lemma prob_eq_CDF (x : β) : X.CDF x = @probability_space.prob β X.induced {c | c ≤ x} := rfl
theorem zero_le_CDF (x : β) : 0 ≤ X.CDF x :=
begin
rw prob_eq_CDF,
exact (@probability_space.zero_le_prob β X.induced {c : β | c ≤ x}),
end
theorem CDF_le_one (x : β) : X.CDF x ≤ 1 :=
begin
rw prob_eq_CDF,
exact (@probability_space.prob_le_one β X.induced {c : β | c ≤ x}),
end
lemma factoid [preorder β] (x y : β ) (h : x ≤ y) : {c : β | c ≤ x} ⊆ {c : β | c ≤ y} :=
begin
sorry
end
theorem CDF_mono {x y : β} (h : x ≤ y) : X.CDF x ≤ X.CDF y :=
begin
repeat { rw prob_eq_CDF },
-- introduce hp: factoid(x y)
-- apply @probability_space.prob_mono β X.induced {c : β | c ≤ x} {c : β | c ≤ y} to hp
sorry
end
variables [topological_space β] [order_closed_topology β] [opens_measurable_space β]
example (b : β) : is_measurable {c | c ≤ b} := is_closed.is_measurable is_closed_Iic
end random_variable
Mario Carneiro (Dec 19 2020 at 19:32):
Have you read #tpil or played the natural number game? They introduce all the tools you need to write proofs. In this case, you need have := factoid x y h
.
Lars Ericson (Dec 19 2020 at 19:59):
Yes I did both but in their examples the have
is usually something like have hp : p, from h.left
so a labelled proposition from a tactic applied to the existing goal state. I didn't find any examples in #tpil of have :=
with a term. In #tpil and the number game, lemmas and theorems are used into introduce equalities that get applied with rw
tactic or implications that get applied with apply
or exact
.
Lars Ericson (Dec 19 2020 at 19:59):
Anyway, DONE:
import measure_theory.lebesgue_measure
import tactic
noncomputable theory
open measure_theory
class probability_space (α : Type*) extends measure_space α :=
(is_probability_measure : probability_measure volume)
namespace probability_space
variables {α : Type*} [probability_space α]
lemma volume_univ : volume (set.univ : set α) = 1 :=
is_probability_measure.measure_univ
lemma volume_le_one (S : set α) : volume S ≤ 1 :=
by rw ← @volume_univ α; exact measure_mono (set.subset_univ _)
def prob (S : set α) : ℝ := (volume S).to_real
def nnprob (S : set α) : nnreal := (volume S).to_nnreal
theorem volume_eq_nnprob (S : set α) : volume S = nnprob S :=
(ennreal.coe_to_nnreal $ ne_top_of_le_ne_top (by rintro ⟨⟩) (volume_le_one S)).symm
theorem prob_eq_nnprob (S : set α) : prob S = nnprob S := rfl
theorem nnprob_le_one (S : set α) : nnprob S ≤ 1 :=
ennreal.coe_le_coe.1 $ by rw ← volume_eq_nnprob; exact volume_le_one S
theorem prob_le_one (S : set α) : prob S ≤ 1 := nnprob_le_one _
theorem zero_le_prob (S : set α) : 0 ≤ prob S := nnreal.zero_le_coe
theorem nnprob_mono {S T : set α} (h : S ⊆ T) : nnprob S ≤ nnprob T :=
ennreal.coe_le_coe.1 $ by simpa only [← volume_eq_nnprob] using measure_mono h
theorem prob_mono {S T : set α} (h : S ⊆ T) : prob S ≤ prob T :=
nnreal.coe_le_coe.2 $ nnprob_mono h
end probability_space
theorem measurable.map_probability_measure {α β} [probability_space α] [measurable_space β]
{f : α → β} (hf : measurable f) :
probability_measure (measure.map f volume) :=
⟨by rw measure_theory.measure.map_apply hf is_measurable.univ;
exact probability_space.volume_univ⟩
structure random_variable (α β : Type*) [probability_space α] [measurable_space β] :=
(outcome : α → β)
(is_measurable_outcome : measurable outcome)
variables {α β : Type*} [probability_space α] [measurable_space β]
namespace random_variable
instance : has_coe_to_fun (random_variable α β) := ⟨_,outcome⟩
variable (X : random_variable α β)
lemma measurable : measurable X := X.is_measurable_outcome
def induced (X : random_variable α β) : probability_space β :=
{ volume := measure.map X volume,
is_probability_measure := X.measurable.map_probability_measure }
variables [preorder β]
def CDF (x : β) : ℝ := @probability_space.prob _ X.induced {c | c ≤ x}
lemma prob_eq_CDF (x : β) : X.CDF x = @probability_space.prob β X.induced {c | c ≤ x} := rfl
theorem zero_le_CDF (x : β) : 0 ≤ X.CDF x :=
begin
rw prob_eq_CDF,
exact (@probability_space.zero_le_prob β X.induced {c : β | c ≤ x}),
end
theorem CDF_le_one (x : β) : X.CDF x ≤ 1 :=
begin
rw prob_eq_CDF,
exact (@probability_space.prob_le_one β X.induced {c : β | c ≤ x}),
end
lemma factoid [preorder β] (x y : β ) (h : x ≤ y) : {c : β | c ≤ x} ⊆ {c : β | c ≤ y} :=
set.Iic_subset_Iic.mpr h
theorem CDF_mono {x y : β} (h : x ≤ y) : X.CDF x ≤ X.CDF y :=
begin
repeat { rw prob_eq_CDF },
have h1 := factoid x y h,
have h2 := @probability_space.prob_mono β X.induced {c : β | c ≤ x} {c : β | c ≤ y},
exact h2 h1,
end
variables [topological_space β] [order_closed_topology β] [opens_measurable_space β]
example (b : β) : is_measurable {c | c ≤ b} := is_closed.is_measurable is_closed_Iic
end random_variable
Mario Carneiro (Dec 19 2020 at 20:21):
In your example h.left
is a term
Mario Carneiro (Dec 19 2020 at 20:22):
Note that have :=
(which is part of the less abbreviated have h : t := term
) is the tactic mode version; the term mode version is have h : t, from term,
.
Mario Carneiro (Dec 19 2020 at 20:26):
It looks like have
is introduced in https://wwwf.imperial.ac.uk/~buzzard/xena/natural_number_game/?world=5&level=3
Lars Ericson (Dec 19 2020 at 20:36):
Thanks you are correct! In terms of my original generalized distribution_function
goal, the above can be trimmed down to the assumption that a random variable is from a probability space α to a measurable space β which is a preorder. CDF
is from β to ℝ. Ideally it should be to set.Icc (0: ℝ) 1
. I guess the additional theorems will help me prove that and restate def CDF (x : β) : ℝ
as def CDF (x : β) : set.Icc (0: ℝ) 1
? The example at the end shows how to prove that {c | c ≤ b}
is measurable for a type which is a topological space, an order closed topology and a measurable space. Bu those don't seem to be conditions required by the factoid
, which only needs preorder
. Are all these properties needed for anything or just measurable space with preorder?
Mario Carneiro (Dec 19 2020 at 20:55):
Ideally it should be to set.Icc (0: ℝ) 1.
As you can see, we have theorems that it is between zero and one. This is generally preferable up until you actually need to put it into a structure that expects set.Icc (0: ℝ) 1
, which you don't currently have
Mario Carneiro (Dec 19 2020 at 20:57):
The example at the end shows how to prove that {c | c ≤ b} is measurable for a type which is a topological space, an order closed topology and a measurable space. Bu those don't seem to be conditions required by the factoid, which only needs preorder. Are all these properties needed for anything or just measurable space with preorder?
I put it as an example
to show that it is derivable, but so far nothing actually needs those assumptions. You haven't done anything that requires {c | c ≤ b}
to be measurable yet, but if you do those are the assumptions you will need (at that point, and not before)
Lars Ericson (Dec 19 2020 at 21:00):
Can you help me with the mechanics of this:
def distribution_function (x: β) : set.Icc (0: ℝ) 1 := (CDF x).to_set_Icc_01
It seems I would need to define
def nat.to_set_Icc_01 (x: ℝ) := sorry
but that doesn't use the 0/1 theorems and doesn't seem right.
Mario Carneiro (Dec 19 2020 at 21:02):
def CDF_in_01 (x: β) : set.Icc (0 : ℝ) 1 := ⟨X.CDF x, X.zero_le_CDF x, X.CDF_le_one x⟩
Mario Carneiro (Dec 19 2020 at 21:02):
to reiterate, don't use this unless you need it
Lars Ericson (Dec 19 2020 at 21:03):
It's in the nature of CDF, what would be the aesthetic or practical reason not to constrain the codomain in this way?
Mario Carneiro (Dec 19 2020 at 21:04):
because real
has operations on it, like addition, while set.Icc 0 1
doesn't
Lars Ericson (Dec 19 2020 at 21:05):
Anyway isn't it exciting to know that CDF can be generalized to any measurable space with a preorder? It's maybe a useless fact but I've never seen it in any presentation of CDF.
Mario Carneiro (Dec 19 2020 at 21:06):
Sure, this kind of "trivial generalization" comes up all the time in mathlib. It makes you consider questions the original authors hadn't thought about, like what closure properties are needed to prove that the CDF is left-continuous?
Mario Carneiro (Dec 19 2020 at 21:07):
You just read what the authors say, delete all the assumptions, and then just add the assumptions if they become necessary and not before
Mario Carneiro (Dec 19 2020 at 21:08):
A nonzero proportion of the time you never add some of them back
Mario Carneiro (Dec 19 2020 at 21:08):
In fact mathlib has a linter which will do this for you
Lars Ericson (Dec 19 2020 at 21:12):
If you don't mind my quoting your proof above, I can now close my old Stack Exchange question with a definitive answer.
Mario Carneiro (Dec 19 2020 at 21:17):
It seems that you got an answer in that question already:
CDFs are defined only for (even for ) and are used for convenience. The push-forward is of a measure by a Random variable , is the object of interest and it is as general as it can be, since has no structure other than being a measurable space.
Mario Carneiro (Dec 19 2020 at 21:18):
This is another way to say that in a more general context we stop talking about the CDF because the pushforward measure is more general
Mario Carneiro (Dec 19 2020 at 21:19):
A CDF is only useful to the extent that it determines the pushforward measure
Lars Ericson (Dec 19 2020 at 21:27):
CDF is defined on points in S
. The convention is that only the case S=R
has any use. What we showed is that if S
is any measurable space with a preorder, CDF
can still be computed and give back a number in [0,1]
. So I can define a preorder on a domain like binary expression trees of depth at most n
and compute the CDF
of a tree in the space of all binary trees depth at most n
. I can say that and say that it is interesting to me because it is. This has nothing to do with using CDF
to determine the pushforward measure. So that comment doesn't make any sense to me. I got the result I was looking for, and it doesn't seem to be relevant to that comment. That or it's not possible to put a preorder on binary trees, and continuing on in the vein, maybe the claim is that any measurable set with preorder is isomorphic to ℝ
. Is that the case?
Mario Carneiro (Dec 19 2020 at 21:28):
Yes you can compute a number, but the question is what does that number mean? It's not true for a general preorder that giving the CDF of the random variable uniquely determines the distribution of the variable
Mario Carneiro (Dec 19 2020 at 21:29):
maybe the claim is that any measurable set with preorder is isomorphic to ℝ. Is that the case?
This is certainly not the case
Mario Carneiro (Dec 19 2020 at 21:31):
you have a function called CDF, but that doesn't on its own mean much. Here's another function called CDF:
def CDF' (X : random_variable α β) (x : β) : ℝ := 0
This one doesn't even need a preorder structure on beta, so it's a clear improvement. It's also monotone, between 0 and 1, and right continuous
Lars Ericson (Dec 19 2020 at 21:36):
Except our CDF is proven to be a probability_measure
on a measurable_space
with preorder
, by construction. I'll take that. It's not 0.
Mario Carneiro (Dec 19 2020 at 21:36):
Our CDF is no such thing
Mario Carneiro (Dec 19 2020 at 21:36):
it's a real function
Mario Carneiro (Dec 19 2020 at 21:36):
I can put all those assumptions on CDF'
too
Mario Carneiro (Dec 19 2020 at 21:37):
Obviously CDF'
is a silly function, but the point is that we haven't shown anything about CDF
that makes it meaningfully different from CDF'
Mario Carneiro (Dec 19 2020 at 21:38):
And my claim is that the real theorem of interest that will distinguish the two doesn't hold on arbitrary preorders
Lars Ericson (Dec 19 2020 at 21:38):
Obtained by running prob
, where prob
was supplied as a probability_measure
when building the initial probability space, and then running it through induced
on the random_variable
. As long as you have a meaningful and correct probability_space
to begin with, and a meaningful and correct random_variable
, the CDF
will be meaningful and correct and derived from the original probability_space
and the supplied random_variable
. There is nothing trivial about it at all.
Mario Carneiro (Dec 19 2020 at 21:39):
How do you know it is meaningful and correct?
Mario Carneiro (Dec 19 2020 at 21:39):
That's what theorems are for
Mario Carneiro (Dec 19 2020 at 21:39):
and we don't have any
Mario Carneiro (Dec 19 2020 at 21:39):
except the obvious ones that are true also for CDF'
Lars Ericson (Dec 19 2020 at 21:40):
If I start out with say the Steinhaus space and the definition of the random variable for the 2 point-distribution and crank it through I will get the distribution and distribution function for the 2-point random variable. What's missing?
Mario Carneiro (Dec 19 2020 at 21:41):
well that's over real
Mario Carneiro (Dec 19 2020 at 21:41):
I thought you wanted arbitrary preorders?
Lars Ericson (Dec 19 2020 at 21:42):
As another example we could compute distributions of binary trees of depth n
. This is something people actually think about. Why can't I have a CDF of those?
Mario Carneiro (Dec 19 2020 at 21:42):
The main theorem here is that two random variables with the same CDF
are the same random variable
Mario Carneiro (Dec 19 2020 at 21:42):
that's not true for CDF'
Mario Carneiro (Dec 19 2020 at 21:42):
and it's not true for CDF
on some preorders
Mario Carneiro (Dec 19 2020 at 21:43):
Really, I think it's not worth thinking about the CDF in the more unusual codomains. Just call X.induced
CDF if it makes you feel better
Lars Ericson (Dec 19 2020 at 21:46):
So what you proved at the bottom required that S
be a topological space, an order closed topology and an opens measurable space. I don't need those assumptions for def CDF
. If I delete variables [preorder β]
, then def CDF
crashes. It doesn't rely however on being topological space, order closed topology and opens measurable space. Are you saying that I will crash if I try to define a random variable into binary trees of length N with a preorder, because of something I'm not understanding?
Mario Carneiro (Dec 19 2020 at 21:46):
In measure theory, a distribution is no more or less than a measure. It takes all subsets of X as its input, and it doesn't need to be boiled down to a function on X
Mario Carneiro (Dec 19 2020 at 21:49):
I don't need those assumptions for def CDF. If I delete variables [preorder β], then def CDF crashes. It doesn't rely however on being topological space, order closed topology and opens measurable space. Are you saying that I will crash if I try to define a random variable into binary trees of length N with a preorder, because of something I'm not understanding?
This is why you need to prove theorems about your definitions. It's perfectly possible that a definition can be made in some tremendous generality but it doesn't make sense without some more assumptions. In fact def CDF
works using only has_le beta
, it doesn't even need preorder
, but it certainly doesn't make sense in that generality; we already have a theorem that will detect this nonsense, namely CDF_mono
. WIthout that theorem you might well think that there are no problems with that definition
Mario Carneiro (Dec 19 2020 at 21:49):
But you haven't tried to prove anything more complicated about CDF
yet, and when you do you will find that preorder beta
isn't good enough either
Mario Carneiro (Dec 19 2020 at 21:51):
It all comes back to the thing I mentioned earlier: how do people use the definition?
Mario Carneiro (Dec 19 2020 at 21:51):
making definitions doesn't tell you any deep truths about it, because you can very easily write nonsense without noticing
Mario Carneiro (Dec 19 2020 at 21:52):
but once you try to use the definition it all falls down
Lars Ericson (Dec 19 2020 at 21:56):
Which brings me back to trees. A long time ago I helped a mathematician counting trees to write a paper making some probabiliistic arguments about operations on trees. Conceivably, while doing that, in a practical sense, I might want to define a preorder on trees and compute the percentage of trees in the entire space of trees of size n
which were "less than" a single particular tree, or an equivalence class of trees. So that would be the CDF of a single tree. It might be boring nonsense to almost everybody in the world, but for the two people in that room, it might be helpful. So then the question, how much of this will fall down if my trees have a preorder but do not constitute a type which is a topological space, order closed topology and opens measurable space? All of it, some of it or none of it? And if it is all of it, can we go further and say that there is a 1-1 onto map between the reals and any such space?
Reid Barton (Dec 19 2020 at 22:06):
Even in this hypothetical scenario, you could just write down "the probability that a random tree is 'less than' the particular tree in question" and not bother with this CDF terminology stuff.
Lars Ericson (Dec 20 2020 at 15:37):
Yes, I could write "the probability that a random tree is 'less than' the particular tree in question" as CDF T
after defining a preorder on trees, following the development of the generalized distribution function above, which establishes that I can do so in a well-typed way. This is very helpful for me. Thanks!
Lars Ericson (Dec 20 2020 at 16:36):
But I think the answer to the question of why distribution function is only defined by the literature for R is in this comment by Oliver Diaz in the Wikipedia article:
"If (𝑌,𝑑) is s Borel space (a Polish space for instance) then it is measurable isomorphic to (0,1) and one can define an order in 𝑌. The isomorphism however is not explicit, it is based on constructions that use the axiom of choice."
"Measurable isomorphic to (0,1)" is what I was looking for, it gives motivation for only talking about R, in the sense that talking about Y
would be redundant under the isomorphism.
Mario Carneiro (Dec 20 2020 at 17:27):
I don't think that is true at all
Mario Carneiro (Dec 20 2020 at 17:28):
Y can easily be something like bool
with the preorder formulation
Mario Carneiro (Dec 20 2020 at 17:30):
you need a lot more assumptions to make it look like real. Borel won't cut it, Polish might but it's not clear what the reason for such an assumption is
Last updated: Dec 20 2023 at 11:08 UTC