Zulip Chat Archive
Stream: Is there code for X?
Topic: Should there be a ProbabilitySpace class?
Terence Tao (Nov 12 2023 at 22:07):
I'm starting a project involving probability theory and was looking around for the concept of a probability space. Mathlib supports docs#MeasureTheory.MeasureSpace and docs#MeasureTheory.ProbabilityMeasure but not, as far as I can tell, a probability space directly. I started coding my own basic theory for probability spaces at
import Mathlib
class ProbSpace (Ω : Type*) extends MeasurableSpace Ω where
μ : MeasureTheory.ProbabilityMeasure Ω
def prob {Ω : Type*} [ProbSpace Ω] (E : Set Ω) := (‹ProbSpace Ω›.μ E)
notation:100 "P[ " E " ]" => prob E
lemma prob_le_one [ProbSpace Ω] (E : Set Ω): P[ E ] ≤ 1 := by
unfold prob
exact MeasureTheory.ProbabilityMeasure.apply_le_one ‹ProbSpace Ω›.μ E
but I wanted to double-check before going further that I wasn't missing some existing support for probability spaces here. Also, I can't get the above code to work when defining ProbSpace
instead as an extension of MeasureSpace
,
import Mathlib
class ProbSpace (Ω : Type*) extends MeasureTheory.MeasureSpace Ω where
totalProb : MeasureTheory.IsProbabilityMeasure ‹MeasureTheory.MeasureSpace Ω›.volume
def prob {Ω : Type*} [ProbSpace Ω] (E : Set Ω) := MeasureTheory.MeasureSpace.volume E
notation:100 "P[ " E " ]" => prob E
lemma prob_le_one [ProbSpace Ω] (E : Set Ω): P[ E ] ≤ 1 := by
unfold prob
sorry
although it seems from browsing Mathlib that there isn't much support for MeasureSpace
anyway so this seems like no great loss.
Eric Wieser (Nov 12 2023 at 22:09):
I think the intent is that you write out [MeasureSpace Ω] [IsProbabilityMeasure (volume : Measure Ω)]
, though that's not necessarily an argument against the shorthand you're proposing
Eric Wieser (Nov 12 2023 at 22:11):
I think in general there's almost nothing in mathlib about docs#MeasureTheory.MeasureSpace, because it's just a way of saying a measure is "canonical", which doesn't actually carry any mathematical properties.
Terence Tao (Nov 12 2023 at 22:18):
OK in that case I will revert back to my original plan of defining ProbSpace
as an extension of MeasurableSpace
. I would still be interested in figuring out how to close the sorry
in the second code block though - I had a lot of trouble accessing volume
from a ProbSpace
instance (when it was defined as an extension of MeasureSpace
) which suggests to me that I don't understand how inheritance works.
Yaël Dillies (Nov 12 2023 at 22:23):
Here is the canonical way of working with a probability space in mathlib:
variable {Ω : Type*} [MeasurableSpace Ω] {μ : Measure μ] [μ.IsProbabilityMeasure]
Terence Tao (Nov 12 2023 at 22:25):
Doesn't this mean that all probabilistic notation (e.g., P[ ]
, E[ ]
, etc.) would have to explicitly involve μ
?
Terence Tao (Nov 12 2023 at 22:27):
For my application I will be working with multiple probability spaces (there is a lot of conditioning that will happen) so I can't simply declare the data for a probability space as a global variable
that all the notation implicitly uses.
Yaël Dillies (Nov 12 2023 at 22:27):
Yes. this is a conscious design decision: It is very common to have several measures hanging around, so one can't assume that the measure is canonical and provide notation for it.
Yaël Dillies (Nov 12 2023 at 22:28):
So this is not your symmetric polynomials project anymore, but something else?
Terence Tao (Nov 12 2023 at 22:28):
My situation is slightly different; I have multiple spaces hanging around, but each space will have a canonical measure attached to it.
I'm temporarily using the symmetric polynomials github to get started on this, but yes, this is a separate project.
Yaël Dillies (Nov 12 2023 at 22:29):
And will some of your probability spaces derive their canonical measure from other spaces?
Terence Tao (Nov 12 2023 at 22:30):
Yeah, usually by conditioning an existing probability space to some positive probability event.
Terence Tao (Nov 12 2023 at 22:30):
Or by taking Cartesian products of existing spaces in order to generate independent trials.
Terence Tao (Nov 12 2023 at 22:31):
For all these applications it seems logical to me to set up a ProbabilitySpace
class and start defining some operations on it (conditioning, product, etc.).
Yaël Dillies (Nov 12 2023 at 22:32):
Product will work out fine. Conditioning will be a pain, however. You can't let typeclass inference fill this in for you because it requires the (non-typeclass-accessible) information that the event you're conditioning by has nonzero probability.
Terence Tao (Nov 12 2023 at 22:33):
Couldn't I make a definition that spits out some useless default probability space if the event has zero probability, but makes the correct construction if it has positive probability?
Terence Tao (Nov 12 2023 at 22:38):
I have to say, the Mathlib choice to make the probability measure explicit all the time goes against the mainstream practice of probabilists, which is to hide the probability measure as much as possible (unless one is in the situation of having multiple probability measures on a single space, none of which are canonical).
Yaël Dillies (Nov 12 2023 at 22:38):
Sure you can. You will run into subtle design decisions, however: what useless default probability space to pick? how to connect it to docs#ProbabilityTheory.cond ? And you will also encounter API writing.
Yaël Dillies (Nov 12 2023 at 22:39):
(incidentally, people were discussing your symmetric polynomials project, and it was pointed out that it dodged having to write API, so I'm interested to see how you figure this one out)
Terence Tao (Nov 12 2023 at 22:39):
API writing?
Terence Tao (Nov 12 2023 at 22:40):
Do you mean writing classes and their methods?
Yaël Dillies (Nov 12 2023 at 22:40):
It's computer science terminology, but what we mean here by "API" is the set of lemmas and auxiliary definitions that let you interact with a definition.
Terence Tao (Nov 12 2023 at 22:41):
OK. Well, yeah, that's what I'm trying to do right now with creating a ProbabilitySpace
class.
Yaël Dillies (Nov 12 2023 at 22:41):
This is a very important concept for mathlib, and most of the lemmas in mathlib are API lemmas. The non-API lemmas are the "big theorems", which usually have names and take more than a few lines to prove.
Yaël Dillies (Nov 12 2023 at 22:44):
This is not the case with all theorem provers. Eg Isabelle's strong automation means that API are less likely to be provided, since automation can always reprove it on the fly where. But in mathlib it is a very structuring principle, and the quality of an API distinguishes very strongly "one-time projects" (eg your symmetric polynomial project) from 'mathlib-ready code" (eg hopefully what you will soon write about probability spaces).
Terence Tao (Nov 12 2023 at 22:45):
I'm familiar with the concept of object-oriented programming. Is API writing synonymous with this (or a particular implementation of that philosophy)?
Yaël Dillies (Nov 12 2023 at 22:46):
Kind of? Most OOP has no use for lemmas, but if you replace "lemmas" with "methods", you get a pretty good parallel (except that of course Lean methods are def
s, so they too should have some API lemmas written about them!).
Adam Topaz (Nov 12 2023 at 22:47):
FWIW, here's how I would prove your original sorry (including two API lemmas):
import Mathlib
open MeasureTheory
class ProbSpace (Ω : Type*) extends
MeasureSpace Ω,
IsProbabilityMeasure ‹MeasureSpace Ω›.volume
def prob {Ω : Type*} [ProbSpace Ω] (E : Set Ω) := MeasureSpace.volume E
notation:100 "P[ " E " ]" => prob E
lemma prob_univ (Ω : Type*) [ProbSpace Ω] : P[(⊤ : Set Ω)] = 1 :=
IsProbabilityMeasure.measure_univ
lemma prob_mono {Ω : Type*} [ProbSpace Ω] {A B : Set Ω} (h : A ≤ B) : P[A] ≤ P[B] :=
OuterMeasure.mono _ h
lemma prob_le_one [ProbSpace Ω] (E : Set Ω) : P[ E ] ≤ 1 := by
rw [← prob_univ Ω]
simp [prob_mono]
Eric Wieser (Nov 12 2023 at 22:47):
extends IsProbabilityMeasure ‹MeasureSpace Ω›.volume
is a neat trick!
Kalle Kytölä (Nov 12 2023 at 22:48):
Terence Tao said:
Or by taking Cartesian products of existing spaces in order to generate independent trials.
A small warning: (as far as I know) we only have binary products of probability measures currently in Mathlib.
I suggested defining general products of probability measures as one of the desirable projects for new probability contributors. It should not be particularly hard compared to other things that have been done, just no-one has done it yet. Although I wouldn't immediately know what is the cleanest extension theorem that would give arbitrary infinite products of probability measures in Mathlib. (I suspect directly applicable extension theorems are missing, but haven't checked carefully.)
Eric Wieser (Nov 12 2023 at 22:48):
... though actually its unnecessary, you can just write volume
by itself.
Yaël Dillies (Nov 12 2023 at 22:50):
All this said, the best answer I can provide to your titular question is: There should be a ProbabilitySpace
typeclass iff someone can write an API for it that demonstrates it is easier to use than IsProbabilityMeasure
.
Adam Topaz (Nov 12 2023 at 22:51):
Eric Wieser said:
... though actually its unnecessary, you can just write
volume
by itself.
Ah right. that saves a few characters :)
Yaël Dillies (Nov 12 2023 at 22:51):
And now I should go to bed before I rightly learn about the Kolmogorov extension theorem in my lecture tomorrow :wink:
Yaël Dillies (Nov 12 2023 at 22:52):
(quite a shame I don't get to see your questions more often due to timezones)
Terence Tao (Nov 12 2023 at 22:55):
Yaël Dillies said:
All this said, the best answer I can provide to your titular question is: There should be a
ProbabilitySpace
typeclass iff someone can write an API for it that demonstrates it is easier to use thanIsProbabilityMeasure
.
OK. I can't imagine that it would be worse - it would be easy to access the probability measure from the ProbabilitySpace
object, and any lemma using IsProbabilityMeasure
should (I expect) be easily wrapped into a corresponding lemma for ProbabilitySpace
. I'm mostly looking forward to all the measure bookkeeping being done automatically - there will be a lot of random variables associated to various spaces, each equipped with canonical measures, and I simply don't want to explicitly think about exactly those measures are, and focus on the random variables instead.
Terence Tao (Nov 12 2023 at 22:56):
I'll only be needing binary products (and conditional products, but I'll cross that bridge when I come to it), so I'll dodge the need to think about Kolmogorov extension.
Yaël Dillies (Nov 12 2023 at 23:00):
That's all well and good, but we don't want mathlib to acquire such an API only to notice that it all breaks down when people want to consider several probability measures on the same space. One has to accommodate more than just their application, because we want code to be reusable.
Yaël Dillies (Nov 12 2023 at 23:00):
But we can't know whether your code is reusable before you've written. So I would say, go ahead!
Kalle Kytölä (Nov 12 2023 at 23:00):
Terence Tao said:
Doesn't this mean that all probabilistic notation (e.g.,
P[ ]
,E[ ]
, etc.) would have to explicitly involveμ
?
This Mathlib file defines notations such as P[X]
and 𝔼[X]
. It uses MeasureSpace.volume
directly, but is meant to be used in probability theory context when one has one canonical probability measure (per space) at hand. I think you get the notation if you open scoped ProbabilityTheory
, although I haven't used it.
Terence Tao (Nov 12 2023 at 23:02):
Adam Topaz said:
FWIW, here's how I would prove your original sorry (including two API lemmas):
Thanks! It's hurting my brain a bit to think about ProbSpace
as both a MeasureSpace
and a Prop
, but that does shortcut things.
For the purpose of interfacing with existing Mathlib lemmas using IsProbabilityMeasure
etc., how would one access the probabiliity measure in your code explicitly? Things like Ω.volume
etc. didn't seem to work for me, which in particular was preventing me from using existing methods like docs#MeasureTheory.ProbabilityMeasure.apply_le_one to close the argument in the second formulation. I would prefer not to duplicate large parts of the existing Mathlib and just borrow whatever has already been proven for probability measures whenever possible.
Kalle Kytölä (Nov 12 2023 at 23:04):
I think your original code was just missing an explicitly recorded instance of [IsProbabilityMeasure _]
for your volume
. So as soon as the instance is around, it should work. Here's a pedestrian version which does just that:
import Mathlib
open MeasureTheory
class ProbSpace (Ω : Type*) extends MeasureSpace Ω where
totalProb : IsProbabilityMeasure ‹MeasureSpace Ω›.volume
def prob {Ω : Type*} [ProbSpace Ω] (E : Set Ω) := MeasureSpace.volume E
notation:100 "P[ " E " ]" => prob E
instance (Ω : Type*) [ProbSpace Ω] : IsProbabilityMeasure (volume : Measure Ω) where
measure_univ := by
have obs := @ProbSpace.totalProb Ω _
exact measure_univ
lemma prob_le_one [ProbSpace Ω] (E : Set Ω): P[ E ] ≤ 1 := by
unfold prob
exact? -- Works and gives `MeasureTheory.prob_le_one`
Terence Tao (Nov 12 2023 at 23:05):
Kalle Kytölä said:
Terence Tao said:
Doesn't this mean that all probabilistic notation (e.g.,
P[ ]
,E[ ]
, etc.) would have to explicitly involveμ
?This Mathlib file defines notations such as
P[X]
and𝔼[X]
. It usesMeasureSpace.volume
directly, but is meant to be used in probability theory context when one has one canonical probability measure (per space) at hand. I think you get the notation if youopen scoped ProbabilityTheory
, although I haven't used it.
Yeah, I saw that, but was surprised to see this notation not used at all in the rest of Mathlib (and in particular I did not have any existing use cases to learn how to use the notation), so was wondering if it was deprecated or something.
Yaël Dillies (Nov 12 2023 at 23:05):
Oh yeah Kalle is raising a good point. Probabilistic notation is already defined. So you should be able to use [MeasureSpace Ω] [IsProbabilityMeasure (volume : Measure Ω]
wherever you would have used your [ProbSpace Ω]
.
Adam Topaz (Nov 12 2023 at 23:06):
Terence Tao said:
For the purpose of interfacing with existing Mathlib lemmas using
IsProbabilityMeasure
etc., how would one access the probabiliity measure in your code explicitly?
Here's one approach:
import Mathlib
open MeasureTheory
class ProbSpace (Ω : Type*) extends
MeasureSpace Ω,
IsProbabilityMeasure volume
-- the other lemmas from the above have been omitted
def probMeasure (Ω : Type*) [ProbSpace Ω] : Measure Ω := volume
instance isProbabilityMeasureProbMeasure [ProbSpace Ω] :
IsProbabilityMeasure (probMeasure Ω) :=
ProbSpace.toIsProbabilityMeasure
Yaël Dillies (Nov 12 2023 at 23:07):
I don't think the notation is deprecated or anything. It's just not used much because people want to avoid assuming there is one canonical measure on a given space. But for the purpose of your project it is a very sensible solution.
Eric Wieser (Nov 12 2023 at 23:08):
Adam's example becomes even shorter with abbrev
:
class ProbSpace (Ω : Type*) extends
MeasureSpace Ω,
IsProbabilityMeasure volume
abbrev probMeasure (Ω : Type*) [ProbSpace Ω] : Measure Ω := volume
which means lean can already find the instance:
variable (Ω : Type*) [ProbSpace Ω] in
#synth IsProbabilityMeasure (probMeasure Ω)
Eric Wieser (Nov 12 2023 at 23:09):
@Kalle Kytölä's instance
isn't necessary with this spelling, as it is generated automatically thanks to being in the extends
clause
Kalle Kytölä (Nov 12 2023 at 23:11):
Yes, I understand that it is not necessary in the later clever approach, but I believe the lack of the instance was why Terry didn't get access to the probability results automatically in the original question. (That's why I referred to my proposal as "pedestrian", because it only recorded the necessary instance )
Eric Wieser (Nov 12 2023 at 23:12):
Indeed, though if you know the instance is already around and Lean can't find it for some reason, usually there's a better solution than trying to reprove the instance "from scratch"
Eric Wieser (Nov 12 2023 at 23:14):
Going back a little;the problem with introducing such a ProbSpace
class (into Mathlib) is that we'd end up with almost nothing stated about it, and whenever someone contributes something to mathlib about probability spaces, we would probably end up asking them to generalize it to probability measures instead. This is sort of the same fate that befell VectorSpace
, which we eventually eliminated; it was great for teaching and for matching literature, but it provided no mathematical value from a Lean perspective over the Module
that we already had.
Eric Wieser (Nov 12 2023 at 23:15):
(and even the teaching benefits of VectorSpace
are marginal; presumably your students start to wonder why they can't find any results about vector spaces in mathlib, and they have to learn about Module
anyway...)
Terence Tao (Nov 12 2023 at 23:18):
I feel like these sorts of decisions create barriers to entry for incoming mathematicians.
Terence Tao (Nov 12 2023 at 23:21):
With polymorphism I don't see why it's necessary to eliminate these sorts of extensions. I'd much rather work in say AddCommGroup
rather than in some AddMonoid
class carrying around additional axioms as needed, even if most of the tools I used actually come from that more primitive class.
Shreyas Srinivas (Nov 12 2023 at 23:27):
What are the drawbacks to adding some kind of proxy lemmas for special cases (vector spaces) which are internally proved by something like exact <general_case_lemma>
? Is it build time and library size?
Kevin Buzzard (Nov 12 2023 at 23:28):
Terence Tao said:
I feel like these sorts of decisions create barriers to entry for incoming mathematicians.
There are currently many barriers. However there are fewer barriers than there were 6 years ago, when there was no documentation for any of the systems which was suitable for mathematicians, and Q&As were mostly being done with email mailing lists (so nothing in real time). Asking many questions on the Lean chatroom (first Gitter, now Zulip) seemed to be the only way into the subject at the time, and those of us who were around back then are still extremely grateful to the computer scientists who spent a lot of time answering our basic questions.
It's very helpful to hear suggestions about how the software can be made easier for mathematicians, because there certainly exist people who are not prepared to expose their ignorance on Zulip, or would rather learn by reading something than asking questions. For questions like this, I am hoping that they will go away over time as mathlib's coverage of basic notions gets better, and more worked examples appear online explaining how to do basic stuff. Mathematics In Lean #mil is an attempt to do this, but mathematics is a big subject and there are still plenty of gaps.
Eric Wieser (Nov 12 2023 at 23:31):
Terence Tao said:
With polymorphism I don't see why it's necessary to eliminate these sorts of extensions. I'd much rather work in say
AddCommGroup
rather than in someAddMonoid
class carrying around additional axioms as needed, even if most of the tools I used actually come from that more primitive class.
I think the key distinction here is between what you as a user with a specific goal in mind should do, and what mathlib, which wants to work for as many users as possible, should do. For your own development, you should absolutely only focus on the generality you care about, especially if it it makes your proofs easier or more readable. But if you end up contributing it to mathlib, it will slowly evolve to more general cases until you no longer recognize it as the code you wrote!
Terence Tao (Nov 12 2023 at 23:37):
Sure, but as long as the general version of the code is automatically inherited by whatever specific class extensions or instances one actually works with, this should be largely invisible to the end user. I can't imagine the process of refactoring the code to the more abstract class being all that timeconsuming, compared with creating the initial version of the code in the first place.
Terence Tao (Nov 12 2023 at 23:39):
especially if one abstracts out the trivial lemmas early on and relies on them as much as possible, rather than using the specific implementation-dependent components of the class.
Kyle Miller (Nov 12 2023 at 23:40):
Regarding probability in mathlib, my understanding is that right now it is an open question how to design it in a way such that natural steps you'd take in a paper proof can be easily realized in Lean. I don't know the details very well, but for example being able to do things like say "now let X be a independent uniform random variable on {0,1}" and have everything in sight be automatically with respect to a sample space that's a cartesian product with the original one. The lack of material here is as I understand it is simply from lack of development -- maybe you'll be the one to solve it for mathlib @Terence Tao :smile:
Regarding VectorSpace
, this is what the definition would look like:
class VectorSpace (k V : Type _) [Field k] [AddCommGroup V] extends Module k V where
-- no body
It's not good to have this as a class
like this because then realizing you can generalize results gets epsilon more difficult, but I believe there's no harm in having this as an abbreviation. (I've made a mechanism where you could write aliases for the variable?
command -- this command suggests a completed variable
command with any missing typeclasses -- there's an example for VectorSpace
in the test file but I never put it into mathlib proper.) In the long run, I think having more specialized versions of everything is a great idea, but in the short term I believe the abstract way mathlib is developed helps keep development costs down.
Just to mention "API" -- we tend to throw it around loosely around here. It's meant to refer to the definitions and theorems that users of a theory are meant to use (vs more internal material that actually unfolds definitions and breaks abstractions). "Application programming interface" is how CS people refer to the public interface of a library, and it's sort of fitting since it's applications of the theory.
Terence Tao (Nov 12 2023 at 23:52):
Kyle Miller said:
Regarding probability in mathlib, my understanding is that right now it is an open question how to design it in a way such that natural steps you'd take in a paper proof can be easily realized in Lean. I don't know the details very well, but for example being able to do things like say "now let X be a independent uniform random variable on {0,1}" and have everything in sight be automatically with respect to a sample space that's a cartesian product with the original one. The lack of material here is as I understand it is simply from lack of development -- maybe you'll be the one to solve it for mathlib Terence Tao :smile:
Actually, I already had a brief discussion about this on the Zulip at stream:Is+there+code+for+X? topic:Working+with+big-O+notation+like+an+analyst near:397910850 [by the way - how does one properly cite prior threads in Zulip?]. In my ideal world, random variables would not be modeled by a single sample space, but on something resembling an element of a filter on the category of all sample spaces, and then independent copies of random variables could be introduced by restricting to a smaller element of this filter; one would almost never touch the sample space directly except for some foundational lemmas, and manipulate random variables almost exactly like how one already manipulates deterministic quantities. But that's a lengthy discussion for another time perhaps.
Kyle Miller (Nov 13 2023 at 00:11):
I'd like to see an approach like that work (let's use pro-objects instead of amateur-objects!). One question there is whether it can be designed such that mathlib users can work with it in simpler cases without realizing how sophisticated it is underneath.
(Re zulip, I don't know about proper, but I tend to write a link like you did and then work a link to the discussion into my sentence, unless you mean cite as in cite in a paper, in which case I don't know.)
Going back to generality in mathlib, one dream I have is that there will be textbook-like frontend libraries to mathlib that give a standard treatment of a subject in an appropriate level of generality, but then all the proofs are (ideally) corollaries of mathlib theorems. That way (1) you can use such frontend in a project or for teaching if it's sufficient for your needs and (2) you can learn how to get around mathlib by starting with math that you're more familiar with.
Terence Tao (Nov 13 2023 at 00:12):
Hmm, one drawback of going through MeasureSpace
is that the probability is currently having the type of ENNReal
rather than NNReal
. I am vaguely aware of some automatic coercion in IsProbabilityMeasure
that had previously magically converted the type to NNReal
but am not sure how to activate this now in the current version of the class.
Eric Wieser (Nov 13 2023 at 00:21):
(regarding citing zulip, you already have done the important part correctly; links to messages like yours are permalinks even if the thread is renamed, but links to threads always go to the last message, and can break)
Terence Tao (Nov 13 2023 at 00:22):
Ah, OK, the problem is that probMeasure
now has the type of Measure Ω
rather than ProbabilityMeasure Ω
. Hmm, just need to find the right coercion operation, I guess.
Eric Wieser (Nov 13 2023 at 00:25):
I think you might have a better time with docs#MeasureTheory.ProbabilityMeasure after all
Eric Wieser (Nov 13 2023 at 00:27):
So perhaps:
import Mathlib
open MeasureTheory
class ProbSpace (Ω : Type*) extends MeasureSpace Ω, IsProbabilityMeasure volume
@[simps]
def prob (Ω : Type*) [ProbSpace Ω] : ProbabilityMeasure Ω := ⟨volume, inferInstance⟩
(simps
here just generates the trivial lemma that prob.toMeasure = volume
)
Adam Topaz (Nov 13 2023 at 00:28):
@Kyle Miller what is an amateur object? In any case, I think it’s worthwhile to keep the monadic approach in mind as well. Just in the last few days I had to generate some random things, and working monadically was extremely convenient, and do notation allows you to work with random variables as though they are deterministic. See here github.com/adamtopaz/lean_pdist
Kyle Miller (Nov 13 2023 at 00:29):
(It's just a silly joke -- the filter-like thing @Terence Tao was talking about sounded like some pro- construction, which of course is for professionals.)
Adam Topaz (Nov 13 2023 at 00:29):
Of course lean’s do notation won’t work as is for the Giry monad.
Terence Tao (Nov 13 2023 at 00:32):
Eric Wieser said:
So perhaps:
This is now giving the right types, but all the methods now break (I guess because ProbabilityMeasure
doesn't inherit everything from Measure
or something?):
import Mathlib
open MeasureTheory
class ProbSpace (Ω : Type*) extends
MeasureSpace Ω,
IsProbabilityMeasure volume
@[simps (config := .lemmasOnly)]
def probMeasure (Ω : Type*) [ProbSpace Ω] : ProbabilityMeasure Ω := ⟨volume, inferInstance⟩
def prob {Ω : Type*} [ProbSpace Ω] (E : Set Ω) := probMeasure Ω E
notation:100 "P[ " E " ]" => prob E
lemma prob_univ (Ω : Type*) [ProbSpace Ω] : P[(⊤ : Set Ω)] = 1 :=
IsProbabilityMeasure.measure_univ -- no longer works
lemma prob_mono {Ω : Type*} [ProbSpace Ω] {A B : Set Ω} (h : A ≤ B) : P[A] ≤ P[B] :=
OuterMeasure.mono _ h -- no longer works
lemma prob_le_one [ProbSpace Ω] (E : Set Ω) : P[ E ] ≤ 1 := by
rw [← prob_univ Ω]
simp [prob_mono]
lemma prob_le_one' [ProbSpace Ω] (E : Set Ω) : P[ E ] ≤ 1 := by
unfold prob
exact MeasureTheory.prob_le_one -- no longer works
lemma prob_eq [ProbSpace Ω] (E : Set Ω) : P[ E ] = probMeasure Ω E := by
rfl
Terence Tao (Nov 13 2023 at 00:38):
Ah, it's just a matter of using the ProbabilityMeasure
methods instead of the Measure
methods. Now things seem to work properly.
import Mathlib
open MeasureTheory
class ProbSpace (Ω : Type*) extends
MeasureSpace Ω,
IsProbabilityMeasure volume
@[simps (config := .lemmasOnly)]
def probMeasure (Ω : Type*) [ProbSpace Ω] : ProbabilityMeasure Ω := ⟨volume, inferInstance⟩
def prob {Ω : Type*} [ProbSpace Ω] (E : Set Ω) := probMeasure Ω E
notation:100 "P[ " E " ]" => prob E
lemma prob_univ (Ω : Type*) [ProbSpace Ω] : P[(⊤ : Set Ω)] = 1 := by
apply MeasureTheory.ProbabilityMeasure.coeFn_univ
lemma prob_mono {Ω : Type*} [ProbSpace Ω] {A B : Set Ω} (h : A ≤ B) : P[A] ≤ P[B] := by
apply MeasureTheory.ProbabilityMeasure.apply_mono
exact h
lemma prob_le_one [ProbSpace Ω] (E : Set Ω) : P[ E ] ≤ 1 := by
apply MeasureTheory.ProbabilityMeasure.apply_le_one
lemma prob_eq [ProbSpace Ω] (E : Set Ω) : P[ E ] = probMeasure Ω E := by
rfl
Terence Tao (Nov 13 2023 at 00:41):
OK I think I can work with this class, it seems I can readily import anything I need from the existing ProbabilityMeasure
methods.
Eric Wieser (Nov 13 2023 at 00:42):
I would strongly recommend dropping def prob
, it's hiding from Lean that you're working with a probability measure
Terence Tao (Nov 13 2023 at 00:43):
I tried but I don't know then how to get the P[ E ] notation to figure out what Ω is.
Kyle Miller (Nov 13 2023 at 00:43):
Just to show off how dot notation could be used here:
lemma prob_univ (Ω : Type*) [ProbSpace Ω] : P[(⊤ : Set Ω)] = 1 := (probMeasure Ω).coeFn_univ
lemma prob_mono {Ω : Type*} [ProbSpace Ω] {A B : Set Ω} (h : A ≤ B) : P[A] ≤ P[B] :=
(probMeasure Ω).apply_mono h
lemma prob_le_one [ProbSpace Ω] (E : Set Ω) : P[ E ] ≤ 1 := (probMeasure Ω).apply_le_one E
lemma prob_eq [ProbSpace Ω] (E : Set Ω) : P[ E ] = probMeasure Ω E := rfl
Eric Wieser (Nov 13 2023 at 00:46):
Ok, there's something pretty nasty going on with ProbabilityMeasure
:
import Mathlib
open MeasureTheory
variable (Ω : Type*) [MeasurableSpace Ω] (P : ProbabilityMeasure Ω) (x : Set Ω) in
#check P x -- (fun s => ENNReal.toNNReal (↑↑↑P s)) x : NNReal
I think you've found a rather sharp edge
Terence Tao (Nov 13 2023 at 00:48):
I actually want to have the probability return a NNReal
rather than an ENNReal
, even if it means that it is a bit harder then to access the more general measure theory lemmas that assume an ENNReal
valued measure.
Terence Tao (Nov 13 2023 at 00:51):
But I assume it is easier to automatically coerce a probability measure to a measure than the other way around, so I would think this isn't a problem?
Eric Wieser (Nov 13 2023 at 00:58):
My comment above is a remark that something got broken during the port from Lean 3 to Lean 4 with ProbabilityMeasure; compare docs#MeasureTheory.ProbabilityMeasure.coeFn_univ (which is a mess of fun
and arrows) with docs3#measure_theory.probability_measure.coe_fn_univ (which is easy to read). I've made a new thread to discuss that issue.
Terence Tao (Nov 13 2023 at 01:29):
OK. Well as long as the existing methods don't break when cleaning them up, it should be fine on my end; I can still read the documentation even though there are quite a few coercion arrows.
For what it's worth, I've started a new project at https://github.com/teorth/pfr to start working with probability spaces (my immediate goal is to start proving some of the Shannon entropy inequalities; I don't plan to comprehensively develop the theory of probability spaces, but perhaps some of the material I end up developing can end up part of such a larger goal).
Sebastien Gouezel (Nov 13 2023 at 07:03):
I'm arriving after the battle, but let me summarize how we do probability in mathlib currently. Typically, we would have
open scoped MeasureTheory ProbabilityTheory ENNReal NNReal
variable {Ω : Type*} [MeasureSpace Ω] [IsProbabilityMeasure (ℙ : Measure Ω)]
and then start working with ℙ
. The drawback is that the probability of an event is an extended positive real, not a real. We should probably introduce a notation ℙᵣ
for the real-valued version, but I don't think this has been done yet. The notations such as 𝔼
for the expectation are available. For an example, see docs#ProbabilityTheory.strong_law_ae, where you will see how we speak of independent identically random variables also.
Sebastien Gouezel (Nov 13 2023 at 07:05):
We also have a good deal on conditional expectations, with handy notations such as 𝔼[X|m]
(see the file https://github.com/leanprover-community/mathlib4/blob/master/Mathlib/Probability/Notation.lean for typical notations we use), and martingales. See for instance docs#MeasureTheory.Submartingale.ae_tendsto_limitProcess for the almost sure convergence of L^1
submartingales.
Sebastien Gouezel (Nov 13 2023 at 07:08):
Using docs#MeasureTheory.ProbabilityMeasure is not the usual way to use a probability measure: it is rather for when you want to consider a bunch of probability measures on a given space, and speak of properties in the space of probability measures, like weak convergence.
Sebastien Gouezel (Nov 13 2023 at 07:10):
There are still big holes in the library. IMHO, the two main ones are:
- We don't have the Kolmogorv extension theorem (yet, although it is in the works), so there is no direct way to construct a sequence of iid random variables. On the other hand, nothing prevents you from assuming the existence of such a sequence and working from there, as in the strong law of large numbers above.
- We don't have any Markov chain theory.
We also miss Brownian motion, characteristic functions, stable laws, entropy, and so on.
Rémy Degenne (Nov 13 2023 at 07:38):
Let me add a short example of how to use the probability notations, and how mathlib deals with default measures on a measure space:
import Mathlib.Probability.Notation
open MeasureTheory ProbabilityTheory
open scoped ENNReal NNReal
variable {Ω : Type*} [MeasureSpace Ω] [IsProbabilityMeasure (ℙ : Measure Ω)]
-- ℙ is notation for the measure, with values in ℝ≥0∞. We don't have ℝ≥0 or ℝ valued versions (yet?)
example (A : Set Ω) (hA : MeasurableSet A) : ℙ Aᶜ = 1 - ℙ A := prob_compl_eq_one_sub hA
-- For any measure μ, `μ[X]` is notation for `∫ ω, X ω ∂μ`, hence the expectation can be obtained
-- by writing `ℙ[X]`. We also have the notation `𝔼[⬝]`, which is `∫ ω, X ω ∂ℙ`.
example (X : Ω → ℝ) : ℙ[X] = 𝔼[X] := rfl
example (X : Ω → ℝ) : 𝔼[X] = ∫ ω, X ω ∂ℙ := rfl
-- In the example below, `Integrable X` means the same as `Integrable X ℙ`. This is done by the
-- `volume_tac` tactic in the definition, which used ℙ if no measure is given:
-- `def Integrable {α} {_ : MeasurableSpace α} (f : α → β) (μ : Measure α := by volume_tac) `
example (X : Ω → ℝ) (hX : Integrable X) (h : ∀ ω, X ω ≤ 1) : 𝔼[X] ≤ 1 :=
calc ℙ[X] ≤ 𝔼[(1 : Ω → ℝ)] := integral_mono hX (integrable_const _) h
_ = 1 := by simp
-- Several other definitions can use the default measure ℙ when the measure is not provided:
example (X : Ω → ℝ) : ∫ ω, X ω = ∫ ω, X ω ∂ℙ := rfl
-- this is used a lot in the analysis files with integrals w.r.t. the Lebesgue measure.
-- In Mathlib, you will find almost sure equalities written like this: `X =ᵐ[μ] Y`. But we have
-- notation that does not need the measure if you want to use ℙ
example (X Y : Ω → ℝ) : X =ₐₛ Y ↔ X =ᵐ[ℙ] Y := Iff.rfl
The main way we avoid writing the measure everywhere is by adding volume_tac
to definitions. You will probably encounter several definitions that should use it but don't: there is not a lot of code written about MeasureSpace
, so the use of volume_tac is not very well tested.
Rémy Degenne (Nov 13 2023 at 07:40):
About the Kolmogorov extension theorem: we have a full proof with Peter Pfaffelhuber, but it still needs to be translated into PRs to mathlib. I'll be doing that soon (perhaps slowly).
Rémy Degenne (Nov 13 2023 at 07:44):
I am working on something related to what you are doing. On branch#RD_kl , I proved positivity of the Kullback-Leibler divergence and a Donsker-Varadhan duality formula. I see that we both proved some basic properties of x * log x
(see the KullbackLeibler.lean file here: https://github.com/leanprover-community/mathlib4/compare/RD_kl ).
Terence Tao (Nov 13 2023 at 07:46):
Sebastien Gouezel said:
I'm arriving after the battle, but let me summarize how we do probability in mathlib currently. Typically, we would have
open scoped MeasureTheory ProbabilityTheory ENNReal NNReal variable {Ω : Type*} [MeasureSpace Ω] [IsProbabilityMeasure (ℙ : Measure Ω)]
and then start working with
ℙ
.
Hmm, this doesn't look convenient for my application at all. Frequently I will be having a random variable X : Ω → S
and a positive probability event E : Set Ω
, and I will wish to create a new conditioned random variable (X|E) : E → S
formed by conditioning X
to E
, which I would like to also endow with the structure of a probability space. So the sample space and the probability measure will change repeatedly, and is not something I can handle as a global variable
. (This is in order to be able to state and prove the entropy chain rule H[X,Y] = H[Y] + H[X|Y]
, where H[X|Y]
is defined as ∑ y, P[ Y = y ] H[ (X|Y=y) ]
, and then go on to prove the Shannon entropy inequalities and thence on to my actual application.)
I think I'm still going to try to make the ProbabilitySpace
class approach work for now. I still need to figure out how to artificially handle this conditioning when the event E
has probability zero (or is not measurable at all) but apart from that technical issue I think I can see how it is going to work for my application. [Note that this sort of conditioning is not exactly the same as conditioning to a sub-sigma-algebra, which is what seems to be what is currently supported in Mathlib.]
Sebastien Gouezel (Nov 13 2023 at 07:58):
There shouldn't be any problem to have locally an instance with the following: if you have a MeasureSpace
instance on a space, with [IsProbabilityMeasure (ℙ : Measure Ω)]
, then endow any subset with the normalized retricted probability measure, and declare this as a measure space with its canonical measure and register also automatically the fact that it is a probability measure. There is of course the issue of zero measure sets -- then use a Dirac mass or something arbitrary if it is nonempty. If if is empty, unfortunately, there is no way out.
Terence Tao (Nov 13 2023 at 08:04):
It's going to be really annoying if I have to submit a certificate of non-emptiness every time I want to invoke conditioning to an event. The other option I was thinking of was to work with measures that are either probability measures or zero (with conditioning defined to produce the zero measure if the event has zero probability, and in particular is empty) and have some side lemmas to the effect that the conditioned measure is still a probability measure if the event conditioned to had positive probability. Not a great solution but the best I can come up with so far.
Terence Tao (Nov 13 2023 at 08:10):
Hmm, maybe I should work with a FiniteMeasureSpace
class with an additional isProbabilitySpace
attribute that is preserved in "good" cases but can be lost in "bad" operations such as conditioning to the empty set. A bit unintuitive, but might be manageable.
Sebastien Gouezel (Nov 13 2023 at 08:10):
You can just define the induced measure as the restricted measure divided by ℙ E
(so you get a MeasureSpace
instance on the subtype), register an instance that it's always a finite measure, and register another instance saying that it's a probability measure under the assumption Fact (0 < ℙ E)
.
Sebastien Gouezel (Nov 13 2023 at 08:11):
We have already classes isFiniteMeasure μ
and isProbabilityMeasure μ
for this.
Terence Tao (Nov 13 2023 at 08:12):
Yeah OK I think I will try this. Good thing I only just got started on this project, this would have been much more of a pain if I had realized I had to do this refactoring a thousand lines of code into the project.
Terence Tao (Nov 13 2023 at 08:14):
Ah and now I don't have to futz around with completely artificial Dirac measures either (using a Classical choice function or something to select the location of the mass).
Rémy Degenne (Nov 13 2023 at 08:14):
Sebastien Gouezel said:
You can just define the induced measure as the restricted measure divided by
ℙ E
(so you get aMeasureSpace
instance on the subtype), register an instance that it's always a finite measure, and register another instance saying that it's a probability measure under the assumptionFact (0 < ℙ E)
.
That normalized restricted measure can be obtained with docs#ProbabilityTheory.cond
Sebastien Gouezel (Nov 13 2023 at 08:15):
Except that cond
is a measure on the whole space, not on the subtype.
Sebastien Gouezel (Nov 13 2023 at 08:23):
Here is a rough sketch of what it could look like:
import Mathlib.Probability.StrongLaw
import Mathlib.Probability.ConditionalProbability
attribute [-instance] MeasureTheory.Measure.Subtype.measureSpace
open scoped MeasureTheory ProbabilityTheory ENNReal NNReal
open MeasureTheory
noncomputable section
def Subtype.probSpace {α : Type*} [MeasureSpace α] {E : Set α} : MeasureSpace E :=
{ volume := Measure.comap Subtype.val (ProbabilityTheory.cond ℙ E : Measure α) }
attribute [local instance] Subtype.probSpace
variable {Ω : Type*} [MeasureSpace Ω]
instance {E : Set Ω} : IsFiniteMeasure (ℙ : Measure E) := sorry
instance {E : Set Ω} [IsFiniteMeasure (ℙ : Measure Ω)] [Fact (0 < ℙ E)] :
IsProbabilityMeasure (ℙ : Measure E) := sorry
lemma foo (f : Ω → ℝ) (hf : Integrable f) (E : Set Ω) :
Integrable (fun (x : E) ↦ f x : E → ℝ) := sorry
variable [IsProbabilityMeasure (ℙ : Measure Ω)]
...
Rémy Degenne (Nov 13 2023 at 08:26):
I am trying this approach but I get into issues with lemmas about that comap of Subtype.val asking for MeasurableSet E
, which is not something the instance mechanism will find on its own.
Sebastien Gouezel (Nov 13 2023 at 08:28):
Right. One would need to add a [Fact (MeasurableSet E)]
in addition to [Fact (0 < ℙ E)]
.
Rémy Degenne (Nov 13 2023 at 08:30):
Yes. I worry that the experience with this will not be very smooth, because we never use measurability as a Fact
anywhere in the library, but I don't see another way.
Sebastien Gouezel (Nov 13 2023 at 08:30):
(Personally, to avoid dependent type hell, I would probably try to work on the original space as much as possible, with the families of measures cond E
over varying E
-- but only experimentation can tell what works best).
Terence Tao (Nov 13 2023 at 15:25):
OK, I think this is how I will start, using a sort of "ProbabilitySpacesWithZero" type, which is a MeasureSpace
equipped with a raw finite measure volume
, which may possibly vanish. The actual measure used in applications will be the normalized measure in which one divides volume
by its total mass; this will usually be a probability measure, except when volume
is zero, in which case one gets the zero measure instead. To condition to an event, one simply restricts the raw measure to the event; no need to perform additional normalization, and now conditioning to empty events works just fine. I got as far as making the barest definitions below; I am intrigued by the idea of making conditional instances of ProbabilityMeasure
contingent on a nondegeneracy condition, but will try to work this out later. Tentatively I think a lot of the foundational theory I wanted to build for probability spaces will also work for probability spaces with zero, with only an occasional additional hypothesis of nondegeneracy needed in various places.
open MeasureTheory
/-- In this project, a ProbabilitySpace is modeled by a MeasureSpace with a finite volume measure that can potentially vanish. The probability measure is then the normalization of this probability measure, bearing in mind that it may be zero. Thus we also consider a measure space with the zero measure to be a (degenerate) example of a ProbabilitySpace-/
class ProbabilitySpace (Ω : Type*) extends MeasureSpace Ω, IsFiniteMeasure volume
/-- The raw, unnormalized measure. Would only be directly used in foundational lemmas typically. --/
def ProbabilitySpace.rawMeasure (Ω : Type*) [ProbabilitySpace Ω] : Measure Ω := volume
/-- The raw measure interpreted as a finite measure. Again, only used in foundational lemmas. --/
@[simps (config := .lemmasOnly)]
def ProbabilitySpace.rawFiniteMeasure (Ω : Type*) [ProbabilitySpace Ω] : FiniteMeasure Ω := ⟨volume, inferInstance⟩
/-- The total mass of the raw measure. -/
def ProbabilitySpace.rawMass (Ω : Type*) [ProbabilitySpace Ω] : NNReal := (ProbabilitySpace.rawFiniteMeasure Ω) Set.univ
/-- The assertion that a probability space is nondegenerate. --/
def ProbabilitySpace.isNondeg (Ω : Type*) [ProbabilitySpace Ω] : Prop := 0 < ProbabilitySpace.rawMass Ω
/-- The normalized measure associated to a probability space -/
noncomputable def ProbabilitySpace.measure (Ω : Type*) [ProbabilitySpace Ω] : Measure Ω := (ProbabilitySpace.rawMass Ω)⁻¹ • volume
/-- The normalized finite measure associated to a probability space -/
noncomputable def ProbabilitySpace.finiteMeasure (Ω : Type*) [ProbabilitySpace Ω] : FiniteMeasure Ω := (ProbabilitySpace.rawMass Ω)⁻¹ •(ProbabilitySpace.rawFiniteMeasure Ω)
/-- prob Ω E is the probability of E in Ω. -/
noncomputable def ProbabilitySpace.prob {Ω : Type*} [ProbabilitySpace Ω] (E : Set Ω) := (ProbabilitySpace.finiteMeasure Ω) E
notation:100 "P[ " E " ]" => ProbabilitySpace.prob E
Sebastien Gouezel (Nov 13 2023 at 15:34):
There is no point in your class ProbabilitySpace
: you could (should?) just write [MeasureSpace Ω] [IsFiniteMeasure (volume : Measure Ω)]
. I guess you would prefer to have one single assumption instead of two for psychological reasons, but in practice it doesn't make any difference. These assumptions will typically be in a variable
block in the top of your file, written there once and for all, so they won't clutter your statements.
In the same way, the standard way to invoke a real Banach space is [NormedAddCommGroup E] [NormedSpace ℝ E] [CompleteSpace E]
. We could have introduced a shortcut [RealBanachSpace E]
, but in fact there is no point in doing so (although it took us collectively some time to become convinced of this).
Eric Rodriguez (Nov 13 2023 at 15:36):
I'm still not convinced of this - I think there should be a way to write [RealBanachSpace E]
that indeed does expand to what we usually write. I think there was some progress to do this automatically via variables!/?
, did @Kyle Miller talk about this recently?
Terence Tao (Nov 13 2023 at 15:49):
Sebastien Gouezel said:
There is no point in your class
ProbabilitySpace
: you could (should?) just write[MeasureSpace Ω] [IsFiniteMeasure (volume : Measure Ω)]
. I guess you would prefer to have one single assumption instead of two for psychological reasons, but in practice it doesn't make any difference. These assumptions will typically be in avariable
block in the top of your file, written there once and for all, so they won't clutter your statements.
I'm not going to be working with a single probability space and a single measure; there will be a lot of them. A typical thing that will happen for instance in my application is that I will start with two random variables X
, Y
on a probability space Ω
with some raw measure volume
, and I will start to need to consider the conditioned random variables (X|Y=y)
defined on a conditioned probability space Y⁻¹'({y})
with the restricted volume measure (and later, I will be taking conditional independent trials, and other various constructions). I can't just choose a single ambient sample space and set it as a global variable
, and so with the orthodox approach I am going to encounter a lot of clutter in my notation. For instance, I will frequently be summing over y
expressions such as P [ Y = y ] H[ X | Y = y ]
that involve random variables on different probability spaces (I am vaguely worried about the "dependent type hell" alluded to earlier that this will entail, but I will cross that bridge when I come to it).
Sebastien Gouezel (Nov 13 2023 at 15:58):
The single variable line at the top, together with instances saying that the measure induced on a subtype is finite if the initial one is finite, should likely be enough for the setting you are describing.
This is probably clear to you, but let me emphasize this. Assume that at the top of your file you have a line
variable {Ω : Type*} [MeasureSpace Ω] [IsFiniteMeasure (volume : Measure Ω)]
then you prove a lemma foo
about Ω
, and then you want to prove another lemma bar
which, in the course of the proof, will use foo
but on a subtype of Ω
. Then your lemma foo
will apply perfectly well, even though it's not to the same Ω
: once foo
is proved, it will apply to any finite measure space you will ever encounter.
Terence Tao (Nov 13 2023 at 16:10):
It's not the lemmas so much that I'm thinking about, but the notation, and in particular my desire to not label probabilistic notation such as probability P[ ]
or Shannon entropy H[ ]
with explicit measures.
Here is a specific use case I have in mind. At some point I will prove the Shannon inequality H[ X, Y ] ≤ H[ X ] + H[ Y ]
whenever X : Ω → S
, Y : Ω → T
are two random variables (taking values in some finite sets S T
). From this inequality I want to deduce the conditional counterpart H[ X, Y | Z ] ≤ H[ X | Z ] + H[ Y | Z ]
, where Z : Ω → U
is some further random variable. The natural way to do this is to expand H[ X, Y | Z ] = ∑ z : U, P[ Z = z ] * H[ X, Y | Z = z ]
, and similarly for H[ X | Z ]
and H[ Y | Z ]
. One then applies the unconditional Shannon inequality for the conditioned random variables (X | Z = z)
, (Y | Z = z)
for each z : U
, and then the conditional inequality will follow from gcongr
. Note here it will be convenient to permit conditioning even if the event Z=z
has zero measure.
In expressions such as ∑ z : U, P[ Z = z ] * H[ X, Y | Z = z ]
, multiple probability spaces and probability measures are in play, and I would prefer to not have to explicitly note all this data in my notation. I believe that a ProbabilitySpace
class will handle this nicely (for instance, the P[ ]
notation already defined does not make explicit reference to the measure). I don't see how to easily express assertions such as H[ X, Y | Z ] = ∑ z : U, P[ Z = z ] * H[ X, Y | Z = z ]
in a lemma just using global variable
s.
Sebastien Gouezel (Nov 13 2023 at 16:19):
The ProbabilitySpace
class is exactly equivalent to the two assumptions MeasureSpace
and IsFiniteMeasure volume
. Whenever you expect that Lean will be able to infer ProbabilitySpace
using typeclass inference, it would also be able to infer MeasureSpace
and IsFiniteMeasure volume
. I am not saying that the ProbabilitySpace
approach is bad, I'm saying that it is completely equivalent to the [MeasureSpace ...] [IsFiniteMeasure ...]
approach.
Terence Tao (Nov 13 2023 at 16:21):
How can I make the notation P[ E ]
work without explicit reference to a probability measure if I am using the [Measure space ...] [IsFiniteMeasure ...]
approach with multiple probability spaces in play even on the same line?
e.g. P[ E ∧ F ] = P[ F ] * P [ E | F ]
where E | F
should be interpreted as the event E
restricted to the probability subspace F
endowed with the conditional probability measure.
Sebastien Gouezel (Nov 13 2023 at 16:29):
When you write P [E]
where E
is a set in some type α
, then Lean will understand that you want to use the canonical volume on α
. It will then look for a MeasureSpace α
instance. Either this one is available in the variables line, or it can be constructed from other instances. For example, if you have registered that a subtype of a MeasureSpace
is also a MeasureSpace
, then Lean will use this instance, and construct the measure you are looking for from the ambient measure.
Terence Tao (Nov 13 2023 at 16:33):
OK. I still think I prefer to package it conceptually as a single package, similar to how for instance [Group ...]
is not commonly deconstructed into [DivInvMonoid ...] [mul_left_inv ...]
. (Also I get to manipulate probabilities as NNReal
s rather than ENNReal
s, which lets me avoid some annoying casting issues.)
Eric Wieser (Nov 13 2023 at 16:34):
If you want NNReals, you can still define:
/-- The canonical measure `volume` as a probability measure -/
@[simps]
def prob (Ω : Type*) [MeasureSpace Ω] [IsFiniteMeasure (volume : Measure Ω)] :
ProbabilityMeasure Ω := ⟨volume, inferInstance⟩
Eric Wieser (Nov 13 2023 at 16:34):
@Sebastien Gouezel, do you think mathlib should have this definition?
Sebastien Gouezel (Nov 13 2023 at 16:40):
No. ProbabilityMeasure Ω
is for the case you want to study the set of all probability measures (with its topology for instance). If you just have one probability measure, it should be a normal measure to benefit from the huge API. What we should definitely have is a way (either notation or definition) to access the value of P s
as a real number (or an NNReal), but this should be an API extension around Measure
.
Eric Wieser (Nov 13 2023 at 16:49):
Could you PR a docstring for ProbabilityMeasure
advising the use of IsProbabilityMeasure
in those cases instead?
Kyle Miller (Nov 13 2023 at 17:52):
Eric Rodriguez said:
I think there was some progress to do this automatically via
variables!/?
, did Kyle Miller talk about this recently?
Unfortunately, the variable?
mechanism is too simple here and fails because pretty printing isn't round tripping. (Yes, it needs to depend on pretty printing...) It seems to work if you set pp.analyze
to true, but I wouldn't count on it, so I'll just say this doesn't work at the moment.
The issue is that IsFiniteMeasure (volume : Measure Ω)
pretty prints as IsFiniteMeasure ℙ
, without reference to Ω
.
example
Eric Rodriguez (Nov 13 2023 at 19:10):
What if you don't open scoped ProbabilityTheory
? [terribly unsatisfactory, I know]
Wrenna Robson (Nov 17 2023 at 16:11):
@Terence Tao I am interested in where your adventures in probability in Lean take you (I have run into some difficulties in it before - actually, at that time as it happens I was using your published set of notes on elementary probability theory as a handy guide, which were very useful!)
In cryptography the situation is even worse than in mainstream mathematics, because there people use probability all the time and they don't even pretend to think about the fact that there's a measure underneath. A nightmare when you try to formalise! But so it goes.
Terence Tao (Nov 17 2023 at 16:14):
Wrenna Robson said:
Terence Tao I am interested in where your adventures in probability in Lean take you (I have run into some difficulties in it before - actually, at that time as it happens I was using your published set of notes on elementary probability theory as a handy guide, which were very useful!)
In cryptography the situation is even worse than in mainstream mathematics, because there people use probability all the time and they don't even pretend to think about the fact that there's a measure underneath. A nightmare when you try to formalise! But so it goes.
In the end we used the standard Mathlib MeasurableSpace
and Measure
packages in our project; see https://leanprover.zulipchat.com/#narrow/stream/412902-Polynomial-Freiman-Ruzsa-conjecture/topic/mathlib-only.20approach.20to.20entropy . The one thing was we permitted the underlying measure to not be a probability measure, but when defining probabilistic notions (such as entropy) we first normalized the measure to be a probability measure (or to be the zero measure, if the underlying measure had zero or infinite mass). This for instance made conditioning, even to an empty event, straightforward; one simply restricts the underlying measure. In a way this models "Probability spaces with zero", but it seems a lot of probability theory survives if one allows for the possibility that the probability measure actually vanishes.
Wrenna Robson (Nov 17 2023 at 16:27):
Thanks, I'll give that a read. Interesting that it works out that way. In cryptography I've sometimes seen a use of sub-probability measures, which one doesn't encounter that often in a first study of probability but I imagine would slot in fairly effortlessly using that approach, as most of the time it wouldn't be relevant.
Terence Tao (Nov 17 2023 at 16:39):
Subprobability measures also occasionally show up in analysis, because the weak limit of a probability measure can be merely a subprobability measure if some of the mass "escapes to infinity".
Wrenna Robson (Nov 17 2023 at 16:48):
That makes sense. In a cryptography/algorithmic context, the escaped mass represents the fact that the probability that a program terminates at all is not one, so if your distribution is one of possible final states, you need to account for non-termination.
Bolton Bailey (Nov 21 2023 at 04:29):
When working in cryptography, using docs#PMF seems like a good option, since all the spaces you work over will be finite/discrete anyway, and you avoid needing to work with measures at all.
Wrenna Robson (Nov 21 2023 at 09:34):
See you'd think they'd be finite but alas.
Wrenna Robson (Nov 21 2023 at 09:34):
But yeah I think that's basically the way to go.
Wrenna Robson (Nov 21 2023 at 09:35):
The issue I find is that cryptographers are not very rigorous in their use of probability...
Wrenna Robson (Nov 21 2023 at 09:36):
Also you do get applications where you discretise continuous distributions.
Shreyas Srinivas (Nov 21 2023 at 09:46):
I think it might be worth having an API for finite discrete probability anyway. For most CS, people are going to be working with Fintypes, Finsets etc, and it gets very tedious when working with coercions and such all the time.
Shreyas Srinivas (Nov 21 2023 at 09:51):
If nothing else, it will be a nice way to show CS people how to do their day to day proofs in lean. Invoking measure theory is a bad idea for that use case.
Wrenna Robson (Nov 21 2023 at 10:07):
I agree, I've said this many many many times before.
Wrenna Robson (Nov 21 2023 at 10:07):
Every time I suggest it I get told it's a bad idea ;)
Wrenna Robson (Nov 21 2023 at 10:07):
I actually wrote most of one once
Wrenna Robson (Nov 21 2023 at 10:08):
To be fair one of the issues is that you do get little wrinkles which are hard to resolve.
Wrenna Robson (Nov 21 2023 at 10:09):
As I recall I based it on FinSupp?
Wrenna Robson (Nov 21 2023 at 10:28):
Though the fact that that is noncomputable might make it annoying for many applications.
Shreyas Srinivas (Nov 21 2023 at 10:36):
I started writing one three days ago
Shreyas Srinivas (Nov 21 2023 at 10:36):
I want to have a discrete math library that picks and chooses what we want from mathlib
Shreyas Srinivas (Nov 21 2023 at 10:37):
I started with probability theory
Shreyas Srinivas (Nov 21 2023 at 10:38):
The biggest issues I have encountered so far come from coercions
Shreyas Srinivas (Nov 21 2023 at 10:42):
I guess I can share this "library" (it is too new, contains very little, and I am not too confident about my design choices) since it isn't connected to any of my ongoing work projects.
Rémy Degenne (Nov 21 2023 at 10:46):
Wrenna Robson said:
Every time I suggest it I get told it's a bad idea ;)
I think it would be a bad idea to develop the theory of probability twice in parallel, once for finite spaces and once in general.
However, it might be a good idea to have definitions specific to finite spaces that are more easily usable than the general ones, provided that whenever possible the proofs of their properties are done by applying the more general results.
Wrenna Robson (Nov 21 2023 at 10:47):
Yes, I agree it shouldn't exist in parallel, but I seemed to recall some resistance to even that before.
Wrenna Robson (Nov 21 2023 at 10:47):
I totally agree with that approach.
Shreyas Srinivas (Nov 21 2023 at 10:47):
Remy the issue is, TCS culture is very different from research math culture. If I tell TCS folks that they can prove something but there is just this hitch with coercions and measurability they'll lose interest
Wrenna Robson (Nov 21 2023 at 10:48):
Yeah, as someone who sits awkwardly astride the two, I also agree with that.
Shreyas Srinivas (Nov 21 2023 at 10:48):
Also, we work a lot with interconnected horizontal abstractions rather than vertical abstractions.
Wrenna Robson (Nov 21 2023 at 10:50):
We do have an InformationTheory folder currently in Mathlib but it really only contains my definitions for a Hamming space. In an ideal world where I have infinite money and time to organise us that is where I would like to put the TCS-world stuff.
Wrenna Robson (Nov 21 2023 at 10:51):
What would probably be good in general is a work plan, to work out what abstractions and content we would need. But this is moving away from the topic of this thread.
Shreyas Srinivas (Nov 21 2023 at 10:53):
I find that in CS, the best way to start something is to not plan too much
Shreyas Srinivas (Nov 21 2023 at 10:53):
(which is a bit ironic)
Shreyas Srinivas (Nov 21 2023 at 11:25):
Here's the repository. I only started it this saturday. So it is brand new, has very little, and the few proofs it has could do with polishing.
Mauricio Collares (Nov 21 2023 at 11:37):
(deleted)
Wrenna Robson (Nov 21 2023 at 12:57):
Yes - I am not sure about some of your definitions. And if we can possibly build on the existing foundation, even if obscuring the measure-theoretic approach, that may be better.
I am not sure why your "prob" function is part of the Probability class, because surely it is implied by pmf
. Or vice versa, if you like.
Shreyas Srinivas (Nov 21 2023 at 13:01):
Yes that's true. I'll change that. My immediate goal is to get to a point where I can talk about concentration bounds easily. I am really looking forward to that inequality rewriting tactic to make life simpler
Shreyas Srinivas (Nov 24 2023 at 16:39):
Rémy Degenne said:
Wrenna Robson said:
Every time I suggest it I get told it's a bad idea ;)
I think it would be a bad idea to develop the theory of probability twice in parallel, once for finite spaces and once in general.
However, it might be a good idea to have definitions specific to finite spaces that are more easily usable than the general ones, provided that whenever possible the proofs of their properties are done by applying the more general results.
I think I found the perfect counter argument to this. A few years ago, in the linked message and a few preceding messages Kevin was explaining how words like "constructivisim" or "different types of equality" are complete anathema to working mathematicians . An analogous argument applies in the case of most theoretical CS folks for functional analysis / measure theory /infinite graphs or anything more advanced than very basic abstract algebra. Although a fraction of the community working in ML theory is quite comfortable with measure theoretic analysis and many of us have math backgrounds, just uttering those words could be enough to send most people in an algorithms or complexity theory group running away (jokes aside they'll ask you to describe stuff in elementary ways). Most of the math we work with can be taught to students right out of high school.
Shreyas Srinivas (Nov 24 2023 at 16:40):
This is why I think a discrete probablility class is not a bad idea, although I can see that it is not a great value-add to Mathlib
Eric Wieser (Nov 24 2023 at 18:08):
Developing the same thing twice can be very expensive; here it could mean some combination of:
- every probability theory contributor now has to do twice as much work, resulting in fewer contributions
- probability theory users find gaps in the API twice as often
- Maintainers have to manage increasingly diverging APIs
Shreyas Srinivas (Nov 24 2023 at 18:13):
I get why this is not going to work for Mathlib. Hence the proposal and initiation of a complementary library from the discrete math side, that puts the needs of TCS people first, but picks and chooses from Mathlib as required.
Eric Wieser (Nov 24 2023 at 18:14):
I think building a simplified interface that sits atop mathlib would be a very reasonable idea for a downstream project
Shreyas Srinivas (Nov 24 2023 at 18:15):
It is not about the technical aspect. I have learnt from first hand experience that I cannot sell this to theoreticians unless they see that they can do proofs in their way.
Terence Tao (Nov 24 2023 at 18:42):
I like this idea, particularly if it comes with some example
code that people not already expert in Lean can start playing with immediately.
Shreyas Srinivas (Nov 24 2023 at 19:18):
I outlined some heuristics for such a library in the README here: https://github.com/Shreyas4991/DiscreteMathLean
I like the idea of having such examples and will add it to the aims. I only started it last Saturday and have not spent more than 2 hours with it, so there isn't much content yet.
Terence Tao (Nov 24 2023 at 19:24):
Some of the methods in https://teorth.github.io/pfr/docs/PFR/Entropy/Measure.html could be useful for this library, e.g., pfr#integral_eq_sum and pfr#ae_iff_of_fintype
Eric Wieser (Nov 24 2023 at 20:42):
Those results probably belong in mathlib itself
Wrenna Robson (Nov 27 2023 at 08:07):
Yeah I'm not really sure about a library that depends on Mathlib but is separate to it. I would much rather have discrete maths in Mathlib, which we can do, and this still doesn't stop theoreticians doing "proofs in their way".
Peter Nelson (Dec 06 2023 at 16:05):
Clearly mathlib needs to eventually have a defined scope, but I would find it very disheartening if something like a usable API for discrete probability is considered to be outside it.
It would be telling everyone working in discrete probability (which is hardly a niche area) that 'mathlib is not for you', and forcing an API to exist in some other project, which is just kicking the 'parallel API' problem can down the road.
To me this would be like dropping linear algebra support for matrices in mathlib because everything is just a linear map.
Shreyas Srinivas (Dec 06 2023 at 16:25):
I think mathlib developers do want a discrete probability API. So what you point out is already not an issue.
They just want it to be a front end to the measure theoretic version. It will take a lot of API development before a discrete math person can stay entirely in the discrete world with the API. So if a TCS (theoretical CS) person comes up with the purely discrete argument for a probability lemma, it is not clear if
a) Mathlib will want that proof as opposed to a more general proof
b) The TCS person will want to invest effort into this more general proof when we don't need it at all.
This also applies to theorems in graphs where infinite graphs are involved for another example.
Ruben Van de Velde (Dec 06 2023 at 16:25):
I think the thing mathlib would want to avoid is an API for discrete probability that is not connected to the rest of the library. I would hope that we could create a usable API that is connected
Peter Nelson (Dec 06 2023 at 16:29):
Thanks, those comments clarify things. Perhaps being lenient in accepting PRs where (a) might apply would mitigate some of the problems.
An existing example that springs to mind is Matrix.rank_transpose
, which is clearly there in insufficient generality, but is better than nothing.
Eric Wieser (Dec 07 2023 at 14:48):
I think insufficiently general results are much less of a problem than insufficiently general definitions
Shreyas Srinivas (Dec 07 2023 at 15:03):
Eric Wieser said:
insufficiently general definitions
That's basically most of TCS if "sufficiently" is defined from a pure math perspective.
Peter Nelson (Dec 09 2023 at 11:48):
As well as a lot of discrete maths.
SimpleGraph
is certainly an insufficiently general definition (the clue is in ‘Simple’!). There is already a good amount of stuff in the API (eg paths) that really should be defined in at least the multigraph setting, and generalizing isn’t just a matter of flipping a switch.
I think the genuine problem is partly nothing to do with formalization - it’s that if you go to a graph theory conference and ask everyone what a graph is, you’ll get answers that are not just cosmetically different, but inequivalent.
I don’t know if this problem is peculiar to combinatorics, but it clearly hasn’t been resolved. If there were an obvious way to define a combinatorial object that contains multigraphs and hypergraphs that didn’t have significant tradeoffs, I’d have tried to PR it.
Yaël Dillies (Dec 09 2023 at 11:53):
@Kyle Miller and I have been thinking about this specific problem quite a lot, and we have a solution for multigraphs (not including hypergraphs). It's currently stuck on small technicalities.
Wrenna Robson (Dec 09 2023 at 18:19):
I have had similar issues with defining linear codes.
Wrenna Robson (Dec 09 2023 at 18:19):
In theory this is very easy.
Wrenna Robson (Dec 09 2023 at 18:20):
In practice there are just a number of little technical hitches and different definitions that trying to find something that lies flat is very frustrating!
Last updated: Dec 20 2023 at 11:08 UTC