Zulip Chat Archive

Stream: Is there code for X?

Topic: probability


Jalex Stark (May 04 2020 at 16:00):

I remember some people talking about how it's hard to think of a good API for probability (with random variables and independence and all that) in the context of proving the existence of special graphs with the probabilistic method

Jalex Stark (May 04 2020 at 16:01):

do we have any nontrivial theorems involving an infinite sequence of random variables on the same space, say a central limit theorem?

Jalex Stark (May 04 2020 at 16:02):

If not, I think that "prove anything that looks like a central limit theorem, and then study it to understand what is needed from the probability API" is a good line of work

Johan Commelin (May 04 2020 at 16:37):

We have a whole bunch of stuff on measure spaces. But nothing specialised to probability. At some point there was a PR by @Koundinya Vajjha. I don't know what happened to it. Maybe it can be resurrected?

Jalex Stark (May 04 2020 at 16:40):

Yeah, I agree that the measure spaces are mature enough that it should be easy to do probability on top of

Koundinya Vajjha (May 04 2020 at 16:52):

Apologies. I do have a tiny probability theory library up in a different repo, but it's out of date and breaks with current mathlib. I keep putting it off but I promise to devote some time to it the next couple of weeks and will try to PR.

Patrick Massot (May 04 2020 at 16:53):

PR, PR, PR before it's too late. Every Lean code that is not PRed to mathlib is doomed to bit rot.

Jalex Stark (May 04 2020 at 17:01):

link to your repo?

Koundinya Vajjha (May 04 2020 at 17:05):

Jalex Stark said:

link to your repo?

https://github.com/jtristan/stump-learnable/tree/master/src/lib/attributed

David Wärn (May 04 2020 at 17:31):

This looks quite nice! I can see a pi-instance for measurable spaces, and binary products for measure spaces: is there a pi-instance for probability spaces somewhere?

Koundinya Vajjha (May 04 2020 at 17:55):

no, sorry. isn't that https://en.wikipedia.org/wiki/Ionescu-Tulcea_theorem?

Koundinya Vajjha (May 04 2020 at 17:56):

I think proving Ionescu Tulcea was originally what @Johannes Hölzl was aiming at

David Wärn (May 04 2020 at 18:42):

Hmm I haven't seen that. I'm thinking of something like question 14 here (countability is beside the point)
http://www.dpmms.cam.ac.uk/study/II/Probability%2BMeasure/ex2.pdf
It seems to me like you have all the ingredients for this construction, but it's maybe a bit technical

Johan Commelin (May 05 2020 at 04:33):

@Koundinya Vajjha Would you mind starting a github project over at https://github.com/leanprover-community/mathlib/projects with little stubs of what you think should be done?

Jalex Stark (May 15 2020 at 15:04):

I guess what I want from a probability interface is to just expose the "algebra of random variables". The fact that this can be modeled with measure spaces should be left on the inside

Sebastien Gouezel (May 15 2020 at 15:36):

Jalex Stark said:

I guess what I want from a probability interface is to just expose the "algebra of random variables". The fact that this can be modeled with measure spaces should be left on the inside

Many interesting theorems in probability really depend on the fact that you have a measure space, and that the notion of "almost surely" is well behaved. Think of the strong law of large numbers, or the martingale convergence theorem, for instance.

Jalex Stark (May 15 2020 at 15:45):

how much well behavedness do you need? isn't it enough to know that countable intersections of almost sure events are almost sure?

Mario Carneiro (May 15 2020 at 15:47):

isn't that basically a measure space?

Mario Carneiro (May 15 2020 at 15:48):

or at least a sigma algebra

Mario Carneiro (May 15 2020 at 15:49):

I don't know how you intend to say anything quantitative about probabilities themselves without a measure though

Reid Barton (May 15 2020 at 15:50):

I think the idea behind "algebra of random variables" is that you want to avoid committing to the idea that random variables are functions at all

Reid Barton (May 15 2020 at 15:51):

or at least, you want to avoid committing to the idea that they are functions on some specific measure space Ω\Omega

Mario Carneiro (May 15 2020 at 15:51):

but you can still ask probability questions about them?

Jalex Stark (May 15 2020 at 15:51):

yes

Mario Carneiro (May 15 2020 at 15:52):

then can't you reverse engineer a measure space from that?

Sebastien Gouezel (May 15 2020 at 15:52):

What really matters is not that they are functions, but their joint distribution, as a measure on RN\mathbb{R}^{\mathbb{N}} if you are considering real-valued random variables indexed by integers for instance.

Reid Barton (May 15 2020 at 15:52):

They still have expectations, for example. Except for the ones that don't, and that is where things start to get a bit tricky...

Jalex Stark (May 15 2020 at 15:53):

Mario Carneiro said:

then can't you reverse engineer a measure space from that?

yes, once you've given any finite amount of information about your variables, there's surely some "minimal" measure space + functions from it that models the things you've said

Jalex Stark (May 15 2020 at 15:53):

but you want to remain agnostic about whether you're using the minimal one or an arbitrary extension of it

Mario Carneiro (May 15 2020 at 15:54):

I was thinking of using all the possible questions you could ask about the variables

Mario Carneiro (May 15 2020 at 15:54):

(not finite)

Jalex Stark (May 15 2020 at 15:54):

sure, that will have a minimal model by a measure space

Jalex Stark (May 15 2020 at 15:54):

and we remain agnostic about whether or not we're adding more variables later

Mario Carneiro (May 15 2020 at 15:55):

that seems difficult to do in lean

Mario Carneiro (May 15 2020 at 15:55):

you can't just add elements to a type, not without rejiggering everything on top

Jalex Stark (May 15 2020 at 15:56):

i agree that it's hard

Jalex Stark (May 15 2020 at 15:56):

I don't think it's adding complexity though

Reid Barton (May 15 2020 at 15:57):

It's similar to how you might do a computation using "indeterminates", which you could think of as happening in a polynomial or power series ring, but then later you might introduce another indeterminate.

Jalex Stark (May 15 2020 at 15:57):

if you "do probability" with measure spaces then you're going to be dealing with these extensions of your measure space all the time

Mario Carneiro (May 15 2020 at 15:57):

if you have a measure space you already have all random variables that could ever exist, so adding a new one is trivial

Reid Barton (May 15 2020 at 15:58):

You only have the ones that exist on that space. The kind of thing you would want to add is a new variable that's independent from all of those (e.g., by forming a product measure space)

Jalex Stark (May 15 2020 at 15:58):

if I want to add a new one with the constraint that it's independent from all the ones I have talked about before, without extending the space, that's not trivial
I guess the way you'd do it is to extend to a product space and then apply an isomorphism of measure spaces between the original space and its extension

Jalex Stark (May 15 2020 at 16:00):

and the point is that it would be nice to have machinery that does that for you

Jalex Stark (May 15 2020 at 16:00):

you have to keep track of the notion of "random variables that are in scope for this discussion"

Jalex Stark (May 15 2020 at 16:00):

but you already have to do that when doing probability informally anyway

Mario Carneiro (May 15 2020 at 16:00):

Color me skeptical

Reid Barton (May 15 2020 at 16:00):

The funny thing is that type theory gets to have this built in

Jalex Stark (May 15 2020 at 16:01):

gets what built in?

Reid Barton (May 15 2020 at 16:01):

It basically took all the good ideas for itself but doesn't reexport them to the internal language

Reid Barton (May 15 2020 at 16:01):

having a variable context that can be extended by lambdas, etc

Jalex Stark (May 15 2020 at 16:02):

Mario Carneiro said:

Color me skeptical

I'm taking your comment to mean "the things you're describing sounds possible in principle, but I bet if I thought about it for a while I would find a different way to do probability that felt easier and/or more natural"

Mario Carneiro (May 15 2020 at 16:02):

You would need to first do it the hard way enough times for us to notice the pattern and abstract it into a tactic

Mario Carneiro (May 15 2020 at 16:03):

this is not the kind of design you are likely to get right on the first try

Mario Carneiro (May 15 2020 at 16:04):

I need to see some of these probability arguments where you are dynamically adding variables to give guidance about how best to model them

Sebastien Gouezel (May 15 2020 at 16:06):

As I said, this "minimal model" is exactly the joint distribution of all your random variables on RN\mathbb{R}^{\mathbb{N}} (i.e., the image of your probability on Ω\Omega under the map ω(X0(ω),X1(ω),)\omega \mapsto (X_0(\omega), X_1(\omega), \dotsc)), and your new random variable is just the nn-th coordinate.

Johan Commelin (May 15 2020 at 16:33):

At POPL I had an interesting discussion with Neel Krishnaswami, and he explained that he was working on some DSL in DTT that captured the semantics of probability theory in a way interesting to the working probability theorist. Maybe we should ask him to explain it in more detail. (I guess that most of my first sentence is nonsense on the nose.)

Jalex Stark (May 15 2020 at 16:55):

I think DTT is dependent type theory, what DSL?

Kevin Buzzard (May 15 2020 at 16:56):

Domain specific language?

Johan Commelin (May 15 2020 at 16:56):

domain specific language

Jalex Stark (May 15 2020 at 16:56):

cool, thanks

Kevin Buzzard (May 15 2020 at 16:56):

I put the question mark because I have no idea what this means ;-)

Jalex Stark (May 15 2020 at 16:58):

my guess is it means "programming language where you're always working inside of some specific monad"

Mario Carneiro (May 15 2020 at 16:58):

in this context it probably means that you can do things like Jalex suggested like create new random variables and put them together in a monad

Sebastien Gouezel (May 15 2020 at 16:59):

The Giry monad, right?

Mario Carneiro (May 15 2020 at 16:59):

I've been told there is some foundational issue with higher order functions in the Giry monad though

Mario Carneiro (May 15 2020 at 17:00):

which makes the denotation of "probabilistic programs" dubious

Koundinya Vajjha (May 16 2020 at 03:54):

The category of measurable spaces isn't Cartesian closed. (There isn't a canonical measurable space structure on the function type which makes the evaluation maps measurable). So the Giry monad can't eat higher order functions and so you can use it to denotate only a small uninteresting subset of probabilistic programming languages.

Koundinya Vajjha (May 16 2020 at 03:55):

But there have been other gadgets ("Quasi-Borel Spaces") which overcome this issue IIRC.

Alex J. Best (May 26 2020 at 00:48):

Koundinya Vajjha said:

Apologies. I do have a tiny probability theory library up in a different repo, but it's out of date and breaks with current mathlib. I keep putting it off but I promise to devote some time to it the next couple of weeks and will try to PR.

I've updated (almost all of?) this to a recent mathlib over at https://github.com/alexjbest/stump-learnable

Kenny Lau (May 26 2020 at 04:10):

everything not PR’d is lost in time — Ancient Chinese Proverb

Martin Zinkevich (Sep 18 2020 at 17:05):

Hi all,
I was working on probability and measure theory as well, and was pointed to this thread.

I looked over your code, and noticed you have the basic stuff, as well as advancements like the Giray Monad.

https://leanprover.zulipchat.com/#narrow/stream/116395-maths/topic/Radon-Nikodym.20Theorem

Specifically, see:

https://github.com/google/formal-ml/blob/master/src/formal_ml/probability_space.lean

There are a lot of design questions that it would be good to talk about:

  1. random variables/measurable functions/events/measurable sets as types?
    This allows for there to be semirings, et cetera. OTOH, more stuff to unfold. A key point was thinking about a measurable function
    as a function from a measurable space to another measurable space, and a random variable as a function from a probability space
    to a measurable space, such that expectation and probability of random variables and events could be explicit.

  2. Syntactic sugar? I was impressed that you could write Pr[A ∧ᵣ B]≤ Pr[A], and E[X + Y]=E[X] + E[Y]
    I think that this lift from the complexities and details of measure theory may be important for wider adoption.

  3. probability_space or probability_measure? This question is perhaps the most deep, as going both ways will split the code needlessly. Either is better than going both ways.

  4. independent variables/identical variables: these are very important to me as a ML researcher.

Anyway, we should figure out how to get more probability theory into mathlib.

Johan Commelin (Sep 18 2020 at 17:14):

The Giry monad is now in mathlib, btw

Adam Topaz (Sep 18 2020 at 17:56):

NB: https://github.com/leanprover-community/mathlib/blob/ae72826a13a0ce58bf75cc58c0c2219b567df2fa/src/measure_theory/giry_monad.lean#L16

Adam Topaz (Sep 18 2020 at 17:57):

I.e. mathlib's Giry monad has all measures.

Adam Topaz (Sep 18 2020 at 17:58):

Unless there is another Giry monad in mathlib that I don't know about @Johan Commelin ?

Reid Barton (Sep 18 2020 at 17:58):

I wouldn't be too surprised if this was just because it predated the existence of probability measures

Adam Topaz (Sep 18 2020 at 17:59):

Yeah I'm sure this is the case, but I don't think mathlib has the monadic structure on probability measures. I would be happy to be wrong.

Martin Zinkevich (Sep 18 2020 at 18:40):

Giry monads aside, what are the thoughts on probability spaces versus probability measures?

-Marty

Adam Topaz (Sep 18 2020 at 18:48):

Is a probability space the same thing as a measurable space endowed with a probability measure? If so, this is essentially a bundled vs. unbundled question.

Adam Topaz (Sep 18 2020 at 18:56):

What are the morphisms in the category of probability spaces?

Martin Zinkevich (Sep 18 2020 at 20:15):

Hmmm...I am not sure how to answer that (I don't understand category theory).

Here is my best effort to answer your question literally:

Okay, measurable spaces are over sets, and probability space is with respect to a measurable space. So these are our objects.

Consider probability space PA over A. If you have a function f:A -> B, then:
Define measurable space MB over B such that B' is measurable iff its preimage in A is measurable.
Define probability measure PB over B such that the measure of any measurable B' is the measure of its preimage in A.

So each function from any set A to any set B serves as a morphism. An identity function serves as an identity. Composition of functions
gives the composite morphism. This function would be a measurable function from MA to MB, but not necessarily the only one.

From a category perspective, this is a "subcategory" of all measure spaces. I don't think it has an initial object.

In machine learning, we normally think about a single, "undefined" probability space. You could think about this as an initial object (insofar as the objects we are concerned with, as opposed to in the category of all probability spaces). We often don't think deeply about the nature of this probability space. What is important to us is the random variables defined as function from that probability space to other more meaningful sets, such as the reals, booleans, integers, et cetera. In these domains, what is measurable or not matters. Having a single initial object in our minds allows us to talk about the independence of any two arbitrary random variables in our model, whereas if we define each random variable more directly on its most obvious space (e.g., define a Gaussian as a probability space on the reals, or a binomial as a probability space on the natural numbers), then I can't talk about how they relate. However, mapping back to a "local" probability space is useful if I want to talk about integration.

From a category perspective, this is a "subcategory" of all unsigned measure spaces.

Adam Topaz (Sep 18 2020 at 20:27):

Okay I think I understand. The objects in the category are pairs (X,μ)(X,\mu) where XX is a measurable space and μ\mu is a measure (which so happens to be a probability measure in this case). And a morphism of pairs (X,μ)(X,μ)(X,\mu) \to (X',\mu') is a measurable function f:XXf : X \to X' such that for all measurable subsets YY in XX', one has μ(f1(Y))=μ(Y)\mu(f^{-1}(Y)) = \mu'(Y). Is this the category? This is asking for an equality of measures/functions on the nose, which seems problematic, no? Wouldn't something like this docs#measure_theory.ae_eq_fun have better properties?

Adam Topaz (Sep 18 2020 at 20:35):

I'm not really sure what you mean by ""undefined" probability space"

Reid Barton (Sep 18 2020 at 20:40):

The earlier discussion on this same topic from around May 15 actually discussed this concept a bit.

Adam Topaz (Sep 18 2020 at 20:43):

I think it would be interesting to construct this category in lean, where morphisms extend functions are defined "amost everyone" and agree "amost everywhere", etc.

Reid Barton (Sep 18 2020 at 20:51):

This paper proposes what seems to me a sensible categorical framework for this style of probabilistic reasoning, though I don't know what the implications are for formalization:
https://drops.dagstuhl.de/opus/volltexte/2017/8051/pdf/LIPIcs-CALCO-2017-1.pdf

Adam Topaz (Sep 19 2020 at 02:12):

Looks interesting. I think Voevodsky had some notes on a categorical foundation of probability theory.

Martin Zinkevich (Sep 19 2020 at 02:17):

Okay, I am a little confused. Perhaps I should not have mentioned the Giry monad. :-)
My primary interest is not in using category theory, but in being able to talk about probability and expectation.
If no categories were added, I would be fine. As I said, I don't understand category theory. I am not opposed to what you are suggesting, I just don't understand it.

I do like the idea of thinking about real random variables as a ring and ennreal random variables as a semiring, but that does not require category theory. I am excited about writing things like:
E[X * Y] = E[X] * E[Y]
E[2 * X] = E[X] + E[X]
Pr[X<Y ∧ᵣ B]
And so forth, so I can do machine learning. If category theory helps me in some way (such as being related to how all the theorems seem impossibly short in lean), then I am all for it.

Am I missing something? I know category theory is useful in Haskell programming, but I don't yet understand enough about mathlib and Lean. Does designing with category theory help me avoid disasters in the future?
-Marty

Johan Commelin (Sep 19 2020 at 03:45):

@Martin Zinkevich I think you can safely ignore category theory for the time being. Can you explain more about what is needed to get the nice notation that you wrote about above?
At some point, I've heard Neel Krishnaswami talk about some interesting ideas for making probability theory "possible" in type theoretic theorem provers. The problem (as I understood from him, again, I'm not a probability theorist) is that typical proofs in the area don't fit nicely in the DTT paradigm. So you need some sort of DSL on top of it, in order to add extra dimensions(?) [i.e. new random variables] to you probability space whenever you need them half-way a proof.

Does this make sense? If someone can make sense of what I'm saying, then it might be worth contacting Neel (he doesn't seem to be on this Zulip) in order to get some good advice on design decisions.

Reid Barton (Sep 19 2020 at 10:58):

I think the problem is precisely that nobody really knows how to formalize the style of argument used in probability theory. Of course we can "lower" any specific such argument into a proof in which the random variables become measurable functions on specific probability spaces and then formalize it, but this leaves something to be desired. Trying to understand probabilistic arguments on their own terms is an active topic in applied category theory, which isn't meant to imply that the eventual solution necessarily involves adopting a bunch of category theory, but just that there seems to be some real conceptual problem to solve here.

Reid Barton (Sep 19 2020 at 11:04):

For those who already speak the lingo, though, it seems plausible that these probabilistic arguments are really happening in the sheaf semantics of the topos described in that paper, so that when we assert the existence of a "fresh" random variable ZZ which is independent of XX, YY, ..., that variable ZZ might only exist after we base change along a cover ΩΩ\Omega' \to \Omega and pull back XX, YY, ... to random variables on Ω\Omega'.

Reid Barton (Sep 19 2020 at 11:09):

In any case, it's probably a good idea to start out by defining "a random variable on sample space Ω\Omega valued in R\mathbb{R}", and defining expectation, proving linearity of expectation, etc., at least to the point where we can formalize the sorts of arguments that are supposedly awkward in this style, since it seems like the eventual underlying argument will have to be of this form anyways, and we'll learn more about these arguments by actually trying to formalize them.

Reid Barton (Sep 19 2020 at 11:10):

And maybe it turns out that the direct approach is adequate somehow (for example can we just pick Ω\Omega so large that for any countable collection of random variables on Ω\Omega, we can still find a new random variable which is independent of them?)

Kevin Buzzard (Sep 19 2020 at 11:15):

Can someone explain why this whole probability thing is even an issue? I thought measure theory solved the problem of making probability theory rigorous. What is missing from the standard measure theory approach? Why isn't there just some dictionary expection -> some integral, standard deviation -> some integral etc?

Reid Barton (Sep 19 2020 at 11:33):

I'll try to cook up an example, but at a high level the issue is sort of like: if we want to talk about the local behavior of a function at a point, why don't we just use εs and δs? What's the deal with filters and germs?

Kevin Buzzard (Sep 19 2020 at 11:36):

I see -- so the issue is really that there is some interface problem, rather than a mathematical one?

Reid Barton (Sep 19 2020 at 11:36):

So suppose I'm interested in random graphs on nn vertices, and I've got a real-valued function XX of a graph that I'd like to think of as a random variable, and I want to prove a statement about the expectation of XX.

Reid Barton (Sep 19 2020 at 11:37):

That's fine--we can define Ω\Omega to be the set of all graphs on nn vertices, and equip it with a probability measure μ\mu, and define XX as a function X:ΩRX : \Omega \to \mathbb{R}, and define the expectation as the integral of XX with respect to μ\mu--no problem so far.

Reid Barton (Sep 19 2020 at 11:38):

So, let's say we've managed to write down the statement of our proposition.

Reid Barton (Sep 19 2020 at 11:38):

Now maybe the proof, for whatever reason, starts like this:
Define a new random variable YY as follows: flip a coin, and if it's heads return XX; otherwise return 00.

Reid Barton (Sep 19 2020 at 11:39):

Where obviously the coin flip is meant to be independent of the random graph we're talking about

Reid Barton (Sep 19 2020 at 11:40):

So, now I guess we have a problem because there is literally no "random variable" Y:ΩRY : \Omega \to \mathbb{R} which has the correct relationship to XX (if I failed to set up the situation correctly, replace it with a better one :upside_down:).

Reid Barton (Sep 19 2020 at 11:42):

So, at this point we could apply what I guess would be the standard interpretation of this, and introduce a new sample space Ω=Ω×{heads,tails}\Omega' = \Omega \times \{\mathsf{heads}, \mathsf{tails}\} with the product measure, which comes with a measure-preserving projection π:ΩΩ\pi : \Omega' \to \Omega.

Reid Barton (Sep 19 2020 at 11:44):

Now, wherever we had XX before, we replace it by X:=Xπ:ΩRX' := X \circ \pi : \Omega' \to \mathbb{R}, and apply theorems that say things like (because π\pi is measure-preserving) the expectation of XX' is the same as the expectation of XX, and so on, and so it doesn't really matter whether we talk about XX or XX'.

Kevin Buzzard (Sep 19 2020 at 11:44):

Aah I see, you've said enough. Thanks.

Kevin Buzzard (Sep 19 2020 at 11:45):

In my brain, Y somehow operates in a different universe, but you have to take the product of these universes if you want to relate Y to stuff going on in another universe.

Reid Barton (Sep 19 2020 at 11:52):

This sheaf theory stuff is appealing to me because this "you might need to pull back everything in sight along a suitable map (cover) ΩΩ\Omega' \to \Omega, and then a thing will exist" is exactly what happens in the interpretation of the existential quantifier in the sheaf semantics, so you can use it to interpret the existential statement "there exists a random variable independent of XX with such-and-such distribution".

Reid Barton (Sep 19 2020 at 12:07):

So, to summarize, trying to find an Ω\Omega'''' large enough that the whole argument works is sort of like trying to find an ε\varepsilon'''' small enough that the whole argument works, except instead of an inequality ε<<ε\varepsilon'''' < \cdots < \varepsilon you have a map ΩΩ\Omega'''' \to \cdots \to \Omega to keep track of, which is what makes it into a category theory problem and not an order theory one.
(ε\varepsilon is really controlling a set (xε,x+ε)(x - \varepsilon, x + \varepsilon) around xx, so there are the same maps in the ε\varepsilon case too, but everything is pinned down as an open subset of some ambient space.)

Kevin Buzzard (Sep 19 2020 at 12:42):

Thanks for this very coherent explanation.

Reid Barton (Sep 19 2020 at 12:51):

You can find other analogies in algebra--when you "adjoin an indeterminate" YY (~ an independent random variable) to a ring RR then the map RR[Y]R \to R[Y] that you implicitly apply everywhere is really pulling back "functions" on Ω=SpecR\Omega = \mathrm{Spec}\, R along the cover Ω=SpecR[Y]Ω\Omega' = \mathrm{Spec}\, R[Y] \to \Omega.

Reid Barton (Sep 19 2020 at 12:56):

And routinely regarding elements of RR (say R=k[X]R = k[X]) as also being elements of R[Y]=k[X,Y]R[Y] = k[X, Y] is something we do in algebra, that can be similarly annoying for formalization.

Jacques Carette (Sep 19 2020 at 13:03):

Mathematicians routinely elide obvious canonical injections. What formalization reveals is that, at least using current technology, these tend to chain badly. What I mean is that inserting 1 of these is no problem, but in general situations when you have to insert 10 of them in a working environment with 100s of potential coercions (aka canonical injections) in scope, it all grinds to a halt.

Reid Barton (Sep 19 2020 at 13:11):

Right, and what maybe doesn't transfer between otherwise analogous domains is that perhaps in algebra, say, you pretty much only ever insert a single coercion RR[Y]R \to R[Y], while in probability theory you need to make a whole bunch of these extensions as you carry out some complex argument. (I'm not saying the situation is necessarily this way for these particular areas, though my guess would be that this is at least somewhat correct.)

Jacques Carette (Sep 19 2020 at 13:15):

But even in Algebra, you want to be careful about too much polymorphism. R[X] has properties that R[X,Y] does not have. So it would be a disaster if you use some term that is inferred to be in R[X] at one point, prove something using that extra property, then push it to R[X,Y] and have that property come along for free. This kind of laisser-faire leads to real bugs in computer algebra systems.

Kevin Buzzard (Sep 19 2020 at 14:02):

I am confident in the ability of the mathematicians using polynomials here that they are able to know when they are working in R[X] and when they are working in R[X,Y]. I absolutely agree that this informal "set-theoretic" world which mathematicians drop into when they are considering a collection of e.g. subrings of a ring is a very powerful intuition, but it cannot always be made to work, e.g. when you don't know how big your universe needs to be at the beginning of the argument (it might involve adjoining n indeterminates later, where n is a number which will only come out of a later calculation). If you don't know how big you want to be when you start, you're forced to build up rather than go down to subsets. The first time I discovered this I generated some huge thread in #maths called something like "algebra is not scaling for me". Since then I've understood the situation much better and am hoping that Kenny's is_scalar_tower tricks will help me in that particular situation.

Jacques Carette (Sep 19 2020 at 15:29):

This is the kind of knowledge that I'd love to find a way to capture. Right now, it is too ephemeral. Once the principled way of dealing with some fragile situation is found, especially when the principled way of doing things is really no harder than the 'sloppy' way, it needs to be highlighted. I already love it when I see lots of mathematicians here talk about API and the like. Of course, mathematicians have been using the equivalent of APIs for a long time (what's a Ring after all?), what isn't as well known is that "theories" as a whole have APIs too. And that not all APIs scale equally well. Bill Farmer and I have written about this in our papers High Level Theories from CICM 2008, and with Michael Kohlhase in Realms: A Structure for Consolidating Knowledge about Mathematical Theories at CICM 2014.

Patrick Massot (Sep 19 2020 at 15:31):

In principle there are formalization papers for disseminating this knowledge.

Jacques Carette (Sep 19 2020 at 15:40):

I agree, but I'm afraid that a lot of little gems would never end up being published. There is no good "Formalization Letters" kind of journal for 1-2 page papers. Both math and CS do themselves a disservice by forcing publishable chunks to be so large. [I hate slicing-and-dicing, don't get me wrong. A 20 page paper that has 1 page's worth of new material should never get published as is. If it were a <= 2 page paper, that's a different thing.]

Kevin Buzzard (Sep 19 2020 at 16:33):

We have Comptes Rendus but this is often for research announcements of larger stuff. I once disproved a conjecture of some fairly well-known guy, and the result could easily be described in just a few pages, so I wrote that up and sent it to Comptes Rendus and it got accepted.

Martin Zinkevich (Sep 19 2020 at 16:35):

To get back to Reid's example with a coin flip and a random variable.

In general, the approach used is that we assume that a single probability space exists, and then we define whatever functions necessary from it.

The coin flip could either be defined as an event or a random variable from P to bool.measurable_space.

C:P r bool.measurable_space
X:P r (borel real)

Then, one can define

Y=(indicator (C =r true)) * X

Here,C =r true is the event that C equals true. The indicator operator is a random variable that equals 1 if (C =r true), and zero otherwise. This is a very common tool we use in machine learning, and it is useful for things such as "how many examples are classified correctly?".
random_variable_independent_pair C X indicates that these two variables are independent.

So, here is an example of what we could write using probability_space.lean and real_random_variable.lean above:

lemma indicator_product {Ω:Type*} {P:probability_space Ω} {X Y:P r borel nnreal} {C:P r bool.measurable_space}:
  random_variable_independent_pair X C 
  Y = (indicator (rv_eq C (const_random_variable tt))) * X 
  E[Y] = Pr[rv_eq C (const_random_variable tt)] * E[X] := sorry

rv_eq C (const_random_variable tt) is basically (C =r true).

There are some caveats: random_variable_independent_pair technically requires two random variables of equal type right now, but there is no
reason that the codomains need to be equal. Obviously, a better equality event notation would be nice.

But, from a mathematical perspective, this is very close to what you are talking about.

Johan Commelin (Sep 19 2020 at 16:43):

@Martin Zinkevich pro tip: #backticks

Kenny Lau (Sep 19 2020 at 22:08):

@Reid Barton Is Giry monad a solution to the aforementioned problem? If not, what can solve it?

Adam Topaz (Sep 19 2020 at 22:32):

With the Giry monad the (measure associated Y) can be written with do notation approx like this

do a <- Bernoulli,
  b <- X,
  return a * b

I'm not really sure what problem you're referring to @Kenny Lau

Kenny Lau (Sep 19 2020 at 22:33):

Reid Barton said:

So, now I guess we have a problem because there is literally no "random variable" Y:ΩRY : \Omega \to \mathbb{R} which has the correct relationship to XX (if I failed to set up the situation correctly, replace it with a better one :upside_down:).

Kenny Lau (Sep 19 2020 at 22:33):

this problem

Adam Topaz (Sep 19 2020 at 22:33):

And expected values should be thought of as the structure map for an algebra of the Giry monad

Reid Barton (Sep 19 2020 at 23:01):

Well, I'm assuming the proof really wants to talk about random variables and not just their distributions, which is what the Giry monad is about.

Adam Topaz (Sep 19 2020 at 23:17):

From the point of view of the Giry monad, real valued random variables are "just" probability distributions on the reals which happen to be pushforwards of some distribution under some measurable map. The problem is the question of how to reason about such things...

Adam Topaz (Sep 19 2020 at 23:18):

BTW, I really like the sheaf theoretic point of view in the paper that @Reid Barton mentioned.

Lars Ericson (Nov 14 2020 at 16:40):

I am new to Lean, please forgive me if this question is naive or already has an answer.

Let Ω\Omega be a nonempty set, the sample space.

Let set F\mathcal F of subsets of Ω\Omega be a σ\sigma-algebra so that

  • ΩF\Omega \in \mathcal F
  • ΩFF\Omega \setminus F \in \mathcal F if FFF \in \mathcal F
  • n=1FnF\bigcup_{n=1}^{\infty} F_n \in \mathcal F if all FiFF_i \in \mathcal F

Let P:F[0,1]P: \mathcal F \rightarrow [0,1] be a probability measure so that

  • P(Ω)=1P(\Omega) = 1
  • P(ΩF)=1P(F)P(\Omega \setminus F) = 1-P(F)
  • P(n=1Fn)=n=1P(Fn)P(\bigcup_{n=1}^{\infty} F_n) = \sum_{n=1}^\infty P(F_n)

Let's call the triple S=(Ω,F,P)S = (\Omega, \mathcal F, P) a probability space. Let's say SS has type TT, S:TS: T. How do I express TT in Lean?

Kevin Buzzard (Nov 14 2020 at 17:08):

You could make a probability_space structure. I've never really engaged with the measure theory library in Lean but it would not surprise me at all if a lot of these definitions are already there, for example I'm pretty sure measure spaces or measurable spaces are there

Bryan Gin-ge Chen (Nov 14 2020 at 17:12):

The docs for measure_theory.measure_space in mathlib (and other files in measure_theory) might be helpful.

Adam Topaz (Nov 14 2020 at 17:29):

docs#measure_theory.probability_measure is in mathlib, so it would be easy to create such structures with current mathlib

Adam Topaz (Nov 14 2020 at 17:32):

I would take a look at how, for example, docs#category_theory.bundled is defined and try to mimic (or even use?) that

Lars Ericson (Nov 15 2020 at 15:19):

Thanks for helping me think about this. I am coming to Lean with the hopes of being able to resolve some questions I posted on StackExchange over the last year or so, namely (in rough order of generality/complexity):

Lean is the perfect fit for these questions. I'm a beginner to Lean so I will be working my way through the book and when I'm done with that I will try to transfer the above topics into a Lean setting.

These questions arose while I was reading KPS1994 which raises the notion of a probability space on page 3 and then quickly drops it in a constructive sense thereafter. If you read the comments in the StackExchange posts, most mathematicians don't seem to think it's really worth thinking about, i.e. that "probability space" is kind of a throwaway concept. I've been struggling with it for a year. I even went to the extent of trying to formalize the types in sympy. The result was very unsatisfying. Lean to the rescue! Probability space is actually a fairly slippery concept to pin down and almost everybody takes it for granted. I am looking forward to a satisfying solution in Lean.

Ultimately this will also involve having a proper representation of the real numbers as Dedekind cuts. I think this is already built in. A rigorous fundamental definition of the construction of reals is needed I think to provide a rigorous definition of things like probability measure. Some of that is already done in Lean but I think there is more to do.


Last updated: Dec 20 2023 at 11:08 UTC