Zulip Chat Archive

Stream: new members

Topic: Getting started contributing to probability theory

Rish Vaishnav (Jan 17 2022 at 17:20):

(not exactly new here but I'm somewhat new to mathlib so posting in this stream)

Hello! I'm a Master's student from UC San Diego studying with professor Sean Gao. I'm interested in helping out with the formalization of probability theory in the direction of Bayes Nets (and ultimately reinforcement learning theory), though I'm in no rush and willing to help out with any amount of groundwork needed beforehand. I wasn't a mathematics undergrad (more on the CS side), so I'm not technically a "mathematician", but as a grad student I'm certainly more interested in the formal side of things.

I'm pretty much up to speed with the existing formalizations in src/probability (as well as the measure theory it's built on). I see some very interesting work being done recently with martingales, though this seems fairly independent from what I'd like to formalize (CMIIW). I suppose I have two main questions:

  1. Are there any streams I should join so I can stay up to speed with what others are working on here?
  2. Are there any "primary references" that we're basing our formalizations on? If so, what are the next things we're planning to formalize? I already have a pretty good idea of where I want to start, though I'm not sure if I'd be duplicating some existing work done in src/measure_theory or elsewhere.

Rémy Degenne (Jan 17 2022 at 18:32):

Hi! The main work on probability currently is this project @Jason KY. and I are working on: https://github.com/leanprover-community/mathlib/projects/15 . This is about martingales, as you mentioned. The goal is to prove Doob's various theorems and inequalities for martingales. As a bonus, it will give us a law of large numbers. Personally, I would like to get to concentration inequalities (Hoeffding's inequality for example), which are one of the main simple tools I use in my usual work (I work on bandit theory, so not so far from reinforcement learning). That means adding facts about moments and moment generating functions, linking the independence file (which is not used much yet) and those new concepts... There is much to do!

You can join us on that project if you'd like, or if you are more interested in Bayesian results, you might want to start by implementing things related to conditional probability. We don't even have a definition for that (although it can be simply defined as a rescaling of measure.restrict). Bayes theorem would be a nice addition!

Rémy Degenne (Jan 17 2022 at 18:49):

And to answer the main questions:

  • there are no streams you need to join. There are not many of us working on this yet, so we did not feel the need for a separate stream.
  • we don't follow any book in particular. For the martingales project we picked a goal (Doob's maximal inequalities), then looked at various proofs to identify the other concepts we should implement first. For now we are implementing concepts so basic that any probability book on martingales has standard proofs for everything. Pick your favourite.

Rish Vaishnav (Jan 17 2022 at 18:58):

Thank you very much! I think I'll get started right away with conditional probability (definition, Bayes' theorem, and some basic independence results) to get my feet wet, and eventually circle back around to Martingales as the law of large numbers seems like quite a fundamental result.

Rish Vaishnav (Jan 19 2022 at 15:26):

BTW if any undergrads (or anyone else) wanted to work on this, do let me know (it seems like a pretty good place to start). I'm realizing I've got a bit more editor tweaking to do but I think I'll have something ready in the next couple days, and I'm just as happy to review as I am to contribute!

Kalle Kytölä (Jan 19 2022 at 19:36):

Hello! Great to hear of new developments in mathlib's probability!

As you already noticed, conditional expectations are there. :tada:

In my opinion, a rather complementary view on condition is (regular) conditional distributions, and my feeling would be that mathlib should eventually have these defined via disintegration / probability kernels (as in e.g. Kallenberg "Foundations of Modern Probability", Chapter 8), which works well in Polish spaces. I don't know if that was your plan already, and if it wasn't I'm not necessarily suggesting it as the first thing. Now I mainly wanted to gauge others' opinions about whether there is agreement on the mathlib-generality and approach in such topics. (Since probability didn't make it to Bourbaki, the mathlibists will have to make some decisions without a canonical precedent.)

Kalle Kytölä (Jan 19 2022 at 19:37):

The little that I know of Bayesian things, I nevertheless suspect that it is conditioning in the sense of conditional distributions rather than conditional expectations that is relevant; is that even correct? In that case I would suspect that whether the above mentioned generality is needed would depend on whether one works in a discrete setup or say in Polish spaces. In any case, I will be happy to see developments in these directions. Thanks in advance if you keep this Zulip posted about the developments!

Kalle Kytölä (Jan 19 2022 at 19:38):

(As for background: I'm interested in probability and formalization, but at least the first half of this semester looks like I'm not really doing almost any Lean apart from a few examples in an undergraduate metric spaces course. As soon as I have the chance again, I will try to take responsibility and finish the portmanteau theorem and basics of convergence in distribution that have only been approximately 10% PR'd so far.)

Rish Vaishnav (Jan 22 2022 at 00:44):

Alright I have some (very basic) first definitions/theorems for conditional probability up at: https://github.com/rish987/lean-bayes.

I may not be the best person at the moment to work on what Kalle has mentioned (though it certainly is interesting). This should at least be enough for me to get started on conditional independence next. In general I plan to formalize Probabilistic Reasoning in Intelligent Systems by Judea Pearl, filling in any extra needed proofs as I go.

Let me know if what I have so far is anything close to what we had in mind. In particular let me know your thoughts on the type class wrapper I added for measurable sets. It has helped me clean things up a bit since having to explicitly pass around measurability conditions is quite cumbersome (especially when those are already implied by more specific conditions), but I suppose the value of this will become clearer as I formalize more.

Not sure if a PR is a better place to discuss the specifics, but I can make one if we prefer that (Github username rish987). And also I am new to formalizing so don't hold back on any criticisms!

Rish Vaishnav (Jan 28 2022 at 04:11):

Update: I have some basic definitions of conditional independence ready, and have gotten started on random variables, with a definition of joint and marginal distributions and the marginalization principle.

I think I'll make a PR as soon I've covered the rest of the basics and feel that things have settled a bit. In the meantime let me know if you notice anything that looks way off track!

Kalle Kytölä (Jan 29 2022 at 23:22):

Great, having good ways to talk about these would be very nice!

I don't have much to say and I haven't had time to carefully look at your code. But I did want to say I like that your choices contain a concrete answer to a question I had a long while ago about how to talk about the laws of random variables. Your use of docs#measure_theory.measure.map is the natural thing in my opinion, too. In the thread of the same question, there was some discussion of getting the dot notation to work with these push-forwards of measures, but as far as I understand, no one has done that yet. If it can be made work, I think it might be an improvement to how we talk about the laws of random variables in mathlib in general.

In any case, great to see that you are pushing mathlib's probability theory to the directions that it will work for Bayesian things.

Rish Vaishnav (Feb 13 2022 at 05:09):

The proofs for independence and conditional independence of random variables are finally sorry-free. There are certainly more basics to cover, but this already feels like a very solid foundation. I've got some academic logistics to take care of for the next few weeks but I'll get started right away on cleaning this up and PRing it piece by piece! And as always let me know if you have any thoughts/suggestions.

Eric Wieser (Feb 13 2022 at 10:03):

What's >>[] notation for there?

Rish Vaishnav (Feb 13 2022 at 15:16):

That's just for converting from one pi_subtype set to another, for example set (Π i : A, β i) to set (Π i : A ∪ B, β i). When I PR this I think I'll reframe everything in pi_subtype.lean as an extension to the library for docs#subtype.restrict.

Yaël Dillies (Feb 13 2022 at 15:21):

Maybe you shouldn't use Π i : A, β i, then. This sounds like subtype hell to me.

Rish Vaishnav (Feb 13 2022 at 15:40):

Perhaps, haha I didn't know we had an expression for this. For me though this was the most natural thing. If I'm using the marginal distribution on A, I don't want to have to convert a marginal assignment set (Π i : A, β i) to be over the full index type set (Π i : ι, β i) every time, that feels like a different kind of hell. I only want to have to define things for the marginalized variables. I could use set (Π i ∈ A, β i), but that would be pretty much equivalent to using a subtype, right?

Yaël Dillies (Feb 13 2022 at 16:22):

Rish Vaishnav said:

I don't want to have to convert a marginal assignment set (Π i : A, β i) to be over the full index type set (Π i : ι, β i) every time, that feels like a different kind of hell.

Why not? Do you know about the concept of junk values?

Yaël Dillies (Feb 13 2022 at 17:07):

Btw, your line 132 could use docs#forall₂_congr and your lines 143-144 could use docs#forall₄_congr.

Yaël Dillies (Feb 13 2022 at 17:11):

So the idea is that partial functions are a pain to work with (you constantly have to prove that your argument to a partial function lies inside the domain), so instead we turn functions total by making them return a junk value, typically a / 0 = 0.

Yaël Dillies (Feb 13 2022 at 17:15):

This makes stuff much easier to handle, and some theorems end up even "truer". For example, (a + b)/c = a/c + b/c without needing c nonzero, or less trivially Rolle's theorem doesn't require the function to be differentiable! (because the derivative has junk value 0).

Yaël Dillies (Feb 13 2022 at 17:17):

Most of your code is actually about those subtyping issues. If you make your functions total, then you can fully get rid of probability_theory.pi_subtype and greatly simplify most proofs.

Reid Barton (Feb 13 2022 at 17:19):

I really don't think this is a good approach in this situation.

Reid Barton (Feb 13 2022 at 17:19):

Or in most situations.

Reid Barton (Feb 13 2022 at 17:20):

But here it doesn't even seem to be applicable--since we are talking about a set of functions on a subset, we can just replace it with the set of all total functions that restrict to a member of the set on that subset.

Rish Vaishnav (Feb 15 2022 at 16:24):

Yaël Dillies said:

Most of your code is actually about those subtyping issues. If you make your functions total, then you can fully get rid of probability_theory.pi_subtype and greatly simplify most proofs.

I'm not sure this would simplify the probability_theory proofs that much. For one thing it would certainly complicate the definition

def marginal (mv : set ι) : measure (Π i : mv, β i) := joint μ (pi_subtype mv f)

(which is just measure.map (λ a i, (pi_subtype mv f) i a) μ). Though of course we won't know unless we try it, unfortunately I'm a bit pressed for time for the next few weeks. FWIW it hasn't really felt like hell yet, and I think it could be good to have a solid library for working with pi subtypes. Going to start PRing pieces of this, perhaps that will be a better place to discuss.

Reid Barton said:

But here it doesn't even seem to be applicable--since we are talking about a set of functions on a subset, we can just replace it with the set of all total functions that restrict to a member of the set on that subset.

Not sure this is the best way either -- I'd imagine in many cases when using joint distributions, you're already working with a subtype of indices from some type. If we were to do this then we'd have to deal with curried pis for the joint distribution and uncurried pis for the marginal, which sounds confusing by itself.

Rish Vaishnav (Feb 27 2022 at 17:13):

I've got two PRs up recently (#12279 and #12343), though it looks like I can't set the awaiting-review label -- would someone mind giving me that permission, rish987 on Github. Thanks!

Mario Carneiro (Feb 27 2022 at 17:15):

@Rish Vaishnav Invite sent. You will need to resubmit your PRs; for technical reasons we can only accept PRs originating from branches in the mathlib repo itself

Mario Carneiro (Feb 27 2022 at 17:16):

that is, push your branch to mathlib, and then reopen the PR for merging the branch into master

Rish Vaishnav (Feb 27 2022 at 17:17):

Thank you! Will do.

Jireh Loreaux (Feb 27 2022 at 17:46):

Is the reason for only accepting mathlib branches is bors and CI?

Kevin Buzzard (Feb 27 2022 at 18:17):

Yes, but CI is a deal breaker given the current state of the repo

Bryan Gin-ge Chen (Feb 27 2022 at 18:18):

The reason is CI, and more specifically that we want our infrastructure to upload oleans for PR branches so that reviewers can more easily check out the code.

(Bors is totally fine with PRs from external repos).

Jireh Loreaux (Feb 27 2022 at 20:37):

I figured, just wanted to make sure I understand. Thanks.

Last updated: Dec 20 2023 at 11:08 UTC