Zulip Chat Archive

Stream: new members

Topic: Looking for how to contribute to mathlib


Torey Hilbert (Mar 11 2022 at 05:07):

Hello, I am a student in statistics and I'm interested in trying to contribute to mathlib for topics in either probability theory or theoretical statistics (be that Markov Chains / Stochastics stuff, or stuff about distributions) but I'm not what to do.

I found in src/measure_theory/probability_mass_function/constructions.lean that currently only three-ish distributions are present. Is a desireble first contribution adding a few more common distributions there, and if so, how would I decide which ones? Thank you so much!

Torey Hilbert (Mar 11 2022 at 05:09):

Also, it's unclear why it's in the measure theory folder when there's a separate folder for proabability, but I figure there's a reason

Johan Commelin (Mar 11 2022 at 06:03):

Welcome! The folder issue might very well be due to history...

Johan Commelin (Mar 11 2022 at 06:04):

There's currently stuff happening with martingales.

Johan Commelin (Mar 11 2022 at 06:04):

@Torey Hilbert Do you already have some lean code somewhere, that you would like to contribute?

Torey Hilbert (Mar 11 2022 at 06:06):

Not yet, the contributing to mathlib guide suggests asking the Zulip chat before and during writing the code so I figured I wouldn't embark on something if the pulse of the project implied it was a bad idea etc.

Torey Hilbert (Mar 11 2022 at 06:06):

Thank you for the response btw!

Johan Commelin (Mar 11 2022 at 06:09):

Yeah, discussing here early on is certainly a good idea!

Johan Commelin (Mar 11 2022 at 06:09):

How much experience do you have with Lean? Did you follow the tutorials? Or #nng, or something else?

Torey Hilbert (Mar 11 2022 at 07:10):

Ope sorry for the slow response. I don't have a ton of experience with Lean - I played the natural numbers game a while ago and I'm working through both the tutorials and the measure theory code directly.

Kevin Buzzard (Mar 11 2022 at 07:47):

Martin Hairer once told me that he couldn't take Lean seriously until it had an API for the Gaussian distribution

Johan Commelin (Mar 11 2022 at 07:52):

Which seems like a fair point (if you s/Lean/mathlib/)

Vincent Beffara (Mar 11 2022 at 08:15):

You could try to do e.g. the "hard" direction of Borel-Cantelli (see docs#measure_theory.measure_limsup_eq_zero for the "easy" direction) or start filling in something else in undergrad.yaml, say the laws of large numbers? It could be more motivating than adding the definitions of the usual distributions!

Rémy Degenne (Mar 11 2022 at 08:36):

Welcome! You may want to have a look at this topic: https://leanprover.zulipchat.com/#narrow/stream/113489-new-members/topic/Getting.20started.20contributing.20to.20probability.20theory
Since you mentioned markov chains: a useful and interesting direction would be to define markov kernels and conditional probability distributions (https://en.wikipedia.org/wiki/Regular_conditional_probability). The book "foundations of modern probability" by Kallenberg looks like a good reference for that.

Yaël Dillies (Mar 11 2022 at 08:48):

There's the binomial PMF in branch#hedetniemi.

Kalle Kytölä (Mar 11 2022 at 23:26):

I agree with @Rémy Degenne that kernels as in Kallenberg looks like a great approach to Markov processes (at least discrete time, but probably beyond).

We have had some discussion that @Jason KY. might do Markov processes in the summer (I'm very happy that he'll be with us at Aalto during the summer :tada:). So to coordinate who is working on what in this area is a good idea. To be clear, I will just be extremely happy when anyone puts Markov processes in mathlib!

Kalle Kytölä (Mar 11 2022 at 23:30):

I also agree with Vincent (how could I not).

But having various common distributions with more or less similar APIs added to mathlib would be very valuable! And it would not just be about giving the definitions of the distributions: probably one would want to prove a few properties about each distribution to stress test the implementation of the definition. (Say, provide some lemmas about characterizations, at least the cdf and perhaps characteristic function and more interestingly specific ones for each distribution, lemmas about properties such as expected value and variance, etc.). Also many distributions would probably be defined by specifying either their densities or characteristic functions (the latter of which admittedly don't exist yet), so they would serve as tests for parts of the existing library (integration, perhaps Radon-Nikodym theorem, perhaps the still-non-existent characteristic functions, ...). Depending on the background, all of this this might be quite nice!

Torey Hilbert (Mar 12 2022 at 04:44):

This is very helpful information! I'll try to work towards the Markov Kernels for now, but I'll probably need some guidance and won't hesitate to ask :+1:

Torey Hilbert (Mar 12 2022 at 04:45):

About the characteristic functions thing, a comment in one of the files mentions that we are waiting on Fourier Analysis results for that. Is it worth pursing just directly proving the relevant results we'd need about characteristic functions instead of waiting?

Kevin Buzzard (Mar 12 2022 at 09:13):

@Torey Hilbert for a recent overview of a related area you might like @Jason KY. 's London Learning Lean talk here.

Vincent Beffara (Mar 12 2022 at 09:20):

Perhaps while waiting for Fourier Analysis it would make sense to start with discrete distributions on the integers and to define the generating function instead.


Last updated: Dec 20 2023 at 11:08 UTC