Zulip Chat Archive
Stream: mathlib4
Topic: Undergrad looking for Probability Theory formalization ideas
huaizhangchu (Sep 28 2025 at 07:13):
Hi everyone,
I'm a final-year undergraduate student very interested in contributing to mathlib as part of my graduation project. My academic background is in Mathematics, with a strong focus on Probability Theory.
I've been exploring the Mathlib/Probability module and find it fascinating. I'm looking for a project idea that:
- Is a foundational proof or construction in probability theory.
- Is suitable in scope and difficulty for an undergraduate thesis (i.e., not cutting-edge research, but still a substantial piece of work).
- Is not yet fully covered or could be significantly extended within mathlib.
I've done some initial research and here are a few areas I've considered, based on what I believe might be gaps or areas for extension (though I might be mistaken!):
Specific Concentration Inequalities: Perhaps more direct proofs for specific simple cases of Hoeffding's or Chernoff Bounds, or other foundational inequalities beyond Markov/Chebyshev.
Alternative/Stronger WLLN/SLLN versions: For instance, Etemadi's Strong Law of Large Numbers, or WLLN in different Lp spaces, if not already covered comprehensively.
Basic Poisson Process constructions/properties: Formalizing the definition of a Poisson process and proving its fundamental properties (e.g., independent/stationary increments, exponential inter-arrival times).
Deeper aspects of convergence relations: Exploring more detailed proofs or equivalences between different modes of convergence (e.g., probability, a.e., Lp, in distribution), possibly involving Egorov's theorem in probability contexts.
Rao-Blackwell Theorem: Formalizing this important result in statistical inference, which relies heavily on conditional expectation.
I'm open to other suggestions as well! My goal is to make a meaningful and useful contribution to mathlib while learning more about formal verification.
Could anyone provide guidance on:
Which of these (or other) areas would be most suitable and impactful for an undergraduate project?
Are there existing mathlib TODO s or discussions related to these topics?
Where would be a good starting point or relevant mathlib files to investigate further for a chosen project?
Thank you for any guidance – I'm excited to learn and contribute to this amazing project!
Rémy Degenne (Sep 28 2025 at 07:37):
Hi! This is not really an answer to the question of what you should work on, but a few remarks about what's already there or not:
- Our SLLN is Etemadi's: ProbabilityTheory.strong_law_ae
- Egorov's theorem is here: MeasureTheory.tendstoUniformlyOn_of_ae_tendsto
- Rao-Blackwell would be nice, but note that we don't have a definition of sufficient statistic, so it would involve more work than just proving the theorem. I very recently started adding results about statistical estimation but there is almost nothing right now
- More concentration inequalities: yes, please! Bernstein inequality perhaps? I did not think about it much.
Kevin Buzzard (Sep 28 2025 at 09:51):
Egorov's theorem was also originally written as a final year undergraduate project :-) (but by someone who already had several years of Lean experience).
huaizhangchu (Sep 28 2025 at 10:50):
Rémy Degenne 說:
Hi! This is not really an answer to the question of what you should work on, but a few remarks about what's already there or not:
- Our SLLN is Etemadi's: ProbabilityTheory.strong_law_ae
- Egorov's theorem is here: MeasureTheory.tendstoUniformlyOn_of_ae_tendsto
- Rao-Blackwell would be nice, but note that we don't have a definition of sufficient statistic, so it would involve more work than just proving the theorem. I very recently started adding results about statistical estimation but there is almost nothing right now
- More concentration inequalities: yes, please! Bernstein inequality perhaps? I did not think about it much.
Hi Rémy Degenne,
Thank you so much for this incredibly helpful and clear feedback—this saves me so much time from duplicating work or going down paths that are already covered, and it’s really clarifying for narrowing down my project ideas!
I really appreciate you pointing out that Etemadi’s SLLN and Egorov’s theorem are already in mathlib—I’d totally missed those in my initial exploration, so that’s a key correction. That helps me cross those off my list and focus on gaps.
Your suggestion of the Bernstein inequality is really exciting—I’d mentioned concentration inequalities as a top interest, and this feels like a perfect fit. Could you share if there are any existing results in mathlib (e.g., Markov/Chebyshev bounds, moment generating function frameworks) that I could build on for this? Or are there specific files or lemmas I should dive into first to align with mathlib’s existing probability infrastructure?
Finally, if you or others in the community have thoughts on priority among these feasible options , that would be invaluable. And are there any TODOs, GitHub issues, or Zulip discussions related to these topics I should look up to get more context?
Thanks again for taking the time to share this—your input has already made my project planning so much more focused. I really appreciate the guidance!
huaizhangchu (Sep 28 2025 at 10:51):
Kevin Buzzard 說:
Egorov's theorem was also originally written as a final year undergraduate project :-) (but by someone who already had several years of Lean experience).
That’s fascinating to hear! It’s great to know these foundational formalizations do fit into undergrad projects—exactly the kind of sign I was hoping for. I totally get the Lean experience gap, though—right now I’m working through the Lean tutorials and poking around the probability theory files in mathlib to get used to how proofs are structured there. Hearing this just makes me more focused on putting in that groundwork first. Really appreciate you telling me that!
Last updated: Dec 20 2025 at 21:32 UTC