## Stream: Is there code for X?

### Topic: Birthday Problem

#### Hunter Monroe (Apr 13 2021 at 05:21):

The birthday theorem is one of the 100 theorems not in Lean. To include this in lean, would it be sufficient to show 365/365 x 364/365 x ... x 343/365 < 1/2? In other theorem provers, the theorem is stated as follows. So it takes as input the number of people s=23 and the number of days in the year t=365, and then generically gives a function of s and t, such that the inequality holds for those particular s and t.

HOL: |- !s t. s HAS_SIZE 23 /\ t HAS_SIZE 365
==> 2 * CARD {f | f IN (s -> t) /\
?x y. x IN s /\ y IN s /\ ~(x = y) /\ f(x) = f(y)}
>= CARD (s -> t)

Isabelle: assumes "card S = 23" "card T = 365"
shows "2 * card {f ∈ S→\<^sub>E S T. ¬ inj_on f S} ≥ card (S →\<^sub>E T)"

Mizar: let l := picks 23 (enumerate 365) in
2 * length (filter collision l) > length l.

#### Johan Commelin (Apr 13 2021 at 05:24):

Personally, I would want to see it phrased in terms of probability theory.

#### Johan Commelin (Apr 13 2021 at 05:24):

And mathlib doesn't have much in that direction, unfortunately.

#### Johan Commelin (Apr 13 2021 at 06:06):

@Hunter Monroe Actually, I don't think it really has to be probability theory. But it should be a statement that is recognizably the Birthday problem, before any maths has been done.

#### Johan Commelin (Apr 13 2021 at 06:14):

import data.multiset

namespace multiset

variables {α : Type*} [decidable_eq α]

def likely_collision (s : multiset (multiset α)) : Prop :=
2 * (s.filter nodup).card < s.card

theorem birthday : likely_collision ((multiset.range 365).powerset_len 23) :=
sorry

end multiset


#### Johan Commelin (Apr 13 2021 at 06:16):

Or maybe

def likely_collision (s : multiset (multiset α)) : Prop :=
(s.filter nodup).card < (s.filter $λ a, ¬ nodup a).card  #### Johan Commelin (Apr 13 2021 at 06:42): Hmm, it seems that multiset.powerset_len doesn't do what I thought it does. #### Johan Commelin (Apr 13 2021 at 06:59): I think this version is at least math-correct: import data.vector2 import data.fintype.card open fintype (card) variables (α : Type*) [decidable_eq α] [fintype α] def likely_collision (n : ℕ) : Prop := 2 * (finset.univ.filter$ λ l : vector α n, l.to_list.nodup).card < card (vector α n)

theorem birthday : likely_collision (fin 365) 23 :=
begin
unfold likely_collision,
rw [card_vector, fintype.card_fin],
sorry
end


#### Bhavik Mehta (Apr 13 2021 at 13:26):

@Matt Kempster has also been working on this, and I think he's made a reasonable amount of progress

#### Matt Kempster (Apr 13 2021 at 14:28):

Yes I have, should I post what I have so far? It's very WIP but I think it's on the right track. It's not probability theory based at all - just counting numbers of injective functions from fin 23 to fin 365 and dividing by all functions from fun 23 to fin 365. This is basically how Metamath does it

#### Joseph Myers (Apr 14 2021 at 00:07):

The value of fintype.card (α ↪ β) (for arbitrary [fintype α] [fintype β]) seems like it belongs in mathlib proper, with only the evaluation for fin 23 and fin 365 going in the mathlib archive. (The denominator is given by fintype.card_fun.)

#### Matt Kempster (Apr 14 2021 at 05:20):

Yep, very on the same page there. I'll push to a branch soon (tomorrow if I find time), but my work is pretty messy so far. Look forward to input and advice!

#### Hunter Monroe (Apr 15 2021 at 17:27):

We would just need to show (finset.filter (λ (l : vector (fin n) k), l.to_list.nodup) finset.univ).card = n* (left hand side but with n-1, k-1)

#### Yakov Pechersky (Apr 15 2021 at 17:28):

You can use list.of_fn to skip the vector

#### Yakov Pechersky (Apr 15 2021 at 17:31):

You can also probably use things about powerset to automatically get nodup subsets

#### Yakov Pechersky (Apr 15 2021 at 17:32):

docs#finset.card_powerset_len

#### Yakov Pechersky (Apr 15 2021 at 17:32):

docs#finset.powerset_len_eq_filter

#### Yakov Pechersky (Apr 15 2021 at 17:35):

You could generalize to any pi fintype, via something like docs#fintype.card_pi and docs#fintype.card_fun

#### Yakov Pechersky (Apr 15 2021 at 17:44):

Ah I see Joseph said this already

#### Eric Rodriguez (Apr 22 2021 at 16:57):

So I've got a nearly finished version, actually, and just thought I should search on Zulip to check if anyone else has done it, and saw this... Sorry Matt!

#### Mario Carneiro (Apr 22 2021 at 16:58):

well, if a bunch of people work on different problems eventually two people will work on the same problem :grinning:

#### Eric Rodriguez (Apr 22 2021 at 16:58):

One thing that I found is that norm_num is slow at evaluating stuff like 365!/(365-23)!, so I made this class to help; do people think it belongs in mathlib proper?

def reverse_fac : ℕ → ℕ → ℕ
| 0 0       := 1
| 0 (k + 1) := 1
| (n + 1) 0 := 1
| (n + 1) (k + 1) := (n + 1) * reverse_fac n k


#### Eric Rodriguez (Apr 22 2021 at 16:59):

(and basically one lemma guaranteeing that it truly is equal to n! / (n-k)!)

#### Mario Carneiro (Apr 22 2021 at 17:00):

Yeah, if you write 365!/(365-23)! that's what norm_num will evaluate

#### Mario Carneiro (Apr 22 2021 at 17:00):

if you want to evaluate the product of 23 terms instead of 365 you should use a function that evaluates that way

#### Eric Rodriguez (Apr 22 2021 at 17:02):

this is what reverse_fac does^^ just asking whether it's worth putting in mathlib or just keep it in the archive file

#### Mario Carneiro (Apr 22 2021 at 17:04):

isn't there already something like this?

#### Mario Carneiro (Apr 22 2021 at 17:04):

they are called rising and falling factorials IIRC

#### Mario Carneiro (Apr 22 2021 at 17:05):

Oh there is docs#pochhammer but it is only for polynomials

#### Eric Rodriguez (Apr 22 2021 at 17:05):

docs#ring_theory.polynomial.pochhammer, maybe? edit: sniped :b

#### Eric Rodriguez (Apr 22 2021 at 17:06):

i'll try retrofit it with the eval theorems it has there

#### Eric Rodriguez (Apr 22 2021 at 17:07):

because it ended up _not_ being trivial as expected and instead I ended up with this disaster:

theorem eval_reverse_fac : ∀ n k : ℕ, k ≤ n → n! / (n - k)! = reverse_fac n k
| 0 0       := by simp!
| 0 (k + 1) := by simp!
| (n + 1) 0 := λ _, by { rw [nat.sub_zero, nat.div_self], unfold reverse_fac, apply factorial_pos }
| (n + 1) (k + 1) := begin
intro h, replace h := le_of_succ_le_succ h,
simp only [reverse_fac, factorial, succ_eq_add_one, succ_sub_succ_eq_sub],
rw ←eval_reverse_fac _ _ h,

-- at this point, we have two slightly different divisions. we fix them both

apply mul_left_cancel' (factorial_ne_zero (n - k)),
obtain ⟨a, ha⟩ := factorial_dvd_factorial (sub_le n k),
rw nat.mul_div_cancel',
swap, {use a * (n + 1), rw ha, ring},
symmetry,
calc (n - k)! * ((n + 1) * (n! / (n - k)!)) = (n + 1) * ((n - k)! * (n! / (n - k)!)) : by ac_refl
... = (n + 1) * n! : by rw nat.mul_div_cancel' ⟨a, ha⟩
end

-- If I don't put the skip it errors?! (simp won't do this on its own btw)
@[simp] lemma rev_fac_zero (m : ℕ) : reverse_fac m 0 = 1 := by {cases m; simp!, skip}

@[simp] lemma zero_rev_fac (m : ℕ) : reverse_fac 0 m = 1 := by {cases m; simp!, skip}


#### Eric Rodriguez (Apr 22 2021 at 17:07):

but yeah factorials API isn't super well-developed

#### Mario Carneiro (Apr 22 2021 at 17:11):

Note that norm_num doesn't actually know how to compute factorials. It's just rewriting with the simp lemmas for factorial

#### Mario Carneiro (Apr 22 2021 at 17:12):

if you use simp lemmas for reverse_fac it will be just as good

#### Mario Carneiro (Apr 22 2021 at 17:13):

The reason your proof is a disaster is because you used - and / on nat

#### Mario Carneiro (Apr 22 2021 at 17:13):

you should rewrite the theorem to use + and *

#### Eric Rodriguez (Apr 22 2021 at 17:13):

ahh, so instead I could do it as (n + k)! = n! reverse_fac blah

#### Eric Rodriguez (Apr 22 2021 at 17:15):

is it worth expressing the card as a division still or as this function? just thinking as it's going in mathlib

#### Mario Carneiro (Apr 22 2021 at 17:19):

for the birthday problem itself you should use a division for presentation purposes

#### Mario Carneiro (Apr 22 2021 at 17:19):

but once you have proven the lemma with + and * it's not hard to postprocess it to the - and / version

#### Eric Rodriguez (Apr 22 2021 at 17:21):

I mean for the card_embedding function that will be kind of like docs#fintype.card_fun

#### Matt Kempster (Apr 23 2021 at 01:00):

Eric Rodriguez said:

So I've got a nearly finished version, actually, and just thought I should search on Zulip to check if anyone else has done it, and saw this... Sorry Matt!

No worries! I actually would say I was only ~30% done, and was starting to hit some major road blocks. I look forward to seeing your proof to compare & contrast with my skeleton!

#### Eric Rodriguez (Apr 25 2021 at 13:15):

It's done! The birthday problem has now been reduced to a call to norm_num :D Cleaning up the code, PR incoming soon :)

#### Eric Rodriguez (Apr 25 2021 at 14:24):

Hmm, how to make Lean give up on code versions that I've already overwritten? For example, say I'm editing a big file, and I add an import, it'll start screaming at me that data.finset.basi doesn't exist, whilst of course I had typed data.finset.basic and it was just sort of delayed and doing old code, before trying again

Restart vs code?

#### Eric Rodriguez (Apr 25 2021 at 14:35):

hmm, wasn't too helpful. I'll just let CI deal with it, I'm only copy-pasting stuff right now anyways

#### Eric Rodriguez (Apr 25 2021 at 14:36):

I changed stuff in embedding.lean and fintype.lean so I'm assuming it needs to recheck everything again, lol

#### Eric Rodriguez (Apr 25 2021 at 15:04):

There's still a couple things to be done, but all the code itself is there ready to review; the other things are basically admin

Last updated: May 16 2021 at 05:21 UTC