Zulip Chat Archive

Stream: Is there code for X?

Topic: Birthday Problem


Hunter Monroe (Apr 13 2021 at 05:21):

The birthday theorem is one of the 100 theorems not in Lean. To include this in lean, would it be sufficient to show 365/365 x 364/365 x ... x 343/365 < 1/2? In other theorem provers, the theorem is stated as follows. So it takes as input the number of people s=23 and the number of days in the year t=365, and then generically gives a function of s and t, such that the inequality holds for those particular s and t.

HOL: |- !s t. s HAS_SIZE 23 /\ t HAS_SIZE 365
==> 2 * CARD {f | f IN (s -> t) /\
?x y. x IN s /\ y IN s /\ ~(x = y) /\ f(x) = f(y)}
>= CARD (s -> t)

Isabelle: assumes "card S = 23" "card T = 365"
shows "2 * card {f ∈ S→\<^sub>E S T. ¬ inj_on f S} ≥ card (S →\<^sub>E T)"

Mizar: let l := picks 23 (enumerate 365) in
2 * length (filter collision l) > length l.

Johan Commelin (Apr 13 2021 at 05:24):

Personally, I would want to see it phrased in terms of probability theory.

Johan Commelin (Apr 13 2021 at 05:24):

And mathlib doesn't have much in that direction, unfortunately.

Johan Commelin (Apr 13 2021 at 06:06):

@Hunter Monroe Actually, I don't think it really has to be probability theory. But it should be a statement that is recognizably the Birthday problem, before any maths has been done.

Johan Commelin (Apr 13 2021 at 06:14):

import data.multiset

namespace multiset

variables {α : Type*} [decidable_eq α]

def likely_collision (s : multiset (multiset α)) : Prop :=
2 * (s.filter nodup).card < s.card

theorem birthday : likely_collision ((multiset.range 365).powerset_len 23) :=
sorry

end multiset

Johan Commelin (Apr 13 2021 at 06:16):

Or maybe

def likely_collision (s : multiset (multiset α)) : Prop :=
(s.filter nodup).card < (s.filter $ λ a, ¬ nodup a).card

Johan Commelin (Apr 13 2021 at 06:42):

Hmm, it seems that multiset.powerset_len doesn't do what I thought it does.

Johan Commelin (Apr 13 2021 at 06:59):

I think this version is at least math-correct:

import data.vector2
import data.fintype.card

open fintype (card)

variables (α : Type*) [decidable_eq α] [fintype α]

def likely_collision (n : ) : Prop :=
2 * (finset.univ.filter $ λ l : vector α n, l.to_list.nodup).card < card (vector α n)

theorem birthday : likely_collision (fin 365) 23 :=
begin
  unfold likely_collision,
  rw [card_vector, fintype.card_fin],
  sorry
end

Bhavik Mehta (Apr 13 2021 at 13:26):

@Matt Kempster has also been working on this, and I think he's made a reasonable amount of progress

Matt Kempster (Apr 13 2021 at 14:28):

Yes I have, should I post what I have so far? It's very WIP but I think it's on the right track. It's not probability theory based at all - just counting numbers of injective functions from fin 23 to fin 365 and dividing by all functions from fun 23 to fin 365. This is basically how Metamath does it

Joseph Myers (Apr 14 2021 at 00:07):

The value of fintype.card (α ↪ β) (for arbitrary [fintype α] [fintype β]) seems like it belongs in mathlib proper, with only the evaluation for fin 23 and fin 365 going in the mathlib archive. (The denominator is given by fintype.card_fun.)

Matt Kempster (Apr 14 2021 at 05:20):

Yep, very on the same page there. I'll push to a branch soon (tomorrow if I find time), but my work is pretty messy so far. Look forward to input and advice!

Hunter Monroe (Apr 15 2021 at 17:27):

We would just need to show (finset.filter (λ (l : vector (fin n) k), l.to_list.nodup) finset.univ).card = n* (left hand side but with n-1, k-1)

Yakov Pechersky (Apr 15 2021 at 17:28):

You can use list.of_fn to skip the vector

Yakov Pechersky (Apr 15 2021 at 17:31):

You can also probably use things about powerset to automatically get nodup subsets

Yakov Pechersky (Apr 15 2021 at 17:32):

docs#finset.card_powerset_len

Yakov Pechersky (Apr 15 2021 at 17:32):

docs#finset.powerset_len_eq_filter

Yakov Pechersky (Apr 15 2021 at 17:35):

You could generalize to any pi fintype, via something like docs#fintype.card_pi and docs#fintype.card_fun

Yakov Pechersky (Apr 15 2021 at 17:44):

Ah I see Joseph said this already

Eric Rodriguez (Apr 22 2021 at 16:57):

So I've got a nearly finished version, actually, and just thought I should search on Zulip to check if anyone else has done it, and saw this... Sorry Matt!

Mario Carneiro (Apr 22 2021 at 16:58):

well, if a bunch of people work on different problems eventually two people will work on the same problem :grinning:

Eric Rodriguez (Apr 22 2021 at 16:58):

One thing that I found is that norm_num is slow at evaluating stuff like 365!/(365-23)!, so I made this class to help; do people think it belongs in mathlib proper?

def reverse_fac :     
| 0 0       := 1
| 0 (k + 1) := 1
| (n + 1) 0 := 1
| (n + 1) (k + 1) := (n + 1) * reverse_fac n k

Eric Rodriguez (Apr 22 2021 at 16:59):

(and basically one lemma guaranteeing that it truly is equal to n! / (n-k)!)

Mario Carneiro (Apr 22 2021 at 17:00):

Yeah, if you write 365!/(365-23)! that's what norm_num will evaluate

Mario Carneiro (Apr 22 2021 at 17:00):

if you want to evaluate the product of 23 terms instead of 365 you should use a function that evaluates that way

Eric Rodriguez (Apr 22 2021 at 17:02):

this is what reverse_fac does^^ just asking whether it's worth putting in mathlib or just keep it in the archive file

Mario Carneiro (Apr 22 2021 at 17:04):

isn't there already something like this?

Mario Carneiro (Apr 22 2021 at 17:04):

they are called rising and falling factorials IIRC

Mario Carneiro (Apr 22 2021 at 17:05):

Oh there is docs#pochhammer but it is only for polynomials

Eric Rodriguez (Apr 22 2021 at 17:05):

docs#ring_theory.polynomial.pochhammer, maybe? edit: sniped :b

Eric Rodriguez (Apr 22 2021 at 17:06):

i'll try retrofit it with the eval theorems it has there

Eric Rodriguez (Apr 22 2021 at 17:07):

because it ended up _not_ being trivial as expected and instead I ended up with this disaster:

theorem eval_reverse_fac :  n k : , k  n  n! / (n - k)! = reverse_fac n k
| 0 0       := by simp!
| 0 (k + 1) := by simp!
| (n + 1) 0 := λ _, by { rw [nat.sub_zero, nat.div_self], unfold reverse_fac, apply factorial_pos }
| (n + 1) (k + 1) := begin
  intro h, replace h := le_of_succ_le_succ h,
  simp only [reverse_fac, factorial, succ_eq_add_one, succ_sub_succ_eq_sub],
  rw eval_reverse_fac _ _ h,

  -- at this point, we have two slightly different divisions. we fix them both

  apply mul_left_cancel' (factorial_ne_zero (n - k)),
  obtain a, ha := factorial_dvd_factorial (sub_le n k),
  rw nat.mul_div_cancel',
    swap, {use a * (n + 1), rw ha, ring},
  symmetry,
  calc (n - k)! * ((n + 1) * (n! / (n - k)!)) = (n + 1) * ((n - k)! * (n! / (n - k)!)) : by ac_refl
                                          ... = (n + 1) * n! : by rw nat.mul_div_cancel' a, ha
end

-- If I don't put the `skip` it errors?! (`simp` won't do this on its own btw)
@[simp] lemma rev_fac_zero (m : ) : reverse_fac m 0 = 1 := by {cases m; simp!, skip}

@[simp] lemma zero_rev_fac (m : ) : reverse_fac 0 m = 1 := by {cases m; simp!, skip}

Eric Rodriguez (Apr 22 2021 at 17:07):

but yeah factorials API isn't super well-developed

Mario Carneiro (Apr 22 2021 at 17:11):

Note that norm_num doesn't actually know how to compute factorials. It's just rewriting with the simp lemmas for factorial

Mario Carneiro (Apr 22 2021 at 17:12):

if you use simp lemmas for reverse_fac it will be just as good

Mario Carneiro (Apr 22 2021 at 17:13):

The reason your proof is a disaster is because you used - and / on nat

Mario Carneiro (Apr 22 2021 at 17:13):

you should rewrite the theorem to use + and *

Eric Rodriguez (Apr 22 2021 at 17:13):

ahh, so instead I could do it as (n + k)! = n! reverse_fac blah

Eric Rodriguez (Apr 22 2021 at 17:15):

is it worth expressing the card as a division still or as this function? just thinking as it's going in mathlib

Mario Carneiro (Apr 22 2021 at 17:19):

for the birthday problem itself you should use a division for presentation purposes

Mario Carneiro (Apr 22 2021 at 17:19):

but once you have proven the lemma with + and * it's not hard to postprocess it to the - and / version

Eric Rodriguez (Apr 22 2021 at 17:21):

I mean for the card_embedding function that will be kind of like docs#fintype.card_fun

Matt Kempster (Apr 23 2021 at 01:00):

Eric Rodriguez said:

So I've got a nearly finished version, actually, and just thought I should search on Zulip to check if anyone else has done it, and saw this... Sorry Matt!

No worries! I actually would say I was only ~30% done, and was starting to hit some major road blocks. I look forward to seeing your proof to compare & contrast with my skeleton!

Eric Rodriguez (Apr 25 2021 at 13:15):

It's done! The birthday problem has now been reduced to a call to norm_num :D Cleaning up the code, PR incoming soon :)

Eric Rodriguez (Apr 25 2021 at 14:24):

Hmm, how to make Lean give up on code versions that I've already overwritten? For example, say I'm editing a big file, and I add an import, it'll start screaming at me that data.finset.basi doesn't exist, whilst of course I had typed data.finset.basic and it was just sort of delayed and doing old code, before trying again

Eric Wieser (Apr 25 2021 at 14:27):

Restart vs code?

Eric Rodriguez (Apr 25 2021 at 14:35):

hmm, wasn't too helpful. I'll just let CI deal with it, I'm only copy-pasting stuff right now anyways

Eric Rodriguez (Apr 25 2021 at 14:36):

I changed stuff in embedding.lean and fintype.lean so I'm assuming it needs to recheck everything again, lol

Eric Rodriguez (Apr 25 2021 at 15:02):

#7363!

Eric Rodriguez (Apr 25 2021 at 15:04):

There's still a couple things to be done, but all the code itself is there ready to review; the other things are basically admin


Last updated: Dec 20 2023 at 11:08 UTC