A new take on the birthday problem
Vitalii Tymchyshyn and Andrii Khlevniuk posted a new paper here entitled "On the average number of birthdays in birthday paradox setup." This paper gives a different perspective on the birthday problem and a new proof.
In the usual formation of the birthday problem, one asks the probability that everyone in a group of size n has a different birthday, or one asks how big n needs to be for the probability to be approximately 1/2 that two people share a birthday. Tymchyshyn and Khlevniuk ask about the expected number of unique birthdays in a group of size n and show that it equals
(1 - In) / (1 - I)
where I = 364/365.
Variations on the birthday problem often come up in practice. For example, you often need to consider how likely it is that a hash function will map set of inputs to unique values. There's a rule of thumb that if there are N possible hash values, then there's a 50-50 chance of a collision if you hash aN values.
Let I = (N-1)/N and p = 1/N. If we take the first three terms in the Taylor series for (1 - p)n we see that the expected number of unique hash values after hashing n inputs is approximately
n - n(n - 1) p/2.
If we choose n = aN, then the expected number of unique hash values is approximately
n - 1/2,
which is consistent with having about a 0.5 probability of one collision.
Related posts