A new take on the birthday problem

John

from John D. Cook on 2019-12-28 21:45 (#4X5H0)

Vitalii Tymchyshyn and Andrii Khlevniuk posted a new paper here entitled "On the average number of birthdays in birthday paradox setup." This paper gives a different perspective on the birthday problem and a new proof.

In the usual formation of the birthday problem, one asks the probability that everyone in a group of size n has a different birthday, or one asks how big n needs to be for the probability to be approximately 1/2 that two people share a birthday. Tymchyshyn and Khlevniuk ask about the expected number of unique birthdays in a group of size n and show that it equals

(1 - Iⁿ) / (1 - I)

where I = 364/365.

Variations on the birthday problem often come up in practice. For example, you often need to consider how likely it is that a hash function will map set of inputs to unique values. There's a rule of thumb that if there are N possible hash values, then there's a 50-50 chance of a collision if you hash aN values.

Let I = (N-1)/N and p = 1/N. If we take the first three terms in the Taylor series for (1 - p)ⁿ we see that the expected number of unique hash values after hashing n inputs is approximately

n - n(n - 1) p/2.

If we choose n = aN, then the expected number of unique hash values is approximately

n - 1/2,

which is consistent with having about a 0.5 probability of one collision.

Source	RSS or Atom Feed
Feed Location	http://feeds.feedburner.com/TheEndeavour?format=xml
Feed Title	John D. Cook
Feed Link	https://www.johndcook.com/blog