The Simple Birthday Problem

Home

Java Applet Simulation of the simple birthday experiment


The Mathematical Model

Consider the birthday experiment of distributing k balls into the n cells. We will record the outcome of the experiment as the random vector

X = (X1, X2, ..., Xk)

where Xi is the number of the cell containing ball i.

Thus, the sample space is

S = {1, 2, ..., n}k.

Our basic modeling assumption is that the balls are independently and randomly placed in the cells. Mathematically this means that the basic outcome variables

X1, X2, ..., Xk

are independent and each is uniformly distributed on {1, 2, ..., n}.

Mathematical Exercise 1. Show that the modeling assumption is equivalent to X being uniformly distributed on S.

The Birthday Probability

The simple birthday problem is to compute the probability of the event

Bn,k: some cell contains 2 or more balls.

If we choose k people at random and note their birthdays (thus n = 365), then the birthday problem is to compute the probability that at least two people have the same birthday (this special case is the origin of the name).

Mathematical Exercise 2. Show that in terms of the basic outcome variables

Bn,k = {Xi = Xj for at least one distint pair of indices i, j}

The solution of the birthday problem is an easy exercise in combinatorial probability.

Mathematical Exercise 3. Use the multiplication rule of combinatorics and Exercise 1 to show that

Hint: The complementary event occurs if and only if the outcome vector X forms a permutation of size k from {1, 2, ..., n}

The fact that the probability is 1 for k > n is sometimes referred to as the pigeonhole principle: if more than n pigeons are placed into n holes then at least one hole has 2 or more pigeons.

Mathematical Exercise 4. Let n = 365 (the true birthday problem). Show that the probability that some cell contains 2 or more balls is

  1. 0.411 for k = 20
  2. 0.706 for k = 30
  3. 0.891 for k = 40
  4. 0.970 for k = 50
  5. 0.994 for k = 60

The Simulation

In the simulation of the birthday experiment, you can vary n and k with scroll bars. When you run the experiment, the basic varibles Xi and the cell counts of the occupied cells are shown on each update. The indicator variable for the birthday event is recorded on each update and the density function is shown in the distribution graph and table.

Simulation Exercise 5. In the birthday experiment, set n = 365. For k = 20, 30, 40, 50, and 60, run the experiment 1000 times each and compute the relative frequency of the event that some cell contains 2 or more balls. Compare the relative frequencies with the probabilities computed in Exercise 4.

In spite of its easy solution, the simple birthday problem is famous because, numerically, the probabilities can be a bit surprising. In the setting of Exercise 4, note that with a just 60 people, the event is almost certain! Mathematically, the rapid increase in the birthday probability in Exercise 3, as k increases, is due to the fact that the denominator grows much faster than the numerator.

Simulation Exercise 6. In the birthday experiment, set n = 365. Vary k and note graphically how the probability changes.

Simulation Exercise 7. Repeat Exercise 6 with n = 100.

Simulation Exercise 8. Repeat Exercise 6 with n = 200.

Recurrence

Mathematical Exercise 9. Let bn,k denote the probability of the complementary event, that no cell contains two or more balls. Prove the following recursion relation in two ways: first starting with the result in Exercise 3, and then by using a conditional probability argument.

Mathematical Exercise 10. Let n = 52 (corresponding to birth weeks). Use Exercise 9 to find the smallest value of k such that the probability that some cell contains 2 or more balls at least 1 / 2.

Simulation Exercise 11. Run the birthday experiment 1000 times with n = 52 and the value of k that you found in Exercise 10. Compare the relative frequency of the event that some cell contains 2 or more balls with the probability that you found in Exercise 10.


Ball and Cell Experiments

PreviousNext