The General Birthday Problem

Home

Java Applet Simulation of the general birthday experiment


Random Variables

Consider again the birthday experiment of distributing k balls randomly into n cells. We record the basic random vector

X = (X1, X2, ..., Xk)

where Xi is the number of the cell containing ball i. Mathematically, our modeling assumption is that X is uniformly distributed on the sample space

S = {1, 2, ..., n}k.

We are interested in two related random variables: the number of empty cells:

and the number of occupied cells:

Note that any cell is either empty or occupied so

Un,k + Vn,k = n.

Thus, once we have the probability distribution and moments of one variable, we can easily find them for the other variable.

Note also that the simple birthday event that at least one cell has multiple balls can be written as

{Vn,k < k} = {Un,k > n - k}

The Simulation

In the simulation of the general birthday experiment, the parameters n and k can be varied with scroll bars. When you run the simulation, the basic variables Xi and the cell counts of the occupied cells are shown on each update. The density function and moments of the number of occupied cells V is displayed in the distribution graph and in the distribution table. The value of V is recorded on each update in the record table.

Simulation Exercise 1. In the birthday experiment, set n = 100. Vary k and note the shape of the density graph and its position within the range. With k = 50, run the experiment in single-step mode a few times and observe the outcomes. Now run the experiment 1000 times with an update frequency of 10 and note the apparent convergence of the relative frequency distribution to the true distribution.

The Density Function

Now let N = {1, 2, ..., n} denote the set of cell indices and for j in N, consider the event that cell j is empty:

Let M be a subset of N with #(M) = m. Using the multiplication principle of combinatorics, it is easy to count the number of ball distributions that leave all the cells indexed by the elements of M empty:

Mathematical Exercise 2. Show that

Now the inclusion-exclusion rule of combinatorics can be used to count the number of ball distributions that leave at least one cell empty:

Mathematical Exercise 3. Show that

Once we have this, it's simple to count the number of ball distributions that leave no cell empty.

Mathematical Exercise 4. Show that

Now we can use a two-step procedure to generate all ball distributions that leave exactly j cells empty: first, choose the j cells that are to be empty. Then distribute the k balls into the remaining cells so that none are empty. Thus, we can use the multiplication principle of combinatorics to count the number of ball distributions that leave exactly j cells empty.

Mathematical Exercise 5. Show that

Finally, since our probability distribution P on the sample space is uniform, we can find the density function of the number of empty cells:

Mathematical Exercise 6. Show that

Also, we can easily find the density function of the number of occupied cells:

Mathematical Exercise 7. Show that

Simulation Exercise 8. In the birthday experiment, set n = 50. Vary k and note the shape of the density graph and its position within the range. With k = 50 run the experiment in single-step mode a few times and watch the outcomes. Now run the experiment 1000 times with an update frequency of 10 and note the apparent convergence of the relative frequency distribution to the true distribution.

The distribution of the number of empty cells can also be obtained by a recursion argument.

Mathematical Exercise 9. Let

Use probability arguments to show that

Simulation Exercise 10. In the birthday experiment, set n = 20. Vary k and note the shape of the density graph and its position within the range. Run the experiment in single-step mode a few times and watch the outcomes. Now run the experiment 1000 times with an update frequency of 10 and note the apparent convergence of the relative frequency distribution to the true distribution.

Moments

Now we will find the means and variances. The number of empty cells and the number of occupied cells are counting variables and hence can be written as sums of indicator variables. As we have seen in many other models, this representation is frequently the best for computing moments.

Let Ij = 1 if Aj occurs (cell j is empty) and Ij = 0 if Aj does not occur (cell j is occupied).

Note that the number of empty cells can be written as

Un,k = I1 + I2 + ··· + In.

Mathematical Exercise 11. Show that

Mathematical Exercise 12. Use the results of Exercise 11 to show that

Mathematical Exercise 13. Use the result of Exercise 12 to show that

Mathematical Exercise 14. Use the results of Exercise 13 and basic properties of variance to show that

Simulation Exercise 15. In the birthday experiment set n = 80. Vary k and note the shape of the density graph and the position and size of the mean-standard deviation bar. With k = 40 run the experiment in single-step mode a few times and watch the outcomes. Now run the experiment 1000 times with an update frequency of 10 and note the apparent convergence of the sample statistics to the distribution parameters.

Cell Counts

Mathematical Exercise 16. Let K = (K1, K2, ..., Kn) where Ki is the number of balls in cell i for i = 1, 2, ..., n. Show that K has the multinomial distribution:


Ball and Cell Experiments

PreviousNext