The Coupon Collector Problem

Home

Java Applet Simulation of the coupon collector experiment


Random Variables

We will now study the coupon collector experiment of distributing balls randomly into the n cells indefinitely. Thus, we have the random variables

X1, X2, X3, ... where Xi is the number of the cell containing ball i.

Our basic modeling assumption is that these random variables are independent and each is uniformly distributed on {1, 2, ..., n}. Our random variable of interest is the number of balls required until j cells are occupied. This variable can be written simply in terms of the number of occupied cells when k balls are distributed into n cells:

Wn,j = min{k: Vn,k = j}

The possible values of this variable are j, j + 1, j + 2, ....

The Simulation

In the coupon collector experiment, the total number of cells n and the number of cells to be occupied j can be varied with scroll bars. When you run the experiment, the cell counts of the occupied cells are shown in the large table. The density and moments of the number of balls required W is displayed in a table and in a graph. The value of W on each update is recorded in a table.

Simulation Exercise 1. In the coupon collector experiment, set n = 50 and vary j. Note the shape of and position of the density graph. With j = 20, run the experiment in single-step mode a few times and watch the outcomes. Finally, run the experiment 1000 times with an update frequency of 10 and note the apparent convergence of the relative frequency distribution to the true distribution.

The Density Function

Now let's find the distribution of the number of balls needed to occupy j cells. The results of the previous section will be very helpful

Mathematical Exercise 2. Argue that

Wn,j = k if and only if Vn,k-1 = j - 1 and Vn,k = j.

Mathematical Exercise 3. Use Exercise 2 and a conditional probability argument to show that

Simulation Exercise 4. In the coupon collector experiment, set n = 100 and vary j. Note the shape of and position of the density graph. With j = 50, run the experiment in single-step mode a few times and watch the outcomes. Finally, run the experiment 1000 times with an update frequency of 10 and note the apparent convergence of the relative frequency distribution to the true distribution.

Moments

We will now show that the number of balls needed to occupy j cells can be decomposed as a sum of j independent variables. This will provide some additional insight into the nature of the distribution and will make the computation of the mean and variance easy.

For i = 1, 2, ... n, let Yi denote the number of additional balls needed to go from i - 1 occupied cells to i occupied cells.

Mathematical Exercise 5. Argue that

  1. Y1, Y2, ..., Yn are independent.
  2. Yi has the geometric distribution with parameter pi = (n - i + 1) / n.
  3. Wn,j = Y1 + Y2 + ··· + Yn.

Exercise 5 shows in a clear way that, each time a new cell is occupied, it becomes harder for another new cell to become occupied.

Simulation Exercise 6. In the coupon collector experiment, set n = 30 and vary j. Note the shape of and position of the density graph. With j = 25, run the experiment in single-step mode a few times and watch the outcomes. Finally, run the experiment 1000 times with an update frequency of 10 and note the apparent convergence of the sample statistics to the distribution parameters.

Mathematical Exercise 7. Use the results of Exercise 5 to show that

Mathematical Exercise 8. Use the results of Exercise 5 to show that

Mathematical Exercise 9. Use Exercises 7 and 8 to explicitly compute the mean and variance of the number of balls needed to occupy all j = 10 cells when there are n = 10 cells.

Simulation Exercise 10. In the coupon collector experiment, set n = 10 and vary j. Note the shape of and position of the density graph. With j = 10, run the experiment in single-step mode a few times and watch the outcomes. Finally, run the experiment 1000 times with an update frequency of 10 and note the apparent convergence of the sample statistics to the distribution parameters.

Mathematical Exercise 11. Use the Result of Exercise 5 to find the moment generating function of the number of balls needed to occupy j cells.

Recurrence Relation

An alternate approach to the distribution of the number of balls needed to occupy j cells is via a recursion formula.

Mathematical Exercise 12. Let cn,j(k) = P(Wn,j = k) for k = j, j + 1, ...

Use a conditional probability argument to show that


Ball and Cell Experiments

Previous