The Coupon Collector Problem |
Simulation of the coupon collector experiment
We will now study the coupon collector experiment of distributing balls randomly into the n cells indefinitely. Thus, we have the random variables
X1, X2, X3, ... where Xi is the number of the cell containing ball i.
Our basic modeling assumption is that these random variables are independent and each is uniformly distributed on {1, 2, ..., n}. Our random variable of interest is the number of balls required until j cells are occupied. This variable can be written simply in terms of the number of occupied cells when k balls are distributed into n cells:
Wn,j = min{k: Vn,k = j}
The possible values of this variable are j, j + 1, j + 2, ....
In the coupon collector experiment, the total number of cells n and the number of cells to be occupied j can be varied with scroll bars. When you run the experiment, the cell counts of the occupied cells are shown in the large table. The density and moments of the number of balls required W is displayed in a table and in a graph. The value of W on each update is recorded in a table.
1. In the coupon collector experiment, set n = 50 and vary j. Note the shape of and position of the density graph. With j = 20, run the experiment in single-step mode a few times and watch the outcomes. Finally, run the experiment 1000 times with an update frequency of 10 and note the apparent convergence of the relative frequency distribution to the true distribution.
Now let's find the distribution of the number of balls needed to occupy j cells. The results of the previous section will be very helpful
2. Argue that
Wn,j = k if and only if Vn,k-1 = j - 1 and Vn,k = j.
3. Use Exercise 2 and a conditional probability argument to show that
4. In the coupon collector experiment, set n = 100 and vary j. Note the shape of and position of the density graph. With j = 50, run the experiment in single-step mode a few times and watch the outcomes. Finally, run the experiment 1000 times with an update frequency of 10 and note the apparent convergence of the relative frequency distribution to the true distribution.
We will now show that the number of balls needed to occupy j cells can be decomposed as a sum of j independent variables. This will provide some additional insight into the nature of the distribution and will make the computation of the mean and variance easy.
For i = 1, 2, ... n, let Yi denote the number of additional balls needed to go from i - 1 occupied cells to i occupied cells.
5. Argue that
Exercise 5 shows in a clear way that, each time a new cell is occupied, it becomes harder for another new cell to become occupied.
6. In the coupon collector experiment, set n = 30 and vary j. Note the shape of and position of the density graph. With j = 25, run the experiment in single-step mode a few times and watch the outcomes. Finally, run the experiment 1000 times with an update frequency of 10 and note the apparent convergence of the sample statistics to the distribution parameters.
7. Use the results of Exercise 5 to show that
8. Use the results of Exercise 5 to show that
9. Use Exercises 7 and 8 to explicitly compute the mean and variance of the number of balls needed to occupy all j = 10 cells when there are n = 10 cells.
10. In the coupon collector experiment, set n = 10 and vary j. Note the shape of and position of the density graph. With j = 10, run the experiment in single-step mode a few times and watch the outcomes. Finally, run the experiment 1000 times with an update frequency of 10 and note the apparent convergence of the sample statistics to the distribution parameters.
11. Use the Result of Exercise 5 to find the moment generating function of the number of balls needed to occupy j cells.
An alternate approach to the distribution of the number of balls needed to occupy j cells is via a recursion formula.
12. Let cn,j(k) = P(Wn,j = k) for k = j, j + 1, ...
Use a conditional probability argument to show that
Ball and Cell Experiments |