The Number of Red Balls

Home

Java Applet Simulation of the ball and urn experiment


In the ball and urn experiment, let Y denote the random variable that gives the number of red balls in the sample.

Sampling with Replacement

Suppose first that the sampling is with replacement.

Mathematical Exercise 1. Show that the colors of successive balls drawn from the urn form a sequence of Bernoulli trials.

Mathematical Exercise 2. Use the result of Exercise 1 to show that Y has the binomial distribution with parameters n and p = R/N:

In particular, the mean and variance are

E(Y) = n(R / N), var(Y) = n(R / N) (1 - R / N)

Simulation Exercise 3. In the urn experiment, select sampling with replacement and random variable Y. Vary the parameters and note the shape of the graph of the density function. Now let N = 50, R = 30, and n = 10 and run the experiment with an update frequency of 100. Watch the apparent convergence of the relative frequency function to the density function.

Sampling Without Replacement

Now suppose that the sampling is without replacement. To derive the probability density function of Y, we can consider the sample as an unordered subset (combination) of size n from the population of size N. Recall that these combinations are equally likely.

Mathematical Exercise 4. Show that

This is known as the hypergeometric distribution with parameters N, R, and n. If we adopt the convention that C(m, j) = 0 for j > m then the formula for the density function is correct for k = 0, 1, ..., n.

Mathematical Exercise 5. Show the following result combinatorially by treating the outcome as a permutation of size k chosen from the population of N balls. Show the result algebraically, starting from the result in Exercise 4.

Simulation Exercise 6. In the ball and urn experiment, select sampling without replacement and random variable Y. Vary the parameters and note the shape of the graph of the density function. Now let N = 50, R = 30, and n = 10 and run the experiment with an update frequency of 100. Watch the apparent convergence of the relative frequency function to the density function.

Moments

Computing the mean and variance of Y directly from the hypergeometric distribution is difficult, so instead we will decompose Y into a sum of indicator variables:

Y = I1 + I2 + ··· + In

where Ij = 1 if the j'th ball is red and Ij = 0 if the j'th ball is green.

In the following problems a key fact is that the joint distribution of any sequence of m indicator variables is the same as that of any other sequence of m indicator variables (the exchangeable property).

Mathematical Exercise 7. Show that for any j,

E(Ij) = R / N.

Mathematical Exercise 8. Use the result of Exercise 7 to show that

E(Y) = n(R / N)

Mathematical Exercise 9. Show that

var(Ij) = (R / N)(1 - R / N)

Mathematical Exercise 10. Use basic properties of covariance and Exercises 7 and 9 to show that for distinct j and k,

  1. cov(Ij, Ik) = -(R / N)(1 - R / N)[1 / (N - 1)]
  2. cor(Ij, Ik) = -1 / (N - 1)

Note from Exercise 10 that the event of a red ball on draw j and the event of a red ball on draw k are negatively correlated, but the correlation depends only on the population size and not on the number of red balls. Note also that the correlation is perfect if N = 2. Think about these result intuitively.

Simulation Exercise 11. In the urn experiment, set N = 50, R = 20, and n = 10. Now run the experiment 500 times, updating after each run. Compute the empirical correlation of the events of a red ball on draw 3 and a red ball on draw 7. Compare with the theoretical result in Exercise 10.

Mathematical Exercise 12. Use the results of Exercise 9 and 10 and basic properties of covariance to show that

Mathematical Exercise 13. Compare the mean and variance of Y when the sampling is with replacement and when the sampling is without replacement. For which distribution is the variance smaller? Does the result seem reasonable?

Convergence of the Hypergeometric Distribution to the Binomial

Suppose now that R depends on N and that

We will show that for fixed n, the hypergeometric distribution with parameters N, R, and n converges to the binomial distribution with parameters n and p. Intuitively, this means that if N is large compared to n, then sampling n items without replacement is not too much different than sampling n items with replacement, and hence the hypergeometric distribution can be approximated by the binomial.

Mathematical Exercise 14. Use the result of Exercise 5 to show that

Mathematical Exercise 15. Complete the proof by showing that for fixed n and k,

Simulation Exercise 16. In the Ball and Urn simulation set N = 100, n = 10, and R = 30. Run the simulation 1000 times, updating every 100 runs. Compare the relative frequency function, the hypergeometric density function, and the approximating binomial density function.


The Ball and Urn Experiment

PreviousNext