The Number of Red Balls |
![]() |
Simulation of the ball and urn experiment
In the ball and urn experiment, let Y denote the random variable that gives the number of red balls in the sample.
Suppose first that the sampling is with replacement.
1. Show that the colors of successive
balls drawn from the urn form a sequence of Bernoulli trials.
2. Use the result of Exercise 1 to
show that Y has the binomial distribution with parameters n
and p = R/N:
In particular, the mean and variance are
E(Y) = n(R / N), var(Y) = n(R / N) (1 - R / N)
3. In the urn experiment, select sampling with
replacement and random variable Y. Vary the parameters and
note the shape of the graph of the density function. Now let N
= 50, R = 30, and n = 10 and run the experiment
with an update frequency of 100. Watch the apparent convergence
of the relative frequency
function to the density function.
Now suppose that the sampling is without replacement. To derive the probability density function of Y, we can consider the sample as an unordered subset (combination) of size n from the population of size N. Recall that these combinations are equally likely.
4. Show that
This is known as the hypergeometric distribution with parameters N, R, and n. If we adopt the convention that C(m, j) = 0 for j > m then the formula for the density function is correct for k = 0, 1, ..., n.
5. Show the following result combinatorially
by treating the outcome as a permutation
of size k chosen from the population of N balls.
Show the result algebraically, starting from the result in
Exercise 4.
6. In the ball and urn experiment, select
sampling without replacement and random variable Y. Vary
the parameters and note the shape of the graph of the density
function. Now let N = 50, R = 30, and n = 10
and run the experiment with an update frequency of 100. Watch the
apparent convergence of the relative frequency function to the
density function.
Computing the mean and variance of Y directly from the hypergeometric distribution is difficult, so instead we will decompose Y into a sum of indicator variables:
Y = I1 + I2 + ··· + In
where Ij = 1 if the j'th ball is red and Ij = 0 if the j'th ball is green.
In the following problems a key fact is that the joint distribution of any sequence of m indicator variables is the same as that of any other sequence of m indicator variables (the exchangeable property).
7. Show that for any j,
E(Ij) = R / N.
8. Use the result of Exercise 7 to
show that
E(Y) = n(R / N)
9. Show that
var(Ij) = (R / N)(1 - R / N)
10. Use basic properties of covariance and Exercises 7 and
9 to show that for distinct j and k,
Note from Exercise 10 that the event of a red ball on draw j and the event of a red ball on draw k are negatively correlated, but the correlation depends only on the population size and not on the number of red balls. Note also that the correlation is perfect if N = 2. Think about these result intuitively.
11. In the urn experiment, set N = 50, R
= 20, and n = 10. Now run the experiment 500 times,
updating after each run. Compute the empirical correlation of the
events of a red ball on draw 3 and a red ball on draw 7. Compare
with the theoretical result in Exercise 10.
12. Use the results of Exercise 9 and
10 and basic properties of covariance to show that
13. Compare the mean and variance of Y
when the sampling is with replacement and when the sampling
is without replacement. For which distribution is the variance
smaller? Does the result seem reasonable?
Suppose now that R depends on N and that
We will show that for fixed n, the hypergeometric distribution with parameters N, R, and n converges to the binomial distribution with parameters n and p. Intuitively, this means that if N is large compared to n, then sampling n items without replacement is not too much different than sampling n items with replacement, and hence the hypergeometric distribution can be approximated by the binomial.
14. Use the result of Exercise 5 to
show that
15. Complete the proof by showing
that for fixed n and k,
16. In the Ball and Urn simulation set N
= 100, n = 10, and R = 30. Run the simulation 1000
times, updating every 100 runs. Compare the relative frequency
function, the hypergeometric density function, and the
approximating binomial density function.
The Ball and Urn Experiment |
![]() ![]() |