Estimation of R with N Known

Java Applet Simulation of the ball and urn experiment

The Estimation Problem

In the ball and urn experiment, suppose that the total number of balls N is known but that the total number of red balls R is unknown. We wish to estimate R by sampling n items and observing Y, the number of red balls in the sample. Recall that if the sampling is without replacement, Y has the hypergeometric distribution with parameters N, R, and n, while if the sampling is with replacement, Y has the binomial distribution with parameters n and p = R / N.

This type of problem could arise, for example, if we had a batch of N computer chips containing an unknown number R of defectives. It would be too costly and perhaps destructive to test all N chips, so we might instead select n chips at random and test those.

A simple estimator of R can be derived by hoping that the sample proportion of red balls is close to the population proportion of red balls. That is,

Y / n ~ R / N so R ~ NY / n

The estimator NY / n has some nice statistical properties. For either sampling model, it is unbiased since the mean of the estimator is the parameter being estimated.

Properties

$Mathematical Exercise$ 1. Use basic properties of expected value to show that

E(NY / n) = R

Since the estimator is unbiased, the variance is a measure of the quality of the estimator, in the mean square sense, because by definition,

$Mathematical Exercise$ 2. Show that if the sampling is with replacement, then

$Mathematical Exercise$ 3. Show that if the sampling is without replacement,

$Mathematical Exercise$ 4. Show that any either case, the variance of Y is a decreasing function of n for fixed N and R. Thus, the estimator improves as the sample size increases.

$Mathematical Exercise$ 5. For given parameter values, which sampling method gives the better estimator (as measured by the variance)?

6. In the urn experiment, select sampling without replacement and set N = 50, R = 20, and n = 10. Run the experiment 50 times, updating after each run. On each run, compute NY / n, the estimate of R. Now compute the square root of the average of the squares of the errors over the 50 runs.

7. Repeat Exercise 6, sampling with replacement. Compare the results.

$Mathematical Exercise$ 8. Show that if the sampling is with replacement, then N k / n maximizes P(Y = k) as a function of R for fixed N and n. This means that N Y / n is the maximum likelihood estimator of R..

When the sampling is with replacement, the problem of estimating R with N known is equivalent to the problem of estimating the proportion of red balls R / N.

Acceptance Sampling

Sometimes we are not interested in estimating R, but just in determining whether R meets or exceeds a critical value C. In particular, this situation arises in acceptance sampling: Suppose that the balls represent items and that the red balls represent defective items, the green balls good items. If the number of defective items R is at least C (the critical value), then we would like to reject the entire lot. However, testing the items is expensive and destructive, so we must test a random sample of n items (drawn without replacement, of course) and base our decision to accept or reject the lot on the number of defectives in the sample. Clearly, the only reasonable approach is to choose another critical value c and reject the lot if the number of defectives in the sample is at least c. In statistical terms, we have described an hypothesis test.

In the following problems, suppose that N = 100 and C = 10. Thus we would like to reject the lot of 100 items if it contains 10 or more defectives. Suppose that we can only afford to sample and test n = 10 items. We will study first the following test: Reject the lot if the number of defectives in the sample is at least 2.

$Mathematical Exercise$ 9. Suppose that R = 15. Find the probability that we make the correct decision (reject) and the probability that we make the wrong decision (accept).

10. In the urn experiment, select sampling without replacement, and set N = 100, R = 15, n = 10. Run the experiment 1000 times, updating every 100 runs. Compute the relative frequency of rejections and compare with the true probability in Exercise 9.

$Mathematical Exercise$ 11. Suppose that R = 8. Find the probability that we make the correct decision (accept) and the probability that we make the wrong decision (reject).

12. In the urn experiment, select sampling without replacement, and set N = 100, R = 8, n = 10. Run the experiment 1000 times, updating every 100 runs. Compute the relative frequency of rejections and compare with the true probability in Exercise 11.

Suppose now that we change the test: Reject the lot if the number of defectives in the sample is at least 1.

$Mathematical Exercise$ 13. Suppose that R = 15. Find the probability that we make the correct decision (reject) and the probability that we make the wrong decision (accept).

14. In the urn experiment, select sampling without replacement, and set N = 100, R = 15, n = 10. Run the experiment 1000 times, updating every 100 runs. Compute the relative frequency of rejections and compare with the true probability in Exercise 13.

$Mathematical Exercise$ 15. Suppose that R = 8. Find the probability that we make the correct decision (accept) and the probability that we make the wrong decision (reject).

16. In the urn experiment, select sampling without replacement, and set N = 100, R = 8, n = 10. Run the experiment 1000 times, updating every 100 runs. Compute the relative frequency of rejections and compare with the true probability in Exercise 15.