Deviations from the Assumptions

Java Applet Simulation of the mean estimation experiment

Interval Estimates of the Mean

In the previous section, we saw that if the underlying distribution is normal with known standard deviation, then confidence bounds for the mean have the form

where

z = ± z_{1 - a/2} for the two-sided 1 - a confidence interval,
z = -z_{1 - a}for the 1 - a confidence lower bound
z = z_{1 - a} for the 1 - a confidence upper bound

The derivations follow easily from the fact that

has a standard normal distribution.

However, we also noted that the assumptions will frequently not be satisfied in real estimation problems. In this section we will see that the estimation procedure is remarkably robust; that is, the procedure works well even when the assumptions are violated.

Non-Normal Distributions

Suppose first that the underlying distribution is not normal. When n is relatively large, the distribution of Z will still be approximately standard normal by the central limit theorem. Thus, the confidence interval and confidence bounds should still be approximately valid.

1. In the mean estimation experiment, select the gamma distribution with parameters a = 1 and r = 1. Select two-sided intervals and confidence level 0.90. For each of the following sample sizes, run the experiment 1000 times with an update frequency of 10. Note how well the proportion of successful intervals approximates the theoretical confidence level.

n = 5.
n = 10.
n = 30.

2. In the mean estimation experiment, select the gamma distribution with parameters shape parameter 5 and scale parameter 1. Select two-sided intervals and confidence level 0.90. For each of the following sample sizes, run the experiment 1000 times with an update frequency of 10. Note how well the proportion of successful intervals approximates the theoretical confidence level.

n = 5.
n = 10.
n = 30.

3. In the mean estimation experiment, select the Poisson distribution with mean 1. Select two-sided intervals and confidence level 0.90. For each of the following sample sizes, run the experiment 1000 times with an update frequency of 10. Note how well the proportion of successful intervals approximates the theoretical confidence level.

n = 5.
n = 10.
n = 30.

How large n needs to be, for the estimation procedure to work well, depends of course on the underlying distribution; the more this distribution deviates from normality, the larger n must be. Fortunately, convergence to normality in the central limit theorem is rapid and hence, as you observed in the exercises, we can get away with relatively small sample sizes (30 or more) in most cases.

Unknown Standard Deviation

Suppose now that the standard deviation of the distribution is unknown. The natural modification is to use the sample standard deviation in the confidence intervals instead of the distribution standard deviation. That is, the confidence bounds would have the form

where z is the appropriate quantile as defined in equations 1, 2, and 3 above. At this point, we have no mathematical sense of how well this will work, even when the distribution is normal. The crucial point is that the confidence bounds now contains two random variables (the sample mean and standard deviation).

The following exercises allow you to explore the issue empirically. First, select the normal distribution and set the options to Use S and Use z quantiles. This ensures that the simulation will construct the confidence bounds given above.

4. In the mean estimation experiment, select the normal distribution with mean 0 and standard deviation 0.5. Select two-sided intervals and confidence level 0.90. For each of the following sample sizes, run the experiment 1000 times with an update frequency of 10. Note how well the proportion of successful intervals approximates the theoretical confidence level.

n = 5.
n = 10.
n = 30.

5. In the mean estimation experiment, select the normal distribution with mean 0 and standard deviation 2. Select lower bound and confidence level 0.80. For each of the following sample sizes, run the experiment 1000 times with an update frequency of 10. Note how well the proportion of successful intervals approximates the theoretical confidence level.

n = 5.
n = 10.
n = 30.

6. In the mean estimation experiment, select the normal distribution with mean 1 and standard deviation 1.5. Select upper bound and confidence level 0.60. For each of the following sample sizes, run the experiment 1000 times with an update frequency of 10. Note how well the proportion of successful intervals approximates the theoretical confidence level.

n = 5.
n = 10.
n = 30.

In exercises 4, 5 and 6 you should have noticed that the estimation procedure works remarkably well for large samples, but with small samples the true confidence level (as indicated by the proportion of successes) seems consistently smaller than the "advertised" confidence level. Equivalently, our confidence intervals are evidently too small. Thus, for small samples, it seems that we need to revise our procedure. We will do this in the section Estimating the Mean with Unknown Variance.