## Estimating the Mean with Known Variance |

Simulation of the mean estimation experiment

In this section, we will assume that the underlying distribution is normal with mean and standard deviation denoted, as ususal by

We will construct confidence intervals for the mean assuming that the standard deviation is known. It is natural to start with the sample mean

since this statistic is an unbiased estimator of the distribution mean. The crucial fact that we will need is given in the following exercise:

** 1. **Use properties of the normal
distribution to show that the standard score

has a standard normal distribution.

Now, for a number *p* in (0, 1), we will let *z*_{p}
denote the *p*'th quantile of the standard normal
distribution, so that

P(Z<z_{p}) =p.

For selected values of *p*, values of these quantiles are
given in the last row of the table
of the *t* distribution. You could also find the
quantiles by using the table of
the standard normal distribution.

** 2. **Use the result of Exercise 1
to show that

** 3.**** **Show that the
expression in Exercise 2 can equivalently be written as

From Exercise 3, it follows that

is a 1 - *a* confidence interval for the distribution
mean. Note that this interval is symmetric with respect to the
sample mean.

** 4. **Use a derivation similar to
Exercises 2 and 3 to show that a 1 - *a* confidence lower
bound for the distribution mean is

** 5. **Use a derivation similar to
Exercises 2 and 3 to show that a 1 - *a* confidence upper
bound for the distribution mean is

Note that the assumption that s is known is critical because, by definition, the confidence bounds cannot contain unknown parameters.

The applet for this page is a simulation of the random experiment of constructing a confidence interval for the mean of a distribution. You can choose among several 2-parameter families of distributions:

The default distribution is the standard normal distribution.

The graph of the density function of the chosen distribution is displayed in the picture box on the left, and the mean is shown as a vertical blue line on the horizontal axis. The mean and standard deviation of the chosen distribution are recorded in the first table on the left.

You can choose among several different confidence levels with a list box: 0.60, 0.80, 0.90, and 0.95. You can choose among several sample sizes with a scroll bar: 5, 10, 15, 20, 25, and 30. You can choose whether to construct a two-sided confidence interval, a confidence lower bound, or a confidence upper bound from a list box.

When you run the simulation, a sample of the specified size is chosen from the given distribution. The sample values are recorded in the second table and the empirical density of the sample is shown in red in the graph on the left. The sample mean and standard deviation are shown in the first table, and when the variable is discrete, the empirical density is also recorded in the first table. The confidence interval is recorded in the third table and is shown graphically as a horizontal red bar in the first graph.

When the confidence interval contains the mean, we have been
successful in our estimate and when the confidence interval does
not contain the mean, we have failed. An indicator variable *I *keeps
track of our successes and failures. The value of *I* is
recorded on each update in the third table, and the empirical
distribution of *I* is shown in the last graph and last
table. By the very meaning of the confidence interval, the
relative frequency of successes should be close to the confidence
level after a large number of runs.

Make sure that *Use Sigma* and *Use z* are
selected. The density of the standard normal* *distribution
is shown in the second graph. The quantiles are recorded and the
interval defined by the quantiles are shown as a blue bar below
the axis of the middle graph.

When you run the simulation, the value of the standard score *Z
*is recorded in the third table and shown as a vertical red
line in the middle graph. The event that the standard score falls
in the critical interval is equivalent to the event that the
confidence interval successfully captured the mean (and thus the
success indicator variable *I* takes the value 1).

The mean estimation experiment displays other information which will be explained as we develop the theory. First, however, let us experience some confidence intervals.

** 6. **In the mean estimation experiment, select the
normal distribution with mean 0 and standard deviation 2, and
select two-sided intervals. For each of the following sample
sizes and confidence levels, run the experiment 1000 times with
an update frequency of 10. Note the size and location of the
confidence intervals an how well the proportion of successful
intervals approximates the theoretical confidence level.

*n*= 5, 80%.*n*= 5, 90%.*n*= 10, 90%.*n*= 30, 90%.

** 7. **In the interval estimate experiment, select
the normal distribution with mean 0 and standard deviation 2 and
select lower bound. For each of the following sample sizes and
confidence levels, run the experiment 1000 times with an update
frequency of 10. Note the size and location of the confidence
intervals an how well the proportion of successful intervals
approximates the theoretical confidence level.

*n*= 5, 80%.*n*= 5, 90%.*n*= 10, 90%.*n*= 30, 90%.

** 8. **In the interval estimate experiment, select
the normal distribution with mean 0 and standard deviation 0.5
and select upper bound. For each of the following sample sizes
and confidence levels, run the experiment 1000 times with an
update frequency of 10. Note the size and location of the
confidence intervals an how well the proportion of successful
intervals approximates the theoretical confidence level.

*n*= 5, 80%.*n*= 5, 90%.*n*= 10, 90%.*n*= 30, 90%.

In Exercises 6, 7, and 8, you should have noticed the general behavior described in the following exercise:

** 9. **Show that the confidence interval
(either two-sided or one-sided)

- decreases as the sample size increases,
- increases as the confidence level increases,
- increases as the standard deviation increases.

In particular, Exercise 9.b shows that there is a tradeoff between the confidence level and the size of the confidence interval. If the sample size and standard deviation are fixed, we can decrease the size of our interval, and hence tighten our estimate, only at the expense of decreasing our confidence in the estimate. Conversely, we can increase our confidence in the estimate only at the expense of enlarging the size of our interval.

Again, our two crucial assumptions are that the underlying distribution is normal with known standard deviation. You should question the legitimacy of both assumptions. First, of course, in real estimation problems, we are unlikely to know much about the underlying distribution, let alone whether or not it is normal.

Moreover, the assumption that the mean of the underlying distribution is unknown, but the variance is known, is usually artificial. However, this is not always the case. Suppose, for example, that we have a machine that is designed to produce parts of a specified length. Because of imperfections in the process however, the true length of a part will be a random variable. The variance of the distribution may be due to inherent factors in the machine, which may remain fairly stable over time. In this case, the variance may known from historical data to a high degree of accuracy. The mean, on the other hand, may be set by adjusting the machine and hence may change to an unknown value fairly frequently because of vibrations or other factors.

In any event, most of the rest of this chapter is devoted to the study of confidence intervals for the mean when our assumptions are relaxed.

## Interval Estimation |