The Sample Variance

Java Applet Simulation of the sample mean experiment

The Random Sample

As usual, we start with a basic experiment and a random variable X of interest. We denote the mean and variance of X by

We repeat the experiment n times to from a random sample of size n from the distribution of X:

(X₁, X₂, ..., X_n)

Recall that these are independent random variables, each with the distribution of X.

The Sample Variance

The sample mean

is the natural measure of the center of the random sample. In this section, we are interested in measuring the spread of the sample about the sample mean. By analogy with the variance of the distribution, we will start with the sum of the squares of the distances of the sample values to the sample mean:

It might seem natural that we should average the sum of the squared deviations by dividing by n. However, another approach is to divide by whatever constant would give us an unbiased estimator of the distribution variance.

$Mathematical Exercise$ 1. Use basic algebra to show that

$Mathematical Exercise$ 2. Use the result in Exercise 1 and basic properties of expected value to show that

From Exercise 2, the random variable

is an unbiased estimator of the variance; it is called the sample variance. The square root of the sample variance is the sample standard deviation; note however, that the sample standard deviation is not usually an unbiased estimator of the distribution standard deviation. Moreover, when n is large, it makes little practical difference whether we divide by n or n – 1.

$Mathematical Exercise$ 3. Use the (strong) law of large numbers to show that with probability 1,

Recall that in the simulation of the sample mean experiment, a random sample of size n is chosen from a basic distribution. The mean and standard deviation of the sample are recorded in the first table and the sample values themselves in the second table. In the left graph, the horizontal red bar is centered at the sample mean and extends one sample standard deviation on either side.

As you run the experiment, you generate a sample of sample means. The mean and standard deviation of this sample are recorded in the third table, and the values themselves in the last table. In the right graph, the horizontal red bar is centered at the mean of the sample of sample mean and extends one standard deviation on either side.

Note again that the graph and tables on the left have the same mathematical structure as the graph and tables on the right.

4. In the simulation of the sample mean experiment, set the basic distribution as indicated below, and set the sample size to 10. Now run the experiment and make sure that you understand all of the information displayed in the tables and graphs. In particular, in the right graph, note the apparent convergence of the various statistics for the sample of sample means to the corresponding parameters of the distribution of the sample mean.

Bernoulli with p = 0.4.
Binomial with m = 5 and p = 0.7.
Poisson with mean 2.
Normal with mean 2 and standard deviation 1.
Gamma with parameters r = 1 and k = 2.

An Estimator of the Variance when the Mean is Known

In most realistic statistical problems, neither the mean nor the variance of the underlying distribution is known. However, even though it is a bit artificial, let us derive an estimator of the distribution variance assuming that the distribution mean is known. Let

$Mathematical Exercise$ 4. Show that W_n is the sample mean for a random sample of size n from the distribution of

Thus conclude that

W_n is an unbiased estimator of the distribution variance.
W_n converges to the distribution variance as n increases.