The Sample Mean and the Law of Large Numbers

Home

Java Applet Simulation of the sample mean experiment


The Sample Mean

Suppose that we have a basic experiment with a random variable X of interest. We will denote the mean and variance of X by

These are parameters of the distribution.

Now suppose we perform n independent replications of the basic experiment. This defines a new, compound experiment with a sequence of n independent random variables, each with the same distribution as X:

(X1, X2, ..., Xn)

Recall that in statistical terms, this sequence is a random sample of size n from the distribution of X. The sample mean is simply the average of the variables in the sample:

The sample mean is a real-valued function of the random sample and thus is a statistic. Like any statistic, the sample mean is itself a random variable with a distribution, mean, and variance of its own. Many times, the distribution mean is unknown and the sample mean is used as an estimator of the distribution mean.

The Simulation

In the simulation of the sample mean experiment, the density function of the basic distribution is shown in blue in the left graph, and the mean and standard deviation of this distribution are recorded in the first table. The blue horizontal bar in the left graph is centered at the mean of the distribution and extends one standard deviation on either side. The type of distribution can be chosen from among several parametric families: binomial, Poisson, normal, and gamma. For discrete variables, the values of the density function are also recorded in the first table.

The density function of the sample mean is shown in blue in the right graph, and the mean and variance of this random variable are recorded in the third table. The blue horizontal bar in the right graph is centered at the mean of the sample mean and extends one standard deviation on either side. The sample size n can be varied with the scroll bar. When the basic variable is discrete, the sample mean is also discrete and the values of the density function are also shown in the third table.

When you run the simulation, a sample of size n is chosen from the basic distribution and the values recorded in the second table. The horizontal red bar in the left graph is centered at the mean of the sample. The mean of the sample is recorded in the last table. As you run the sample mean experiment repeatedly, note that the sample means themselves form a random sample from the distribution of the sample mean. The horizontal red bar in right graph is centered at the mean of the sample of sample means.

The simulation displays other information that we will discuss in other sections.

Properties of the Sample Mean

Simulation Exercise 1. Start with the default settings in the simulation of the sample mean experiment. Vary the sample size with the scroll bar and note the location, spread, and shape of the distribution of the sample mean. Repeat this for each choice of the basic distribution. Try to formulate some general conjectures.

Mathematical Exercise 2. Use basic properties of expected value to show that

Exercise 1 shows that the sample mean is an unbiased estimator of the distribution mean. Therefore, the variance of the sample mean is the mean square error, when the sample mean is used as an estimator of the distribution mean.

Mathematical Exercise 2. Use basic properties of variance to show that

From Exercise 2, the variance of the sample mean is an increasing function of the distribution variance and a decreasing function of the sample size. Both of these make intuitive sense if we think of the sample mean as an estimator of the distribution mean.

Simulation Exercise 3. Start with the default settings in the simulation of the sample mean experiment. Vary the sample size with the scroll bar. Note that the mean of the sample mean stays the same, but the standard deviation of the sample mean decreases (as we now know, in inverse proportion to the sample size). Repeat this for each choice of the basic distribution.

The Law of Large Numbers

By Exercise 2, the variance of the sample mean converges to 0 as n increases to infinity. This suggests that in some sense, the sample mean should converge to the distribution mean.

Mathematical Exercise 4. Use Chebyshev's inequality to prove the weak law of large numbers:

The weak law of large numbers states that the sample mean converges to the mean of the distribution in probability.

Actually a much stronger result is true: the strong law of large numbers states that the sample mean converges to the distribution mean with probability 1:

The proof is beyond the scope of this text, so you may want to consult, for example, the book Probability and Measure.

Simulation Exercise 5. Start with the default settings in the simulation of the sample mean experiment. Increase the sample size with the scroll bar and note how the distribution of the sample mean begins to resemble a point mass distribution. Repeat this for each choice of the basic distribution.

Simulation Exercise 6. Start with the default settings in the simulation of the sample mean experiment and with a sample size of your choice. Run the simulation with an update frequency of 10. In the right graph, watch how the center of the red horizontal bar (the mean of the sample of sample means) appears to converge to the center of the blue horizontal bar (the mean of the distribution of the sample mean).


The Sample Mean

PreviousNext