Relative Frequency and Empirical Distributions

Home

For browsers with Java support, a simulation of the sample mean experiment appears here. Simulation of the sample mean experiment


Random samples and their sample means are ubiquitous in probability and statistics. In this section, we will see how sample means can be used to estimate probabilities, density functions, and distribution functions.

Relative Frequency

Suppose that we have a basic experiment with an event A of interest. Let p = P(A) denote the probability of A and let X denote the indicator variable of A that takes the value 1 if A occurs and takes the value 0 otherwise. Thus

P(X = 1) = p, P(X = 0) = 1 - p

The distribution of X is known as the Bernoulli distribution with parameter p.

Mathematical Exercise 1. Show that the mean and variance are given by

  1. E(X) = p
  2. var(X) = p(1 - p).

Now suppose that we repeat the basic experiment n times to form a random sample of size n from the distribution of X:

(X1, X2, ..., Xn)

Mathematical Exercise 2. Show that the sample mean is the relative frequency of the event; that is, the number of times the event occurred divided by the number of runs n. Thus, conclude that

  1. The relative frequency of the event is an unbiased estimator of the probability of the event
  2. The relative frequency of the event converges to the probability of the event as n increases.

Simulation Exercise 3. In the simulation of the sample mean experiment, make the basic distribution Bernoulli with parameter p = 0.5. We can pretend that the basic experiment is to toss a fair coin. The right graph shows the distribution of the relative frequency of heads in n tosses. Vary n with the scroll bar and note how this distribution changes. Repeat with p = 0.1, 0.3, 0.7, 0.9.

You can also see the convergence of the relative frequency of an event, as the experiment is repeated, to the probability of the event in

Empirical Density for a Discrete Variable

Suppose now that we have a discrete random variable X in our basic experiment that takes values in a (finite or countably infinite) set S. Let f denote the discrete density function of X so that

f(x) = P(X = x) for x in S

Now suppose that we repeat the basic experiment n times to form a random sample of size n from the distribution of X:

(X1, X2, ..., Xn)

The relative frequency function (or empirical density function) of the sample is given by

Mathematical Exercise 4. Show that as a function of x, the relative frequency function satisfies the mathematical properties of a discrete density function.

Mathematical Exercise 5. In the context of Exercise 4 show that the sample mean is the mean of the relative frequency function.

Of course, the values of the relative frequency function are random variables.

Mathematical Exercise 6. Show that for each x, fn(x) is the sample mean for a random sample of size n from the distribution of the indicator variable 1(X = x). Thus, conclude that

  1. fn(x) is an unbiased estimator of f(x).
  2. fn(x) converges to f(x) as n increases to infinity.

Recall that in the simulation of the sample mean experiment, a random sample of size n is chosen from a basic distribution and the sample mean is computed. If the basic distribution is discrete, the relative frequency function of the sample is shown in red in the left graph and the values are recorded in the first table. As you run the experiment, you generate a random sample of sample means. The relative frequency function of this sample is shown in red in the right graph. Thus, note that there is perfect symmetry between the two graphs: both show the density function of a distribution and a relative frequency function for a sample chosen from that distribution.

Simulation Exercise 7. In the simulation of the sample mean experiment, make the basic distribution Poisson with mean 2 and set the sample size to 5. Run the simulation and note the graphs of the relative frequency functions. In the right graph, note the apparent convergence of the relative frequency function to the density function.

Empirical Density for a Continuous Variable

Suppose now that we have a continuous random variable X in our basic experiment and that X takes values in an interval S. Let f denote the density function of X. Suppose that we partition S into small subintervals:

Aj, with length hj and midpoint xj for j = 1, 2, ...

If hj is small, then by definition of density,

Now suppose that we repeat the basic experiment n times to form a random sample of size n from the distribution of X.

(X1, X2, ..., Xn)

An approximate empirical density function based on this sample and the given partition can be defined by

Mathematical Exercise 8. Show that for each j, fn(xj) is the sample mean for a random sample of size n from the distribution of the random variable that takes the value 1/hj when X is in Aj, and 0 otherwise. Thus, conclude that when n is large and hj is small, fn(xj) should be close to f(xj).

In the simulation of the sample mean experiment, empirical density functions are graphed when the basic distribution is continuous. The empirical densities are based on a partition of the displayed interval of the distribution into 100 subintervals of equal size.

Simulation Exercise 9. In the simulation of the sample mean experiment, make the basic distribution exponential with parameter 1 and set the sample size to 5. Run the simulation with an update frequency of 100 and note the graphs of the empirical density functions. In the right graph, note the apparent convergence of the empirical density function to the true density function.

Simulation Exercise 10. In the simulation of the sample mean experiment, make the basic distribution normal with mean 0 and standard deviation 1 and set the sample size to 5. Run the simulation with an update frequency of 100 and note the graphs of the empirical density functions. In the right graph, note the apparent convergence of the empirical density function to the true density function.


The Sample Mean

PreviousNext