Estimation

Home

Java Applet Simulation of the sample mean experiment


Probability and Statistics

In a sense, inferential statistics is the dual of probability. In probability, we try to predict the outcome of a random experiment, assuming knowledge of the underlying model and its parameters. In statistics, by contrast, we observe the outcome of a random experiment and try to infer information about the underlying model and its parameters.

The techniques of inferential statistics have been enormously successful; these techniques are widely used in just about every subject that deals with quantification—the natural sciences, the social sciences, law, and medicine. On the other hand, statistics has a legalistic quality and a great deal of terminology that can make the subject a bit intimidating at first.

Parameters

Let us discuss some of the basic ideas. The term parameter refers to a non-random quantity in a model that, once chosen, remains constant. Almost all probability models are actually parametric families of models; that is, they are models governed by one or more parameters that can be adjusted to fit the random process being modeled.

Mathematical Exercise 1. Identify the parameters in each of the following:

Statistics

Suppose that we have a random experiment whose outcome is a sequence of n random variables (i.e. a random vector):

(X1, X2, ..., Xn).

We run the experiment and observe the particular outcome:

(x1, x2, ..., xn).

These observed values are our data. Based on the data, we want to draw inferences about the underlying random experiment. We usually do this by computing statistics—functions of the data that we believe give useful information about the experiment or its parameters.

Technically, a statistic is an observable, real-valued function of the outcome variables of the random experiment:

W = h(X1, X2, ..., Xn)

The term observable means that the function should not contain any unknown parameters. After all, we need to be able to compute the value of the statistic from the data:

w = h(x1, x2, ..., xn)

The crucial point is that a statistic is a random variable and hence, like all random variables, it has a probability distribution, a mean, a variance, and so on. Ultimately, what we observe is a value of this random variable.

Estimators

Suppose that we are interested in an unknown parameter c of the model. A statistic W that is used to estimate the parameter is called, appropriately enough, an estimator of c. The error is difference between the estimator and the parameter:

W - c.

The expected value of the error is known as the bias:

Bias(W) = E(W - c)

Mathematical Exercise 1. Use basic properties of expected value to show that

Bias(W) = E(W) - c

Thus, the estimator is said to be unbiased if the bias is 0, equivalently if the expected value of the estimator is the parameter being estimated:

E(W) = c.

The quality of the estimator is usually measured by computing the mean square error:

MSE(W) = E[(W - c)2]

Mathematical Exercise 2. Use basic properties of expected value and variance to show that

MSE(W) = var(W) = Bias2(W)

In particular, if the estimator is unbiased, then the mean square error of W is simply the variance of W.

Random Samples

The most common and important special case of this statistical model occurs when we have a basic random experiment with a random variable X of interest. We create a compound experiment by performing n independent replications of the basic experiment. Thus, for the compound experiment, we have a sequence of n independent random variables, each with the same distribution as X:

(X1, X2, ..., Xn).

Indeed, we think of these variables as independent copies of X. The sequence is called a random sample of size n drawn from the distribution of X.

Examples

An estimator of pi can be derived from independent replications of Buffon's needle experiment. The Ball and Urn module contains a discussion of several estimation problems. In some cases, the estimators are based on random samples; in other cases, the data come from dependent variables.


The Sample Mean

PreviousNext