STEPS Statistics Glossary

Basic Definitions

Statistical Inference

Experiment

Experimental (or Sampling) Unit

Population

Sample

Parameter

Statistic

Sampling Distribution

Estimate

Estimator

Estimation


Main Contents page | Index of all entries



Statistical Inference

Statistical Inference makes use of information from a sample to draw conclusions (inferences) about the population from which the sample was taken.



Experiment

An experiment is any process or study which results in the collection of data, the outcome of which is unknown. In statistics, the term is usually restricted to situations in which the researcher has control over some of the conditions under which the experiment takes place.

Example
Before introducing a new drug treatment to reduce high blood pressure, the manufacturer carries out an experiment to compare the effectiveness of the new drug with that of one currently prescribed. Newly diagnosed subjects are recruited from a group of local general practices. Half of them are chosen at random to receive the new drug, the remainder receiving the present one. So, the researcher has control over the type of subject recruited and the way in which they are allocated to treatment.



Experimental (or Sampling) Unit

A unit is a person, animal, plant or thing which is actually studied by a researcher; the basic objects upon which the study or experiment is carried out. For example, a person; a monkey; a sample of soil; a pot of seedlings; a postcode area; a doctor's practice.



Population

A population is any entire collection of people, animals, plants or things from which we may collect data. It is the entire group we are interested in, which we wish to describe or draw conclusions about.

In order to make any generalisations about a population, a sample, that is meant to be representative of the population, is often studied. For each population there are many possible samples. A sample statistic gives information about a corresponding population parameter. For example, the sample mean for a set of data would give information about the overall population mean.

It is important that the investigator carefully and completely defines the population before collecting the sample, including a description of the members to be included.

Example
The population for a study of infant health might be all children born in the UK in the 1980's. The sample might be all babies born on 7th May in any of the years.



Sample

A sample is a group of units selected from a larger group (the population). By studying the sample it is hoped to draw valid conclusions about the larger group.

A sample is generally selected for study because the population is too large to study in its entirety. The sample should be representative of the general population. This is often best achieved by random sampling. Also, before collecting the sample, it is important that the researcher carefully and completely defines the population, including a description of the members to be included.

Example
The population for a study of infant health might be all children born in the UK in the 1980's. The sample might be all babies born on 7th May in any of the years.



Parameter

A parameter is a value, usually unknown (and which therefore has to be estimated), used to represent a certain population characteristic. For example, the population mean is a parameter that is often used to indicate the average value of a quantity.

Within a population, a parameter is a fixed value which does not vary. Each sample drawn from the population has its own value of any statistic that is used to estimate this parameter. For example, the mean of the data in a sample is used to give information about the overall mean in the population from which that sample was drawn.

Parameters are often assigned Greek letters (e.g. sigma), whereas statistics are assigned Roman letters (e.g. s).



Statistic

A statistic is a quantity that is calculated from a sample of data. It is used to give information about unknown values in the corresponding population. For example, the average of the data in a sample is used to give information about the overall average in the population from which that sample was drawn.

It is possible to draw more than one sample from the same population and the value of a statistic will in general vary from sample to sample. For example, the average value in a sample is a statistic. The average values in more than one sample, drawn from the same population, will not necessarily be equal.

Statistics are often assigned Roman letters (e.g. m and s), whereas the equivalent unknown values in the population (parameters ) are assigned Greek letters (e.g. µ and sigma).



Sampling Distribution

The sampling distribution describes probabilities associated with a statistic when a random sample is drawn from a population.

The sampling distribution is the probability distribution or probability density function of the statistic.

Derivation of the sampling distribution is the first step in calculating a confidence interval or carrying out a hypothesis test for a parameter.

Example
Suppose that x1, ......., xn are a simple random sample from a normally distributed population with expected value µ and known variance sigma^2. Then the sample mean is a statistic used to give information about the population parameter µ; x_bar is normally distributed with expected value µ and variance sigma^2/n.



Estimate

An estimate is an indication of the value of an unknown quantity based on observed data.

More formally, an estimate is the particular value of an estimator that is obtained from a particular sample of data and used to indicate the value of a parameter.

Example
Suppose the manager of a shop wanted to know the mean expenditure of customers in her shop in the last year. She could calculate the average expenditure of the hundreds (or perhaps thousands) of customers who bought goods in her shop, that is, the population mean. Instead she could use an estimate of this population mean by calculating the mean of a representative sample of customers. If this value was found to be £25, then £25 would be her estimate.



Estimator

An estimator is any quantity calculated from the sample data which is used to give information about an unknown quantity in the population. For example, the sample mean is an estimator of the population mean.

Estimators of population parameters are sometimes distinguished from the true value by using the symbol 'hat'. For example,
sigma = true population standard deviation
sigma_hat = estimated (from a sample) population standard deviation

Example

The usual estimator of the population mean is
mu_hat = X_bar = (sum of Xi)/n
where n is the size of the sample and X1, X2, X3, ......., Xn are the values of the sample.

If the value of the estimator in a particular sample is found to be 5, then 5 is the estimate of the population mean µ.



Estimation

Estimation is the process by which sample data are used to indicate the value of an unknown quantity in a population.

Results of estimation can be expressed as a single value, known as a point estimate, or a range of values, known as a confidence interval.




Top of page | Main Contents page