Variance |
Recall that the expected value or mean of a random variable X gives the center of the distribution of X. The variance of X is a measure of the spread of the distribution about the mean and is defined by
var(X) = E{[X - E(X)]2}
1. Suppose that X is a discrete random variable taking values in a subset S of R, with density function f. Use the change of variables theorem to show that
2. Suppose that X is a continuous. random variable taking values in a subset S of R with density function f. Use the change of variables theorem to show that
The standard deviation of X is the square root of the variance:
It also measures dispersion about the mean but is in the same units as the variable X.
The following exercises give some basic properties of variance, which in turn rely on basic properties of expected value:
3. Show that var(X) = E(X2) - [E(X)]2.
4. Show that var(X) 0
5. Show that var(X) = 0 if and only if P(X = c) = 1 for some constant c.
6. Show that if a and b are constants then var(aX + b) = a2var(X)
7. Suppose that I is an indicator variable with
P(I = 1) = p, P(I = 0) = 1 - p
8. Suppose that X is uniformly distributed on {1, 2, ..., n}. Show that
var(X) = (n2 - 1) / 12
9. Suppose that X is uniformly distributed on the interval (a, b) where a < b. Show that
var(X) = (b - a)2 / 12.
Note in particular that the variance depends only on the length of the interval, which is intuitively reasonable.
10. Suppose that X has the power distribution with parameter a > 1, which has probability density function
f(x) = (a - 1)x-a for x > 1
Show that if a > 3,
11. Suppose that X is a real-valued random variable. Define
Show that Z has mean 0 and variance 1.
The random variable Z in Exercise 11 is sometimes called the standard score associated with X. Since X and its mean and standard deviation all have the same units, the standard score Z is dimensionless. It measures the directed distance from X to its mean in terms of standard deviations.
12. Marilyn Vos Savant has an IQ of 228. Assuming that the distribution of IQ scores has mean 100 and standard deviation 15, find Marilyn's standard score.
Suppose that we want to approximate a random variable X with a single real number t, and we measure the quality of the approximation by the mean square error
MSE(t) = E[(X - t)2]
(recall that this is the second moment of X about t).
13. Show that
MSE(t) = E(X2) - 2t E(X) + t2.
14. Show that MSE(t) is minimized when t = E(X) and that the minimum value is var(X).
The root mean square error is the square root of the mean square error:
RMSE(t) = [MSE(t)]1/2.
15. Show that RMSE(t) is minimized when t = E(X) and that the minimum value is sd(X).
For more on this topic read the section on mean square error for frequency distributions. The section on mean absolute error for frequency distributions gives some insight on why mean square error is the best choice for measuring the error.
16. Use Markov's inequality to prove Chebyshev's inequality: for t > 0,
17. Establish the following equivalent version of Chebyshev's inequality: for k > 0,
18. Suppose that X is uniformly distributed on the interval (0, 6). Compute the true value and the Chebyshev bound for the probability that X is at least 2 standard deviations away from the mean.
19. Suppose that X has the power distribution with parameter a > 3:
f(x) = (a - 1)x-a for x > 1
Compute the true value and the Chebyshev bound for the probability that X is at least 3 standard deviations away from the mean.
The variance of a sum of random variables is best understood in terms of a related concept known as covariance.
Expected Value |