Covariance and Correlation

Java Applet Simulation of the bivariate uniform experiment

Definition

Suppose that X and Y are random variables for a random experiment. The covariance of X and Y is defined by

cov(X, Y) = E{[X - E(X)][Y - E(Y)]}

and (assuming the variances are positive) the correlation of X and Y is defined by

Note that the covariance and correlation always have the same sign (positive, negative, or 0). When the sign is positive, the variables are said to be positively correlated; when the sign is negative, the variables are said to be negatively correlated; and when the sign is 0, the variables are said to be uncorrelated.

The Bivariate Experiment

In the bivariate uniform simulation, the random vector (X, Y) is uniformly distributed on one of three regions that can be selected from the list box:

the square R = {(x, y): -6 < x < 6, -6 < y < 6}
the triangle R = {(x, y): -6 < y < x < 6}
the circle R = {(x, y): x² + y² < 36}

When you run the simulation, the values of (X, Y) are plotted in the scatterplot on the left. The other two graphs show the marginal densities of X and Y in blue and the corresponding empirical densities in red. The table on the left gives the values of X and Y. The middle tables give the mean, standard deviation, sample mean and sample standard deviation of X and of Y. The table on the right gives the correlation and sample correlation between X and Y.

1. In the bivariate uniform experiment, select the square in the list box. Run the simulation 2000 times, updating every 10 runs. Note the value of the correlation and the shape of the cloud of points in the scatterplot.

2. In the bivariate uniform experiment, select the triangle in the list box. Run the simulation 2000 times, updating every 10 runs. Note the value of the correlation and the shape of the cloud of points in the scatterplot.

3. In the bivariate uniform experiment, select the circle in the list box. Run the simulation 2000 times, updating every 10 runs. Note the value of the correlation and the shape of the cloud of points in the scatterplot.

As we will see, the scatterplot for positively correlated variables shows a linear trend with positive slope, while the scatterplot for negatively correlated variables shows a linear trend with negative slope. For uncorrelated variables, the scatterplot should look like an amorphous blob of points with no discernible linear trend. Evidently, then, correlation and covariance measure the linear dependence between X and Y, in some sense. We will return to this idea later.

Properties

The following exercises give some basic properties of covariance. The main tool that you will need is the fact that expected value is a linear operation.

$Mathematical Exercise$ 4. Show that cov(X, Y) = E(XY) - E(X)E(Y)

$Mathematical Exercise$ 5. Show that cov(X, Y) = cov(Y, X).

$Mathematical Exercise$ 6. Show that cov(X, X) = var(X).

$Mathematical Exercise$ 7. Show that cov(aX + bY, Z) = a cov(X, Z) + b cov(Y, Z).

By Exercise 4, we see that X and Y are uncorrelated if and only if

E(XY) = E(X)E(Y)

In particular, if X and Y are independent, then they are uncorrelated. However, the converse fails with a passion, as the next exercises shows.

$Mathematical Exercise$ 8. Suppose that X is uniformly distributed on the interval (–1, 1) and Y = X². Show that X and Y are uncorrelated even though Y depends functionally on X (the strongest form of dependence).

Exercise 8 gives some additional insight that correlation is measuring a specific type of dependence between X and Y.

$Mathematical Exercise$ 9. Suppose that (X, Y) is uniformly distributed on the square

R = {(x, y): -6 < x < 6, -6 < y < 6}

Show that X and Y are independent and hence uncorrelated.

$Mathematical Exercise$ 10. Suppose that (X, Y) is uniformly distributed on the circular region

R = {(x, y): x² + y² < 36}

Show that X and Y are dependent but still uncorrelated.

$Mathematical Exercise$ 11. Suppose that (X, Y) is uniformly distributed on the triangular region

R = {(x, y): -6 < y < x < 6}

Show that cor(X, Y) = 1/2.

12. Repeat Exercises 1, 2, and 3 in light of Exercises 9, 10, and 11.

Variance of a Sum

You will now show that the variance of a sum of variables is the sum of the pairwise covariances. Suppose that X_i , i in I is a collection of random variables for an experiment, where I is a finite index set.

$Mathematical Exercise$ 13. Use Exercises 4-7 to show that

Exercise 13 can be very useful; it is used for example to compute the variance of the number of red balls in the ball and urn experiment.

$Mathematical Exercise$ 14. Suppose that X_i , i in I are pairwise uncorrelated (this holds in particular if they are mutually independent). Show that

Events

The covariance and correlation of two events A and B is defined to be the covariance and correlation of their indicator random variables 1_A and 1_B.

$Mathematical Exercise$ 15. Suppose that A and B are events for an experiment. Show that A and B are positively correlated, negatively correlated, or independent, respectively (as defined in the section on conditional probability) if and only if the indicator variables of A and B are positively correlated, negatively correlated, or uncorrelated, as defined in this section.

Linear Dependence

The next sequence of exercises show that covariance indeed measures linear dependence between X and Y.

$Mathematical Exercise$ 16. Use Exercises 2, 3, and 4 to show that for any t,

var(Y - tX) = var(Y) - 2t cov(X, Y) + t² var(X) 0

$Mathematical Exercise$ 17. Compute the discriminant of the quadratic in Exercise 16 and show that

4[cov(X, Y)]² - 4 var(X) var(Y) 0

$Mathematical Exercise$ 18. Show that the inequality in Exercise 17 is equivalent to

-sd(X) sd(Y) cov(X, Y) sd(X) sd(Y)

$Mathematical Exercise$ 19. Show that the inequality in Exercise 18 is equivalent to

-1 cor(X, Y) 1

$Mathematical Exercise$ 20. Show that equality holds in Exercises 17-19 if and only if there exists m such that

var(Y - mX) = 0

$Mathematical Exercise$ 21. Show that the condition in Exercise 20 holds if and only if there exists b such that

P(Y = mx + b) = 1

$Mathematical Exercise$ 22. In Exercise 21, show that m has the same sign as the correlation.

Additional Exercises

$Mathematical Exercise$ 23. Suppose that (X, Y) has probability density function f given by

f(x, y) = (x + y) / 4 for 0 < x < y < 2.

Find cov(X, Y) and cor(X, Y).

$Mathematical Exercise$ 24. Suppose that (X, Y) has probability density function f given by

f(x, y) = (x + y) / 3 for 0 < x < 1, 0 < y < 2

Find cov(X, Y) and cor(X, Y).

$Mathematical Exercise$ 25. Suppose that (X, Y) has probability density function f given by

f(x, y) = (3 / 2)x²y for 0 < x < 1, 0 < y < 2

Find cov(X, Y) and cor(X, Y).