Covariance and Correlation |
Simulation of the bivariate uniform experiment
Suppose that X and Y are random variables for a random experiment. The covariance of X and Y is defined by
cov(X, Y) = E{[X - E(X)][Y - E(Y)]}
and (assuming the variances are positive) the correlation of X and Y is defined by
Note that the covariance and correlation always have the same sign (positive, negative, or 0). When the sign is positive, the variables are said to be positively correlated; when the sign is negative, the variables are said to be negatively correlated; and when the sign is 0, the variables are said to be uncorrelated.
In the bivariate uniform simulation, the random vector (X, Y) is uniformly distributed on one of three regions that can be selected from the list box:
When you run the simulation, the values of (X, Y) are plotted in the scatterplot on the left. The other two graphs show the marginal densities of X and Y in blue and the corresponding empirical densities in red. The table on the left gives the values of X and Y. The middle tables give the mean, standard deviation, sample mean and sample standard deviation of X and of Y. The table on the right gives the correlation and sample correlation between X and Y.
1. In the bivariate uniform experiment, select the square in the list box. Run the simulation 2000 times, updating every 10 runs. Note the value of the correlation and the shape of the cloud of points in the scatterplot.
2. In the bivariate uniform experiment, select the triangle in the list box. Run the simulation 2000 times, updating every 10 runs. Note the value of the correlation and the shape of the cloud of points in the scatterplot.
3. In the bivariate uniform experiment, select the circle in the list box. Run the simulation 2000 times, updating every 10 runs. Note the value of the correlation and the shape of the cloud of points in the scatterplot.
As we will see, the scatterplot for positively correlated variables shows a linear trend with positive slope, while the scatterplot for negatively correlated variables shows a linear trend with negative slope. For uncorrelated variables, the scatterplot should look like an amorphous blob of points with no discernible linear trend. Evidently, then, correlation and covariance measure the linear dependence between X and Y, in some sense. We will return to this idea later.
The following exercises give some basic properties of covariance. The main tool that you will need is the fact that expected value is a linear operation.
4. Show that cov(X, Y) = E(XY) - E(X)E(Y)
5. Show that cov(X, Y) = cov(Y, X).
6. Show that cov(X, X) = var(X).
7. Show that cov(aX + bY, Z) = a cov(X, Z) + b cov(Y, Z).
By Exercise 4, we see that X and Y are uncorrelated if and only if
E(XY) = E(X)E(Y)
In particular, if X and Y are independent, then they are uncorrelated. However, the converse fails with a passion, as the next exercises shows.
8. Suppose that X is uniformly distributed on the interval (1, 1) and Y = X2. Show that X and Y are uncorrelated even though Y depends functionally on X (the strongest form of dependence).
Exercise 8 gives some additional insight that correlation is measuring a specific type of dependence between X and Y.
9. Suppose that (X, Y) is uniformly distributed on the square
R = {(x, y): -6 < x < 6, -6 < y < 6}
Show that X and Y are independent and hence uncorrelated.
10. Suppose that (X, Y) is uniformly distributed on the circular region
R = {(x, y): x2 + y2 < 36}
Show that X and Y are dependent but still uncorrelated.
11. Suppose that (X, Y) is uniformly distributed on the triangular region
R = {(x, y): -6 < y < x < 6}
Show that cor(X, Y) = 1/2.
12. Repeat Exercises 1, 2, and 3 in light of Exercises 9, 10, and 11.
You will now show that the variance of a sum of variables is the sum of the pairwise covariances. Suppose that Xi , i in I is a collection of random variables for an experiment, where I is a finite index set.
13. Use Exercises 4-7 to show that
Exercise 13 can be very useful; it is used for example to compute the variance of the number of red balls in the ball and urn experiment.
14. Suppose that Xi , i in I are pairwise uncorrelated (this holds in particular if they are mutually independent). Show that
The covariance and correlation of two events A and B is defined to be the covariance and correlation of their indicator random variables 1A and 1B.
15. Suppose that A and B are events for an experiment. Show that A and B are positively correlated, negatively correlated, or independent, respectively (as defined in the section on conditional probability) if and only if the indicator variables of A and B are positively correlated, negatively correlated, or uncorrelated, as defined in this section.
The next sequence of exercises show that covariance indeed measures linear dependence between X and Y.
16. Use Exercises 2, 3, and 4 to show that for any t,
var(Y - tX) = var(Y) - 2t cov(X, Y) + t2 var(X) 0
17. Compute the discriminant of the quadratic in Exercise 16 and show that
4[cov(X, Y)]2 - 4 var(X) var(Y) 0
18. Show that the inequality in Exercise 17 is equivalent to
-sd(X) sd(Y) cov(X, Y) sd(X) sd(Y)
19. Show that the inequality in Exercise 18 is equivalent to
-1 cor(X, Y) 1
20. Show that equality holds in Exercises 17-19 if and only if there exists m such that
var(Y - mX) = 0
21. Show that the condition in Exercise 20 holds if and only if there exists b such that
P(Y = mx + b) = 1
22. In Exercise 21, show that m has the same sign as the correlation.
23. Suppose that (X, Y) has probability density function f given by
f(x, y) = (x + y) / 4 for 0 < x < y < 2.
Find cov(X, Y) and cor(X, Y).
24. Suppose that (X, Y) has probability density function f given by
f(x, y) = (x + y) / 3 for 0 < x < 1, 0 < y < 2
Find cov(X, Y) and cor(X, Y).
25. Suppose that (X, Y) has probability density function f given by
f(x, y) = (3 / 2)x2y for 0 < x < 1, 0 < y < 2
Find cov(X, Y) and cor(X, Y).
Expected Value |