Covariance and Correlation |
![]() |
Simulation of the bivariate uniform experiment
Suppose that X and Y are random variables for a random experiment. The covariance of X and Y is defined by
cov(X, Y) = E{[X - E(X)][Y - E(Y)]}
and (assuming the variances are positive) the correlation of X and Y is defined by
Note that the covariance and correlation always have the same sign (positive, negative, or 0). When the sign is positive, the variables are said to be positively correlated; when the sign is negative, the variables are said to be negatively correlated; and when the sign is 0, the variables are said to be uncorrelated.
In the bivariate uniform simulation, the random vector (X, Y) is uniformly distributed on one of three regions that can be selected from the list box:
When you run the simulation, the values of (X, Y) are plotted in the scatterplot on the left. The other two graphs show the marginal densities of X and Y in blue and the corresponding empirical densities in red. The table on the left gives the values of X and Y. The middle tables give the mean, standard deviation, sample mean and sample standard deviation of X and of Y. The table on the right gives the correlation and sample correlation between X and Y.
1. In the bivariate uniform
experiment, select the square in the list box. Run the
simulation 2000 times, updating every 10 runs. Note the value of
the correlation and the shape of the cloud of points in the
scatterplot.
2. In the bivariate uniform
experiment, select the triangle in the list box. Run the
simulation 2000 times, updating every 10 runs. Note the value of
the correlation and the shape of the cloud of points in the
scatterplot.
3. In the bivariate uniform
experiment, select the circle in the list box. Run the
simulation 2000 times, updating every 10 runs. Note the value of
the correlation and the shape of the cloud of points in the
scatterplot.
As we will see, the scatterplot for positively correlated variables shows a linear trend with positive slope, while the scatterplot for negatively correlated variables shows a linear trend with negative slope. For uncorrelated variables, the scatterplot should look like an amorphous blob of points with no discernible linear trend. Evidently, then, correlation and covariance measure the linear dependence between X and Y, in some sense. We will return to this idea later.
The following exercises give some basic properties of covariance. The main tool that you will need is the fact that expected value is a linear operation.
4. Show that cov(X, Y)
= E(XY) - E(X)E(Y)
5. Show that cov(X, Y)
= cov(Y, X).
6. Show that cov(X, X)
= var(X).
7. Show that cov(aX + bY,
Z) = a cov(X, Z) + b
cov(Y, Z).
By Exercise 4, we see that X and Y are uncorrelated if and only if
E(XY) = E(X)E(Y)
In particular, if X and Y are independent, then they are uncorrelated. However, the converse fails with a passion, as the next exercises shows.
8. Suppose that X is
uniformly distributed on the interval (1, 1) and Y
= X2. Show that X and Y are
uncorrelated even though Y depends functionally on X (the
strongest form of dependence).
Exercise 8 gives some additional insight that correlation is measuring a specific type of dependence between X and Y.
9. Suppose that (X, Y)
is uniformly distributed on the square
R = {(x, y): -6 < x < 6, -6 < y < 6}
Show that X and Y are independent and hence uncorrelated.
10. Suppose that (X, Y)
is uniformly distributed on the circular region
R = {(x, y): x2 + y2 < 36}
Show that X and Y are dependent but still uncorrelated.
11. Suppose that (X, Y)
is uniformly distributed on the triangular region
R = {(x, y): -6 < y < x < 6}
Show that cor(X, Y) = 1/2.
12. Repeat Exercises 1, 2, and 3 in
light of Exercises 9, 10, and 11.
You will now show that the variance of a sum of variables is the sum of the pairwise covariances. Suppose that Xi , i in I is a collection of random variables for an experiment, where I is a finite index set.
13. Use Exercises 4-7 to show that
Exercise 13 can be very useful; it is used for example to compute the variance of the number of red balls in the ball and urn experiment.
14. Suppose that Xi
, i in I are pairwise uncorrelated (this holds
in particular if they are mutually independent). Show that
The covariance and correlation of two events A and B is defined to be the covariance and correlation of their indicator random variables 1A and 1B.
15. Suppose that A and B are
events for an experiment. Show that A and B are
positively correlated, negatively correlated, or independent,
respectively (as defined in the section on conditional probability) if and
only if the indicator variables of A and B are
positively correlated, negatively correlated, or uncorrelated, as
defined in this section.
The next sequence of exercises show that covariance indeed measures linear dependence between X and Y.
16. Use Exercises 2, 3, and 4 to show
that for any t,
var(Y - tX) = var(Y) - 2t cov(X, Y) + t2 var(X)
0
17. Compute the discriminant of the
quadratic in Exercise 16 and show that
4[cov(X, Y)]2 - 4 var(X) var(Y)
0
18. Show that the inequality in
Exercise 17 is equivalent to
-sd(X) sd(Y)
cov(X, Y)
sd(X) sd(Y)
19. Show that the inequality in
Exercise 18 is equivalent to
-1
cor(X, Y)
1
20. Show that equality holds in
Exercises 17-19 if and only if there exists m such that
var(Y - mX) = 0
21. Show that the condition in
Exercise 20 holds if and only if there exists b such that
P(Y = mx + b) = 1
22. In Exercise 21, show that m
has the same sign as the correlation.
23. Suppose that (X, Y)
has probability density function f given by
f(x, y) = (x + y) / 4 for 0 < x < y < 2.
Find cov(X, Y) and cor(X, Y).
24. Suppose that (X, Y)
has probability density function f given by
f(x, y) = (x + y) / 3 for 0 < x < 1, 0 < y < 2
Find cov(X, Y) and cor(X, Y).
25. Suppose that (X, Y)
has probability density function f given by
f(x, y) = (3 / 2)x2y for 0 < x < 1, 0 < y < 2
Find cov(X, Y) and cor(X, Y).
Expected Value |
![]() ![]() |