0% found this document useful (0 votes)
13 views97 pages

Chapter 4

Chapter 4 of the document focuses on Bivariate Distributions, covering concepts such as joint probability mass functions, joint probability density functions, and the relationships between two variables measured on the same subjects. It includes definitions, examples, and results related to joint probabilities, including marginal and conditional probabilities, independence, covariance, and the bivariate normal distribution. The chapter aims to provide a comprehensive understanding of how to analyze and interpret bivariate data.

Uploaded by

huangde1212
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views97 pages

Chapter 4

Chapter 4 of the document focuses on Bivariate Distributions, covering concepts such as joint probability mass functions, joint probability density functions, and the relationships between two variables measured on the same subjects. It includes definitions, examples, and results related to joint probabilities, including marginal and conditional probabilities, independence, covariance, and the bivariate normal distribution. The chapter aims to provide a comprehensive understanding of how to analyze and interpret bivariate data.

Uploaded by

huangde1212
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 97

School of Mathematics and Statistics

UNSW Sydney

Introduction to Probability and Stochastic Processes

OPEN LEARNING
CHAPTER 4
Bivariate Distributions

2 / 93
Outline:
4.1 Joint Probability Mass Functions

4.2 Joint Probability Density Functions

4.3 Other Results for Joint Probability Mass Functions and Joint Probability Density Functions

4.4 Marginal Probability Mass Functions and Marginal Probability Density Functions

4.5 Conditional Probability Mass Functions and Conditional Density Functions

4.6 Conditional Expected Value and Conditional Variance

4.7 Independent Random Variables

4.8 Covariance and Correlation

4.9 Bivariate Normal Distribution


① Visualisation of the Bivariate Normal Probability Density Function
4.10 Extension to n Random Variables

4.11 Supplementary Material

3 / 93
Observations are often taken in pairs, leading to bivariate observations
(X, Y ), i.e. observations of two variables measured on the same subjects.

For example, (height and weight) can be measured on people; other


possibilities are (age and blood pressure), (gender and promotion) for
employees, and (sales and price) for supermarket products, etc.

Often, we are interested in exploring the nature of the relationship between


two variables that have been measured on the same set of subjects.

In this chapter, we develop a notation for the study of relationship between


two variables, and explore some key concepts.

4 / 93
4.1 Joint Probability Mass Functions

5 / 93
Definition
If X and Y are discrete random variables, then the joint probability mass
function of X and Y is

P (X = x, Y = y) = P ((X = x) ∩ (Y = y)),

the probability that X = x and Y = y.

6 / 93
Why Study Joint Probabilities?

Recall that if two events A and B are dependent, then, P (A ∩ B) ̸= P (A) · P (B).

In the context of two discrete random variables X and Y being dependent, we have

P (X = x, Y = y) ̸= P (X = x) · P (Y = y).

So when we want to calculate P (X = x, Y = y), or any joint probability involving


both X and Y , we cannot find it using the probability mass functions of X and Y ,
which give us P (X = x) and P (Y = y).

We instead need to know the joint probability mass function

P (X = x, Y = y).

7 / 93
Example
Suppose that X and Y have joint probability mass function,
P (X = x, Y = y) as
y
-1 0 1
0 1/8 1/4 1/8
x 1 1/8 1/16 1/16
2 1/16 1/16 1/8

Here P (X = 0, Y = 0) = 1/4, etc.

1 1 1 1 1 1 1 1 1
P
Observe that all x, y P (X = x, Y = y) = 8 + 4 + 8 + 8 + 16 + 16 + 16 + 16 + 8 = 1.

The sum of all the entries of P (X = x, Y = y) = 1, so it is a legitimate probability mass


function.

8 / 93
Example
Suppose that X and Y have joint probability mass function,
P (X = x, Y = y) as

y
-1 0 1 P (X = x)
0 1/8 1/4 1/8 1/2
x 1 1/8 1/16 1/16 1/4
2 1/16 1/16 1/8 1/4
P (Y = y) 5/16 3/8 5/16 1
Note
by adding along the columns say for x = 0, we obtain the probability mass function
P (X = 0) = 1/8 + 1/4 + 1/8 = 1/2, and
by adding along the rows of y = −1, we obtain the probability mass function of
P (Y = −1) = 1/8 + 1/8 + 1/16 = 5/16.

9 / 93
Example
Suppose that X and Y have joint probability mass function, P (X = x, Y = y) as

y
-1 0 1 P (X = x)
0 1/8 1/4 1/8 1/2
x 1 1/8 1/16 1/16 1/4
2 1/16 1/16 1/8 1/4
P (Y = y) 5/16 3/8 5/16 1

Note that P (X = 0, Y = −1) = 1/8. However, P (X = 0) = 1/2 and P (Y = −1) = 5/16.

So we observed that

P (X = 0, Y = −1) = 1/8 ̸= P (X = 0) · P (Y = −1) = 5/32.

The random variables of X and Y are not independent; we must consider the joint
probability mass function of X and Y .
10 / 93
Example
Let X be the number of successes in the first of two Bernoulli trials, each with a
probability of success, p, and let Y be the total number of successes in the first two
Bernoulli trials. Then

P (X = 1, Y = 1) = p (1 − p) ̸= P (X = 1) P (Y = 1) = p × 2p(1 − p) = 2p2 (1 − p).


y
0 1 2 P (X = x)

0 (1 − p)2 (1 − p) p 0 1−p
x
1 0 p (1 − p) p2 p

P (Y = y) (1 − p)2 2p(1 − p) p2 1

11 / 93
Explanation of the previous slide
Think of the following; S = {HH, HT, T H, T T } and head is a success. That is, P (H) = p.

When X = 0 =⇒ T H, T T and P (T H, T T ) = (1 − p)p + (1 − p)2 = 1 − p

When X = 1 =⇒ HT, HH and P (HT, HH) = p(1 − p) + p2 = p

When Y = 0 =⇒ T T and P (T T ) = (1 − p)2

When Y = 1 =⇒ HT, T H and P (HT, T H) = 2p(1 − p)

When Y = 2 =⇒ HH and P (HH) = p2

When X = 0, Y = 0 =⇒ T H, T T ∩ T T = T T and P (T T ) = (1 − p)2

When X = 0, Y = 1 =⇒ T H, T T ∩ HT, T H = T T and P (T H) = (1 − p)p

When X = 0, Y = 2 =⇒ T H, T T ∩ HH and P (∅) = 0

When X = 1, Y = 0 =⇒ HT, HH ∩ T T and P (∅) = 0

When X = 1, Y = 1 =⇒ HT, HH ∩ HT, T H and P (HT ) = p(1 − p)

When X = 1, Y = 2 =⇒ HT, HH ∩ HH and P (HH) = p2


12 / 93
4.2 Joint Probability Density
Functions

13 / 93
Definition
The joint probability density function of a continuous random variables
X and Y is a bivariate function fXY with the probability
Z Z
fX,Y (x, y) dx dy = P ( (X, Y ) ∈ A )
A

for any (measurable) subset of A of R2 .

For two continuous random variables, X and Y , probabilities have the following
geometrical interpretation:
fX,Y is a surface over the plane R2
probabilities over subsets A ⊂ R2 correspond to the volume under fX,Y over A.

14 / 93
For example, if
3 2
fX,Y (x, y) = (x + xy), for x, y ∈ (0, 1),
4
then, the joint probability density function looks like this:

15 / 93
Example
Suppose (X, Y ) have the joint probability density function
3 2
fX,Y (x, y) = (x + xy), for x, y ∈ (0, 1).
4
Find P(X < 12 , Y < 23 ).

16 / 93
Example
First, what region of (X, Y ) plane do we want to integrate over?

The red shaded area in this graph corresponds to P (X < 21 , Y < 23 ):


Y
1
X < 1/2, Y < 2/3

0.5

0 X
0 0.2 0.4 0.6 0.8 1

We want to integrate over this redshaded region.

17 / 93
Example

Solution:

1/2 2/3
1 2 3 2
Z Z
P (X < ,Y < ) = (x + x y) dy dx
2 3 0 0 4
3 1/2 h 2
Z i2/3
= (x y + x y 2 /2) dx
4 0 0

3 1/2 h 2
Z i
= (x (2/3) + x (2/3)2 /2) dx
4 0
1 1/2 h 2 x i
Z
= x + dx
2 0 3
1 h x3 x2 i1/2
= +
2 3 6 0
1h 1 1 i
= +
2 23 × 3 22 × 6
1
= .
24 18 / 93
Example
Suppose Suppose (X, Y ) have the joint probability density function

fX,Y (x, y) = 2 (x + y), 0<x<y 0 < y < 1.

What is P(X < 13 , Y < 21 )?

19 / 93
Example
The red shaded area in this graph corresponds to P (X < 31 , Y < 12 ) and X < Y :
Y
1

0.5

0 X
0 0.2 0.4 0.6 0.8 1
This question is a little tricky because the limits for x are a function of y. This means that
the domain of this probability density function has a triangular shape, the area over which
we need to integrate under fX,Y (x, y) is a trapezoidal, as shown in the dark grey area in
the figure.

20 / 93
Example
Y
1

0.5

0 X
0 0.2 0.4 0.6 0.8 1

21 / 93
Example

Solution:
If we use the horizontal strips, then the integral needs to be broken up into two terms, one corresponding to the triangle
and the rectangle components of the trapezium.

Z 1/3 Z y
1 1
P (X < ,Y < ) = 2 (x + y) dx dy
3 2 0 0
Z 1/2 Z 1/3
+ 2 (x + y) dx dy
1/3 0
11
= .
108

If we use the vertical strips, then the integral can be done in one piece as follows:

Z 1/3 Z 1/2
1 1
P (X < ,Y < ) = 2 (x + y) dy dx
3 2 0 x
11
= .
108

22 / 93
4.3 Other Results for Joint Probability Mass Functions
and Joint Probability Density Functions

23 / 93
Many of the definitions and results for random variables, considered in
Chapter 2, generalise directly to the bivariate case.

Essentially, all the changes from Chapter 2 to this chapter are that instead of
doing a single summation or integral, we now do a double summation or
double integral because there are two variables under consideration.

24 / 93
Results
1 If X and Y are discrete random variables, then
XX
P (X = x, Y = y) = 1.
all x all y

2 If X and Y are continuous random variables, then


Z ∞Z ∞
fX,Y (x, y) dx dy = 1.
−∞ −∞

25 / 93
Definition
The joint cumulative distribution of X and Y is
FX,Y (x, y) = P (X ≤ x, Y ≤ y)
P P
 u≤x v≤y P (X = u, Y = v) (X, Y ) is discrete

=
R y R x

f
−∞ −∞ X,Y
(u, v) du dv (X, Y ) is continuous.

26 / 93
Example
Suppose X and Y has joint probability density function

fX,Y (x, y) = x + y, 0 < x < 1, 0 < y < 1.

Find FX,Y (x, y) .

27 / 93
Example
First, what region of (X, Y ) plane do we want to integrate over? The
red-shaded region corresponds to fX,Y (x, y) > 0.

Y
1

0.5

0 X
0 0.5 1

28 / 93
Example

Solution:
For 0 < x < 1 and 0 < y < 1, we have
Z y Z x
FX,Y (x, y) = fX,Y (u, v) du dv
−∞ −∞
Z y Z x
= (u + v) du dv
0 0
Z y h u2 ix
= + v u dv
0 2 0
Z y h x2 i
= + v x dv
0 2
h x2 v v 2 x iy
= +
2 2 0
xy
= (x + y).
2

13
For example, P (X < 1/2, Y < 4/5) = FX,Y (1/2, 4/5) = 50
.

29 / 93
Results
If g is any function of X and Y , we have the expectation of g(X, Y ) is
P P
 all x all y g(x, y) P (X = x, Y = y) discrete

E[ g(X, Y ) ] =
R ∞ R ∞

−∞ −∞
g(x, y) fX,Y (x, y) dx dy continuous

30 / 93
Example
Suppose we are given the following for the joint probability mass function of
X and Y
y
0 1 2
0 0.1 0.2 0.2
x
1 0.2 0.2 a

where a is a constant.
1 Find a if P (X = x, Y = y) is a joint probability mass function.
2 Find FX,Y (1, 1)
3 Find E(XY ).

31 / 93
Example

Solution:
P P
1 Here we find a by all x all y P (X = x, Y = y) = 1. That is, this gives a = 0.1 .
2

FX,Y (1, 1) = P (X ≤ 1, Y ≤ 1)
= P (X = 0, Y = 0) + P (X = 0, Y = 1)
+P (X = 1, Y = 0) + P (X = 1, Y = 1)
= 0.1 + 0.2 + 0.2 + 0.2 = 0.7

X X
E(XY ) = x y P (X = x, Y = y)
all x all y
= 0 × 0 × 0.1 + 0 × 1 × 0.2 + 0 × 2 × 0.2
+1 × 0 × 0.2 + 1 × 1 × 0.2 + 1 × 2 × 0.1 = 0.4

32 / 93
4.4 Marginal Probability Mass Functions
and Marginal Probability Density Functions

33 / 93
Result
If X and Y are discrete random variables, then P (X = x) and P (Y = y) can
be calculated from P (X = x, Y = y) as follows
X
P (X = x) = P (X = x, Y = y),
all y
X
P (Y = y) = P (X = x, Y = y).
all x

P (X = x) is sometimes referred to as the marginal probability mass


function of X or simply the marginal probability function of X.

34 / 93
Example
Suppose we are given the joint probability mass function of X and Y to be
y
0 1 2 P (X = x)
x 0 0.1 0.2 0.2 0.5
1 0.2 0.2 0.1 0.5
P (Y = y) 0.3 0.4 0.3 1

Here, we obtain, for example,

P (X = 0) = P (X = 0, Y = 0) + P (X = 0, Y = 1) + P (X = 0, Y = 2)
= 0.1 + 0.2 + 0.2 = 0.50

P
In general, P (X = x) = all y P (X = x, Y = y).

35 / 93
Result
If X and Y are continuous random variables, then fX (x) and fY (y) can
calculated from fX,Y (x, y) as follows
Z ∞
fX (x) = fX,Y (x, y) dy.
−∞
Z ∞
fY (y) = fX,Y (x, y) dx.
−∞

Here, fX (x) is sometimes referred to as the marginal probability density


function of X.

36 / 93
Example
Suppose the joint probability density function of X and Y is given by

fX,Y (x, y) = x + y, 0 < x < 1 0 < y < 1.

Find fX (x) and fY (y).

37 / 93
Example
Suppose the joint probability density function of X and Y is given by

fX,Y (x, y) = x + y, 0 < x < 1 0 < y < 1.

Find fX (x) and fY (y).


Solution:
By the results on continuous marginals, , we have
Z ∞
fX (x) = fX,Y (x, y) dy
−∞
Z 1
= (x + y) dy
0
h y 2 i1
= xy + = x + 1/2, 0 < x < 1.
2 0
R∞
Similarly, fY (y) = −∞ fX,Y (x, y) dx = y + 1/2, 0 < y < 1.

37 / 93
Example
Suppose the joint probability density function of X and Y is given by

fX,Y (x, y) = 2(x + y), 0<x<y 0 < y < 1.

Find fX (x) and fY (y).

38 / 93
Example
Suppose the joint probability density function of X and Y is given by

fX,Y (x, y) = 2(x + y), 0<x<y 0 < y < 1.

Find fX (x) and fY (y).


Solution:
By the results on continuous marginals, , we have

Z ∞
fX (x) = fX,Y (x, y) dy
−∞
Z 1
= 2 (x + y) dy
x
2 1
h i
= 2xy + y
x
2
= 2x + 1 − 3x , 0 < x < 1.

R∞ Ry
Similarly, fY (y) = −∞ fX,Y (x, y) dx = 0 2 (x + y) dx = 3y 2 , 0 < y < 1.

38 / 93
4.5 Conditional Probability Mass Functions and
Conditional Probability Density Functions

39 / 93
Definition
If X and Y are discrete random variables, the conditional probability
mass function of X given Y = y is

P (X = x, Y = y)
P (X = x Y = y) = .
P (Y = y)
P (X=x,Y =y)
Similarly, P (Y = y X = x) = P (X=x)
.

This is simply an application of the definition of conditional probability


discussed in Chapter 1.

40 / 93
Definition
If X and Y are continuous random variables, the conditional probability
density function of X given Y = y is

fX,Y (x, y)
fX|Y (x Y = y) = .
fY (y)
fX,Y (x,y)
Similarly, fY |X (y X = x) = fX (x)
.

We often write fY |X (y|x) as shorthand notation for fY |X (y X = x).

41 / 93
Example
Suppose we are given the joint probability mass function of X and Y to be
y
0 1 2 P (X = x)
x 0 0.1 0.2 0.2 0.5
1 0.2 0.2 0.1 0.5
P (Y = y) 0.3 0.4 0.3 1

Find P(X = x | Y = 2) and P(Y = y | X = 0)

42 / 93
Example

Solution:
First, we need to find P (X = x | Y = 2), for x = 0, 1:

P (X = 0, Y = 2) 0.20 2
P (X = 0 | Y = 2) = = =
P (Y = 2) 0.30 3

P (X = 1, Y = 2) 0.10 1
P (X = 1 | Y = 2) = = =
P (Y = 2) 0.30 3

In tabular form,

x 0 1
2 1
P (X = x | Y = 2) 3 3

Similarly, for P (Y = y | X = 0), we have

y 0 1 2
1 2 2
P (Y = y | X = 0) 5 5 5
.

43 / 93
Example
Suppose we are given the joint probability density function of X and Y to be

fX,Y (x, y) = x + y, 0 < x < 1, 0 < y < 1.

Find fX|Y (x|y) and fY|X (y|x).

44 / 93
Example

Solution:
First, we need to find the marginals of X and Y :
Z ∞ Z 1
1
fX (x) = fX,Y (x, y) dy = (x + y) dy = x + , 0 < x < 1,
−∞ 0 2
Z ∞ Z 1
1
fY (y) = fX,Y (x, y) dx = (x + y) dx = y + , 0 < y < 1.
−∞ 0 2

The conditional probability density functions are

fX,Y (x, y) x+y


fX|Y (x|y) = = , 0 < x < 1,
fY (y) y + 12
fX,Y (x, y) x+y
fY |X (y|x) = = , 0 < y < 1.
fX (x) x + 12

45 / 93
Let X and Y be continuous random variables.

For a given value of x, fY |X (y|x) is an ordinary probability density function


and has the usual properties such as

Result
If X and Y are continuous random variables, then
Z b
P (a ≤ Y ≤ b X = x) = fY |X (y|x) dy.
a

Note that Y |X = x is a continuous random variable with density


function fY |X (y|x).

46 / 93
Let X and Y be discrete random variables.

Result
If X and Y are discrete random variables, then
X
P (Y ∈ A X = x) = P (Y = y|X = x).
y∈A

47 / 93
4.6 Conditional Expected Value and Conditional
Variance

48 / 93
Definition
The conditional expected value of X given Y = y is
P
 all x x P (X = x Y = y)
 (discrete)
E(X Y = y) =
R ∞

−∞
x fX|Y (x y) dx (continuous)

Definition
The conditional expected value of Y given X = x is
P
 all y y P (Y = y X = x)
 (discrete)
E(Y X = x) =
R ∞

−∞
y fY |X (y x) dy (continuous)

Note that this is an application of the definition of E(X) from Chapter 2 .

49 / 93
Example
Recall the example of conditional probability mass function

x 0 1
2 1
P (X = x | Y = 2) 3 3

y 0 1 2
1 2 2
P (Y = y | X = 0) 5 5 5.

and joint probability mass function of X and Y to be


y
0 1 2 P (X = x)
x 0 0.1 0.2 0.2 0.5
1 0.2 0.2 0.1 0.5
P (Y = y) 0.3 0.4 0.3 1

Find E(X|Y = 2) and E(Y|X = 0).

50 / 93
Example

Solution:
X
E(X Y = 2) = x P (X = x|Y = 2)
all x
= 0 × 2/3 + 1 × 1/3 = 1/3

X
E(Y X = 0) = y P (Y = y|X = 0)
all y
= 0 × 1/5 + 1 × 2/5 + 2 × 2/5 = 6/5.

51 / 93
Example
Recall the example of conditional probability density function
x+y
fX|Y (x|y) = , 0 < x < 1,
y + 12
x+y
fY |X (y|x) = , 0 < y < 1.
x + 12

Find E(X|Y) and E(Y|X).

52 / 93
Example

Solution:
Z ∞
E(X|Y = y) = x fX|Y (x|y) dx
−∞
Z 1  
x+y
= x dx
0 y + 1/2
Z 1
1
= (x2 + x y) dx
y + 1/2 0
1 h x3 x2 y i1
= +
y + 1/2 3 2 0
1 1 y
= + , 0 < y < 1.
y + 1/2 3 2

Note that E(X|Y ) is a random variable.


 
Similarly, E(Y |X = x) = 1
x+ 21
1
3 + x
2 , 0 < x < 1.
53 / 93
Definition
The conditional variance of X given Y = y is
h i2
V ar(X|Y = y) = E(X 2 Y = y) − E(X Y = y) ,
where
P
2
 all x x P (X = x|Y = y) (discrete)

E(X 2 Y = y) =
R ∞

−∞
x2 fX|Y (x|y) dx (continuous)

Note that this definition is an application of the definitions of V ar(X) from Chapter 2.

54 / 93
Example
Recall the example of conditional probability mass function
x 0 1
2 1
P (X = x | Y = 2) 3 3

y 0 1 2
1 2 2
P (Y = y | X = 0) 5 5 5
.

Find Var(X|Y = 2) for this example.

55 / 93
Example

Solution:
h i2
V ar(X|Y = 2) = E(X 2 Y = 2) − E(X Y = 2) .

Here X 2 1 1
E(X 2 |Y = 2) = x2 P (X = x | Y = 2) = 02 × +1× = .
3 3 3
all x

1
Earlier, we saw that E(X|Y = 2) = 3 .

So
h i2 1  1 2 2
V ar(X|Y = 2) = E(X 2 Y = 2) − E(X Y = 2) = − = .
3 3 9

56 / 93
Example
Recall the example of conditional probability density function
x+y
fX|Y (x|y) = , 0<x<1
y + 12
and conditional expectation for this example

Find Var(X|Y) for this example.

57 / 93
Example

Solution:
h i2
V ar(X|Y = y) = E(X 2 Y = y) − E(X Y = y) .

Here
Z ∞ Z 1 x+y
E(X 2 |Y = y) = x2 fX|Y (x|y) dx = x2 1
dx
−∞ 0 y+ 2
" #1
Z 1  x3 + x2 y  1 x4 x3 y
= dx = +
0 y + 12 y+ 1
2
4 3
0
" #
1 1 y 3 + 4y
= 1
+ = , 0 < y < 1.
y+ 2
4 3 6 (2y + 1)

2+3y
Earlier, we saw that E(X|Y = y) = 3 (2y+1)
, 0<y<1 .
" #2
h i2
3+4y 2+3y
So V ar(X|Y = y) = E(X 2 Y = y) − E(X Y = y) = 6 (2y+1)
− 3 (1+2y)
, 0 < y < 1.

Note that V ar(X|Y ) is a random variable.


58 / 93
4.7 Independent Random Variables

59 / 93
Definition
The random variables X and Y are independent if and only if

fX,Y (x, y) = fX (x) fY (y) for all x, y.

60 / 93
Result
The random variables X and Y are independent if and only if

fY |X (y|x) = fY (y)

or
fX|Y (y|x) = fX (x).

This result allows the interpretation that conforms with the every day
meaning of the word independence.

If X and Y are independent, then the probability structure of Y is unaffected


by the knowledge that X takes on some value x (and vice versa).

61 / 93
Result
If random variables X and Y are independent, then

FX,Y (x, y) = FX (x) FY (y).

62 / 93
Example
Suppose the joint probability mass function of X and Y is given by
y
-1 0 1 P (X = x)
0 0.01 0.02 0.07 0.1
x 1 0.04 0.13 0.33 0.50
2 0.05 0.05 0.30 0.40
P (Y = y) 0.10 0.20 0.70 1

Are X and Y independent?

63 / 93
Example

Solution:
To show that X and Y are not independent, we need only to find a single
case of x and y such that P (X = x, Y = y) ̸= P (X = x) P (Y = y).

Consider the case when x = 1 and y = 1. We have P (X = 1, Y = 1) = 0.33,


while P (X = 1) = 0.50 and P (Y = 1) = 0.70. So we see that

P (X = 1, Y = 1) = 0.33 ̸= P (X = 1) P (Y = 1) = 0.35.

Hence, X and Y are not independent.

64 / 93
Example
Suppose the joint probability mass function of X and Y is given by
P (X = x, Y = y) = p2 (1 − p)x+y , x = 0, 1, 2, . . . , y = 0, 1, 2, . . . ,
0 < p < 1.

Are X and Y independent?

65 / 93
Example

Solution:
First, we will find the marginals of X and Y .


2 x+y
X X
P (X = x) = P (X = x, Y = y) = p (1 − p)
all y y=0

!
2 x
X y 2 x 1
= p (1 − p) (1 − p) = p (1 − p)
y=0 1 − (1 − p)
x
= p (1 − p) , x = 0, 1, 2, . . . , 0 < p < 1.

Similarly, P (Y = y) = p (1 − p)y , y = 0, 1, 2, . . . , 0 < p < 1.

Therefore, for all x, y ≥ 0, we have

x y 2 x+y
P (X = x) P (Y = y) = p (1 − p) × p (1 − p) = p (1 − p) = P (X = x, Y = y).

Hence, X and Y are independent.

66 / 93
Example
Suppose the joint probability density function of X and Y is given by

fX,Y (x, y) = 6 x y 2 , 0 < x < 1, 0 < y < 1.

Are X and Y independent?

67 / 93
Example

Solution:
First, we find the marginals of X and Y . That is,
Z ∞ Z 1
fX (x) = fX,Y (x, y) dy = 6 x y 2 dy = 2x, 0 < x < 1,
−∞ 0
Z ∞ Z 1
fY (y) = fX,Y (x, y) dx = 6 x y 2 dx = 3 y 2 , 0 < y < 1.
−∞ 0

For all x ∈ (0, 1) and y ∈ (0, 1), we have

fX (x) fY (y) = 2x · 3y 2 = 6 x y 2 = fX,Y (x, y).

Therefore X and Y are independent.


68 / 93
Example
Suppose the joint probability density function of X and Y is given by

fX,Y (x, y) = 10 x y 2 , 0 < x < y, 0 < y < 1.

Are X and Y independent?

69 / 93
Example

Solution:
First, we find the marginals of X and Y .That is,
Z ∞ Z 1 10x (1 − x3 )
fX (x) = fX,Y (x, y) dy = 10 x y 2 dy = , 0 < x < 1,
−∞ x 3

and Z ∞ Z y
fY (y) = fX,Y (x, y) dx = 10 x y 2 dx = 5y 4 , 0 < y < 1.
−∞ 0

Clearly
10x (1 − x3 )
fX (x) fY (y) = · 5y 4 ̸= fX,Y (x, y) = 10 x y 2 ,
3
for all 0 < x < y < 1.

Therefore X and Y are not independent.

70 / 93
Results
If X and Y are independent random variables, then

E(XY ) = E(X) E(Y ),

and more generally, for any functions g(X) and h(Y ),

E(g(X) h(Y )) = E(g(X)) E(h(Y )).

71 / 93
4.8 Covariance and Correlation

72 / 93
Definition
The covariance of X and Y is
h i
Cov(X, Y ) = E (X − µX ) (Y − µY ) ,

where µX = E(X) and µY = E(Y ).

Cov(X, Y ) measures not only how X and Y vary about their means but also
how they vary together linearly.

Cov(X, Y) > 0 if X and Y are positively associated, i.e., if X is likely to be


large when Y is large; and X is likely to be small when Y is small.

If X and Y are negatively associate, Cov(X, Y) < 0.

73 / 93
Results
1 Cov(X, X) = V ar(X).
2 Cov(X, Y ) = E(XY ) − µX µY , where E(X) = µX and E(Y ) = µY .

74 / 93
Results
1 Cov(X, X) = V ar(X).
2 Cov(X, Y ) = E(XY ) − µX µY , where E(X) = µX and E(Y ) = µY .

Proof.
h i h i
1 By definition, Cov(X, X) = E (X − µX )(X − µX ) = E (X − µX )2 = V ar(X).
h i h i
2 By definition, Cov(X, Y ) = E (X − µX )(Y − µY ) = E (X Y ) − X µY − µX Y + µX µY =

E ( X Y ) − µY E(X) − µX E(Y ) + µX µY = E (X Y ) − µX µY .

74 / 93
Result
If X and Y are independent, then Cov(X, Y ) = 0.

75 / 93
Result
If X and Y are independent, then Cov(X, Y ) = 0.
Proof.
We will consider the continuous case only.
Z ∞ Z ∞
E(XY ) = x y fX,Y (x, y) dx dy
−∞ −∞
Z ∞ Z ∞
= x y fX (x) fY (y) dx dy
−∞ −∞
(since X and Y are independent)
Z ∞ Z ∞
= x fX (x) dx y fY (y) dy
−∞ −∞
= E(X) E(Y ) = µX µY .

Independence implies covariance is zero. However, Cov(X, Y ) = 0, DOES


NOT imply that X and Y are independent!
75 / 93
Results
1 V ar(X + Y ) = V ar(X) + V ar(Y ) + 2Cov(X, Y )

2 V ar(X + Y ) = V ar(X) + V ar(Y ) if X and Y are independent.

76 / 93
Definition
The correlation between X and Y is
Cov(X, Y )
Corr(X, Y ) = p p .
V ar(X) V ar(Y )

Corr(X, Y ) measures the strength of the linear association between X and Y .

Corr(X, Y ) is a number between -1 and +1 (i.e., Corr(X, Y ) ∈ [−1, +1].)

77 / 93
Definition
If Corr(X, Y) = 0, then X and Y are said to be uncorrelated.

Independent random variables are uncorrelated, but uncorrelated variables


are not necessarily independent.
For example, if X has a distribution which is symmetric about zero and Y = X 2 . Then

E(XY ) = E(X 3 ) = 0 and E(0) = 0.

So, Cov(X, Y ) = E(XY ) − E(X) E(Y ) = E(X 3 ) − E(X) E(Y ) = 0 − 0 × E(Y ) = 0 and
Corr(X, Y ) = 0, but since Y = X 2 , therefore X and Y are dependent.

78 / 93
Results
1 Corr(X, Y ) ≤ 1

2 Corr(X, Y ) = −1 if and only if P (Y = a + bX) = 1 for some constant a


and b such that b < 0.

3 Corr(X, Y ) = +1 if and only if P (Y = a + bX) = 1 for some constant a


and b such that b > 0.

79 / 93
Proof.
2
➊ Let ρ = Corr(X, Y ). We know that, with σX = V ar(X) and
2
σY = V ar(Y ) ,
X Y 
0 ≤ V ar +
σX σY
X  Y  X Y 
= V ar + V ar + 2 Cov ,
σX σY σX σY
V ar(X) V ar(Y )
= 2 + + 2 ρ = 2 (1 + ρ),
σX σY2
therefore, ρ ≥ −1.
X Y

In addition, 0 ≤ V ar σX
− σY
= 2 (1 − ρ), so ρ ≤ 1.

80 / 93
Proof.
 
X Y X Y
➋ If ρ = −1, V ar σX
+ σY
= 2 (1 + ρ) = 0. This means that σX
+ σY
is a
constant i.e., P ( σXX + Y
σY
= c) = 1 for some constant c. However,

X Y −X σY
+ = c ⇐⇒ Y = + c σY .
σX σY σX

So, P (Y = a + bX) = 1 for some constants a = c σY and b = −σ σX


Y
< 0.
➌ Similarly, for ρ = 1, P (Y = a + bX) = 1 for some constants a and
b = σσXY > 0.

81 / 93
4.9 Bivariate Normal Distribution

82 / 93
The most commonly used special type of bivariate distribution is the
bivariate normal.

We say that X and Y have the bivariate normal distribution if


1
fX,Y (x, y) = p
2 π σX σY 1 − ρ2
( " !2 ! ! !2 # )
1 x − µX x − µX y − µy y − µy
× exp − p − 2ρ + ,
2 1 − ρ2 σX σX σy σy

−∞ < x < ∞, −∞ < y < ∞; −∞ < µX < ∞, −∞ < µY < ∞,


σX > 0, σY > 0, −1 < ρ < 1.

83 / 93
4.9.1 Visualisation of the Bivariate Normal Probability
Density Function

84 / 93
The bivariate normal probability density is a bivariate function fX,Y (x, y),
with elliptical contours.

The figures in the next slide provide contour plots of the bivariate normal
probability density for

µX = 3, µY = 7, σX = 2 σY = 5

with ρ = Corr(X, Y ) taking four different values 0.3, 0.7, -0.7 and zero,
respectively.

85 / 93
86 / 93
These, respectively, correspond to
moderate positive correlation between X and Y when Corr(X, Y ) = 0.3
strong positive correlation between X and Y when Corr(X, Y ) = 0.7
strong negative correlation between X and Y when Corr(X, Y ) = −0.7
X and Y are uncorrelated when Corr(X, Y ) = 0.

87 / 93
4.10 Extension to n Random Variables

88 / 93
All of the definitions and results in this chapter extend to the case of more
than two random variables.

For general cases of n random variables, we now give some of the most
fundamental results.

Definition
The joint probability mass function of X1 , X2 , . . . , Xn is

P (X1 = x1 , X2 = x2 , . . . , Xn = xn ).

89 / 93
Definition
The joint cumulative distribution function of X1 , X2 , . . . , Xn is

FX1 ,X2 ,...,Xn (x1 , x2 , . . . , xn ) = P (X1 ≤ x1 , X2 ≤ x2 , . . . , Xn ≤ xn ).

Definition
The joint probability density function of X1 , X2 , . . . , Xn is
∂n
fX1 ,X2 ,...,Xn (x1 , x2 , . . . , xn ) = FX1 ,X2 ,...,Xn (x1 , x2 , . . . , xn ).
∂x1 ∂x2 . . . ∂xn

90 / 93
Definition
We say that X1 , X2 , . . . , Xn are independent if
P (X1 = x1 , X2 = x2 , . . . , Xn = xn ) = P (X1 = x1 ) P (X2 = x2 ) · · · P (Xn = xn )

or

fX1 ,X2 ,...,Xn (x1 , x2 , . . . , xn ) = fX1 (x1 ) fX2 (x2 ) · · · fXn (xn )

or

FX1 ,X2 ,...,Xn (x1 , x2 , . . . , xn ) = FX1 (x1 ) FX2 (x2 ) · · · FXn (xn )

or

E(X1 X2 · · · Xn ) = E(X1 ) E(X2 ) · · · E(Xn )

91 / 93
4.11 Supplementary Material

92 / 93
Supplementary Material - Definition of Expectation

Definition
The expected value of X is
P
 all x x P (X = x)
 (discrete)
E(X) =
R ∞

−∞ x fX (x) dx (continuous)

93 / 93

You might also like