Chapter 4
Chapter 4
UNSW Sydney
OPEN LEARNING
CHAPTER 4
Bivariate Distributions
2 / 93
Outline:
4.1 Joint Probability Mass Functions
4.3 Other Results for Joint Probability Mass Functions and Joint Probability Density Functions
4.4 Marginal Probability Mass Functions and Marginal Probability Density Functions
3 / 93
Observations are often taken in pairs, leading to bivariate observations
(X, Y ), i.e. observations of two variables measured on the same subjects.
4 / 93
4.1 Joint Probability Mass Functions
5 / 93
Definition
If X and Y are discrete random variables, then the joint probability mass
function of X and Y is
P (X = x, Y = y) = P ((X = x) ∩ (Y = y)),
6 / 93
Why Study Joint Probabilities?
Recall that if two events A and B are dependent, then, P (A ∩ B) ̸= P (A) · P (B).
In the context of two discrete random variables X and Y being dependent, we have
P (X = x, Y = y) ̸= P (X = x) · P (Y = y).
P (X = x, Y = y).
7 / 93
Example
Suppose that X and Y have joint probability mass function,
P (X = x, Y = y) as
y
-1 0 1
0 1/8 1/4 1/8
x 1 1/8 1/16 1/16
2 1/16 1/16 1/8
1 1 1 1 1 1 1 1 1
P
Observe that all x, y P (X = x, Y = y) = 8 + 4 + 8 + 8 + 16 + 16 + 16 + 16 + 8 = 1.
8 / 93
Example
Suppose that X and Y have joint probability mass function,
P (X = x, Y = y) as
y
-1 0 1 P (X = x)
0 1/8 1/4 1/8 1/2
x 1 1/8 1/16 1/16 1/4
2 1/16 1/16 1/8 1/4
P (Y = y) 5/16 3/8 5/16 1
Note
by adding along the columns say for x = 0, we obtain the probability mass function
P (X = 0) = 1/8 + 1/4 + 1/8 = 1/2, and
by adding along the rows of y = −1, we obtain the probability mass function of
P (Y = −1) = 1/8 + 1/8 + 1/16 = 5/16.
9 / 93
Example
Suppose that X and Y have joint probability mass function, P (X = x, Y = y) as
y
-1 0 1 P (X = x)
0 1/8 1/4 1/8 1/2
x 1 1/8 1/16 1/16 1/4
2 1/16 1/16 1/8 1/4
P (Y = y) 5/16 3/8 5/16 1
So we observed that
The random variables of X and Y are not independent; we must consider the joint
probability mass function of X and Y .
10 / 93
Example
Let X be the number of successes in the first of two Bernoulli trials, each with a
probability of success, p, and let Y be the total number of successes in the first two
Bernoulli trials. Then
0 (1 − p)2 (1 − p) p 0 1−p
x
1 0 p (1 − p) p2 p
P (Y = y) (1 − p)2 2p(1 − p) p2 1
11 / 93
Explanation of the previous slide
Think of the following; S = {HH, HT, T H, T T } and head is a success. That is, P (H) = p.
13 / 93
Definition
The joint probability density function of a continuous random variables
X and Y is a bivariate function fXY with the probability
Z Z
fX,Y (x, y) dx dy = P ( (X, Y ) ∈ A )
A
For two continuous random variables, X and Y , probabilities have the following
geometrical interpretation:
fX,Y is a surface over the plane R2
probabilities over subsets A ⊂ R2 correspond to the volume under fX,Y over A.
14 / 93
For example, if
3 2
fX,Y (x, y) = (x + xy), for x, y ∈ (0, 1),
4
then, the joint probability density function looks like this:
15 / 93
Example
Suppose (X, Y ) have the joint probability density function
3 2
fX,Y (x, y) = (x + xy), for x, y ∈ (0, 1).
4
Find P(X < 12 , Y < 23 ).
16 / 93
Example
First, what region of (X, Y ) plane do we want to integrate over?
0.5
0 X
0 0.2 0.4 0.6 0.8 1
17 / 93
Example
Solution:
1/2 2/3
1 2 3 2
Z Z
P (X < ,Y < ) = (x + x y) dy dx
2 3 0 0 4
3 1/2 h 2
Z i2/3
= (x y + x y 2 /2) dx
4 0 0
3 1/2 h 2
Z i
= (x (2/3) + x (2/3)2 /2) dx
4 0
1 1/2 h 2 x i
Z
= x + dx
2 0 3
1 h x3 x2 i1/2
= +
2 3 6 0
1h 1 1 i
= +
2 23 × 3 22 × 6
1
= .
24 18 / 93
Example
Suppose Suppose (X, Y ) have the joint probability density function
19 / 93
Example
The red shaded area in this graph corresponds to P (X < 31 , Y < 12 ) and X < Y :
Y
1
0.5
0 X
0 0.2 0.4 0.6 0.8 1
This question is a little tricky because the limits for x are a function of y. This means that
the domain of this probability density function has a triangular shape, the area over which
we need to integrate under fX,Y (x, y) is a trapezoidal, as shown in the dark grey area in
the figure.
20 / 93
Example
Y
1
0.5
0 X
0 0.2 0.4 0.6 0.8 1
21 / 93
Example
Solution:
If we use the horizontal strips, then the integral needs to be broken up into two terms, one corresponding to the triangle
and the rectangle components of the trapezium.
Z 1/3 Z y
1 1
P (X < ,Y < ) = 2 (x + y) dx dy
3 2 0 0
Z 1/2 Z 1/3
+ 2 (x + y) dx dy
1/3 0
11
= .
108
If we use the vertical strips, then the integral can be done in one piece as follows:
Z 1/3 Z 1/2
1 1
P (X < ,Y < ) = 2 (x + y) dy dx
3 2 0 x
11
= .
108
22 / 93
4.3 Other Results for Joint Probability Mass Functions
and Joint Probability Density Functions
23 / 93
Many of the definitions and results for random variables, considered in
Chapter 2, generalise directly to the bivariate case.
Essentially, all the changes from Chapter 2 to this chapter are that instead of
doing a single summation or integral, we now do a double summation or
double integral because there are two variables under consideration.
24 / 93
Results
1 If X and Y are discrete random variables, then
XX
P (X = x, Y = y) = 1.
all x all y
25 / 93
Definition
The joint cumulative distribution of X and Y is
FX,Y (x, y) = P (X ≤ x, Y ≤ y)
P P
u≤x v≤y P (X = u, Y = v) (X, Y ) is discrete
=
R y R x
f
−∞ −∞ X,Y
(u, v) du dv (X, Y ) is continuous.
26 / 93
Example
Suppose X and Y has joint probability density function
27 / 93
Example
First, what region of (X, Y ) plane do we want to integrate over? The
red-shaded region corresponds to fX,Y (x, y) > 0.
Y
1
0.5
0 X
0 0.5 1
28 / 93
Example
Solution:
For 0 < x < 1 and 0 < y < 1, we have
Z y Z x
FX,Y (x, y) = fX,Y (u, v) du dv
−∞ −∞
Z y Z x
= (u + v) du dv
0 0
Z y h u2 ix
= + v u dv
0 2 0
Z y h x2 i
= + v x dv
0 2
h x2 v v 2 x iy
= +
2 2 0
xy
= (x + y).
2
13
For example, P (X < 1/2, Y < 4/5) = FX,Y (1/2, 4/5) = 50
.
29 / 93
Results
If g is any function of X and Y , we have the expectation of g(X, Y ) is
P P
all x all y g(x, y) P (X = x, Y = y) discrete
E[ g(X, Y ) ] =
R ∞ R ∞
−∞ −∞
g(x, y) fX,Y (x, y) dx dy continuous
30 / 93
Example
Suppose we are given the following for the joint probability mass function of
X and Y
y
0 1 2
0 0.1 0.2 0.2
x
1 0.2 0.2 a
where a is a constant.
1 Find a if P (X = x, Y = y) is a joint probability mass function.
2 Find FX,Y (1, 1)
3 Find E(XY ).
31 / 93
Example
Solution:
P P
1 Here we find a by all x all y P (X = x, Y = y) = 1. That is, this gives a = 0.1 .
2
FX,Y (1, 1) = P (X ≤ 1, Y ≤ 1)
= P (X = 0, Y = 0) + P (X = 0, Y = 1)
+P (X = 1, Y = 0) + P (X = 1, Y = 1)
= 0.1 + 0.2 + 0.2 + 0.2 = 0.7
X X
E(XY ) = x y P (X = x, Y = y)
all x all y
= 0 × 0 × 0.1 + 0 × 1 × 0.2 + 0 × 2 × 0.2
+1 × 0 × 0.2 + 1 × 1 × 0.2 + 1 × 2 × 0.1 = 0.4
32 / 93
4.4 Marginal Probability Mass Functions
and Marginal Probability Density Functions
33 / 93
Result
If X and Y are discrete random variables, then P (X = x) and P (Y = y) can
be calculated from P (X = x, Y = y) as follows
X
P (X = x) = P (X = x, Y = y),
all y
X
P (Y = y) = P (X = x, Y = y).
all x
34 / 93
Example
Suppose we are given the joint probability mass function of X and Y to be
y
0 1 2 P (X = x)
x 0 0.1 0.2 0.2 0.5
1 0.2 0.2 0.1 0.5
P (Y = y) 0.3 0.4 0.3 1
P (X = 0) = P (X = 0, Y = 0) + P (X = 0, Y = 1) + P (X = 0, Y = 2)
= 0.1 + 0.2 + 0.2 = 0.50
P
In general, P (X = x) = all y P (X = x, Y = y).
35 / 93
Result
If X and Y are continuous random variables, then fX (x) and fY (y) can
calculated from fX,Y (x, y) as follows
Z ∞
fX (x) = fX,Y (x, y) dy.
−∞
Z ∞
fY (y) = fX,Y (x, y) dx.
−∞
36 / 93
Example
Suppose the joint probability density function of X and Y is given by
37 / 93
Example
Suppose the joint probability density function of X and Y is given by
37 / 93
Example
Suppose the joint probability density function of X and Y is given by
38 / 93
Example
Suppose the joint probability density function of X and Y is given by
Z ∞
fX (x) = fX,Y (x, y) dy
−∞
Z 1
= 2 (x + y) dy
x
2 1
h i
= 2xy + y
x
2
= 2x + 1 − 3x , 0 < x < 1.
R∞ Ry
Similarly, fY (y) = −∞ fX,Y (x, y) dx = 0 2 (x + y) dx = 3y 2 , 0 < y < 1.
38 / 93
4.5 Conditional Probability Mass Functions and
Conditional Probability Density Functions
39 / 93
Definition
If X and Y are discrete random variables, the conditional probability
mass function of X given Y = y is
P (X = x, Y = y)
P (X = x Y = y) = .
P (Y = y)
P (X=x,Y =y)
Similarly, P (Y = y X = x) = P (X=x)
.
40 / 93
Definition
If X and Y are continuous random variables, the conditional probability
density function of X given Y = y is
fX,Y (x, y)
fX|Y (x Y = y) = .
fY (y)
fX,Y (x,y)
Similarly, fY |X (y X = x) = fX (x)
.
41 / 93
Example
Suppose we are given the joint probability mass function of X and Y to be
y
0 1 2 P (X = x)
x 0 0.1 0.2 0.2 0.5
1 0.2 0.2 0.1 0.5
P (Y = y) 0.3 0.4 0.3 1
42 / 93
Example
Solution:
First, we need to find P (X = x | Y = 2), for x = 0, 1:
P (X = 0, Y = 2) 0.20 2
P (X = 0 | Y = 2) = = =
P (Y = 2) 0.30 3
P (X = 1, Y = 2) 0.10 1
P (X = 1 | Y = 2) = = =
P (Y = 2) 0.30 3
In tabular form,
x 0 1
2 1
P (X = x | Y = 2) 3 3
y 0 1 2
1 2 2
P (Y = y | X = 0) 5 5 5
.
43 / 93
Example
Suppose we are given the joint probability density function of X and Y to be
44 / 93
Example
Solution:
First, we need to find the marginals of X and Y :
Z ∞ Z 1
1
fX (x) = fX,Y (x, y) dy = (x + y) dy = x + , 0 < x < 1,
−∞ 0 2
Z ∞ Z 1
1
fY (y) = fX,Y (x, y) dx = (x + y) dx = y + , 0 < y < 1.
−∞ 0 2
45 / 93
Let X and Y be continuous random variables.
Result
If X and Y are continuous random variables, then
Z b
P (a ≤ Y ≤ b X = x) = fY |X (y|x) dy.
a
46 / 93
Let X and Y be discrete random variables.
Result
If X and Y are discrete random variables, then
X
P (Y ∈ A X = x) = P (Y = y|X = x).
y∈A
47 / 93
4.6 Conditional Expected Value and Conditional
Variance
48 / 93
Definition
The conditional expected value of X given Y = y is
P
all x x P (X = x Y = y)
(discrete)
E(X Y = y) =
R ∞
−∞
x fX|Y (x y) dx (continuous)
Definition
The conditional expected value of Y given X = x is
P
all y y P (Y = y X = x)
(discrete)
E(Y X = x) =
R ∞
−∞
y fY |X (y x) dy (continuous)
49 / 93
Example
Recall the example of conditional probability mass function
x 0 1
2 1
P (X = x | Y = 2) 3 3
y 0 1 2
1 2 2
P (Y = y | X = 0) 5 5 5.
50 / 93
Example
Solution:
X
E(X Y = 2) = x P (X = x|Y = 2)
all x
= 0 × 2/3 + 1 × 1/3 = 1/3
X
E(Y X = 0) = y P (Y = y|X = 0)
all y
= 0 × 1/5 + 1 × 2/5 + 2 × 2/5 = 6/5.
51 / 93
Example
Recall the example of conditional probability density function
x+y
fX|Y (x|y) = , 0 < x < 1,
y + 12
x+y
fY |X (y|x) = , 0 < y < 1.
x + 12
52 / 93
Example
Solution:
Z ∞
E(X|Y = y) = x fX|Y (x|y) dx
−∞
Z 1
x+y
= x dx
0 y + 1/2
Z 1
1
= (x2 + x y) dx
y + 1/2 0
1 h x3 x2 y i1
= +
y + 1/2 3 2 0
1 1 y
= + , 0 < y < 1.
y + 1/2 3 2
Note that this definition is an application of the definitions of V ar(X) from Chapter 2.
54 / 93
Example
Recall the example of conditional probability mass function
x 0 1
2 1
P (X = x | Y = 2) 3 3
y 0 1 2
1 2 2
P (Y = y | X = 0) 5 5 5
.
55 / 93
Example
Solution:
h i2
V ar(X|Y = 2) = E(X 2 Y = 2) − E(X Y = 2) .
Here X 2 1 1
E(X 2 |Y = 2) = x2 P (X = x | Y = 2) = 02 × +1× = .
3 3 3
all x
1
Earlier, we saw that E(X|Y = 2) = 3 .
So
h i2 1 1 2 2
V ar(X|Y = 2) = E(X 2 Y = 2) − E(X Y = 2) = − = .
3 3 9
56 / 93
Example
Recall the example of conditional probability density function
x+y
fX|Y (x|y) = , 0<x<1
y + 12
and conditional expectation for this example
57 / 93
Example
Solution:
h i2
V ar(X|Y = y) = E(X 2 Y = y) − E(X Y = y) .
Here
Z ∞ Z 1 x+y
E(X 2 |Y = y) = x2 fX|Y (x|y) dx = x2 1
dx
−∞ 0 y+ 2
" #1
Z 1 x3 + x2 y 1 x4 x3 y
= dx = +
0 y + 12 y+ 1
2
4 3
0
" #
1 1 y 3 + 4y
= 1
+ = , 0 < y < 1.
y+ 2
4 3 6 (2y + 1)
2+3y
Earlier, we saw that E(X|Y = y) = 3 (2y+1)
, 0<y<1 .
" #2
h i2
3+4y 2+3y
So V ar(X|Y = y) = E(X 2 Y = y) − E(X Y = y) = 6 (2y+1)
− 3 (1+2y)
, 0 < y < 1.
59 / 93
Definition
The random variables X and Y are independent if and only if
60 / 93
Result
The random variables X and Y are independent if and only if
fY |X (y|x) = fY (y)
or
fX|Y (y|x) = fX (x).
This result allows the interpretation that conforms with the every day
meaning of the word independence.
61 / 93
Result
If random variables X and Y are independent, then
62 / 93
Example
Suppose the joint probability mass function of X and Y is given by
y
-1 0 1 P (X = x)
0 0.01 0.02 0.07 0.1
x 1 0.04 0.13 0.33 0.50
2 0.05 0.05 0.30 0.40
P (Y = y) 0.10 0.20 0.70 1
63 / 93
Example
Solution:
To show that X and Y are not independent, we need only to find a single
case of x and y such that P (X = x, Y = y) ̸= P (X = x) P (Y = y).
P (X = 1, Y = 1) = 0.33 ̸= P (X = 1) P (Y = 1) = 0.35.
64 / 93
Example
Suppose the joint probability mass function of X and Y is given by
P (X = x, Y = y) = p2 (1 − p)x+y , x = 0, 1, 2, . . . , y = 0, 1, 2, . . . ,
0 < p < 1.
65 / 93
Example
Solution:
First, we will find the marginals of X and Y .
∞
2 x+y
X X
P (X = x) = P (X = x, Y = y) = p (1 − p)
all y y=0
∞
!
2 x
X y 2 x 1
= p (1 − p) (1 − p) = p (1 − p)
y=0 1 − (1 − p)
x
= p (1 − p) , x = 0, 1, 2, . . . , 0 < p < 1.
x y 2 x+y
P (X = x) P (Y = y) = p (1 − p) × p (1 − p) = p (1 − p) = P (X = x, Y = y).
66 / 93
Example
Suppose the joint probability density function of X and Y is given by
67 / 93
Example
Solution:
First, we find the marginals of X and Y . That is,
Z ∞ Z 1
fX (x) = fX,Y (x, y) dy = 6 x y 2 dy = 2x, 0 < x < 1,
−∞ 0
Z ∞ Z 1
fY (y) = fX,Y (x, y) dx = 6 x y 2 dx = 3 y 2 , 0 < y < 1.
−∞ 0
69 / 93
Example
Solution:
First, we find the marginals of X and Y .That is,
Z ∞ Z 1 10x (1 − x3 )
fX (x) = fX,Y (x, y) dy = 10 x y 2 dy = , 0 < x < 1,
−∞ x 3
and Z ∞ Z y
fY (y) = fX,Y (x, y) dx = 10 x y 2 dx = 5y 4 , 0 < y < 1.
−∞ 0
Clearly
10x (1 − x3 )
fX (x) fY (y) = · 5y 4 ̸= fX,Y (x, y) = 10 x y 2 ,
3
for all 0 < x < y < 1.
70 / 93
Results
If X and Y are independent random variables, then
71 / 93
4.8 Covariance and Correlation
72 / 93
Definition
The covariance of X and Y is
h i
Cov(X, Y ) = E (X − µX ) (Y − µY ) ,
Cov(X, Y ) measures not only how X and Y vary about their means but also
how they vary together linearly.
73 / 93
Results
1 Cov(X, X) = V ar(X).
2 Cov(X, Y ) = E(XY ) − µX µY , where E(X) = µX and E(Y ) = µY .
74 / 93
Results
1 Cov(X, X) = V ar(X).
2 Cov(X, Y ) = E(XY ) − µX µY , where E(X) = µX and E(Y ) = µY .
Proof.
h i h i
1 By definition, Cov(X, X) = E (X − µX )(X − µX ) = E (X − µX )2 = V ar(X).
h i h i
2 By definition, Cov(X, Y ) = E (X − µX )(Y − µY ) = E (X Y ) − X µY − µX Y + µX µY =
E ( X Y ) − µY E(X) − µX E(Y ) + µX µY = E (X Y ) − µX µY .
74 / 93
Result
If X and Y are independent, then Cov(X, Y ) = 0.
75 / 93
Result
If X and Y are independent, then Cov(X, Y ) = 0.
Proof.
We will consider the continuous case only.
Z ∞ Z ∞
E(XY ) = x y fX,Y (x, y) dx dy
−∞ −∞
Z ∞ Z ∞
= x y fX (x) fY (y) dx dy
−∞ −∞
(since X and Y are independent)
Z ∞ Z ∞
= x fX (x) dx y fY (y) dy
−∞ −∞
= E(X) E(Y ) = µX µY .
76 / 93
Definition
The correlation between X and Y is
Cov(X, Y )
Corr(X, Y ) = p p .
V ar(X) V ar(Y )
77 / 93
Definition
If Corr(X, Y) = 0, then X and Y are said to be uncorrelated.
So, Cov(X, Y ) = E(XY ) − E(X) E(Y ) = E(X 3 ) − E(X) E(Y ) = 0 − 0 × E(Y ) = 0 and
Corr(X, Y ) = 0, but since Y = X 2 , therefore X and Y are dependent.
78 / 93
Results
1 Corr(X, Y ) ≤ 1
79 / 93
Proof.
2
➊ Let ρ = Corr(X, Y ). We know that, with σX = V ar(X) and
2
σY = V ar(Y ) ,
X Y
0 ≤ V ar +
σX σY
X Y X Y
= V ar + V ar + 2 Cov ,
σX σY σX σY
V ar(X) V ar(Y )
= 2 + + 2 ρ = 2 (1 + ρ),
σX σY2
therefore, ρ ≥ −1.
X Y
In addition, 0 ≤ V ar σX
− σY
= 2 (1 − ρ), so ρ ≤ 1.
80 / 93
Proof.
X Y X Y
➋ If ρ = −1, V ar σX
+ σY
= 2 (1 + ρ) = 0. This means that σX
+ σY
is a
constant i.e., P ( σXX + Y
σY
= c) = 1 for some constant c. However,
X Y −X σY
+ = c ⇐⇒ Y = + c σY .
σX σY σX
81 / 93
4.9 Bivariate Normal Distribution
82 / 93
The most commonly used special type of bivariate distribution is the
bivariate normal.
83 / 93
4.9.1 Visualisation of the Bivariate Normal Probability
Density Function
84 / 93
The bivariate normal probability density is a bivariate function fX,Y (x, y),
with elliptical contours.
The figures in the next slide provide contour plots of the bivariate normal
probability density for
µX = 3, µY = 7, σX = 2 σY = 5
with ρ = Corr(X, Y ) taking four different values 0.3, 0.7, -0.7 and zero,
respectively.
85 / 93
86 / 93
These, respectively, correspond to
moderate positive correlation between X and Y when Corr(X, Y ) = 0.3
strong positive correlation between X and Y when Corr(X, Y ) = 0.7
strong negative correlation between X and Y when Corr(X, Y ) = −0.7
X and Y are uncorrelated when Corr(X, Y ) = 0.
87 / 93
4.10 Extension to n Random Variables
88 / 93
All of the definitions and results in this chapter extend to the case of more
than two random variables.
For general cases of n random variables, we now give some of the most
fundamental results.
Definition
The joint probability mass function of X1 , X2 , . . . , Xn is
P (X1 = x1 , X2 = x2 , . . . , Xn = xn ).
89 / 93
Definition
The joint cumulative distribution function of X1 , X2 , . . . , Xn is
Definition
The joint probability density function of X1 , X2 , . . . , Xn is
∂n
fX1 ,X2 ,...,Xn (x1 , x2 , . . . , xn ) = FX1 ,X2 ,...,Xn (x1 , x2 , . . . , xn ).
∂x1 ∂x2 . . . ∂xn
90 / 93
Definition
We say that X1 , X2 , . . . , Xn are independent if
P (X1 = x1 , X2 = x2 , . . . , Xn = xn ) = P (X1 = x1 ) P (X2 = x2 ) · · · P (Xn = xn )
or
fX1 ,X2 ,...,Xn (x1 , x2 , . . . , xn ) = fX1 (x1 ) fX2 (x2 ) · · · fXn (xn )
or
FX1 ,X2 ,...,Xn (x1 , x2 , . . . , xn ) = FX1 (x1 ) FX2 (x2 ) · · · FXn (xn )
or
91 / 93
4.11 Supplementary Material
92 / 93
Supplementary Material - Definition of Expectation
Definition
The expected value of X is
P
all x x P (X = x)
(discrete)
E(X) =
R ∞
−∞ x fX (x) dx (continuous)
93 / 93