0% found this document useful (0 votes)
14 views17 pages

Recitation 5 Solution

The document discusses joint, marginal, and conditional distributions in probability theory, providing definitions and theorems related to cumulative distribution functions (CDFs), probability mass functions (PMFs), and probability density functions (PDFs). It includes exercises on finding joint PDFs, marginal PDFs, and conditional PDFs, as well as discussions on independence of random variables. The document also covers Bayes' rule and the law of total probability for both discrete and continuous random variables.

Uploaded by

a2671631196
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views17 pages

Recitation 5 Solution

The document discusses joint, marginal, and conditional distributions in probability theory, providing definitions and theorems related to cumulative distribution functions (CDFs), probability mass functions (PMFs), and probability density functions (PDFs). It includes exercises on finding joint PDFs, marginal PDFs, and conditional PDFs, as well as discussions on independence of random variables. The document also covers Bayes' rule and the law of total probability for both discrete and continuous random variables.

Uploaded by

a2671631196
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

10510134: Probability Theory and Mathematical Statistics Fall 2023

Recitation 5

5.1 Joint, Marginal, and Conditional Distributions

5.1.1 Review

Definition 5.1 (Joint CDF). The joint CDF of r.v.s X and Y is the function FX,Y below:

FX,Y (x, y) = P (X ≤ x, Y ≤ y).

• Just like the univariate CDFs, a valid joint CDF also has to satisfy several basic properties: for
any (x, y) ∈ R2 ,

– FX,Y (−∞, y) = FX,Y (x, −∞) = 0; FXY (∞, ∞) = 1.

– FXY (x, y) is non-decreasing in both x and y.

– FXY (x, y) is right continuous in both x and y.

Definition 5.2 (Joint PMF). The joint PMF of discrete r.v.s X and Y is the function pX,Y given by

pX,Y (x, y) = P (X = x, Y = y).

The support of (X, Y ) is Supp(X, Y ) = {(x, y) ∈ R2 : pX,Y (x, y) > 0}.

Definition 5.3 (Marginal PMF). For discrete r.v.s X and Y , the marginal PMF of X is
X X
pX (x) = P (X = x) = P (X = x, Y = y) = pX,Y (x, y).
y y

P
Similarly, the marginal PMF of Y is pY (y) = x pX,Y (x, y).

Definition 5.4 (Conditional PMF). For discrete r.v.s X and Y , the conditional PMF of Y given
X = x, such that P (X = x) > 0, is defined as

P (X = x, Y = y)
pY |X (y | x) = P (Y = y | X = x) = .
P (X = x)

This is viewed as a function of y for a fixed x. The conditional PMF of X given Y can be defined
symmetrically.

5-1
5-2

Theorem 5.5 (Discrete form of Bayes’ rule and law of total probability). For discrete r.v.s X and Y ,
we have the following discrete form of Bayes’ rule,
P (X = x | Y = y)P (Y = y)
P (Y = y | X = x) = .
P (X = x)
We also have the discrete form of law of total probability:
X
P (X = x) = P (X = x | Y = y)P (Y = y).
y

Definition 5.6 (Joint PDF). If X and Y are continuous with joint CDF FX,Y , their joint PDF is
the derivative of the joint CDF with respect to x and y:
∂2
fX,Y (x, y) = FX,Y (x, y).
∂x∂y
The support of (X, Y ) is Supp(X, Y ) = {(x, y) ∈ R2 : fX,Y (x, y) > 0}.

Definition 5.7 (Marginal PDF). For continuous r.v.s X and Y with joint PDF fX,Y , the marginal
PDF of X is Z ∞
fX (x) = fX,Y (x, y)dy.
−∞
R∞
Similarly, the marginal PDF of Y is fY (y) = f
−∞ X,Y
(x, y)dx.

Definition 5.8 (Conditional PDF). For continuous r.v.s X and Y with joint PDF fX,Y , the condi-
tional PDF of Y given X = x is
fX,Y (x, y)
fY |X (y | x) = ,
fX (x)
for all x with fX (x) > 0. This is considered as a function of y for the fixed x. The conditional PDF of
X given Y = y can be defined symmetrically.

Theorem 5.9 (Continuous form of Bayes’ rule and law of total probability). For continuous r.v.s X
and Y , we have the following continuous form of Bayes’ rule:
fX|Y (x | y)fY (y)
fY |X (y | x) = .
fX (x)
And we have the following continuous form of the law of total probability:
Z ∞
fX (x) = fX|Y (x | y)fY (y)dy.
−∞

Theorem 5.10. Two random variables X and Y are independent if and only if for all x and y,

FX,Y (x, y) = FX (x)FY (y).

1. If X and Y are discrete, this is equivalent to the condition

P (X = x, Y = y) = P (X = x)P (Y = y)
5-3

for all x, y, and it is also equivalent to the condition

P (Y = y | X = x) = P (Y = y), ∀x, y, such that P (X = x) > 0,

and also equivalent to the condition

P (X = x | Y = y) = P (X = x), ∀x, y, such that P (Y = y) > 0.

2. If X and Y are continuous, this is equivalent to the condition

fX,Y (x, y) = fX (x)fY (y)

for all x, y, and it is also equivalent to the condition

fY |X (y | x) = fY (y) ∀x, y, such that fX (x) > 0,

and also equivalent to the condition

fX|Y (x | y) = fY (x) ∀x, y, such that fY (y) > 0.

5.1.2 Exercise

Exercise 1. Let X and Y have joint PDF

fX,Y (x, y) = cxy, for 0 < x < y < 1

1. Find c to make this a valid joint PDF.


2. Are X and Y independent?
3. Find the marginal PDFs of X and Y .
4. Find the conditional PDF of Y given X = x.

Answer:

1. To make the joint density function fX,Y (x, y) valid, we have to ensure that
Z 1Z y
1= fX,Y (x, y)dxdy
0 0
Z 1 Z y 
= cxy dx dy
0 0
Z 1 3
cy c
= dy =
0 2 8
which implies the normalizing constant c = 8.
2. They are not indpendent becuase the support of X relies on the value of Y . This can be also
seen from the answers to questions 3 and 4 below: apparently, the joint PDF cannot be written
as the product of the two marginal PDFs below, and the conditional PDF below depends on x.
5-4

3. The marginal PDF of X is


Z 1
fX (x) = 8xy dy = 4x(1 − x2 ), for 0 < x < 1.
x

Similarily, the mariginal PDF of Y has the form


Z y
fY (y) = 8xy dx = 4y 3 , for 0 < y < 1.
0

4. By definition, the conditional PDF of Y given X = x is


fX,Y (x, y) 2y
fY |X (y | x) = = .
fX (x) 1 − x2

Exercise 2. A random point (X, Y, Z) is chosen uniformly in the ball B = {(x, y, z) : x2 +y 2 +z 2 ≤ 1}.

1. Find the joint PDF of X, Y, Z.


2. Find the joint PDF of X, Y .
3. Find an expression for the marginal PDF of X, as an integral.

Answer:

1. Notice that the volume of a unit ball is Volume(B) = 34 π, then the joint PDF of X, Y, Z has the
form  3
 , if x2 + y 2 + z 2 ≤ 1
f (x, y, z) = 4π

0, otherwise
given the point (X, Y, Z) is uniformly chosen from B. One can readily verify that
ZZZ
3
f (x, y, z) dxdydz = · Volume(B) = 1
B 4π
2. The joint PDF of X, Y could be derived from the joint PDF of X, Y, Z by marginalizing out the
variable Z. Notice that x2 + y 2 + z 2 ≤ 1, which implies that z 2 ≤ 1 − (x2 + y 2 ), we have
Z √1−x2 −y2
3 p
fX,Y (x, y) = √ f (x, y, z)dz = 1 − x2 − y 2 ,
− 1−x2 −y 2 2π

for any x, y such that x2 + y 2 ≤ 1.


3. We can readily obtain the marginal PDF of X from the joint PDF of X, Y , namely, fX,Y (x, y)
for x2 + y 2 ≤ 1. Again notice that y 2 ≤ 1 − x2 , we have for −1 ≤ x ≤ 1,
Z √
1−x2 Z √
1−x2 p
3
fX (x) = √ fX,Y (x, y)dy = √
1 − x2 − y 2 dy
− 1−x2 2π − 1−x2
5-5

Exercise 3. Let X, Y, Z be r.v.s such that X ∼ N (0, 1) and conditional on X = x, Y and Z are i.i.d.
N (x, 1).

1. Find the joint PDF of X, Y, Z.


2. By definition, Y and Z are conditionally independent given X. Discuss intuitively whether or not
Y and Z are also unconditionally independent.
3. Find the joint PDF of Y and Z.

Answer:

1. Note that the conditional PDF of Y and Z given X = x are


1 (y−x)2 1 (z−x)2
fY |X (y | x) = √ e− 2 and fZ|X (z | x) = √ e− 2
2π 2π
respectively. Given Y ⊥ Z | X = x, we then have
1 − (z−x)2 +(y−x) 2
fY,Z|X (y, z | x) = fY |X (y | x) · fZ|X (z | x) = e 2

2
√1 e− 2
x
By noticing that fX (x) = 2π
, the joint PDF of X, Y, Z has the form
 3
1 x2 +(y−x)2 +(z−x)2
fX,Y,Z (x, y, z) = fY,Z|X (y, z | x) · fX (x) = √ e− 2


2. We know that both Y, Z are dependent of X. This means that knowing the value of Y provides
information about X, which in turn is informative about Z. Thus, although Y, Z are conditionally
independent given X, they are unconditionally dependent.
3. The joint PDF of Y and Z can be obtained by marginalizing over X on fX,Y,Z (x, y, z), which is
Z ∞
fY,Z (y, z) = fX,Y,Z (x, y, z) dx
−∞
 3 Z ∞
1 x2 +(y−x)2 +(z−x)2
= √ e− 2 dx
2π −∞
 3 Z ∞
1 −y
2 +z 2 −yz 3(x−(y+z)/3)2
= √ e 3 e− 2 dx
2π −∞
 3 r
1 2π−y 1
2 +z 2 −yz y 2 +z 2 −yz
= √ e = √ e−3 3

2π 3 2 3π
R∞ 3(x−(y+z)/3)2
q
Here the second last equality uses the fact that −∞ e− 2 dx = 2π 3
. This is because

1 3(x−(y+z)/3)2
q e− 2

3

can be viewed as the PDF of N ( y+z


3
, 13 ), so it must integrate to 1.
This result does show that Y and Z are not unconditionally independent, since fY,Z (y, z) cannot
be factored into two functions of y, z respectively.
5-6

Exercise 4. Please answer the following questions about Poisson distribution and Binomial distribu-
tion:

1. If X ∼ Pois(λp), Y ∼ Pois(λq), and X and Y are independent, then what is the distribution of
N = X + Y ? What is the conditional distribution of X | N = n?
2. If N ∼ Pois(λ) and X | N = n ∼ Bin(n, p), then what is the marginal distribution of X and the
marginal distribution of Y = N − X? Are X and Y are independent or dependent?

Answer:

1. Let N = X + Y where X ∼ Poisson(λp) and Y ∼ Poisson(λq) are independent. The PMF of N


has the form

X
k
P (N = k) = P (X + Y = k) = P (X + Y = k | X = j) · P (X = j)
j=0

X
k
= P (Y = k − j | X = j) · P (X = j)
j=0

X
k
= P (Y = k − j) · P (X = j) (Independence of Y , X)
j=0

Xk
(λq)k−j −λq (λp)j −λp
= e e
j=0
(k − j)! j!
k  
e−λ(p+q) X k
= (λp)j (λq)k−j
k! j=0
j
e−λ(p+q) (λ(p + q))k
=
k!

where the last step used the binomial theorem. So we have N ∼ Poisson(λ(p + q)). Meanwhile,
the conditional distribution of X | N = n is

P (X = j, Y = n − j) P (X = j)P (Y = n − j)
P (X = j | N = n) = =
P (N = n) P (N = n)
(λp)j −λp (λq)n−j −λq
j!
e (n−j)!
e
= (λ(p+q))n −λ(p+q)
n!
e
  j  n−j
n p q
=
j p+q p+q

which is just the Binomial distribution Binom(n, p/(p + q)).


5-7

2. Given that N ∼ Poisson(λ) and X | N = n ∼ Binom(n, p), the marginal distribution of X is


X

P (X = k) = P (X = k | N = n) · P (N = n)
n=0
X∞
= P (X = k | N = n) · P (N = n)
n=k
X∞  
n k λn
= p (1 − p)n−k e−λ
n=k
k n!
X∞    
−λ(1−p) (λ(1 − p))
k n−k
−λp (λp)
= e · e
n=k
k! (n − k)!
  X∞
(λp)k (λ(1 − p))m
= e−λp · e−λ(1−p) (m := n − k)
k! m=0
m!
(λp)k
= e−λp for k ≥ 0
k!
where the last equality using the fact of the summation of PMF of Poisson(m, (1 − p)) is 1. Next,
let Y = N − X, then the marginal distribution of Y can be derived in a similar manner as
X

P (Y = k) = P (N − X = k) = P (n − X = k | N = n) · P (N = n)
n=0
X∞
= P (X = n − k | N = n) · P (N = n)
n=k
X∞  
n λn
= pn−k (1 − p)k e−λ
n=k
n−k n!
X∞    
−λ(1−p) (λ(1 − p))
n−k k
−λp (λp)
= e · e
n=k
(n − k)! k!
  X∞
(λ(1 − p))k (λp)m
= e−λ(1−p) · e−λp (m := n − k)
k! m=0
m!
(λ(1 − p))k
= e−λ(1−p)
k!
which suggest that Y = N −X is a random variable following Poisson(λ(1−p)). Finally, acoording
to Bayes rule, we have

P (Y = n − k | X = k) = P (N = n | X = k)
P (X = k | N = n) · P (N = n)
=
P (X = k)
 n
n k
k
p (1 − p)n−k e−λ λn!
= k
e−λp (λp)
k!
(λ(1 − p))n−k
= e−λ(1−p) = P (Y = n − k)
(n − k)!
5-8

for all k ≥ 0. Therefore, this reuslt illustrate that X and Y are indeed independent.

5.2 Functions of Random Variables

5.2.1 Review

Definition 5.11. When X1 , X2 are discrete r.v.s, Y1 , Y2 are also discrete, and we can easily get the
joint PMF of (Y1 , Y2 ) from the PMF of (X1 , X2 ):

pY1 ,Y2 (y1 , y2 ) = P (Y1 = y1 , Y2 = y2 ) = P ((X1 , X2 ) ∈ A(y1 , y2 ))


X
= P (X1 = x1 , X2 = x2 ),
(x1 ,x2 )∈A(y1 ,y2 )
X
= pX1 ,X2 (x1 , x2 ),
(x1 ,x2 )∈A(y1 ,y2 )

where A(y1 , y2 ) = {(x1 , x2 ) ∈ Supp(X1 , X2 ) : g1 (x1 ) = y1 , g2 (x2 ) = y2 }.

Definition 5.12. When X1 , X2 are continuous random variables, we can obtain the joint CDF of
(Y1 , Y2 ) from the joint PDF of (X1 , X2 ):

FY1 ,Y2 (y1 , y2 ) = P (Y1 ≤ y1 , Y2 ≤ y2 ) = P ((X1 , X2 ) ∈ B(y1 , y2 ))


ZZ
= fX1 ,X2 (x1 , x2 )dx1 dx2 ,
B(y1 ,y2 )

where B(y1 , y2 ) = {(x1 , x2 ) ∈ Supp(X1 , X2 ) : g1 (x1 , x2 ) ≤ y1 , g2 (x1 , x2 ) ≤ y2 }. If the CDF above is also
differentiable in y1 , y2 , then we can take its derivative with respect to y1 , y2 to obtain the joint PDF
fY1 ,Y2 .

Theorem 5.13 (Change of two variables). Let X = (X1 , X2 ) be a continuous random vector with joint
PDF fX and support set A = {(x1 , x2 ) : fX (x1 , x2 ) > 0}. Let g : A → R2 be a continuously differentiable
and invertible function whose range is given by B = {(y1 , y2 ) : y1 = g1 (x1 , x2 ), y2 = g2 (x1 , x2 ), (x1 , x2 ) ∈
A}.

Let Y = g(X), and mirror this by letting y = g(x). Accordingly, we also have X = g −1 (Y) and
x = g −1 (y). Also assume that the determinant of the Jacobian matrix ∂x/∂y is never 0 on B. Then
the joint PDF of Y is
∂x
fY (y) = fX (x) · | | for y ∈ B,
∂y
and 0 otherwise, where x = g −1 (y). (The inner bars around the Jacobian say to take the determinant
and the outer bars say to take the absolute value.)
5-9

5.2.2 exercise

Exercise 5. Let T1 be the lifetime of a refrigerator and T2 be the lifetime of a stove. Assume that
T1 ∼ Expo(λ1 ) and T2 ∼ Expo(λ2 ) are independent.

1. What is the probability P (T1 < T2 )?


2. What is the distribution of the time when the first appliance failure occurs, namely, min(T1 , T2 )?
What is the distribution of the time when both appliances fail, namely, max{T1 , T2 }?

Answer:

1. We just need to integrate the joint PDF of T1 and T2 over the appropriate region, which is all
(t1 , t2 ) with t1 > 0, t2 > 0, and t1 < t2 . This yields
Z ∞ Z t2
P (T1 < T2 ) = fT1 ,T2 (t1 , t2 )dt1 dt2
0 0
Z ∞ Z t2
= fT1 (t1 )fT2 (t2 )dt1 dt2 (Independence of T1 , T2 )
0 0
Z ∞ Z t2
= λ1 e−λ1 t1 λ2 e−λ2 t2 dt1 dt2
0 0
Z ∞ Z t 2 
−λ1 t1
= λ1 e dt1 λ2 e−λ2 t2 dt2
Z ∞
0 0

= (1 − e−λ1 t2 )λ2 e−λ2 t2 dt2


0
Z ∞
=1− λ2 e−(λ1 +λ2 )t2 dt2
0
λ2 λ1
=1− =
λ1 + λ2 λ1 + λ2

2. We can find the distribution of min(T1 , T2 ) by considering its survival function. By indenpedence,

P (min(T1 , T2 ) > t) = P (T1 > t, T2 > t) = P (T1 > t) · P (T2 > t)


= e−λ1 t · e−λ2 t = e−(λ1 +λ2 )t

which means that


P (min(T1 , T2 ) ≤ t) = 1 − e−(λ1 +λ2 )t

This shows that the random variable min(T1 , T2 ) follows from Expo(λ1 + λ2 ).
The distribution of the time when both appliances fail, namely, max(T1 , T2 ), takes the form

P (max(T1 , T2 ) ≤ t) = P (T1 ≤ t, T2 ≤ t) = P (T1 ≤ t) · P (T2 ≤ t)


= (1 − e−λ1 t ) · (1 − e−λ2 t )
= 1 − e−λ1 t − e−λ2 t + e−(λ1 +λ2 )t
5-10

The PDF of max(T1 , T2 ) is then


dP (max(T1 , T2 ) ≤ t)
fmax(T1 ,T2 ) (t) = = λ1 e−λ1 t + λ2 e−λ2 t − (λ1 + λ2 )e−(λ1 +λ2 )t
dt
Note that max(T1 , T2 ) is not a exponential random variable.

Exercise 6. Let X, Y be independent continuous r.v.s with PDF fX , fY , and T = X + Y be their


sum. What is the PDF of T ?

Answer: There are two equivalent ways to derive the PDF of T . One is through the law of total
probability, and the other is through the change-of-variable formula.

Approach 1: Law of total probability. Using the law of total probability and the fact that X and
Y are independent, we have
Z ∞
FT (t) = P (X + Y ≤ t) = P (X + Y ≤ t | X = x)fX (x)dx
−∞
Z ∞
= P (Y ≤ t − x | X = x)fX (x)dx
−∞
Z ∞
= P (Y ≤ t − x)fX (x)dx by independence
−∞
Z ∞
= FY (t − x)fX (x)dx
−∞

Consequently, the PDF of T is given by


Z ∞
fT (t) = FT′ (t) = fY (t − x)fX (x)dx
−∞

where we use the technique of Leibniz integral rule. This equation is known as the convolution integral
formula.

Approach 2: Change-of-variable formula. We consider the invertible transformation (X, Y ) 7→


(X + Y, X) (using (X, Y ) 7→ (X + Y, Y ) would be equally valid). Once we have the joint PDF of X + Y
and X, we integrate out X to get the marginal PDF of X + Y .

Let T = X + Y, W = X, and let t = x + y, w = x. It may seem redundant to give X the new


name “W ”, but doing this makes it easier to distinguish between pre-transformation variables and
post-transformation variables: we are transforming (X, Y ) 7→ (T, W ). Then
!
∂(t, w) 1 1
=
∂(x, y) 1 0
has absolute determinant equal to 1. Thus, the joint PDF of T and W is

fT,W (t, w) = fX,Y (x, y) = fX (x)fY (y) = fX (w)fY (t − w)


5-11

and the marginal PDF of T is


Z ∞ Z ∞
fT (t) = fT,W (t, w)dw = fX (x)fY (t − x)dx
−∞ −∞

in agreement with our result in Approach 1 above.

The formula
Z ∞
fX+Y (t) = fX (x)fY (t − x)dx
−∞

is called convolution formula. You can check out the appendix in Chapter 3 for more details.

Exercise 7. If X, Y are i.i.d random variables with Expo(λ) distribution, then what is the PDF of
T =X +Y?

Answer: When X and Y are i.i.d. random variables with Exp(λ) distributions, the PDF of T = X + Y
has the following form: for any t > 0,
Z ∞
fT (t) = fY (t − x)fX (x)dx
−∞
Z t
= λe−λ(t−x) λe−λx dx
0
Z t
2 −λt
=λ e dx = λ2 te−λt ,
0

where we restricted the integral to be from 0 to t since we need t − x > 0 and x > 0. Such PDF fT (t)
is known as the Gamma(2, λ) distribution.

Exercise 8. Let U ∼ Unif(0, 2π), and let T ∼ Expo(1) be independent of U . Define


√ √
X= 2T cos U and Y = 2T sin U

Find the joint PDF of (X, Y ). Are they independent? What are their marginal distributions?

Answer: By the independence of U, T , the joint PDF of U and T is


1 −t
fU,T (u, t) = fU (u)fT (t) = e ,

for u ∈ (0, 2π) and t > 0. Viewing (X, Y ) as a point in the plane,

X 2 + Y 2 = 2T cos2 U + sin2 U = 2T

is the squared distance from the origin and U is the angle; that is, ( 2T , U ) expresses (X, Y ) in polar
coordinates.
5-12

Since we can recover (U, T ) from (X, Y ), the transformation is invertible. The Jacobian matrix
√ !
∂(x, y) − 2t sin u √12t cos u
= √
∂(u, t) 2t cos u √1 sin u 2t

exists, has continuous entries, and has absolute determinant

− sin2 u − cos2 u = 1.

This implies
∂(u, t) ∂(x, y)
| | = 1/| | = 1.
∂(x, y) ∂(u, t)
√ √
Then letting x = 2t cos u, y = 2t sin u to mirror the transformation from (U, T ) to (X, Y ), we have
∂(u, t)
fX,Y (x, y) = fU,T (u, t) · | |
∂(x, y)
1 −t
= e ·1

1 − 12 (x2 +y2 )
= e

1 1
= √ e−x /2 · √ e−y /2 ,
2 2

2π 2π
for all real x and y. We recognize the joint PDF as the product of two standard Normal PDFs, so X
and Y are i.i.d. N (0, 1) r.v.s!

Exercise 9. Let X and Y be independent positive r.v.s, with PDFs fX and fY respectively, and
consider the product T = XY . Find the PDF of T in terms of fX and fY .

Answer: We use 3 equivalent approaches to derive the PDF of T .


Approach 1: Law of total probability. Using the law of total probability and the fact that X and
Y are independent positive r.v.s, we have
Z ∞
FT (t) = P (XY ≤ t) = P (XY ≤ t | X = x) fX (x)dx
Z0 ∞ 
t
= P Y ≤ | X = x fX (x)dx
x
Z ∞  
0
t
= P Y ≤ fX (x)dx
x
0
Z ∞  
t
= FY fX (x)dx
0 x
Consequently, the marginal PDF of T is given by
Z ∞   Z ∞  
∂ ∂ t t dx
fT (t) = FT (t) = FY fX (x)dx = fX (x)fY
∂t 0 ∂t x 0 x x
5-13

where we use the technique of Leibniz integral rule in the second equality when we exchange the order
of differentiation and integration.
Similarly, we can also derive another equivalent formula by conditioning on Y = y:
Z ∞  
t dy
fT (t) = fX fY (y)
0 y y

Approach 2: Change-of-variable formula. We consider the invertible transformation (X, Y ) 7→


(XY, Y ) (using (X, Y ) 7→ (XY, X) would be equally valid). Once we have the joint PDF of XY and
Y , we integrate out Y to get the marginal PDF of XY .
Let T = XY , W = Y , and let t = xy, w = y. It may seem redundant to give Y the new name W , but
doing this makes it easier to distinguish between pre-transformation variables and post-transformation
variables: we are transforming (X, Y ) 7→ (T, W ). Then
!
∂(t, w) y x
=
∂(x, y) 0 1

has absolute determinant equal to y (y > 0). This means that


∂(x, y) ∂(t, w) 1
| | = 1/| |= .
∂(t, w) ∂(x, y) y
Thus, the joint PDF of T and W is
 
∂(x, y) 1 t 1
fT,W (t, w) = fX,Y (x, y)| | = fX (x)fY (y) = fX fY (w)
∂(t, w) y w w
and the marginal PDF of T is
Z ∞ Z ∞  
t 1
fT (t) = fT,W (t, w)dw = fX fY (y) dy
0 0 y y
Similarly, we can also derive another equivalent formula by using (X, Y ) 7→ (XY, X):
Z ∞  
t dx
fT (t) = fX (x)fY
0 x x
in agreement with our result in Approach 1 above.

Approach 3: Convolution formula. In addition, by taking the log of both sides of T = XY , we


can also use the convolution formula we proved in Exercise 8 of Recitation 5 to perform the calculation.
Let Z = log(T ), W = log(X), V = log(Y ), so Z = W + V . According to the convolution formula in
Exercise 8 of Recitation 5, the PDF of Z is
Z ∞
fZ (z) = fW (w)fV (z − w)dw,
−∞

where by change of variables fW (w) = fX (ew )ew , fV (v) = fY (ev )ev . So


Z ∞ Z ∞
w w z−w z−w z
fZ (z) = fX (e )e fY (e )e dw = e fX (ew )fY (ez−w )dw.
−∞ −∞
5-14

Transforming back to T , we have


Z ∞ Z ∞  
1 w log(t)−w t dx
fT (t) = fZ (log t) = fX (e )fY (e )dw = fX (x)fY
t −∞ 0 x x

letting x = ew . Similarly, we can also derive another equivalent formula by integrating over v:
Z ∞  
t dy
fT (t) = fX fY (y)
0 y y

It is worth noting that the PDF of T = XY is not “a convolution with a product instead of a sum”;
R∞
namely, it is not simply 0 fX (x)fY (t/x)dx. Instead, there is an extra x (or y) in the denominator. This
stems from the fact that the transformation (X, Y ) to (XY, X) (or (XY, Y )) is nonlinear, in contrast
to the linear transformation (X, Y ) to (X + Y, X) (or (X + Y, Y )). This nonlinear transformation
contributes to a nontrivial Jacobian.

5.3 Independence and Expectation

5.3.1 Review

Theorem 5.14 (Independence and Expectation). Suppose (X, Y ) are independent r.v.s. Then for any
functions h(X) and q(Y ) whose expectations exist, we have

E[h(X)q(Y )] = E[h(X)]E[q(Y )].

Theorem 5.15 (Independence and MGF). Suppose X and Y are independent and their MGFs MX (t)
and MY (t) exist for t in a neighborhood of 0. Then MX+Y (t) exists for t in a small neighborhood of 0,
and
MX+Y (t) = MX (t)MY (t)

for all t in a neighborhood of 0.

5.3.2 exercise

Exercise 10. Suppose X1 , · · · , Xn are n independent random variables, with Xi having a Poisson (λi )
Pn Pn
distribution, i = 1, · · · , n. Show that i=1 Xi follows a Poisson (λ) distribution, where λ = i=1 λi .

Answer: For a random variable Xi that follows a Poisson (λi ) distribution, the MGF
t
−1)
Mi (t) = eλi (e , −∞ < t < ∞.
5-15

Pn
Put X = i=1 Xi . Then its MGF
 Yn Y
n
eλi (e −1) = eλ(e −1)
n t t
MX (t) = E etΣi=1 Xi = Mi (t) =
i=1 i=1
Pn
where λ = i=1 λi . Note that this is exactly the MGF of the Poisson(λ) distribution (see the derivation
in Recitation 5 Exercise 2). Therefore, X ∼ Poisson(λ). In other words, the sum of n independent
Pn
Poisson (λi ) random variables is a Poisson(λ) random variable with λ = i=1 λi .

Here MGF provides a very convenient way to deal with the sum of i.i.d Poisson random variables. In
Pn
principle, we can also directly derive the PMF of i=1 Xi by applying the convolution formula in the
Appendix of Chapter 3. But that approach would be much more difficult.

5.4 Covariance and Correlation

5.4.1 Review

Definition 5.16 (Covariance). The covariance between r.v.s X and Y is

Cov(X, Y ) = E((X − EX)(Y − EY )).

Multiplying this out and using linearity, we have an equivalent expression:

Cov(X, Y ) = E(XY ) − E(X)E(Y ).

Theorem 5.17. If X and Y are independent, then they are uncorrelated.

Proposition 5.18 (Bilinearity of Covariance). Covariance is bilinear:

• Cov(aX + bY, Z) = a Cov(X, Z) + b Cov(Y, Z) for any constants a, b.

• Cov(X, cZ + dW ) = c Cov(X, Z) + d Cov(X, W ) for any constants c, d.

It follows that for r.v.s X, Y, Z, W and constants a, b, c, d, we have

Cov(aX + bY, cZ + dW ) = ac Cov(X, Z) + ad Cov(X, W ) + bc Cov(Y, Z) + bd Cov(Y, W ).

Proposition 5.19 (Variance and Covariance). For two random variables X, Y , we have

Var(X + Y ) = Var(X) + Var(Y ) + 2 Cov(X, Y ).

For n r.v.s X1 , . . . , Xn ,
X
Var (X1 + · · · + Xn ) = Var (X1 ) + · · · + Var (Xn ) + 2 Cov (Xi , Xj ) .
i<j
5-16

In particular, if X1 , . . . , Xn are independent, then the variance of the their sum is the sum of their
variances: !
X
n X
n
Var Xj = Var (Xj ) .
j=1 j=1

Definition 5.20 (Correlation). The correlation between r.v.s X and Y is

Cov(X, Y )
Corr(X, Y ) = p .
Var(X) Var(Y )

(This is undefined in the degenerate cases Var(X) = 0 or Var(Y ) = 0.)

5.4.2 exercise

Exercise 11. Consider the following method for creating a bivariate Poisson (a joint distribution for
two r.v.s such that both marginals are Poissons). Let X = V + W, Y = V + Z where V, W, Z are i.i.d.
Pois(λ).

1. Find Cov(X, Y )
2. Are X and Y independent? Are they conditionally independent given V ?
3. Find the joint PMF of X, Y (as a sum).

Answer: Before answering the questions, we note that V, W, Z are all Pois(λ) r.v.s, so their means are
all λ and variances are all λ as well.

1. By bilinearity of covariance,

Cov(X, Y ) = Cov(V, V ) + Cov(V, Z) + Cov(W, V ) + Cov(W, Z) = Var(V ) = λ

2. Since X and Y are correlated (with covariance λ > 0 ), they are not independent. An alternative
way to show this is to note that E(Y ) = 2λ but E(Y | X = 0) = λ since if X = 0 occurs then
V = 0 occurs and thus Y = Z.
But X and Y are conditionally independent given V , since the conditional joint PMF is

P (X = x, Y = y | V = v) = P (W = x − v, Z = y − v | V = v)
= P (W = x − v, Z = y − v)
= P (W = x − v)P (Z = y − v)
= P (X = x | V = v)P (Y = y | V = v).

This makes sense intuitively since if we observe that V = v, then X and Y are the independent
r.v.s W and Z, shifted by the constant v.
5-17

3. By (b), a good strategy is to condition on V :


X

P (X = x, Y = y) = P (X = x, Y = y | V = v)P (V = v)
v=0

X
min(x,y)
= P (X = x | V = v)P (Y = y | V = v)P (V = v)
v=0

X
min(x,y)
= P (W = x − v)P (Z = y − v)P (V = v)
v=0

X
min(x,y)
λx−v −λ λy−v −λ λv
= e−λ e e
v=0
(x − v)! (y − v)! v!
X
min(x,y)
λ−v
= e−3λ λx+y ,
v=0
(x − v)!(y − v)!v!

for x and y nonnegative integers. Note that we sum only up to min(x, y) since we know for sure
that V ≤ X and V ≤ Y .

Exercise 12. Let X be the number of distinct birthdays in a group of 110 people (i.e., the number
of days in a year such that at least one person in the group has that birthday). Under the usual
assumptions (no Feb 29, all the other 365 days of the year are equally likely, and the day when one
person is born is independent of the days when the other people are born), find the mean and variance
of X.

Answer: Let Ij be the indicator r.v. for the event that at least one of the people was born on the
P365
jth day of the year, so X = j=1 Ij with Ij ∼ Bern(p), where p = 1 − (364/365)110 . The Ij ’s are
dependent but by linearity, we still have

E(X) = 365p ≈ 95.083.

By symmetry, the variance is


 
365
Var(X) = 365 Var (I1 ) + 2 Cov (I1 , I2 ) .
2
Here Var (I1 ) = p(1 − p) since I1 ∼ Bern(p). To get the covariance, note that Cov (I1 , I2 ) = E (I1 I2 ) −
E (I1 ) E (I2 ) = E (I1 I2 ) − p2 , and E (I1 I2 ) = P (I1 I2 = 1) = P (A1 ∩ A2 ), where Aj is the event that at
least one person was born on the jth day of the year. The probability of the complement is
 110  110
364 363
P (A1 ∪ A2 ) = P (A1 ) + P (A2 ) − P (A1 ∩ A2 ) = 2
c c c c c c

365 365
"  110  110 ! #
364 363
so Var(X) = 365p(1 − p) + 365 · 364 · 1 − 2 − − p2 ≈ 10.019.
365 365

You might also like