Recitation 5 Solution
Recitation 5 Solution
Recitation 5
5.1.1 Review
Definition 5.1 (Joint CDF). The joint CDF of r.v.s X and Y is the function FX,Y below:
• Just like the univariate CDFs, a valid joint CDF also has to satisfy several basic properties: for
any (x, y) ∈ R2 ,
Definition 5.2 (Joint PMF). The joint PMF of discrete r.v.s X and Y is the function pX,Y given by
Definition 5.3 (Marginal PMF). For discrete r.v.s X and Y , the marginal PMF of X is
X X
pX (x) = P (X = x) = P (X = x, Y = y) = pX,Y (x, y).
y y
P
Similarly, the marginal PMF of Y is pY (y) = x pX,Y (x, y).
Definition 5.4 (Conditional PMF). For discrete r.v.s X and Y , the conditional PMF of Y given
X = x, such that P (X = x) > 0, is defined as
P (X = x, Y = y)
pY |X (y | x) = P (Y = y | X = x) = .
P (X = x)
This is viewed as a function of y for a fixed x. The conditional PMF of X given Y can be defined
symmetrically.
5-1
5-2
Theorem 5.5 (Discrete form of Bayes’ rule and law of total probability). For discrete r.v.s X and Y ,
we have the following discrete form of Bayes’ rule,
P (X = x | Y = y)P (Y = y)
P (Y = y | X = x) = .
P (X = x)
We also have the discrete form of law of total probability:
X
P (X = x) = P (X = x | Y = y)P (Y = y).
y
Definition 5.6 (Joint PDF). If X and Y are continuous with joint CDF FX,Y , their joint PDF is
the derivative of the joint CDF with respect to x and y:
∂2
fX,Y (x, y) = FX,Y (x, y).
∂x∂y
The support of (X, Y ) is Supp(X, Y ) = {(x, y) ∈ R2 : fX,Y (x, y) > 0}.
Definition 5.7 (Marginal PDF). For continuous r.v.s X and Y with joint PDF fX,Y , the marginal
PDF of X is Z ∞
fX (x) = fX,Y (x, y)dy.
−∞
R∞
Similarly, the marginal PDF of Y is fY (y) = f
−∞ X,Y
(x, y)dx.
Definition 5.8 (Conditional PDF). For continuous r.v.s X and Y with joint PDF fX,Y , the condi-
tional PDF of Y given X = x is
fX,Y (x, y)
fY |X (y | x) = ,
fX (x)
for all x with fX (x) > 0. This is considered as a function of y for the fixed x. The conditional PDF of
X given Y = y can be defined symmetrically.
Theorem 5.9 (Continuous form of Bayes’ rule and law of total probability). For continuous r.v.s X
and Y , we have the following continuous form of Bayes’ rule:
fX|Y (x | y)fY (y)
fY |X (y | x) = .
fX (x)
And we have the following continuous form of the law of total probability:
Z ∞
fX (x) = fX|Y (x | y)fY (y)dy.
−∞
Theorem 5.10. Two random variables X and Y are independent if and only if for all x and y,
P (X = x, Y = y) = P (X = x)P (Y = y)
5-3
5.1.2 Exercise
Answer:
1. To make the joint density function fX,Y (x, y) valid, we have to ensure that
Z 1Z y
1= fX,Y (x, y)dxdy
0 0
Z 1 Z y
= cxy dx dy
0 0
Z 1 3
cy c
= dy =
0 2 8
which implies the normalizing constant c = 8.
2. They are not indpendent becuase the support of X relies on the value of Y . This can be also
seen from the answers to questions 3 and 4 below: apparently, the joint PDF cannot be written
as the product of the two marginal PDFs below, and the conditional PDF below depends on x.
5-4
Exercise 2. A random point (X, Y, Z) is chosen uniformly in the ball B = {(x, y, z) : x2 +y 2 +z 2 ≤ 1}.
Answer:
1. Notice that the volume of a unit ball is Volume(B) = 34 π, then the joint PDF of X, Y, Z has the
form 3
, if x2 + y 2 + z 2 ≤ 1
f (x, y, z) = 4π
0, otherwise
given the point (X, Y, Z) is uniformly chosen from B. One can readily verify that
ZZZ
3
f (x, y, z) dxdydz = · Volume(B) = 1
B 4π
2. The joint PDF of X, Y could be derived from the joint PDF of X, Y, Z by marginalizing out the
variable Z. Notice that x2 + y 2 + z 2 ≤ 1, which implies that z 2 ≤ 1 − (x2 + y 2 ), we have
Z √1−x2 −y2
3 p
fX,Y (x, y) = √ f (x, y, z)dz = 1 − x2 − y 2 ,
− 1−x2 −y 2 2π
Exercise 3. Let X, Y, Z be r.v.s such that X ∼ N (0, 1) and conditional on X = x, Y and Z are i.i.d.
N (x, 1).
Answer:
2π
2. We know that both Y, Z are dependent of X. This means that knowing the value of Y provides
information about X, which in turn is informative about Z. Thus, although Y, Z are conditionally
independent given X, they are unconditionally dependent.
3. The joint PDF of Y and Z can be obtained by marginalizing over X on fX,Y,Z (x, y, z), which is
Z ∞
fY,Z (y, z) = fX,Y,Z (x, y, z) dx
−∞
3 Z ∞
1 x2 +(y−x)2 +(z−x)2
= √ e− 2 dx
2π −∞
3 Z ∞
1 −y
2 +z 2 −yz 3(x−(y+z)/3)2
= √ e 3 e− 2 dx
2π −∞
3 r
1 2π−y 1
2 +z 2 −yz y 2 +z 2 −yz
= √ e = √ e−3 3
2π 3 2 3π
R∞ 3(x−(y+z)/3)2
q
Here the second last equality uses the fact that −∞ e− 2 dx = 2π 3
. This is because
1 3(x−(y+z)/3)2
q e− 2
2π
3
Exercise 4. Please answer the following questions about Poisson distribution and Binomial distribu-
tion:
1. If X ∼ Pois(λp), Y ∼ Pois(λq), and X and Y are independent, then what is the distribution of
N = X + Y ? What is the conditional distribution of X | N = n?
2. If N ∼ Pois(λ) and X | N = n ∼ Bin(n, p), then what is the marginal distribution of X and the
marginal distribution of Y = N − X? Are X and Y are independent or dependent?
Answer:
X
k
P (N = k) = P (X + Y = k) = P (X + Y = k | X = j) · P (X = j)
j=0
X
k
= P (Y = k − j | X = j) · P (X = j)
j=0
X
k
= P (Y = k − j) · P (X = j) (Independence of Y , X)
j=0
Xk
(λq)k−j −λq (λp)j −λp
= e e
j=0
(k − j)! j!
k
e−λ(p+q) X k
= (λp)j (λq)k−j
k! j=0
j
e−λ(p+q) (λ(p + q))k
=
k!
where the last step used the binomial theorem. So we have N ∼ Poisson(λ(p + q)). Meanwhile,
the conditional distribution of X | N = n is
P (X = j, Y = n − j) P (X = j)P (Y = n − j)
P (X = j | N = n) = =
P (N = n) P (N = n)
(λp)j −λp (λq)n−j −λq
j!
e (n−j)!
e
= (λ(p+q))n −λ(p+q)
n!
e
j n−j
n p q
=
j p+q p+q
P (Y = n − k | X = k) = P (N = n | X = k)
P (X = k | N = n) · P (N = n)
=
P (X = k)
n
n k
k
p (1 − p)n−k e−λ λn!
= k
e−λp (λp)
k!
(λ(1 − p))n−k
= e−λ(1−p) = P (Y = n − k)
(n − k)!
5-8
for all k ≥ 0. Therefore, this reuslt illustrate that X and Y are indeed independent.
5.2.1 Review
Definition 5.11. When X1 , X2 are discrete r.v.s, Y1 , Y2 are also discrete, and we can easily get the
joint PMF of (Y1 , Y2 ) from the PMF of (X1 , X2 ):
Definition 5.12. When X1 , X2 are continuous random variables, we can obtain the joint CDF of
(Y1 , Y2 ) from the joint PDF of (X1 , X2 ):
where B(y1 , y2 ) = {(x1 , x2 ) ∈ Supp(X1 , X2 ) : g1 (x1 , x2 ) ≤ y1 , g2 (x1 , x2 ) ≤ y2 }. If the CDF above is also
differentiable in y1 , y2 , then we can take its derivative with respect to y1 , y2 to obtain the joint PDF
fY1 ,Y2 .
Theorem 5.13 (Change of two variables). Let X = (X1 , X2 ) be a continuous random vector with joint
PDF fX and support set A = {(x1 , x2 ) : fX (x1 , x2 ) > 0}. Let g : A → R2 be a continuously differentiable
and invertible function whose range is given by B = {(y1 , y2 ) : y1 = g1 (x1 , x2 ), y2 = g2 (x1 , x2 ), (x1 , x2 ) ∈
A}.
Let Y = g(X), and mirror this by letting y = g(x). Accordingly, we also have X = g −1 (Y) and
x = g −1 (y). Also assume that the determinant of the Jacobian matrix ∂x/∂y is never 0 on B. Then
the joint PDF of Y is
∂x
fY (y) = fX (x) · | | for y ∈ B,
∂y
and 0 otherwise, where x = g −1 (y). (The inner bars around the Jacobian say to take the determinant
and the outer bars say to take the absolute value.)
5-9
5.2.2 exercise
Exercise 5. Let T1 be the lifetime of a refrigerator and T2 be the lifetime of a stove. Assume that
T1 ∼ Expo(λ1 ) and T2 ∼ Expo(λ2 ) are independent.
Answer:
1. We just need to integrate the joint PDF of T1 and T2 over the appropriate region, which is all
(t1 , t2 ) with t1 > 0, t2 > 0, and t1 < t2 . This yields
Z ∞ Z t2
P (T1 < T2 ) = fT1 ,T2 (t1 , t2 )dt1 dt2
0 0
Z ∞ Z t2
= fT1 (t1 )fT2 (t2 )dt1 dt2 (Independence of T1 , T2 )
0 0
Z ∞ Z t2
= λ1 e−λ1 t1 λ2 e−λ2 t2 dt1 dt2
0 0
Z ∞ Z t 2
−λ1 t1
= λ1 e dt1 λ2 e−λ2 t2 dt2
Z ∞
0 0
2. We can find the distribution of min(T1 , T2 ) by considering its survival function. By indenpedence,
This shows that the random variable min(T1 , T2 ) follows from Expo(λ1 + λ2 ).
The distribution of the time when both appliances fail, namely, max(T1 , T2 ), takes the form
Answer: There are two equivalent ways to derive the PDF of T . One is through the law of total
probability, and the other is through the change-of-variable formula.
Approach 1: Law of total probability. Using the law of total probability and the fact that X and
Y are independent, we have
Z ∞
FT (t) = P (X + Y ≤ t) = P (X + Y ≤ t | X = x)fX (x)dx
−∞
Z ∞
= P (Y ≤ t − x | X = x)fX (x)dx
−∞
Z ∞
= P (Y ≤ t − x)fX (x)dx by independence
−∞
Z ∞
= FY (t − x)fX (x)dx
−∞
where we use the technique of Leibniz integral rule. This equation is known as the convolution integral
formula.
The formula
Z ∞
fX+Y (t) = fX (x)fY (t − x)dx
−∞
is called convolution formula. You can check out the appendix in Chapter 3 for more details.
Exercise 7. If X, Y are i.i.d random variables with Expo(λ) distribution, then what is the PDF of
T =X +Y?
Answer: When X and Y are i.i.d. random variables with Exp(λ) distributions, the PDF of T = X + Y
has the following form: for any t > 0,
Z ∞
fT (t) = fY (t − x)fX (x)dx
−∞
Z t
= λe−λ(t−x) λe−λx dx
0
Z t
2 −λt
=λ e dx = λ2 te−λt ,
0
where we restricted the integral to be from 0 to t since we need t − x > 0 and x > 0. Such PDF fT (t)
is known as the Gamma(2, λ) distribution.
Find the joint PDF of (X, Y ). Are they independent? What are their marginal distributions?
Since we can recover (U, T ) from (X, Y ), the transformation is invertible. The Jacobian matrix
√ !
∂(x, y) − 2t sin u √12t cos u
= √
∂(u, t) 2t cos u √1 sin u 2t
− sin2 u − cos2 u = 1.
This implies
∂(u, t) ∂(x, y)
| | = 1/| | = 1.
∂(x, y) ∂(u, t)
√ √
Then letting x = 2t cos u, y = 2t sin u to mirror the transformation from (U, T ) to (X, Y ), we have
∂(u, t)
fX,Y (x, y) = fU,T (u, t) · | |
∂(x, y)
1 −t
= e ·1
2π
1 − 12 (x2 +y2 )
= e
2π
1 1
= √ e−x /2 · √ e−y /2 ,
2 2
2π 2π
for all real x and y. We recognize the joint PDF as the product of two standard Normal PDFs, so X
and Y are i.i.d. N (0, 1) r.v.s!
Exercise 9. Let X and Y be independent positive r.v.s, with PDFs fX and fY respectively, and
consider the product T = XY . Find the PDF of T in terms of fX and fY .
where we use the technique of Leibniz integral rule in the second equality when we exchange the order
of differentiation and integration.
Similarly, we can also derive another equivalent formula by conditioning on Y = y:
Z ∞
t dy
fT (t) = fX fY (y)
0 y y
letting x = ew . Similarly, we can also derive another equivalent formula by integrating over v:
Z ∞
t dy
fT (t) = fX fY (y)
0 y y
It is worth noting that the PDF of T = XY is not “a convolution with a product instead of a sum”;
R∞
namely, it is not simply 0 fX (x)fY (t/x)dx. Instead, there is an extra x (or y) in the denominator. This
stems from the fact that the transformation (X, Y ) to (XY, X) (or (XY, Y )) is nonlinear, in contrast
to the linear transformation (X, Y ) to (X + Y, X) (or (X + Y, Y )). This nonlinear transformation
contributes to a nontrivial Jacobian.
5.3.1 Review
Theorem 5.14 (Independence and Expectation). Suppose (X, Y ) are independent r.v.s. Then for any
functions h(X) and q(Y ) whose expectations exist, we have
Theorem 5.15 (Independence and MGF). Suppose X and Y are independent and their MGFs MX (t)
and MY (t) exist for t in a neighborhood of 0. Then MX+Y (t) exists for t in a small neighborhood of 0,
and
MX+Y (t) = MX (t)MY (t)
5.3.2 exercise
Exercise 10. Suppose X1 , · · · , Xn are n independent random variables, with Xi having a Poisson (λi )
Pn Pn
distribution, i = 1, · · · , n. Show that i=1 Xi follows a Poisson (λ) distribution, where λ = i=1 λi .
Answer: For a random variable Xi that follows a Poisson (λi ) distribution, the MGF
t
−1)
Mi (t) = eλi (e , −∞ < t < ∞.
5-15
Pn
Put X = i=1 Xi . Then its MGF
Yn Y
n
eλi (e −1) = eλ(e −1)
n t t
MX (t) = E etΣi=1 Xi = Mi (t) =
i=1 i=1
Pn
where λ = i=1 λi . Note that this is exactly the MGF of the Poisson(λ) distribution (see the derivation
in Recitation 5 Exercise 2). Therefore, X ∼ Poisson(λ). In other words, the sum of n independent
Pn
Poisson (λi ) random variables is a Poisson(λ) random variable with λ = i=1 λi .
Here MGF provides a very convenient way to deal with the sum of i.i.d Poisson random variables. In
Pn
principle, we can also directly derive the PMF of i=1 Xi by applying the convolution formula in the
Appendix of Chapter 3. But that approach would be much more difficult.
5.4.1 Review
Proposition 5.19 (Variance and Covariance). For two random variables X, Y , we have
For n r.v.s X1 , . . . , Xn ,
X
Var (X1 + · · · + Xn ) = Var (X1 ) + · · · + Var (Xn ) + 2 Cov (Xi , Xj ) .
i<j
5-16
In particular, if X1 , . . . , Xn are independent, then the variance of the their sum is the sum of their
variances: !
X
n X
n
Var Xj = Var (Xj ) .
j=1 j=1
Cov(X, Y )
Corr(X, Y ) = p .
Var(X) Var(Y )
5.4.2 exercise
Exercise 11. Consider the following method for creating a bivariate Poisson (a joint distribution for
two r.v.s such that both marginals are Poissons). Let X = V + W, Y = V + Z where V, W, Z are i.i.d.
Pois(λ).
1. Find Cov(X, Y )
2. Are X and Y independent? Are they conditionally independent given V ?
3. Find the joint PMF of X, Y (as a sum).
Answer: Before answering the questions, we note that V, W, Z are all Pois(λ) r.v.s, so their means are
all λ and variances are all λ as well.
1. By bilinearity of covariance,
2. Since X and Y are correlated (with covariance λ > 0 ), they are not independent. An alternative
way to show this is to note that E(Y ) = 2λ but E(Y | X = 0) = λ since if X = 0 occurs then
V = 0 occurs and thus Y = Z.
But X and Y are conditionally independent given V , since the conditional joint PMF is
P (X = x, Y = y | V = v) = P (W = x − v, Z = y − v | V = v)
= P (W = x − v, Z = y − v)
= P (W = x − v)P (Z = y − v)
= P (X = x | V = v)P (Y = y | V = v).
This makes sense intuitively since if we observe that V = v, then X and Y are the independent
r.v.s W and Z, shifted by the constant v.
5-17
X
min(x,y)
= P (X = x | V = v)P (Y = y | V = v)P (V = v)
v=0
X
min(x,y)
= P (W = x − v)P (Z = y − v)P (V = v)
v=0
X
min(x,y)
λx−v −λ λy−v −λ λv
= e−λ e e
v=0
(x − v)! (y − v)! v!
X
min(x,y)
λ−v
= e−3λ λx+y ,
v=0
(x − v)!(y − v)!v!
for x and y nonnegative integers. Note that we sum only up to min(x, y) since we know for sure
that V ≤ X and V ≤ Y .
Exercise 12. Let X be the number of distinct birthdays in a group of 110 people (i.e., the number
of days in a year such that at least one person in the group has that birthday). Under the usual
assumptions (no Feb 29, all the other 365 days of the year are equally likely, and the day when one
person is born is independent of the days when the other people are born), find the mean and variance
of X.
Answer: Let Ij be the indicator r.v. for the event that at least one of the people was born on the
P365
jth day of the year, so X = j=1 Ij with Ij ∼ Bern(p), where p = 1 − (364/365)110 . The Ij ’s are
dependent but by linearity, we still have