Recitation 4 Solution
Recitation 4 Solution
Recitation 4
4.1.1 Review
Definition 4.1 (Uniform distribution). A continuous r.v. U is said to have the Uniform distribution
on the interval (a, b) if its PDF is
1
if a < x < b
b−a
f (x) =
0 otherwise
Definition 4.2 (Exponential distribution). A continuous r.v. X is said to have the Exponential
distribution with parameter λ, where λ > 0, if its PDF is
Definition 4.3 (Standard Normal distribution). A continuous r.v. Z is said to have the standard
Normal distribution if its PDF φ is given by
1
φ(z) = √ e−z /2 ,
2
−∞ < z < ∞
2π
We write this as Z ∼ N (0, 1).
Definition 4.4 (Gamma distribution). A r.v. Y is said to have the Gamma distribution with
parameters a and λ, where a > 0 and λ > 0, if its PDF is
λa a−1 −λy
f (y) = y e , y>0
Γ(a)
R∞
where Γ(a) = 0
xa−1 e−x dx is the Gamma function. We write Y ∼ Gamma(a, λ).
Definition 4.5 (Beta distribution). An r.v. X is said to have the Beta distribution with parameters
a and b, where a > 0 and b > 0, if its PDF is
1 Γ(a)Γ(b)
f (x) = xa−1 (1 − x)b−1 , 0 < x < 1, β(a, b) =
β(a, b) Γ(a + b)
where the constant β(a, b) is chosen to make the PDF integrate to 1. We write this as X ∼ Beta(a, b).
4-1
4-2
4.1.2 Exercise
Exercise 1. Joe lives in a city where buses always arrive exactly on time, with the time between
successive buses fixed at 10 minutes. Having lost his watch, he arrives at the bus stop at a random
time (assume that buses run 24 hours a day, and that the time that Joe arrives is uniformly random
on a particular day).
1. What is the distribution of how long Joe has to wait for the next bus? What is the average time
that Joe has to wait?
2. Given that the bus has not yet arrived after 6 minutes, what is the probability that Joe will have
to wait at least 3 more minutes?
3. Joe moves to a new city with inferior urban planning and where buses are much more erratic.
Now, when any bus arrives, the time until the next bus arrives is an Exponential random variable
with mean 10 minutes. Joe arrives at the bus stop at a random time, not knowing how long ago
the previous bus came. What is the distribution of Joe’s waiting time for the next bus? What is
the average time that Joe has to wait?
Answer:
Exercise 2. Joe is trying to transmit to Donald the answer to a yes-no question, using a noisy channel.
He encodes “yes” as 1 and “no” as 0, and sends the appropriate value. However, the channel adds noise;
specifically, Donald receives what Joe sends plus a N (0, σ 2 ) noise term (the noise is independent of
what Joe sends). If Donald receives a value greater than 1/2 he interprets it as “yes”; otherwise, he
interprets it as “no”.
2. What happens to the result from question 1 if σ is very small? What about if σ is very large?
Explain intuitively why the results in these extreme cases make sense.
Answer:
1. Let a be the value that Joe sends and ϵ be the noise, so B = a + ϵ is what Donald receives. If
a = 1, then Donald will understand correctly if and only if ϵ > −1/2. If a = 0, then Donald will
understand correctly if and only if ϵ ≤ 1/2. By symmetry of the Normal, P (ϵ > −1/2) = P (ϵ ≤
1/2), so the probability that Donald understands does not depend on a. This probability is
1 ϵ 1 1
P ϵ≤ =P ≤ =Φ .
2 σ 2σ 2σ
2. If σ is very small, then
1
Φ ≈ 1,
2σ
since Φ(x) (like any CDF) goes to 1 as x → ∞. This makes sense intuitively: when the standard
deviation σ is very small, the noise ϵ is unlikely to take values very different from 0 (i.e., there is
very little noise), then it’s easy for Donald to understand Joe. If σ is very large, then
1
Φ ≈ Φ(0) = 1/2
2σ
Again this makes sense intuitively: if there is a huge amount of noise, then Joe’s message will get
drowned out (the noise dominates over the signal).
Exercise 3. In studies of anticancer drugs it was found that if mice are injected with cancer cells, the
survival time (in hours) can be modeled with the exponential distribution Expo(λ) with λ = 0.1. What
is the probability that a randomly selected mouse will survive at least 8 hours? At most 12 hours?
Between 8 and 12 hours?
Answer: Let T ∼ Expo(0.1) be the survival time (in hours) of selected mice. Then P (T ≥ 8) =
−0.1×8
1 − (1 − e ) = e−0.1×8 = 0.4493, P (T ≤ 12) = 1 − e−0.1×12 = 0.6988. Combining these two
answers, P (8 ≤ T ≤ 12) = P (T ≤ 12) − P (T < 8) = 0.6988 − (1 − 0.4493) = 0.1481.
Exercise 4. Let T be the lifetime of a certain person (how long that person lives), and let T have
CDF F and PDF f . The survival function G is defined as
1. Explain why h is called the hazard function and in particular, why h(t) is the probability density
for death at time t, given that the person survived up until then.
2. Show that the CDF and PDF are related to hazard function by
Z t Z t
F (t) = 1 − exp − h(s)ds , f (t) = h(t) exp − h(s)ds , ∀t > 0.
0 0
3. Show that an Exponential r.v. has constant hazard function and conversely, if the hazard function
of T is a constant then T must be Expo(λ) for some λ.
4. What is the hazard function of a Weibull(γ, λ) distribution? (A Weibull(γ, λ) distribution has
CDF F (x) = 1 − exp(−(λx)γ ) for x ≥ 0 and F (x) = 0 otherwise, where γ > 0 is a parameter).
Answer:
Thus the hazard function gives a natural way to measure the instantaneous hazard of death at
some time since it accounts for the person having survived up until that time.
2. Since G(s) = 1 − F (s) for any s > 0, we have
f (s) F ′ (s) −G′ (s)
h(s) = = = .
G(s) G(s) G(s)
Integrating both sides with respect to t gives the following for any t > 0:
Z t Z t ′ Z t
G (s)
h(s)ds = − ds = − d log G(s).
0 0 G(s) 0
4-5
Obviously, Z t
d log G(s) = log G (t) − log G(0) = log G (t) ,
0
d
Rt
since dt 0
h(s)ds = h(t) (by the fundamental theorem of calculus).
3. Let T ∼ Expo(λ). Then the hazard function is
λe−λt
h(t) = = λ, ∀t > 0,
e−λt
which is a constant.
Conversely, suppose that h(t) = λ for all t, with λ > 0 a constant. According to the conclusion
in Question 2, we have
Z t
F (t) = 1 − exp(− h(s)ds) = 1 − exp(−λt).
0
Thus, T ∼ Expo(λ).
4. The CDF and PDF of Weibull distribution are as follows
F (t) = 1 − exp(−(λt)γ )
So,we have
We can observe that when γ = 1, the hazard rate is a constant so the Weibull distribution reduces
to an Exponential distribution. When γ > 1, the hazard function increases with t. This can be
used to model the wear-and-tear effects in survival times. For example, let the lifetime of a certain
machine be a random variable X. If X has an Exponential distribution, then X is memeoryless
4-6
and has a constant hazard rate. This means that no matter how long the machine has been used,
its remaining lifetime has the same distribution as a new machine. However, in practice, the
machine often ages as it is used, thus the risk of machine breakdown should increase as times goes
by. This can be captured by an increasing hazard function.
Exercise 5. Let F be a CDF which is a continuous function and strictly increasing on the support
of the distribution. This ensures that the inverse function F −1 exists, as a function from (0, 1) to R.
Prove the following statements:
Answer:
so the CDF of X is F , as claimed. For the last equality, we used the fact that P (U ≤ u) = u for
u ∈ (0, 1) since U ∼ Unif(0, 1).
2. Let X have CDF F , and find the CDF of Y = F (X). Since F (·) takes values in (0, 1), P (Y ≤ y)
equals 0 for y ≤ 0 and equals 1 for y ≥ 1. For any y ∈ (0, 1),
P (Y ≤ y) = P (F (X) ≤ y) = P X ≤ F −1 (y) = F F −1 (y) = y.
Exercise 6. Suppose that a random variable U ∼ Unif(0, 1). Based on the results in Exercise 1,
discuss how you can construct a random variable X with the following CDFs from U :
3. FX (x) = 1 − e−λx for x > 0. (This is the Exponential distribution with parameter λ).
Answer:
1. Suppose we have U ∼ Unif(0, 1) and wish to generate a Logistic random variable. Statement 1
in Exercise 5 says that F −1 (U ) ∼ Logistic if F is the CDF of the Logistic distribution. Thus we
first invert the CDF to get F −1 :
u
F −1 (u) = log .
1−u
log(1 − u)
F −1 (u) = − ,
λ
Exercise 7. In Exercises 5 and 6, we consider continuous r.v.s X. Discuss how to construct a discrete
Pn
random variable taking values in {0, . . . , n} with P (X = j) = pj such that j=0 pj = 1 from U ∼
Unif(0, 1).
4-8
4.1: Given a PMF, chop up the interval (0, 1) into pieces, with lengths given by the PMF values.
Answer: Suppose we want to use U ∼ Unif(0, 1) to construct a discrete r.v. X with pj = P (X = j) for
j = 0, 1, 2, . . . , n. As illustrated in following Figure 4.1, we can chop up the interval (0, 1) into pieces
of lengths p0 , p1 , . . . , pn . By the properties of a valid PMF, the sum of the pj ’s is 1, so this perfectly
divides up the interval, without overshooting or undershooting.
Now define X to be the r.v. which equals 0 if U falls into the p0 interval, 1 if U falls into the p1 interval,
2 if U falls into the p2 interval, and so on. Then X is a discrete r.v. taking on values 0 through n.
The probability that X = j is the probability that U falls into the interval of length pj . But for a Unif
(0, 1) r.v., the probability that this r.v. falls into an interval in (0, 1) is just the length of this interval,
so P (X = j) is precisely pj , as desired!
4.3.1 Review
For a continuous and strictly monotonic transformation, we can also apply the following change-of-
variable formula.
Theorem 4.6 (Change of variables). Let X be a continuous r.v. with PDF fX , and let Y = g(X),
where g is differentiable and strictly increasing (or strictly decreasing) over the support of X. Then the
PDF of Y is given by
dx
fY (y) = fX (x)
dy
4-9
where x = g −1 (y). The support of Y is all g(x) with x in the support of X, namely, Supp(Y ) =
{y = g(x) : x ∈ Supp(X)}.
4.3.2 Exercise
Exercise 8. If the length of a side of a square X is random with the PDF fX (x) = x/8, 0 < x < 4,
and Y is the area of the square, find the PDF of Y .
Thus, the PDF of Y is fY (y) = FY′ (y) = 1/16, 0 < y < 16. That is, the area Y is uniform on (0, 16).
Exercise 9. Find the PDF of the random variable Y in the following settings:
Answer:
1. Given that X is uniformly distributed in the interval [−1, 1], the probability density function
(PDF) of X is:
1 for − 1 ≤ x ≤ 1
2
fX (x) =
0 otherwise
To find the distribution of Y = |X|, we consider the range of Y :
Thus, the range of Y is [0, 1]. Now, we can derive the CDF of Y : for 0 ≤ y ≤ 1,
1
FY (y) = P (|X| ≤ y) = P (−y ≤ X ≤ y) = 2y × =y
2
Thus the CDF of Y is:
0 for y < 0
FY (y) = y for 0 ≤ y < 1
1 for y ≥ 1.
4-10
Thus, the range of Y is [0, 1]. Now, we can derive the CDF of Y :
√ √ √ 1 √
FY (y) = P (X 2 ≤ y) = P (− y ≤ X ≤ y) = 2 y × = y
2
Thus the CDF of Y is:
0 for y < 0
FY (y) = √y for 0 ≤ y < 1
1 for y ≥ 1.
FY (y) = P (Y ≤ y) = P (X 2 ≤ y)
For 0 ≤ y < 1:
√ √ 1√
FY (y) = P (X 2 ≤ y) = P (− y ≤ X ≤ y) = y
2
For 1 ≤ y ≤ 9:
√ 1 √
FY (y) = P (X 2 ≤ y) = P (X ≤ y) = ( y + 1)
4
Thus, the CDF is:
0 for y < 0
1 √y for 0 ≤ y < 1
2
FY (y) =
1 √
( y + 1) for 1 ≤ y ≤ 9
4
1 for y > 9
4.4.1 Review
Definition 4.7 (Expectation of a discrete r.v.). The expected value (also called the expectation
or mean) of a discrete r.v. X with distinct possible values x1 , x2 , . . . is defined by
X
∞
E(X) = xj P (X = xj )
j=1
4-12
If the support is finite, then this is replaced by a finite sum. We can also write
X
E(X) = x pX (x)
|{z} | {z }
x:x∈Supp(X) value PMF at x
Definition 4.8 (Expectation of a continuous r.v.). The expected value of a continuous r.v. X with
PDF f is
Z ∞
E(X) = xf (x)dx
−∞
Theorem 4.9 (Linearity of expectation). For any r.v.s X, Y (whose expectations exist) and any
constant c,
Theorem 4.10 (Law of the unconscious statistician (LOTUS)). The expected value or mean of a
random variable g(X), denoted by E(g(X)), is
R
∞ g(x)fX (x)dx if X is continuous
−∞
E(g(X)) = P P
if X is discrete
x∈Supp(X) g(x)pX (x) = x∈Supp(X) g(x)P (X = x)
4.4.2 Exercise
Exercise 10. The PMF for X = the number of major defects on a randomly selected appliance of a
certain type is
x 0 1 2 3 4
p(x) 0.08 0.15 0.45 0.27 0.05
Compute the following:
4-13
1. E(X).
2. V (X) directly from the definition.
3. The standard deviation of X.
4. E(X 2 ) and E(X 3 ).
5. V (X) using the shortcut formula.
Answer:
P4
1. E(X) = x=0 xp(x) = 0 × 0.08 + 1 × 0.15 + 2 × 0.45 + 3 × 0.27 + 4 × 0.05 = 2.06;
P4 2
2. V (X) = x=0 (x − E(X)) p(x) = (0 − 2.06)2 × 0.08 + (1 − 2.06)2 × 0.15 + (2 − 2.06)2 × 0.45 +
(3 − 2.06)2 × 0.27 + (4 − 2.06)2 × 0.05 = 0.9364;
p √
3. SD(X) = V (X) = 0.9364 = 0.9677;
P4
4. E (X 2 ) = x=0 x2 p(x) = 02 × 0.08 + 12 × 0.15 + 22 × 0.45 + 32 × 0.27 + 42 × 0.05 = 5.18;
P4
E (X 3 ) = x=0 x3 p(x) = 03 × 0.08 + 13 × 0.15 + 23 × 0.45 + 33 × 0.27 + 43 × 0.05 = 14.24;
5. Using the shortcut formula, V (X) = E (X 2 ) − (E(X))2 = 5.18 − 2.062 = 0.9364, the same answer
as 2.
Exercise 11. Let Z ∼ N (0, 1), and c be a nonnegative constant. Find E(max(Z, 0)) and E(min(Z, 0)),
in terms of the standard Normal CDF Φ and PDF φ.
Answer: By LOTUS,
Z ∞
E(max(Z, 0)) = max(z, 0)φ(z)dz
−∞
Z ∞
= zφ(z)dz
0
∞
−1
= √ e−z /2
2
2π 0
1
=√ .
2π
Z ∞
E(min(Z, 0)) = min(z, 0)φ(z)dz
−∞
Z 0
= zφ(z)dz
−∞
Z ∞ Z ∞
= zφ(z)dz − zφ(z)dz
−∞ 0
1 1
= 0 − √ = −√ .
2π 2π
4-14
Remark: we can use these two results to obtain E(|Z|). Note that |Z| = max(Z, 0) − min(Z, 0). So
r
2
E(|Z|) = E(max(Z, 0) − min(Z, 0)) = E(max(Z, 0)) − E(min(Z, 0)) = .
π
4.5.1 Review
Definition 4.14 (Moment generating function). The moment generating function (MGF) of an r.v.
X is M (t) = E etX , as a function of t, if this is finite on some open interval (−a, a) containing 0.
Otherwise we say the MGF of X does not exist.
Proposition 4.15 (MGF of location-scale transformation). If X has MGF M (t), then the MGF of
a + bX is
E et(a+bX) = E eat ebtX = eat E ebtX = eat M (bt).
Theorem 4.16 (Moments via derivatives of the MGF). Given the MGF of X, we can get the nth
moment of X by evaluating the nth derivative of the MGF at 0:
dn
E (X n ) = M (n) (0) = M (t)|t=0 .
dtn
Theorem 4.17 (MGF determines the distribution). The MGF of a random variable determines its
distribution: if two r.v.s have the same MGF, they must have the same distribution.
Mathematically, let X, Y be two r.v.s with MGFs MX , MY and CDFs FX , FY . If MX (t) = MY (t) for
all t in an interval (−a, a) containing 0, then FX (u) = FY (u) for all u.
• The fact that a MGF uniquely determines a distribution is very useful for
4.5.2 Exercises
1. Use LOTUS and the fact that E(X) = 1/λ and Var(X) = 1/λ2 , and integration by parts (
).
4-15
Answer:
1. By LOTUS, we have:
Z
∞
E X 3
= x3 λe−λx dx
Z 0
∞
= −x3 de−λx
0
Z ∞
∞
=− x3 e−λx 0 +3 x2 e−λx dx (Integration by Parts)
Z 0
3 ∞ 2 −λx
= x λe dx
λ 0
3 3 6
= E X2 = Var(X) + (EX)2 = 3 .
λ λ λ
2. The MGF of an Exponential random variable with rate parameter λ is
Z ∞
M (t) = E e tX
= etx λe−λx dx
0
Z ∞
=λ e−(λ−t)x dx
0
Z ∞
λ
= (λ − t)e−(λ−t)x dx
λ−t 0
λ
= , ∀t < λ.
λ−t
R∞
Here we use the fact that 0 (λ − t)e−(λ−t)x dx = 1 for any t < λ. You can easily verify this or
you can simply think of (λ − t)e−(λ−t)x as the density function of Expo(λ − t) for λ > 0. Thus
there is an open interval containing 0 on which M (t) is finite.
To get the third moment, we can take the third derivative of the MGF and evaluate at t = 0 :
−4
d3 M (t) 6 t 6
E X 3
= = 3 1− =
dt3 t=0 λ λ λ3
t=0
But a much nicer way to use the MGF here is via pattern recognition: note that M (t) looks like
it came from a geometric series:
∞ n
X X∞
λ 1 t n! tn
M (t) = = = = , ∀|t| < λ.
λ−t 1− t
λ n=0
λ n=0
λn n!
tn dn
Thus the coefficient of n!
here is exactly dtn
M (t)|t=0 , namely the nth moment of X. Therefore,
n n! 6
we have E (X ) = λn
for all nonnegative integers n. So again we get E (X 3 ) = λ3
. This method
not only avoids the need to compute the 3rd derivative of M (t) directly, but also it gives us all
the moments of X.
Exercise 13. Let M (t) be the moment generating function (MGF) of a random variable X. The
cumulant generating function is defined to be g(t) = ln M (t). Expanding g(t) as a Taylor series
X
∞
cj
g(t) = tj
j=1
j!
(the summation starts at j = 1 because g(0) = 0 ), the coefficient cj is called the j-th cumulant of X.
Find the j-th cumulant of X, for all j ≥ 1.
Answer:
M ′ (0)
1. g(0) = ln M (0) = ln E e0·X = 0, g ′ (0) = M (0)
= E[X]
1
= E[X],
′′ ′ 2
′′
g (0) = M (0)M (0)−(M (0))
(M (0))2
= E[X ] − (E[X]) = V ar(X).
2 2
2. Recall that the probability mass function (PMF) of Poisson distribution with parameter λ > 0 is
λk −λ
P (X = k) = k!
e . Then, by definition, the MGF of X has the form
X
∞ k X
∞
(et λ)k
tk λ −λ −λ t
−λ
M (t) = E[e tX
]= e e =e = eλe
k=0
k! k=0
k!
Therefore, the cumulant generating function g(t) = ln M (t) can be expaned as a Taylor series
with the form
X
∞ k
t
g(t) = ln M (t) = λ(et − 1) = λ
k=1
k!
which means that the coefficient of k-th cumulant is ck = λ for all k ≥ 1.