0% found this document useful (0 votes)
19 views16 pages

Recitation 4 Solution

The document provides definitions and properties of various continuous probability distributions, including Uniform, Exponential, Standard Normal, Gamma, and Beta distributions. It includes exercises related to waiting times for buses, transmission of messages over noisy channels, survival times of mice, and the hazard function in survival analysis. Additionally, it discusses the universality of the Uniform distribution and its implications in probability theory.

Uploaded by

a2671631196
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views16 pages

Recitation 4 Solution

The document provides definitions and properties of various continuous probability distributions, including Uniform, Exponential, Standard Normal, Gamma, and Beta distributions. It includes exercises related to waiting times for buses, transmission of messages over noisy channels, survival times of mice, and the hazard function in survival analysis. Additionally, it discusses the universality of the Uniform distribution and its implications in probability theory.

Uploaded by

a2671631196
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

10510134: Probability Theory and Mathematical Statistics Fall 2023

Recitation 4

4.1 Famous Continuous Distributions

4.1.1 Review

Definition 4.1 (Uniform distribution). A continuous r.v. U is said to have the Uniform distribution
on the interval (a, b) if its PDF is

 1
if a < x < b
b−a
f (x) =
0 otherwise

We denote this by U ∼ Unif(a, b).

Definition 4.2 (Exponential distribution). A continuous r.v. X is said to have the Exponential
distribution with parameter λ, where λ > 0, if its PDF is

f (x) = λe−λx , x>0

We denote this by X ∼ Expo(λ).

Definition 4.3 (Standard Normal distribution). A continuous r.v. Z is said to have the standard
Normal distribution if its PDF φ is given by
1
φ(z) = √ e−z /2 ,
2
−∞ < z < ∞

We write this as Z ∼ N (0, 1).

Definition 4.4 (Gamma distribution). A r.v. Y is said to have the Gamma distribution with
parameters a and λ, where a > 0 and λ > 0, if its PDF is
λa a−1 −λy
f (y) = y e , y>0
Γ(a)
R∞
where Γ(a) = 0
xa−1 e−x dx is the Gamma function. We write Y ∼ Gamma(a, λ).

Definition 4.5 (Beta distribution). An r.v. X is said to have the Beta distribution with parameters
a and b, where a > 0 and b > 0, if its PDF is
1 Γ(a)Γ(b)
f (x) = xa−1 (1 − x)b−1 , 0 < x < 1, β(a, b) =
β(a, b) Γ(a + b)
where the constant β(a, b) is chosen to make the PDF integrate to 1. We write this as X ∼ Beta(a, b).

4-1
4-2

4.1.2 Exercise

Exercise 1. Joe lives in a city where buses always arrive exactly on time, with the time between
successive buses fixed at 10 minutes. Having lost his watch, he arrives at the bus stop at a random
time (assume that buses run 24 hours a day, and that the time that Joe arrives is uniformly random
on a particular day).

1. What is the distribution of how long Joe has to wait for the next bus? What is the average time
that Joe has to wait?
2. Given that the bus has not yet arrived after 6 minutes, what is the probability that Joe will have
to wait at least 3 more minutes?
3. Joe moves to a new city with inferior urban planning and where buses are much more erratic.
Now, when any bus arrives, the time until the next bus arrives is an Exponential random variable
with mean 10 minutes. Joe arrives at the bus stop at a random time, not knowing how long ago
the previous bus came. What is the distribution of Joe’s waiting time for the next bus? What is
the average time that Joe has to wait?

Answer:

1. The distribution is Uniform on (0, 10), so the mean is 5 minutes.


2. Let T be the waiting time. Then
P (T ≥ 9, T > 6) P (T ≥ 9) 1/10 1
P (T ≥ 6 + 3 | T > 6) = = = = .
P (T > 6) P (T > 6) 4/10 4
In particular, Joe’s waiting time is not memoryless; conditional on having waited 6 minutes
already, there’s only a 1/4 chance that he’ll have to wait at least another 3 minutes, whereas if
he had just showed up, there would be a P (T ≥ 3) = 7/10 chance of having to wait at least 3
minutes.
3. By the memoryless property, the distribution is Exponential with parameter 1/10 (and mean
10 minutes) regardless of when Joe arrives; how much longer the next bus will take to arrive is
independent of how long ago the previous bus arrived. The average time that Joe has to wait is
10 minutes.

Exercise 2. Joe is trying to transmit to Donald the answer to a yes-no question, using a noisy channel.
He encodes “yes” as 1 and “no” as 0, and sends the appropriate value. However, the channel adds noise;
specifically, Donald receives what Joe sends plus a N (0, σ 2 ) noise term (the noise is independent of
what Joe sends). If Donald receives a value greater than 1/2 he interprets it as “yes”; otherwise, he
interprets it as “no”.

1. Find the probability that Donald understands Joe correctly.


4-3

2. What happens to the result from question 1 if σ is very small? What about if σ is very large?
Explain intuitively why the results in these extreme cases make sense.

Answer:

1. Let a be the value that Joe sends and ϵ be the noise, so B = a + ϵ is what Donald receives. If
a = 1, then Donald will understand correctly if and only if ϵ > −1/2. If a = 0, then Donald will
understand correctly if and only if ϵ ≤ 1/2. By symmetry of the Normal, P (ϵ > −1/2) = P (ϵ ≤
1/2), so the probability that Donald understands does not depend on a. This probability is
     
1 ϵ 1 1
P ϵ≤ =P ≤ =Φ .
2 σ 2σ 2σ
2. If σ is very small, then  
1
Φ ≈ 1,

since Φ(x) (like any CDF) goes to 1 as x → ∞. This makes sense intuitively: when the standard
deviation σ is very small, the noise ϵ is unlikely to take values very different from 0 (i.e., there is
very little noise), then it’s easy for Donald to understand Joe. If σ is very large, then
 
1
Φ ≈ Φ(0) = 1/2

Again this makes sense intuitively: if there is a huge amount of noise, then Joe’s message will get
drowned out (the noise dominates over the signal).

Exercise 3. In studies of anticancer drugs it was found that if mice are injected with cancer cells, the
survival time (in hours) can be modeled with the exponential distribution Expo(λ) with λ = 0.1. What
is the probability that a randomly selected mouse will survive at least 8 hours? At most 12 hours?
Between 8 and 12 hours?

Answer: Let T ∼ Expo(0.1) be the survival time (in hours) of selected mice. Then P (T ≥ 8) =
−0.1×8
1 − (1 − e ) = e−0.1×8 = 0.4493, P (T ≤ 12) = 1 − e−0.1×12 = 0.6988. Combining these two
answers, P (8 ≤ T ≤ 12) = P (T ≤ 12) − P (T < 8) = 0.6988 − (1 − 0.4493) = 0.1481.

Exercise 4. Let T be the lifetime of a certain person (how long that person lives), and let T have
CDF F and PDF f . The survival function G is defined as

G(t) = P (T > t) = 1 − F (t),

which is the probability of survival at time t. The hazard function of T is defined by


f (t) f (t)
h(t) = = .
G(t) 1 − F (t)
4-4

1. Explain why h is called the hazard function and in particular, why h(t) is the probability density
for death at time t, given that the person survived up until then.
2. Show that the CDF and PDF are related to hazard function by
 Z t   Z t 
F (t) = 1 − exp − h(s)ds , f (t) = h(t) exp − h(s)ds , ∀t > 0.
0 0

3. Show that an Exponential r.v. has constant hazard function and conversely, if the hazard function
of T is a constant then T must be Expo(λ) for some λ.
4. What is the hazard function of a Weibull(γ, λ) distribution? (A Weibull(γ, λ) distribution has
CDF F (x) = 1 − exp(−(λx)γ ) for x ≥ 0 and F (x) = 0 otherwise, where γ > 0 is a parameter).

Answer:

1. Given that T > t0 , the conditional CDF of T is


P (t0 ≤ T ≤ t) F (t) − F (t0 )
P (T ≤ t | T ≥ t0 ) = =
P (T ≥ t0 ) 1 − F (t0 )
for t ≥ t0 (and 0 otherwise). So the conditional PDF of T given T ≥ t0 is
f (t)
f (t | T ≥ t0 ) = , for t ≥ t0 .
1 − F (t0 )
The conditional PDF of T at t0 , given that the person survived up until t0 , is then f (t0 ) / (1 − F (t0 )) =
h (t0 ). Alternatively, we can use the hybrid form of Bayes’ rule:
P (T ≥ t | T = t)f (t) f (t)
f (t | T ≥ t) = = .
P (T ≥ t) 1 − F (t)
In the last display, densities are treated like probabilities in the application of the Bayes’ rule.
This is actually legitimate. We will see more about this in the next Chapter.
Thus, h(t) gives the probability density of death at time t given that the person has survived up
until then. For a very small ϵ > 0, the probability of dying before t0 + ϵ given survival up to t0 is
Z t0 +ϵ
P (T ≤ t0 + ϵ | T ≥ t0 ) = f (t | T ≥ t0 ) dt ≈ h(t0 )ϵ.
t0

Thus the hazard function gives a natural way to measure the instantaneous hazard of death at
some time since it accounts for the person having survived up until that time.
2. Since G(s) = 1 − F (s) for any s > 0, we have
f (s) F ′ (s) −G′ (s)
h(s) = = = .
G(s) G(s) G(s)
Integrating both sides with respect to t gives the following for any t > 0:
Z t Z t ′ Z t
G (s)
h(s)ds = − ds = − d log G(s).
0 0 G(s) 0
4-5

Obviously, Z t
d log G(s) = log G (t) − log G(0) = log G (t) ,
0

since G(0) = 1 − F (0) = 1.


Thus, Z t
log (1 − F (t)) = log G(t) = − h(s)ds,
0

which shows that  Z t 


F (t) = 1 − exp − h(s)ds .
0

Differentiating both sides of above equation, we have


 Z t 
f (t) = h(t) exp − h(s)ds ,
0

d
Rt
since dt 0
h(s)ds = h(t) (by the fundamental theorem of calculus).
3. Let T ∼ Expo(λ). Then the hazard function is

λe−λt
h(t) = = λ, ∀t > 0,
e−λt
which is a constant.
Conversely, suppose that h(t) = λ for all t, with λ > 0 a constant. According to the conclusion
in Question 2, we have
Z t
F (t) = 1 − exp(− h(s)ds) = 1 − exp(−λt).
0

Thus, T ∼ Expo(λ).
4. The CDF and PDF of Weibull distribution are as follows

F (t) = 1 − exp(−(λt)γ )

f (t) = λ · γ · (λt)γ−1 · exp (−(λx)γ )

So,we have

f (t) λ · γ · (λt)γ−1 · exp (−(λt)γ )


h(t) = =
1 − F (t) exp (−(λt)γ )
= λ · γ(λt)γ−1

We can observe that when γ = 1, the hazard rate is a constant so the Weibull distribution reduces
to an Exponential distribution. When γ > 1, the hazard function increases with t. This can be
used to model the wear-and-tear effects in survival times. For example, let the lifetime of a certain
machine be a random variable X. If X has an Exponential distribution, then X is memeoryless
4-6

and has a constant hazard rate. This means that no matter how long the machine has been used,
its remaining lifetime has the same distribution as a new machine. However, in practice, the
machine often ages as it is used, thus the risk of machine breakdown should increase as times goes
by. This can be captured by an increasing hazard function.

4.2 Universaility of the Uniform Distribution

Exercise 5. Let F be a CDF which is a continuous function and strictly increasing on the support
of the distribution. This ensures that the inverse function F −1 exists, as a function from (0, 1) to R.
Prove the following statements:

1. Let U ∼ Unif(0, 1) and X = F −1 (U ). Then X is an r.v. with CDFF .


2. Let X be an r.v. with CDF F . Then F (X) ∼ Unif(0, 1).

Answer:

1. Let U ∼ Unif(0, 1) and X = F −1 (U ). For all real x,



P (X ≤ x) = P F −1 (U ) ≤ x = P (U ≤ F (x)) = F (x)

so the CDF of X is F , as claimed. For the last equality, we used the fact that P (U ≤ u) = u for
u ∈ (0, 1) since U ∼ Unif(0, 1).
2. Let X have CDF F , and find the CDF of Y = F (X). Since F (·) takes values in (0, 1), P (Y ≤ y)
equals 0 for y ≤ 0 and equals 1 for y ≥ 1. For any y ∈ (0, 1),
 
P (Y ≤ y) = P (F (X) ≤ y) = P X ≤ F −1 (y) = F F −1 (y) = y.

Thus Y has the Unif(0, 1) CDF.


Note that there is a potential notational confusion: F (x) = P (X ≤ x) by definition, but it would
be incorrect to say “F (X) = P (X ≤ X) = 1”. Rather, we should first find an expression for the
CDF F (x) as a function of x, then replace x with X to obtain an r.v. For example, if the CDF
of X is F (x) = 1 − e−x for x > 0, then F (X) = 1 − e−X when X > 0.

Exercise 6. Suppose that a random variable U ∼ Unif(0, 1). Based on the results in Exercise 1,
discuss how you can construct a random variable X with the following CDFs from U :

1. FX (x) = ex /(1 + ex ) for any x ∈ R. (This is called Logistic distribution).


2. FX (x) = 1 − e−x
2
/2
for any x > 0. (This is called Rayleigh distribution).
4-7

3. FX (x) = 1 − e−λx for x > 0. (This is the Exponential distribution with parameter λ).

Answer:

1. Suppose we have U ∼ Unif(0, 1) and wish to generate a Logistic random variable. Statement 1
in Exercise 5 says that F −1 (U ) ∼ Logistic if F is the CDF of the Logistic distribution. Thus we
first invert the CDF to get F −1 :
 
u
F −1 (u) = log .
1−u

Then we plug in U for u :


 
−1 U
F (U ) = log
1−U
   
Therefore log U
1−U
∼ Logistic. We can verify directly that log U
1−U
has the required CDF:
start from the definition of CDF, do some algebra to isolate U on one side of the inequality, and
then use the CDF of the Uniform distribution. Let’s work through these calculations once for
practice:
     
U U
P log ≤x =P ≤ ex
1−U 1−U
= P (U ≤ ex (1 − U ))
 
ex
=P U ≤
1 + ex
x
e
= ,
1 + ex
which is indeed the Logistic CDF.
2. The quantile function (the inverse of the CDF) is
p
F −1 (u) = −2 log(1 − u),
p
so if U ∼ Unif(0, 1), then F −1 (U ) = −2 log(1 − U ) ∼ Rayleigh.
3. The quantile function (the inverse of the CDF) is

log(1 − u)
F −1 (u) = − ,
λ

so if U ∼ Unif(0, 1), then F −1 (U ) = − log(1−U


λ
)
∼ Exponential distribution with parameter λ.

Exercise 7. In Exercises 5 and 6, we consider continuous r.v.s X. Discuss how to construct a discrete
Pn
random variable taking values in {0, . . . , n} with P (X = j) = pj such that j=0 pj = 1 from U ∼
Unif(0, 1).
4-8

4.1: Given a PMF, chop up the interval (0, 1) into pieces, with lengths given by the PMF values.

Answer: Suppose we want to use U ∼ Unif(0, 1) to construct a discrete r.v. X with pj = P (X = j) for
j = 0, 1, 2, . . . , n. As illustrated in following Figure 4.1, we can chop up the interval (0, 1) into pieces
of lengths p0 , p1 , . . . , pn . By the properties of a valid PMF, the sum of the pj ’s is 1, so this perfectly
divides up the interval, without overshooting or undershooting.

Now define X to be the r.v. which equals 0 if U falls into the p0 interval, 1 if U falls into the p1 interval,
2 if U falls into the p2 interval, and so on. Then X is a discrete r.v. taking on values 0 through n.
The probability that X = j is the probability that U falls into the interval of length pj . But for a Unif
(0, 1) r.v., the probability that this r.v. falls into an interval in (0, 1) is just the length of this interval,
so P (X = j) is precisely pj , as desired!

4.3 Functions of a Random Variable

4.3.1 Review

A general approach to finding the PDF of Y = g(X) for continuous X:

• Find FY (y) by converting {Y ≤ y} to {X ∈ {x : g(x) ≤ y}}.

• Differentiate FY (y) to obtain the PDF fY (y).

For a continuous and strictly monotonic transformation, we can also apply the following change-of-
variable formula.

Theorem 4.6 (Change of variables). Let X be a continuous r.v. with PDF fX , and let Y = g(X),
where g is differentiable and strictly increasing (or strictly decreasing) over the support of X. Then the
PDF of Y is given by
dx
fY (y) = fX (x)
dy
4-9

where x = g −1 (y). The support of Y is all g(x) with x in the support of X, namely, Supp(Y ) =
{y = g(x) : x ∈ Supp(X)}.

4.3.2 Exercise

Exercise 8. If the length of a side of a square X is random with the PDF fX (x) = x/8, 0 < x < 4,
and Y is the area of the square, find the PDF of Y .

Answer: Given the area of the square Y = X 2 , we first have


Z √
√ √ y
x 1
FY (y) = P (X ≤ y) = P (X ≤
2
y) = FX ( y) = dx = y
0 8 16

Thus, the PDF of Y is fY (y) = FY′ (y) = 1/16, 0 < y < 16. That is, the area Y is uniform on (0, 16).

Exercise 9. Find the PDF of the random variable Y in the following settings:

1. Y = |X| for X ∼ Unif[−1, 1].


2. Y = X 2 for X ∼ Unif[−1, 1].
3. Y = X 2 for X ∼ Unif[−1, 3].

Answer:

1. Given that X is uniformly distributed in the interval [−1, 1], the probability density function
(PDF) of X is: 
1 for − 1 ≤ x ≤ 1
2
fX (x) =
0 otherwise
To find the distribution of Y = |X|, we consider the range of Y :

For x ∈ [−1, 0], y = |x| ranges from 1 to 0.


For x ∈ [0, 1], y = |x| ranges from 0 to 1.

Thus, the range of Y is [0, 1]. Now, we can derive the CDF of Y : for 0 ≤ y ≤ 1,
1
FY (y) = P (|X| ≤ y) = P (−y ≤ X ≤ y) = 2y × =y
2
Thus the CDF of Y is: 

 0 for y < 0


FY (y) = y for 0 ≤ y < 1



1 for y ≥ 1.
4-10

Differentiating the CDF to find the PDF: For 0 ≤ y ≤ 1,

fY (y) = FY′ (y) = 1, 0 < y < 1

So, the PDF of Y is 


1 for 0 ≤ y ≤ 1
fY (y) =
0 otherwise
which means that the random variable Y is uniformly distributed on the area [0, 1].
2. Given that X is uniformly distributed in the interval [−1, 1], the probability density function
(PDF) of X is: 
1 for − 1 ≤ x ≤ 1
2
fX (x) =
0 otherwise

To find the distribution of Y = X 2 , we consider the range of Y :

For x ∈ [−1, 0], y = x2 ranges from 1 to 0.


For x ∈ [0, 1], y = x2 ranges from 0 to 1.

Thus, the range of Y is [0, 1]. Now, we can derive the CDF of Y :
√ √ √ 1 √
FY (y) = P (X 2 ≤ y) = P (− y ≤ X ≤ y) = 2 y × = y
2
Thus the CDF of Y is: 

 0 for y < 0


FY (y) = √y for 0 ≤ y < 1



1 for y ≥ 1.

Consequently, for 0 ≤ y ≤ 1, the PDF of Y is fY (y) = FY′ (y) = 1



2 y
. So the PDF of random
variable Y is 
 1
for 0 ≤ y ≤ 1
2y 1/2
fY (y) =
0 otherwise
3. Given that X is uniformly distributed in the interval [−1, 3], the probability density function
(PDF) of X is: 
1 for − 1 ≤ x ≤ 3
4
fX (x) =
0 otherwise

To find the distribution of Y = X 2 , we consider the range of Y :

For x ∈ [−1, 0], y = x2 ranges from 1 to 0.


For x ∈ [0, 3], y = x2 ranges from 0 to 9.
4-11

Thus, the range of Y is [0, 9].


Now, the cumulative distribution function (CDF) of Y is:

FY (y) = P (Y ≤ y) = P (X 2 ≤ y)

For 0 ≤ y < 1:
√ √ 1√
FY (y) = P (X 2 ≤ y) = P (− y ≤ X ≤ y) = y
2
For 1 ≤ y ≤ 9:
√ 1 √
FY (y) = P (X 2 ≤ y) = P (X ≤ y) = ( y + 1)
4
Thus, the CDF is:


 0 for y < 0




 1 √y for 0 ≤ y < 1
2
FY (y) =

 1 √

 ( y + 1) for 1 ≤ y ≤ 9


4
1 for y > 9

Differentiating the CDF to find the PDF: For 0 ≤ y < 1:


1
fY (y) =
4y 1/2
For 1 ≤ y ≤ 9:
1
fY (y) =
8y 1/2
So, the PDF is: 

 1
for 0 ≤ y < 1

 4y1/2
fY (y) = 1
for 1 ≤ y ≤ 9

 8y 1/2

0 otherwise

4.4 Expectation and Variance

4.4.1 Review

Definition 4.7 (Expectation of a discrete r.v.). The expected value (also called the expectation
or mean) of a discrete r.v. X with distinct possible values x1 , x2 , . . . is defined by

X

E(X) = xj P (X = xj )
j=1
4-12

If the support is finite, then this is replaced by a finite sum. We can also write
X
E(X) = x pX (x)
|{z} | {z }
x:x∈Supp(X) value PMF at x

Definition 4.8 (Expectation of a continuous r.v.). The expected value of a continuous r.v. X with
PDF f is
Z ∞
E(X) = xf (x)dx
−∞

Theorem 4.9 (Linearity of expectation). For any r.v.s X, Y (whose expectations exist) and any
constant c,

E(X + Y ) = E(X) + E(Y ),


E(cX) = cE(X)

Theorem 4.10 (Law of the unconscious statistician (LOTUS)). The expected value or mean of a
random variable g(X), denoted by E(g(X)), is
R
 ∞ g(x)fX (x)dx if X is continuous
−∞
E(g(X)) = P P
 if X is discrete
x∈Supp(X) g(x)pX (x) = x∈Supp(X) g(x)P (X = x)

Definition 4.11 (Variance and standard deviation). The variance of an r.v. X is

Var(X) = E(X − EX)2

The square root of the variance is called the standard deviation:


p
SD(X) = Var(X)

Theorem 4.12 (Variance shortcut formula). For any r.v. X,



Var(X) = E X 2 − (EX)2

Proposition 4.13 (Properties of Variance). For any constants a, b,

Var(aX + b) = a2 Var(X), SD(aX + b) = |a| SD(X).

4.4.2 Exercise

Exercise 10. The PMF for X = the number of major defects on a randomly selected appliance of a
certain type is
x 0 1 2 3 4
p(x) 0.08 0.15 0.45 0.27 0.05
Compute the following:
4-13

1. E(X).
2. V (X) directly from the definition.
3. The standard deviation of X.
4. E(X 2 ) and E(X 3 ).
5. V (X) using the shortcut formula.

Answer:
P4
1. E(X) = x=0 xp(x) = 0 × 0.08 + 1 × 0.15 + 2 × 0.45 + 3 × 0.27 + 4 × 0.05 = 2.06;
P4 2
2. V (X) = x=0 (x − E(X)) p(x) = (0 − 2.06)2 × 0.08 + (1 − 2.06)2 × 0.15 + (2 − 2.06)2 × 0.45 +
(3 − 2.06)2 × 0.27 + (4 − 2.06)2 × 0.05 = 0.9364;
p √
3. SD(X) = V (X) = 0.9364 = 0.9677;
P4
4. E (X 2 ) = x=0 x2 p(x) = 02 × 0.08 + 12 × 0.15 + 22 × 0.45 + 32 × 0.27 + 42 × 0.05 = 5.18;
P4
E (X 3 ) = x=0 x3 p(x) = 03 × 0.08 + 13 × 0.15 + 23 × 0.45 + 33 × 0.27 + 43 × 0.05 = 14.24;
5. Using the shortcut formula, V (X) = E (X 2 ) − (E(X))2 = 5.18 − 2.062 = 0.9364, the same answer
as 2.

Exercise 11. Let Z ∼ N (0, 1), and c be a nonnegative constant. Find E(max(Z, 0)) and E(min(Z, 0)),
in terms of the standard Normal CDF Φ and PDF φ.

Answer: By LOTUS,
Z ∞
E(max(Z, 0)) = max(z, 0)φ(z)dz
−∞
Z ∞
= zφ(z)dz
0

−1
= √ e−z /2
2

2π 0
1
=√ .

Z ∞
E(min(Z, 0)) = min(z, 0)φ(z)dz
−∞
Z 0
= zφ(z)dz
−∞
Z ∞ Z ∞
= zφ(z)dz − zφ(z)dz
−∞ 0
1 1
= 0 − √ = −√ .
2π 2π
4-14

Remark: we can use these two results to obtain E(|Z|). Note that |Z| = max(Z, 0) − min(Z, 0). So
r
2
E(|Z|) = E(max(Z, 0) − min(Z, 0)) = E(max(Z, 0)) − E(min(Z, 0)) = .
π

4.5 Moment Generating Function

4.5.1 Review

Definition 4.14 (Moment generating function). The moment generating function (MGF) of an r.v.

X is M (t) = E etX , as a function of t, if this is finite on some open interval (−a, a) containing 0.
Otherwise we say the MGF of X does not exist.

Proposition 4.15 (MGF of location-scale transformation). If X has MGF M (t), then the MGF of
a + bX is
  
E et(a+bX) = E eat ebtX = eat E ebtX = eat M (bt).

Theorem 4.16 (Moments via derivatives of the MGF). Given the MGF of X, we can get the nth
moment of X by evaluating the nth derivative of the MGF at 0:
dn
E (X n ) = M (n) (0) = M (t)|t=0 .
dtn
Theorem 4.17 (MGF determines the distribution). The MGF of a random variable determines its
distribution: if two r.v.s have the same MGF, they must have the same distribution.

Mathematically, let X, Y be two r.v.s with MGFs MX , MY and CDFs FX , FY . If MX (t) = MY (t) for
all t in an interval (−a, a) containing 0, then FX (u) = FY (u) for all u.

• The fact that a MGF uniquely determines a distribution is very useful for

– finding the distribution of the sum of multiple independent random variables


– establishing the Central Limit Theorem.

4.5.2 Exercises

Exercise 12. Let X ∼ Expo(λ). Find E (X 3 ) in the following two ways:

1. Use LOTUS and the fact that E(X) = 1/λ and Var(X) = 1/λ2 , and integration by parts (
).
4-15

2. Use the MGF of X.

Answer:

1. By LOTUS, we have:
Z
 ∞
E X 3
= x3 λe−λx dx
Z 0

= −x3 de−λx
0
Z ∞

=− x3 e−λx 0 +3 x2 e−λx dx (Integration by Parts)
Z 0
3 ∞ 2 −λx
= x λe dx
λ 0
3  3  6
= E X2 = Var(X) + (EX)2 = 3 .
λ λ λ
2. The MGF of an Exponential random variable with rate parameter λ is
Z ∞

M (t) = E e tX
= etx λe−λx dx
0
Z ∞
=λ e−(λ−t)x dx
0
Z ∞
λ
= (λ − t)e−(λ−t)x dx
λ−t 0
λ
= , ∀t < λ.
λ−t
R∞
Here we use the fact that 0 (λ − t)e−(λ−t)x dx = 1 for any t < λ. You can easily verify this or
you can simply think of (λ − t)e−(λ−t)x as the density function of Expo(λ − t) for λ > 0. Thus
there is an open interval containing 0 on which M (t) is finite.
To get the third moment, we can take the third derivative of the MGF and evaluate at t = 0 :
 −4
 d3 M (t) 6 t 6
E X 3
= = 3 1− =
dt3 t=0 λ λ λ3
t=0

But a much nicer way to use the MGF here is via pattern recognition: note that M (t) looks like
it came from a geometric series:
∞  n
X X∞
λ 1 t n! tn
M (t) = = = = , ∀|t| < λ.
λ−t 1− t
λ n=0
λ n=0
λn n!

Recall that the Taylor expansion of M (t) around t = 0 is


∞  n
X 
d tn
M (t) = n
M (t)|t=0 .
n=0
dt n!
4-16

tn dn
Thus the coefficient of n!
here is exactly dtn
M (t)|t=0 , namely the nth moment of X. Therefore,
n n! 6
we have E (X ) = λn
for all nonnegative integers n. So again we get E (X 3 ) = λ3
. This method
not only avoids the need to compute the 3rd derivative of M (t) directly, but also it gives us all
the moments of X.

Exercise 13. Let M (t) be the moment generating function (MGF) of a random variable X. The
cumulant generating function is defined to be g(t) = ln M (t). Expanding g(t) as a Taylor series
X

cj
g(t) = tj
j=1
j!

(the summation starts at j = 1 because g(0) = 0 ), the coefficient cj is called the j-th cumulant of X.
Find the j-th cumulant of X, for all j ≥ 1.

1. Prove that g(0) = 0, g ′ (0) = c1 = E[X], and g ′′ (0) = c2 = V ar(X).


2. Now consider a possion random variable X ∼ Pois(λ). Derive the cumulant generating function
of X and find the j-th cumulant of X, for all j ≥ 1.

Answer:
  M ′ (0)
1. g(0) = ln M (0) = ln E e0·X = 0, g ′ (0) = M (0)
= E[X]
1
= E[X],
′′ ′ 2
′′
g (0) = M (0)M (0)−(M (0))
(M (0))2
= E[X ] − (E[X]) = V ar(X).
2 2

2. Recall that the probability mass function (PMF) of Poisson distribution with parameter λ > 0 is
λk −λ
P (X = k) = k!
e . Then, by definition, the MGF of X has the form

X
∞ k X

(et λ)k
tk λ −λ −λ t
−λ
M (t) = E[e tX
]= e e =e = eλe
k=0
k! k=0
k!

Therefore, the cumulant generating function g(t) = ln M (t) can be expaned as a Taylor series
with the form
X
∞ k
t
g(t) = ln M (t) = λ(et − 1) = λ
k=1
k!
which means that the coefficient of k-th cumulant is ck = λ for all k ≥ 1.

You might also like