ch5 PDF
ch5 PDF
Continuous Distributions
145
146 CHAPTER 5. CONTINUOUS DISTRIBUTIONS
In order for P (a ≤ X ≤ b) to be nonnegative for all a and b and for P (−∞ <
X < ∞) = 1 we must have
Z ∞
f (x) ≥ 0 and f (x) dx = 1 (5.2)
−∞
The idea here is that we are picking a value “at random” from (a,b). That is,
values outside the interval are impossible, and all those inside have the same
probability (density).
If we set f (x) = c when a < x < b and 0 otherwise then
Z Z b
f (x) dx = c dx = c(b − a)
a
These distributions are often used in situations where P (X > x) does not go to
0 very fast as x → ∞. For example, the Italian economist Pareto used them to
describe the distribution of family incomes.
Distribution functions
Any random variable (discrete, continuous, or in between) has a distribution
function defined by F (x) = P (X ≤ x). If X has a density function f (x) then
Z x
F (x) = P (−∞ < X ≤ x) = f (y) dy
−∞
P (X ≤ b) = P (X ≤ a) + P (a < X ≤ b)
or, rearranging,
The last formula is valid for any random variable. When X has density function
f , it says that
Z b
f (x) dx = F (b) − F (a)
a
i.e., the integral can be evaluated by taking the difference of the antiderivative
at the two endpoints.
To see what distribution functions look like, and to explain the use of (5.3),
we return to our examples.
Example 5.4. Uniform distribution. f (x) = 1/(b − a) for a < x < b.
0
x≤a
F (x) = (x − a)/(b − a) a ≤ x ≤ b
1 x≥b
The first line of the answer is easy to see. Since P (X > 0) = 1 we have
P (X ≤ x) = 0 for x ≤ 0. For x ≥ 0 we compute
Z x Z x
P (X ≤ x) = f (y) dy = λe−λy dy
−∞ 0
x
= −e−λy 0 = −e−λx − (−1)
P (T > t + s) e−λ(t+s)
P (T > t + s|T > t) = = = e−λs = P (T > s)
P (T > t) e−λt
The first line of the answer is easy to see. Since P (X > 1) = 1, we have
P (X ≤ x) = 0 for x ≤ 1. For x ≥ 1 we compute
Z x Z x
P (X ≤ x) = f (y) dy = (ρ − 1)y −ρ dy
−∞ 1
x
= −y −(ρ−1) = 1 − x−(ρ−1)
1
Medians
Intuitively, the median is the place where F (x) crosses 1/2. The precise
definition we are about to give is complicated by the fact that {x : F (x) = 1/2}
may be empty or contain more than one point.
m is a median for F if P (X ≤ m) ≥ 1/2 and P (X ≥ m) ≥ 1/2.
A more computationally useful form is: Let m1 = min{x : P (X ≥ x) ≥ 1/2} and
m2 = max{x : P (X ≤ x) ≤ 1/2}. m is a median if and only if m1 ≤ m ≤ m2 .
We begin with a simple example and then consider two that illustrate the
problems that can arise.
150 CHAPTER 5. CONTINUOUS DISTRIBUTIONS
This is the only median, since if x < 2 then P (X ≤ x) ≤ P (X < 2) ≤ 1/3 and
if x > 2 then P (X ≥ x) ≤ P (X > 2) = 1/3.
Example 5.10. Suppose X takes values 1, 2, 3, 4 with probability 1/4 each.
The distribution function is
0 x<1
1/4 1≤x<2
F (x) = 2/4 2≤x<3
3/4 3≤x<4
1 4≤x
These are the only medians, since if x < 2 then P (X ≤ x) ≤ P (X < 2) ≤ 1/4
and if x > 3 then P (X ≥ x) ≤ P (X > 3) = 1/4.
Expected value
In this case
b
b2 a2
Z
x (b + a)
EX = dx = − =
a b−a 2(b − a) 2(b − a) 2
Z ∞
EX = x λe−λx dx
0
Z ∞
= e−λx dx = 1/λ
0
Example 5.13. Power laws. Let ρ > 1 and f (x) = (ρ − 1)x−ρ for x ≥ 1.
with f (x) = 0 otherwise.
To compute EX
∞
ρ−2
Z
EX = x(ρ − 1)x−ρ dx =
1 ρ−1
if ρ > 2. EX = ∞ if 1 < ρ ≤ 2.
152 CHAPTER 5. CONTINUOUS DISTRIBUTIONS
d λy −1/2 −λy1/2
fY (y) = P (Y ≤ y) = e for y ≥ 0
dy 2
and 0 otherwise.
Generalizing from the last example, we get
Theorem 5.2. Suppose X has density f and P (a < X < b) = 1. Let Y =
r(X). Suppose r : (a, b) → (α, β) is continuous and strictly increasing, and let
s : (α, β) → (a, b) be the inverse of r. Then Y has density
Before proving this, let’s see how it applies to the last example. There X has
density f (x) = λe−λx for x ≥ 0 so we can take a = 0 and b = ∞. The function
5.2. FUNCTIONS OF RANDOM VARIABLES 153
r(x) = x2 is indeed continuous and strictly increasing on (0, ∞). (Notice, how-
ever, that x2 is decreasing on (−∞, 0).) α = r(a) = 0 and β = r(b) = ∞. To
find the inverse function we set y = x2 and solve to get x = y 1/2 so s(y) = y 1/2 .
Differentiating, we have s0 (y) = y −1/2 /2 and plugging into the formula, we have
1/2
g(y) = λe−λy · y −1/2 /2 for y > 0
P (Y ≤ y) = P (r(X) ≤ y) = P (X ≤ s(y))
d d
g(y) = P (Y ≤ y) = F (s(y)) = F 0 (s(y))s0 (y) = f (s(y))s0 (y)
dy dy
by the chain rule.
Example 5.15. How not to water your lawn. The head of a lawn sprinkler,
which is a metal rod with a line of small holes in it, revolves back and forth so
that drops of water shoot out at angles between 0 and π/2 radians (i.e., between
0 and 90 degrees). If we use x to denote the distance from the sprinkler and y
the height off the ground, then a drop of water released at angle θ with velocity
v0 will follow a trajectory
where g is the gravitational constant, 32 ft/sec2 . The drop lands when y(t0 ) = 0
that is, at time t0 = (2v0 sin θ)/g. At this time
2v02 v2
x(t0 ) = sin θ cos θ = 0 sin(2θ)
g g
If we assume that the sprinkler moves evenly back and forth between 0 and π/2,
it will spend an equal amount of time at each angle. Letting K = v02 /g, this
leads us to the following question:
If Θ is uniform on [0, π/2] then what is the distribution of Z = K sin(2Θ)?
The first difficulty we must confront when solving this problem is that sin(2x) is
increasing on [0, π/4] and decreasing on [π/4, π/2]. The solution to this problem
is simple, however. The function sin(2x) is symmetric about π/4, so if we let X
be uniform on [0, π/4] then Z = K sin(2Θ) and Y = K sin(2X) have the same
distribution. To apply (3.1) we let r(x) = K sin(2x) and solve y = K sin(2x) to
get s(y) = (1/2) sin−1 (y/K). Plugging into (3.1) and recalling
d 1
sin−1 (x) = √
dx 1 − x2
154 CHAPTER 5. CONTINUOUS DISTRIBUTIONS
Reversing the ideas in the last proof, we get a result that is useful to construct
random variables with a specified distribution.
Theorem 5.4. Suppose U has a uniform distribution on (0,1). Then Y =
F −1 (U ) has distribution function F .
Proof. The definition of F −1 was chosen so that if 0 < x < 1 then
F −1 (y) ≤ x if and only if F (x) ≤ y
and this holds for any distribution function F . Taking y = U , it follows that
P (F −1 (U ) ≤ x) = P (U ≤ F (x)) = F (x)
since P (U ≤ u) = u.
5.2. FUNCTIONS OF RANDOM VARIABLES 155
and integrating by parts with g(y) = y, h (y) = e−y (so g 0 (y) = 1, h(y) = −e−y ).
0
Z ∞ Z ∞
∞ ∞
ye−y dy = −ye−y 0 + e−y dy = 0 + (−e−y )0 = 1
0 0
To illustrate the use of (5.6) we will now compute P (X ≤ 1), which can be
written as P ((X, Y ) ∈ A) where A = {(x, y) : x ≤ 1}. The formula in (5.6) tells
us that we find P ((X, Y ) ∈ A) by integrating the joint density over A. However,
the joint density is only positive on B = {(x, y) : 0 < x < y < ∞} so we only
need to integrate over A ∩ B = {(x, y) : 0 < x ≤ 1, x < y}, and doing this we
find Z Z 1 ∞
P (X ≤ 1) = e−y dy dx
0 x
To evaluate the double integral we begin by observing that
Z ∞
∞
e−y dy = (−e−y )x = 0 − (−e−x ) = e−x
x
R1 1
so P (X < 1) = 0
e−x dx = (−e−x )|0 = 1 − e−1 .
5.3. JOINT DISTRIBUTIONS 157
Example 5.18. Uniform on a ball. Pick a point at random from the ball
B = {(x, y) : x2 + y 2 ≤ 1}. By “at random from B” we mean that a choice
outside of B is impossible and that all the points in B should be equally likely.
In terms of the joint density this means that f (x, y) = 0 when (x, y) 6∈ B and
there is a constant c > 0 so that f (x, y) = c when (x, y) ∈ B.
Our f (x, y) ≥ 0. To make the integral of f equal to 1, we have to choose c
appropriately. Now,
ZZ ZZ
f (x, y) dx dy = c dx dy = c (area of B) = cπ
B
The arguments that led to the last conclusion generalize easily to show that if
we pick a point “at random” from a set S with area a then
(
1/a (x, y) ∈ S
f (x, y) = (5.7)
0 otherwise
Let X be the distance from the center of the needle to the nearest crack and
Θ be the angle ∈ [0, π) that the top half of the needle makes with the crack.
(We make this choice to have sin Θ > 0.) We assume that all the ways the
needle can land are equally likely, that is, the joint distribution of (X, Θ) is
(
2/π if x ∈ [0, 1/2), θ ∈ [0, π)
f (x, θ) =
0 otherwise
The formula for the joint density follows from (5.7). We are picking a point “at
random” from a set S with area π/2, so the joint density is 2/π on S.
158 CHAPTER 5. CONTINUOUS DISTRIBUTIONS
By drawing a picture (like the one above), one sees that the needle touches
the crack if and only if (L/2) sin Θ ≥ X. (5.6) tells us that the probability of
this event is obtained by integrating the joint density over
A = {(x, θ) ∈ [0, 1/2) × [0, π) : x ≤ (L/2) sin θ}
so the probability we seek is
ZZ Z π Z (L/2) sin θ
2
f (x, θ) dx dθ = dx dθ
A 0 0 π
π
2 πL
Z
L
= sin θ dθ = (− cos θ) = 2L/π
π 0 2 π 0
P (X ≤ x, Y ≤ y) = P (X ≤ x, Y ≤ 1) = x
by the formula for the second case. The fourth case is similar to the third, and
the fifth is trivial. X and Y are always smaller than 1 so if x > 1 and y > 1
then {X ≤ x, Y ≤ y} has probability 1.
We will not use the joint distribution function in what follows. For complete-
ness, however, we want to mention two of its important properties. The first
formula is the two-dimensional generalization of P (a < X ≤ b) = F (b) − F (a).
Proof. The reasoning we use here is much like that employed in studying the
probabilities of unions in Section 1.6. By adding and subtracting the probabil-
ities on the right, we end up with the desired area counted exactly once.
Using A as shorthand for P ((X, Y ) ∈ A), etc.,
F (b1 , b2 ) = A+B+C +D
−F (a1 , b2 ) = −B − D
−F (b1 , a2 ) = −C − D
F (a1 , a2 ) = D
The next formula tells us how to recover the joint density function from
the joint distribution function. RTo motivate the formula, we recall that in one
x
dimension F 0 = f since F (x) = ∞ f (u) du.
∂2F
=f (5.9)
∂x∂y
To explain why this formula is true, we note that
Z x Z y
F (x, y) = f (u, v) dv du
−∞ −∞
and differentiating twice kills the two integrals. To check that (5.9) works in
∂2F
Example 5.20, F (x, y) = xy when 0 < x < 1 and 0 < y < 1, so ∂x∂y = 1 there
and it is 0 otherwise.
160 CHAPTER 5. CONTINUOUS DISTRIBUTIONS
n random variables
The developments above generalize in a straightforward way to n > 2 random
variables X1 , . . . , Xn . f is a joint density function if f (x1 , . . . , xn ) ≥ 0 and
Z Z
· · · f (x1 , . . . , xn ) dxn · · · dx1 = 1
F (x1 , . . . , xn ) = P
Z (X 1 ≤ Zx1 , . . . , Xn ≤ xn )
x1 xn
= ··· f (y1 , . . . , yn ) dyn · · · dy1
−∞ −∞
In the continuous case if X and Y have joint density f (x, y), then the marginal
densities of X and Y are given by
Z Z
fX (x) = f (x, y) dy fY (y) = f (x, y) dx (5.11)
The verbal explanation of the first formula is similar to that of the discrete case:
if X = x then Y will take on some value y, so to find P (X = x) we integrate
the joint density f (x, y) over all possible values of y.
To illustrate the use of these formulas we look at Example 5.17.
Example 5.21. (
e−y 0<x<y<∞
f (x, y) =
0 otherwise
In this case Z ∞ ∞
fX (x) = e−y dy = (−e−y )x = e−x
x
since (3.8) tells us to integrate f (x, y) over all values of y but we only have f > 0
when y > x. Similarly,
Z y
fY (y) = e−y dx = ye−y
0
Theorem 5.5. Two random variables with joint density f are independent if
and only if
f (x, y) = fX (x)fY (y)
that is, if the joint density is the product of the marginal densities.
We will now consider three examples that parallel the ones used in the dis-
crete case.
Example 5.22. (
e−y 0<x<y<∞
f (x, y) =
0 otherwise
162 CHAPTER 5. CONTINUOUS DISTRIBUTIONS
In this case
f (3, 2) = 0 6= fX (3)fY (2) > 0
so Theorem 5.5 implies that X and Y are not independent. In general, if the
set of values where f > 0 is not a rectangle then X and Y are not independent.
Example 5.23.
(
(1 + x + y)/2 0 < x < 1 and 0 < y < 1
f (x, y) =
0 otherwise
In this case the set where f > 0 is a rectangle, so the joint distribution passes
the first test and we have to compute the marginal densities
Z 1 1
y2
1+x x 3
fX (x) = (1 + x + y)/2 dy = y + = +
0 2 4 0 2 4
y 3
fY (y) = + by symmetry
2 4
These formulas are valid for 0 < x < 1 and 0 < y < 1 respectively. To check
independence we have to see if
1+x+y x 3 y 3
(?) = + · +
2 2 4 2 4
Multiplying both sides by 4 and simplifying the right-hand side, we see this
holds if and only if
2 + 2x + 2y = 4xy + 6x + 6y + 9
for all 0 < x < 1 and 0 < y < 1, which is ridiculous. A simpler way to see that
(?) is wrong is simply to note that when x = y = 0 it says that 1/2 = 9/16.
Example 5.24.
y −3/2 √
(
cos x sin x−(1/2y)
(e−1) 2π
e 0 < x < π/2, y > 0
f (x, y) =
0 otherwise
In this case it would not be very much fun to integrate to find the marginal
densities, so we adopt another approach.
Theorem 5.6. If f (x, y) can be written as g(x)h(y) then there is a constant c
so that fX (x) = cg(x) and fY (y) = h(y)/c. It follows that f (x, y) = fX (x)fY (y)
and hence X and Y are independent.
Conditional distributions
Introducing fX (x|Y = y) as notation for the conditional density of X
given Y = y (which we think of as P (X = x|Y = y)), we have
f (x, y) f (x, y)
fX (x|Y = y) = =R (5.12)
fY (y) f (u, y) du
Example 5.25. (
e−y 0<x<y<∞
f (x, y) =
0 otherwise
e−y 1
fX (x|Y = y) = −y
= for 0 < x < y
ye y
That is, the conditional distribution is uniform on (0, y). This should not be
surprising since the joint density does not depend on x.
To compute the other conditional distribution we recall fX (x) = e−x so
e−y
fY (y|X = x) = = e−(y−x) for y > x
e−x
That is, given X = x, Y − x is exponential with parameter 1. The last answer
is quite reasonable since in Example 6.1 we saw that if Z1 , Z2 are independent
exponential(1) then X = Z1 , Y = Z1 +Z2 has the joint distribution given above.
If we condition on X = x then Z1 = x and Y = x + Z2 .
P (X = x, Y = y) = P (X = x)P (Y = y|X = x)
164 CHAPTER 5. CONTINUOUS DISTRIBUTIONS
The next example demonstrates the use of (5.13) to compute a joint distribution.
Example 5.26. Suppose we pick a point uniformly distributed on (0, 1), call it
X, and then pick a point Y uniformly distributed on (0, X).
5.5 Exercises
1. Suppose X has density function f (x) for a ≤ x ≤ b and Y = cX + d where
c > 0. Find the density function of Y .
2. Show that if X = exponential(1) then Y = X/λ is exponential(λ).
3. Suppose X is uniform on (0, 1). Find the density function of Y = X n .
4. Suppose X has density x−2 for x ≥ 1 and Y = X −2 . Find the density
function of Y .
5. Suppose X has an exponential distribution with parameter λ and Y = X 1/α .
Find the density function of Y . This is the Weibull distribution.
6. Suppose X has an exponential distribution with parameter 1 and Y =
ln(X). Find the distribution function of X. This is the double exponential
distribution.
7. Suppose X has a normal distribution and Y = eX . Find the density function
of Y . This is the lognormal distribution.
8. A drunk standing one foot from a wall shines a flashlight at a random angle
that is uniformly distributed between −π/2 and π/2. Find the density function
of the place where the light hits the wall. The answer is called the Cauchy
density.
9. Suppose X is uniform on (0, π/2) and Y = sin X. Find the density function
of Y . The answer is called the arcsine law because the distribution function
contains the arcsine function.
10. Suppose X has density function 3x−4 for x ≥ 1. (a) Find a function g so
that g(X) is uniform on (0, 1). (b) Find a function h so that if U is uniform on
(0, 1), h(U ) has density function 3x−4 for x ≥ 1.
11. Suppose X has density function f (x) for −1 ≤ x ≤ 1, 0 otherwise. Find the
density function of (a) Y = |X|, (b) Z = X 2 .
12. Suppose X has density function x/2 for 0 < x < 2, 0 otherwise. Find
the density function of Y = X(2 − X) by computing P (Y ≥ y) and then
differentiating.
13. A weather channel has the local forecast on the hour and at 10, 25, 30, 45,
and 55 minutes past. Suppose you wake up in the middle of the night and turn
on the TV, and let X be the time you have to wait to see the local forecast,
measured in hours. Find the density function of X.
14. Suppose r is differentiable and {x : r0 (x) = 0} is finite. Show that the
density function of Y = r(X) is given by
X
f (x)/|r0 (x)|
x:r(x)=y
166 CHAPTER 5. CONTINUOUS DISTRIBUTIONS
15. Show that if F1 (x) ≤ F2 (x) are two distribution functions then by us-
ing the recipe in (3.3) we can define random variables X1 and X2 with these
distributions so that X1 ≥ X2 .
Joint distributions
16. Suppose we draw 2 balls out of an urn with 8 red, 6 blue, and 4 green balls.
Let X be the number of red balls we get and Y the number of blue balls. Find
the joint distribution of X and Y .
17. Suppose we roll two dice that have the numbers 1, 2, 3, and 4 on their four
sides. Let X be the maximum of the two numbers that appear and Y be the
sum. Find the joint distribution of X and Y .
18. Suppose we roll two ordinary six-sided dice, let X be the minimum of the
two numbers that appear, and let Y be the maximum of the two numbers. Find
the joint distribution of X and Y .
19. Suppose we roll one die repeatedly and let Ni be the number of the roll on
which i first appears. Find the joint distribution of N1 and N6 .
20. Suppose P (X = x, Y = y) = c(x + y) for x, y = 0, 1, 2, 3. (a) What value of
c will make this a probability function? (b) What is P (X > Y )?
21. Suppose X and Y have joint density f (x, y) = c(x + y) for 0 < x, y < 1. (a)
What is c? (b) What is P (X < 1/2)?
22. Suppose X and Y have joint density f (x, y) = 6xy 2 for 0 < x, y < 1. What
is P (X + Y < 1)?
23. Suppose X and Y have joint density f (x, y) = 2 for 0 < y < x < 1. Find
P (X − Y > z).
24. We take a stick of length 1 and break it into 3 pieces. To be precise, we
think of taking the unit interval and cutting at X < Y where X and Y have
joint density f (x, y) = 2 for 0 < x < y < 1. What is the probability we can
make a triangle with the three pieces? (We can do this if no piece is longer than
1/2.)
25. Suppose X and Y have joint density f (x, y) = 1 for 0 < x, y < 1. Find
P (XY ≤ z).
26. X, Y , and Z have a uniform density on the unit cube. That is, their joint
density is 1 when 0 < x, y, z < 1 and 0 otherwise. Find P (X + Y + Z < 1).
27. Suppose X and Y have joint density f (x, y) = e−(x+y) for x, y > 0. Find
the distribution function.
28. Suppose X is uniform on (0, 1) and Y = X. Find the joint distribution
function of X and Y .
29. A pair of random variables X and Y take values between 0 and 1 and have
P (X ≤ x, Y ≤ y) = x3 y 2 when 0 ≤ x, y ≤ 1. Find the joint density function.
5.5. EXERCISES 167
min{X1 , . . . , Xn } = exponential(nλ)
168 CHAPTER 5. CONTINUOUS DISTRIBUTIONS
43. Suppose X1 , X2 , . . . are independent and have the same continuous distri-
bution F . We say that a record occurs at time k if Xk > maxj<k Xj . Show that
the events Ak = “a record occurs at time k” are independent and P (Ak ) = 1/k.
44. Let E1 , . . . , En be events and let Xi be 1 if Ei occurs, 0 otherwise. These
are called indicator random variables, since they indicate whether or not the
ith event occurred. Show that the indicator random variables are independent
if and only if the events Ei are.