0% found this document useful (0 votes)
149 views24 pages

ch5 PDF

This document defines and provides examples of continuous probability distributions. It discusses density functions, which describe the probability of a continuous random variable falling within a particular range of values. The document defines distribution functions, which give the probability that a random variable is less than or equal to a particular value. Examples provided include the uniform, exponential, and power law distributions. Distribution functions are calculated by taking the integral of the density function from negative infinity to the given value.

Uploaded by

charu.iitd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
149 views24 pages

ch5 PDF

This document defines and provides examples of continuous probability distributions. It discusses density functions, which describe the probability of a continuous random variable falling within a particular range of values. The document defines distribution functions, which give the probability that a random variable is less than or equal to a particular value. Examples provided include the uniform, exponential, and power law distributions. Distribution functions are calculated by taking the integral of the density function from negative infinity to the given value.

Uploaded by

charu.iitd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Chapter 5

Continuous Distributions

5.1 Density and Distribution Functions


In many situations random variables can take any value on the real line or in
a certain subset of the real line. For concrete examples, consider the height
or weight of a person chosen at random or the time it takes a person to drive
from Los Angeles to San Francisco. A random variable X is said to have a
continuous distribution with density function f if for all a ≤ b we have
Z b
P (a ≤ X ≤ b) = f (x) dx (5.1)
a

Geometrically, P (a ≤ X ≤ b) is the area under the curve f between a and b.

For the purposes of understanding and remembering formulas, it is useful to


think of f (x) as P (X = x) even though the last event has probability zero. To
explain the last remark and to prove P (X = x) = 0, note that taking a = x
and b = x + ∆x in (2.1) we have
Z x+∆x
P (x ≤ X ≤ x + ∆x) = f (y) dy ≈ f (x)∆x
x

when ∆x is small. Letting ∆x → 0, we see that P (X = x) = 0, but f (x) tells


us how likely it is for X to be near x. That is,
P (x ≤ X ≤ x + ∆x)
≈ f (x)
∆x

145
146 CHAPTER 5. CONTINUOUS DISTRIBUTIONS

In order for P (a ≤ X ≤ b) to be nonnegative for all a and b and for P (−∞ <
X < ∞) = 1 we must have
Z ∞
f (x) ≥ 0 and f (x) dx = 1 (5.2)
−∞

Any function f that satisfies (5.2) is said to be a density function. Some


important density functions are:
Example 5.1. Uniform distribution.
(
1
b−a a<x<b
f (x) =
0 otherwise

The idea here is that we are picking a value “at random” from (a,b). That is,
values outside the interval are impossible, and all those inside have the same
probability (density).
If we set f (x) = c when a < x < b and 0 otherwise then
Z Z b
f (x) dx = c dx = c(b − a)
a

So we have to pick c = 1/(b − a) to make the integral 1. The most important


special case occurs when a = 0 and b = 1. Random numbers generated by a
computer are typically uniformly distributed on (0, 1). Another case that comes
up in applications is a = −1/2 and b = 1/2. If we take a measurement and round
it off to the nearest integer then it is reasonable to assume that the “round-off
error” is uniformly distributed on (−1/2, 1/2).
Example 5.2. Exponential distribution.
(
λe−λx x ≥ 0
f (x) =
0 otherwise
Here λ > 0 is a parameter.
To check that this is a density function, we note that
Z ∞

λe−λx dx = −e−λx 0 = 0 − (−1) = 1
0

Exponentially distributed random variables often come up as waiting times be-


tween events; for example, the arrival times of customers at a bank or ice cream
shop. Sometimes we will indicate that X has an exponential distribution with
parameter λ by writing X = exponential(λ).
Example 5.3. Power laws.
(
(ρ − 1)x−ρ x≥1
f (x) =
0 otherwise
Here ρ > 1 is a parameter that governs how fast the probabilities go to 0 at ∞.
5.1. DENSITY AND DISTRIBUTION FUNCTIONS 147

To check that this is a density function, we note that


Z ∞ ∞
(ρ − 1)x−ρ dx = −x−(ρ−1) = 0 − (−1) = 1

1 1

These distributions are often used in situations where P (X > x) does not go to
0 very fast as x → ∞. For example, the Italian economist Pareto used them to
describe the distribution of family incomes.
Distribution functions
Any random variable (discrete, continuous, or in between) has a distribution
function defined by F (x) = P (X ≤ x). If X has a density function f (x) then
Z x
F (x) = P (−∞ < X ≤ x) = f (y) dy
−∞

That is, F is an antiderivative of f .


One of the reasons for computing the distribution function is explained by
the next formula. If a < b then {X ≤ b} = {X ≤ a} ∪ {a < X ≤ b} with the
two sets on the right-hand side disjoint so

P (X ≤ b) = P (X ≤ a) + P (a < X ≤ b)

or, rearranging,

P (a < X ≤ b) = P (X ≤ b) − P (X ≤ a) = F (b) − F (a) (5.3)

The last formula is valid for any random variable. When X has density function
f , it says that
Z b
f (x) dx = F (b) − F (a)
a
i.e., the integral can be evaluated by taking the difference of the antiderivative
at the two endpoints.
To see what distribution functions look like, and to explain the use of (5.3),
we return to our examples.
Example 5.4. Uniform distribution. f (x) = 1/(b − a) for a < x < b.

0
 x≤a
F (x) = (x − a)/(b − a) a ≤ x ≤ b

1 x≥b

To check this, note that P (a < X < b) = 1 so P (X ≤ x) = 1 when x ≥ b


and P (X ≤ x) = 0 when x ≤ a. For a ≤ x ≤ b we compute
Z x Z x
1 x−a
P (X ≤ x) = f (y) dy = dy =
−∞ a b − a b−a

In the most important special case a = 0, b = 1 we have F (x) = x for 0 ≤ x ≤ 1.


148 CHAPTER 5. CONTINUOUS DISTRIBUTIONS

Example 5.5. Exponential distribution. f (x) = λe−λx for x ≥ 0.


(
0 x≤0
F (x) = −λx
1−e x≥0

The first line of the answer is easy to see. Since P (X > 0) = 1 we have
P (X ≤ x) = 0 for x ≤ 0. For x ≥ 0 we compute
Z x Z x
P (X ≤ x) = f (y) dy = λe−λy dy
−∞ 0
x
= −e−λy 0 = −e−λx − (−1)

Suppose X has an exponential distribution with parameter λ. If t ≥ 0 then


P (X > t) = 1 − P (X ≤ t) = 1 − F (t) = e−λt , so if s ≥ 0 then

P (T > t + s) e−λ(t+s)
P (T > t + s|T > t) = = = e−λs = P (T > s)
P (T > t) e−λt

This is the lack of memory property of the exponential distribution. Given


that you have been waiting t units of time, the probability you must wait an
additional s units of time is the same as if you had not been waiting at all.
Example 5.6. Power laws. f (x) = (ρ − 1)x−ρ for x ≥ 1. Here ρ > 1.
(
0 x≤1
F (x) = −(ρ−1)
1−x x≥1

The first line of the answer is easy to see. Since P (X > 1) = 1, we have
P (X ≤ x) = 0 for x ≤ 1. For x ≥ 1 we compute
Z x Z x
P (X ≤ x) = f (y) dy = (ρ − 1)y −ρ dy
−∞ 1
x
= −y −(ρ−1) = 1 − x−(ρ−1)

1

To illustrate the use of (5.3) we note that if ρ = 3 then


1 1 3
P (2 < X ≤ 4) = (1 − 4−2 ) − (1 − 2−2 ) = − =
4 16 16

Distribution functions are somewhat messier in the discrete case.


Example 5.7. Flip three coins and let X be the number of heads that we see.
The probability function is given by
x 0 1 2 3
P (X = x) 1/8 3/8 3/8 1/8
5.1. DENSITY AND DISTRIBUTION FUNCTIONS 149

In this case the distribution function is:





 0 x<0
1/8 0 ≤x<1




F (x) = 4/8 1 ≤ x < 2

7/8 2 ≤ x < 3





1 3≤x

To check this, note for example that for 1 ≤ x < 2, P (X ≤ x) = P (X ∈


{0, 1}) = 1/8 + 3/8. The reader should note that F is discontinuous at each
possible value of X and the height of the jump there is P (X = x).
Theorem 5.1. All distribution functions have the following properties
(i) If x1 < x2 then F (x1 ) ≤ F (x2 ) i.e., F is nondecreasing.
(ii) limx→−∞ F (x) = 0
(iii) limx→∞ F (x) = 1
(iv) limy↓x F (y) = F (x), i.e., F is continuous from the right.
(v) limy↑x F (y) = P (X < x)
(vi) limy↓x F (y) − limy↑x F (y) = P (X = x),
i.e., the jump in F at x is equal to P (X = x).
Proof. To prove (i) we note that {X ≤ x1 } ⊂ {X ≤ x2 }, so (1.4) implies
F (x1 ) = P (X ≤ x1 ) ≤ P (X ≤ x2 ) = F (x2 ).
For (ii), we note that {X ≤ x} ↓ ∅ as x ↓ −∞ (here ↓ is short for “decreases
and converges to”), so (1.6) implies that P (X ≤ x) ↓ P (∅) = 0.
The argument for (iii) is similar {X ≤ x} ↑ Ω as x ↑ ∞ (here ↑ is short for
“increases and converges to”), so (1.5) implies that P (X ≤ x) ↑ P (Ω) = 1.
For (iv), we note that if y ↓ x then {X ≤ y} ↓ {X ≤ x}, so (1.6) implies that
P (X ≤ y) ↓ P (X ≤ x).
The argument for (v) is similar. If y ↑ x then {X ≤ y} ↑ {X < x} since
{X = x} 6⊂ {X ≤ y} when y < x. Using (1.5) now, (v) follows.
Subtracting (v) from (iv) gives (vi).

Medians
Intuitively, the median is the place where F (x) crosses 1/2. The precise
definition we are about to give is complicated by the fact that {x : F (x) = 1/2}
may be empty or contain more than one point.
m is a median for F if P (X ≤ m) ≥ 1/2 and P (X ≥ m) ≥ 1/2.
A more computationally useful form is: Let m1 = min{x : P (X ≥ x) ≥ 1/2} and
m2 = max{x : P (X ≤ x) ≤ 1/2}. m is a median if and only if m1 ≤ m ≤ m2 .
We begin with a simple example and then consider two that illustrate the
problems that can arise.
150 CHAPTER 5. CONTINUOUS DISTRIBUTIONS

Example 5.8. Suppose X has an exponential(λ) density. As we computed in


Example 5.5, the distribution function is F (x) = 1 − e−λx .
To find the median we set P (X ≤ m) = 1/2, i.e., 1 − e−λm = 1/2, and solve
to find m = (ln 2)/λ. To see that this is a median, we note that P (X = m) = 0
so P (X ≥ m) = P (X > m) = 1 − P (X ≤ m) = 1/2. To see that this is the
only median, we observe that if m < (ln 2)/λ then P (X ≤ m) < 1/2 while if
m > (ln 2)/λ then P (X ≥ m) < 1/2.
In the context of radioactive decay, which is commonly modeled with an
exponential distribution, the median is sometimes called the half-life, since
half of the particles will have broken down by that time. One reason for interest
in the half-life is that

P (X > k ln 2/λ) = e−k ln 2 = 2−k

or in words, after k half-lives only 1/2k particles remain radioactive.


Example 5.9. Suppose X takes values 1, 2, 3 with probability 1/3 each. The
distribution function is


0 x<1

1/3 1 ≤ x < 2
F (x) =
2/3 2 ≤ x < 3


1 3≤x

To check that 2 is a median, we note that

P (X ≤ 2) = P (X ∈ {1, 2}) = 2/3


P (X ≥ 2) = P (X ∈ {2, 3}) = 2/3

This is the only median, since if x < 2 then P (X ≤ x) ≤ P (X < 2) ≤ 1/3 and
if x > 2 then P (X ≥ x) ≤ P (X > 2) = 1/3.
Example 5.10. Suppose X takes values 1, 2, 3, 4 with probability 1/4 each.
The distribution function is



0 x<1
1/4 1≤x<2



F (x) = 2/4 2≤x<3

3/4 3≤x<4





1 4≤x

To check that any number m with 2 ≤ m ≤ 3 is a median, we note that for


any of these values

P (X ≤ m) ≥ P (X ∈ {1, 2}) = 2/4


P (X ≥ m) ≥ P (X ∈ {3, 4}) = 2/4
5.1. DENSITY AND DISTRIBUTION FUNCTIONS 151

These are the only medians, since if x < 2 then P (X ≤ x) ≤ P (X < 2) ≤ 1/4
and if x > 3 then P (X ≥ x) ≤ P (X > 3) = 1/4.
Expected value

P For a discrete random variable the expected value is defined by EX =


x P (X = x). To define the expected value for a continuous random vari-
ables, we replace the probability function by the density function and the sum
by an integral. Z
EX = xf (x) dx (5.4)

Example 5.11. Uniform distribution. Suppose X has density function


f (x) = 1/(b − a) for a < x < b and 0 otherwise.

In this case
b
b2 a2
Z
x (b + a)
EX = dx = − =
a b−a 2(b − a) 2(b − a) 2

since b2 − a2 = (b + a)(b − a). Notice that (a + b)/2 is the midpoint of the


interval and hence the natural choice for the average value of X.

Example 5.12. Exponential distribution. Suppose X has density function


f (x) = λe−λx for x ≥ 0 and 0 otherwise.

To compute EX we integrate by parts with f (x) = x, g 0 (x) = λe−λx , so


f (x) = 1 and g(x) = −e−λx .
0

Z ∞
EX = x λe−λx dx
0
Z ∞
= e−λx dx = 1/λ
0

Example 5.13. Power laws. Let ρ > 1 and f (x) = (ρ − 1)x−ρ for x ≥ 1.
with f (x) = 0 otherwise.

To compute EX

ρ−2
Z
EX = x(ρ − 1)x−ρ dx =
1 ρ−1

if ρ > 2. EX = ∞ if 1 < ρ ≤ 2.
152 CHAPTER 5. CONTINUOUS DISTRIBUTIONS

5.2 Functions of Random Variables


In this section we will answer the question: If X has density function f and
Y = r(X), then what is the density function for Y ? Before proving a general
result, we will consider an example:
Example 5.14. Suppose X has an exponential distribution with parameter λ.
What is the distribution of Y = X 2 ?
To solve this problem we will use the distribution function. First we recall
from Example 5.5 that P (X ≤ x) = 1 − e−λx so if y ≥ 0 then
√ 1/2
P (Y ≤ y) = P (X 2 ≤ y) = P (X ≤ y) = 1 − e−λy

Differentiating, we see that the density function of Y

d λy −1/2 −λy1/2
fY (y) = P (Y ≤ y) = e for y ≥ 0
dy 2
and 0 otherwise.
Generalizing from the last example, we get
Theorem 5.2. Suppose X has density f and P (a < X < b) = 1. Let Y =
r(X). Suppose r : (a, b) → (α, β) is continuous and strictly increasing, and let
s : (α, β) → (a, b) be the inverse of r. Then Y has density

g(y) = f (s(y))s0 (y) for y ∈ (α, β) (5.5)

Before proving this, let’s see how it applies to the last example. There X has
density f (x) = λe−λx for x ≥ 0 so we can take a = 0 and b = ∞. The function
5.2. FUNCTIONS OF RANDOM VARIABLES 153

r(x) = x2 is indeed continuous and strictly increasing on (0, ∞). (Notice, how-
ever, that x2 is decreasing on (−∞, 0).) α = r(a) = 0 and β = r(b) = ∞. To
find the inverse function we set y = x2 and solve to get x = y 1/2 so s(y) = y 1/2 .
Differentiating, we have s0 (y) = y −1/2 /2 and plugging into the formula, we have
1/2
g(y) = λe−λy · y −1/2 /2 for y > 0

Proof. If y ∈ (α, β) then

P (Y ≤ y) = P (r(X) ≤ y) = P (X ≤ s(y))

since r is increasing and s is its inverse.


Writing F (x) for P (X ≤ x) and differentiating with respect to y now gives

d d
g(y) = P (Y ≤ y) = F (s(y)) = F 0 (s(y))s0 (y) = f (s(y))s0 (y)
dy dy
by the chain rule.
Example 5.15. How not to water your lawn. The head of a lawn sprinkler,
which is a metal rod with a line of small holes in it, revolves back and forth so
that drops of water shoot out at angles between 0 and π/2 radians (i.e., between
0 and 90 degrees). If we use x to denote the distance from the sprinkler and y
the height off the ground, then a drop of water released at angle θ with velocity
v0 will follow a trajectory

x(t) = (v0 cos θ)t y(t) = (v0 sin θ)t − gt2 /2

where g is the gravitational constant, 32 ft/sec2 . The drop lands when y(t0 ) = 0
that is, at time t0 = (2v0 sin θ)/g. At this time

2v02 v2
x(t0 ) = sin θ cos θ = 0 sin(2θ)
g g

If we assume that the sprinkler moves evenly back and forth between 0 and π/2,
it will spend an equal amount of time at each angle. Letting K = v02 /g, this
leads us to the following question:
If Θ is uniform on [0, π/2] then what is the distribution of Z = K sin(2Θ)?

The first difficulty we must confront when solving this problem is that sin(2x) is
increasing on [0, π/4] and decreasing on [π/4, π/2]. The solution to this problem
is simple, however. The function sin(2x) is symmetric about π/4, so if we let X
be uniform on [0, π/4] then Z = K sin(2Θ) and Y = K sin(2X) have the same
distribution. To apply (3.1) we let r(x) = K sin(2x) and solve y = K sin(2x) to
get s(y) = (1/2) sin−1 (y/K). Plugging into (3.1) and recalling

d 1
sin−1 (x) = √
dx 1 − x2
154 CHAPTER 5. CONTINUOUS DISTRIBUTIONS

we see that Y has density function


4 1 1 2
f (s(y))s0 (y) = · p · = p
π 2 1 − y /K K
2 2 π K 2 − y2
when 0 < y < K and 0 otherwise. The title of this example comes from the fact
that the density function goes to ∞ as y → K so the lawn gets very soggy at
the edge of the sprinkler’s range. This is due to the fact that s0 (K) = ∞, which
in turn is caused by r0 (π/4) = 0.
Example 5.16. Suppose X has an exponential distribution with parameter 3.
That is, X has density function 3e−3x for x ≥ 0. Let Y = 1 − e−3X .
Here, r(x) = 1 − e−3x is increasing on (0, ∞), α = r(0) = 0, and β =
r(∞) = 1. To find the inverse function we set y = 1 − e−3x and solve to get
s(y) = (−1/3) ln(1 − y). Differentiating, we have s0 (y) = −(−1/3)/(1 − y). So
plugging into (3.1), the density function of Y is
1/3
f (s(y))s0 (y) = 3eln(1−y) · =1
(1 − y)
for 0 < y < 1. That is, Y is uniform on (0, 1). There is nothing special about
λ = 3 here. The next result shows that there is nothing very special about the
exponential distribution.
Theorem 5.3. Suppose X has a continuous distribution. Then Y = F (X) is
uniform on (0, 1).
Proof. Even though F may not be strictly increasing, we can define an inverse
of F by
F −1 (y) = min{x : F (x) ≥ y}
Using this definition of F −1 , we have
P (Y ≤ y) = P (X ≤ F −1 (y)) = F (F −1 (y)) = y
the last equality holding since F is continuous.

Reversing the ideas in the last proof, we get a result that is useful to construct
random variables with a specified distribution.
Theorem 5.4. Suppose U has a uniform distribution on (0,1). Then Y =
F −1 (U ) has distribution function F .
Proof. The definition of F −1 was chosen so that if 0 < x < 1 then
F −1 (y) ≤ x if and only if F (x) ≤ y
and this holds for any distribution function F . Taking y = U , it follows that
P (F −1 (U ) ≤ x) = P (U ≤ F (x)) = F (x)
since P (U ≤ u) = u.
5.2. FUNCTIONS OF RANDOM VARIABLES 155

For a concrete example, suppose we want to construct an exponential distribu-


tion with parameter λ. Setting 1−e−λx = y and solving gives − ln(1−y)/λ = x.
So if U is uniform on (0, 1) then − ln(1 − U )/λ has the desired exponential
distribution. Of course since 1 − U is uniform on (0, 1) we could also use
− ln(U )/λ. In the case of a power law, setting 1 − x−(ρ−1) = y and solving
gives (1 − y)−1/(ρ−1) = x. So if U is uniform on (0, 1) then U −1/(ρ−1) has the
desired power law distribution.
156 CHAPTER 5. CONTINUOUS DISTRIBUTIONS

5.3 Joint Distributions


Two random variables are said to have joint density function f if
ZZ
P ((X, Y ) ∈ A) = f (x, y) dx dy (5.6)
A
RR
where f (x, y) ≥ 0 and f (x, y) dx dy = 1.
In words, we find the probability that (X, Y ) lies in A by integrating f over
A. As we will see a number of times below, it is useful to think of f (x, y) as
P (X = x, Y = y) even though the last event has probability 0. As in Section
5.1, the precise interpretation of f (x, y) is
Z x+∆x Z y+∆y
P (x ≤ X ≤ x + ∆x, y ≤ Y ≤ y + ∆y) = f (u, v) dv du
x y
≈ f (x, y)∆x∆y
when ∆x and ∆y are small, so f (x, y) indicates how likely it is for (X, Y ) to be
near (x, y).
For a concrete example of a joint density function, consider
Example 5.17. (
e−y 0<x<y<∞
f (x, y) =
0 otherwise
The story behind this example will be told later. To check that f is a density
function, we observe that
Z ∞Z y Z ∞
−y
e dx dy = ye−y dy
0 0 0

and integrating by parts with g(y) = y, h (y) = e−y (so g 0 (y) = 1, h(y) = −e−y ).
0

Z ∞ Z ∞
∞ ∞
ye−y dy = −ye−y 0 + e−y dy = 0 + (−e−y ) 0 = 1
0 0

To illustrate the use of (5.6) we will now compute P (X ≤ 1), which can be
written as P ((X, Y ) ∈ A) where A = {(x, y) : x ≤ 1}. The formula in (5.6) tells
us that we find P ((X, Y ) ∈ A) by integrating the joint density over A. However,
the joint density is only positive on B = {(x, y) : 0 < x < y < ∞} so we only
need to integrate over A ∩ B = {(x, y) : 0 < x ≤ 1, x < y}, and doing this we
find Z Z 1 ∞
P (X ≤ 1) = e−y dy dx
0 x
To evaluate the double integral we begin by observing that
Z ∞

e−y dy = (−e−y ) x = 0 − (−e−x ) = e−x
x
R1 1
so P (X < 1) = 0
e−x dx = (−e−x )|0 = 1 − e−1 .
5.3. JOINT DISTRIBUTIONS 157

Example 5.18. Uniform on a ball. Pick a point at random from the ball
B = {(x, y) : x2 + y 2 ≤ 1}. By “at random from B” we mean that a choice
outside of B is impossible and that all the points in B should be equally likely.
In terms of the joint density this means that f (x, y) = 0 when (x, y) 6∈ B and
there is a constant c > 0 so that f (x, y) = c when (x, y) ∈ B.
Our f (x, y) ≥ 0. To make the integral of f equal to 1, we have to choose c
appropriately. Now,
ZZ ZZ
f (x, y) dx dy = c dx dy = c (area of B) = cπ
B

So we choose c = 1/π to make the integral 1 and define


(
1/π x2 + y 2 ≤ 1
f (x, y) =
0 otherwise

The arguments that led to the last conclusion generalize easily to show that if
we pick a point “at random” from a set S with area a then
(
1/a (x, y) ∈ S
f (x, y) = (5.7)
0 otherwise

Example 5.19. Buffon’s needle. A floor consists of boards of width 1. If


we drop a needle of length L ≤ 1 on the floor, what is the probability it will
touch one of the cracks (i.e., the small spaces between the boards)? To make
the question simpler to answer, we assume that the needle and the cracks have
width zero.

Let X be the distance from the center of the needle to the nearest crack and
Θ be the angle ∈ [0, π) that the top half of the needle makes with the crack.
(We make this choice to have sin Θ > 0.) We assume that all the ways the
needle can land are equally likely, that is, the joint distribution of (X, Θ) is
(
2/π if x ∈ [0, 1/2), θ ∈ [0, π)
f (x, θ) =
0 otherwise

The formula for the joint density follows from (5.7). We are picking a point “at
random” from a set S with area π/2, so the joint density is 2/π on S.
158 CHAPTER 5. CONTINUOUS DISTRIBUTIONS

By drawing a picture (like the one above), one sees that the needle touches
the crack if and only if (L/2) sin Θ ≥ X. (5.6) tells us that the probability of
this event is obtained by integrating the joint density over
A = {(x, θ) ∈ [0, 1/2) × [0, π) : x ≤ (L/2) sin θ}
so the probability we seek is
ZZ Z π Z (L/2) sin θ
2
f (x, θ) dx dθ = dx dθ
A 0 0 π
π
2 πL
Z
L
= sin θ dθ = (− cos θ) = 2L/π
π 0 2 π 0

Buffon wanted to use this as a method of estimating π. Taking L = 1/2 and


performing the experiment 10,000 times on a computer, we found that 1 over
the fraction of times the needle hit the crack was 3.2310, 3.1368, and 3.0893
in the three times we tried this. We will see in Chapter 5 that these numbers
are typical outcomes and that to compute π to 4 decimal places would require
about 108 (or 100 million) tosses.
Remark. Before leaving the subject of joint densities, we would like to make
one remark that will be useful later. If X and RR
Y have joint density f (x, y) then
P (X = Y ) = 0. To see this, we observe that A f (x, y) dx dy is the volume of
the region over A underneath the graph of f , but this volume is 0 if A is the
line x = y.
Joint distribution function
The joint distribution of two random variables is occasionally described by
giving the joint distribution function:
F (x, y) = P (X ≤ x, Y ≤ y)
The next example illustrates this notion but also shows that in this situation
the density function is easier to write down.
Example 5.20. Suppose (X, Y ) is uniformly distributed over the square {(x, y) :
0 < x < 1, 0 < y < 1}. That is,
(
1 0 < x < 1, 0 < y < 1
f (x, y) =
0 otherwise
Here, we are picking a point “at random” from a set with area 1, so the formula
follows from (5.7).
By patiently considering the possible cases, one finds that



 0 if x < 0 or y < 0
xy if 0 ≤ x ≤ 1 and 0 ≤ y ≤ 1



F (x, y) = x if 0 ≤ x ≤ 1 and y > 1

y if x > 1 and 0 ≤ y ≤ 1





1 if x > 1 and y > 1
5.3. JOINT DISTRIBUTIONS 159

The first case should be clear: If x < 0 or y < 0 then {X ≤ x, Y ≤ y} is


impossible since X and Y always lie between 0 and 1. For the second case we
note that when 0 ≤ x ≤ 1 and 0 ≤ y ≤ 1,
Z xZ y
P (X ≤ x, Y ≤ y) = 1 dv du = xy
0 0

In the third case, since values of Y > 1 are impossible,

P (X ≤ x, Y ≤ y) = P (X ≤ x, Y ≤ 1) = x

by the formula for the second case. The fourth case is similar to the third, and
the fifth is trivial. X and Y are always smaller than 1 so if x > 1 and y > 1
then {X ≤ x, Y ≤ y} has probability 1.
We will not use the joint distribution function in what follows. For complete-
ness, however, we want to mention two of its important properties. The first
formula is the two-dimensional generalization of P (a < X ≤ b) = F (b) − F (a).

P (a1 < X ≤ b1 , a2 < Y ≤ b2 )


= F (b1 , b2 ) − F (a1 , b2 ) − F (b1 , a2 ) + F (a1 , a2 ) (5.8)

Proof. The reasoning we use here is much like that employed in studying the
probabilities of unions in Section 1.6. By adding and subtracting the probabil-
ities on the right, we end up with the desired area counted exactly once.
Using A as shorthand for P ((X, Y ) ∈ A), etc.,

F (b1 , b2 ) = A+B+C +D
−F (a1 , b2 ) = −B − D
−F (b1 , a2 ) = −C − D
F (a1 , a2 ) = D

Adding the last four equations gives the one in (5.8).

The next formula tells us how to recover the joint density function from
the joint distribution function. RTo motivate the formula, we recall that in one
x
dimension F 0 = f since F (x) = ∞ f (u) du.

∂2F
=f (5.9)
∂x∂y
To explain why this formula is true, we note that
Z x Z y
F (x, y) = f (u, v) dv du
−∞ −∞

and differentiating twice kills the two integrals. To check that (5.9) works in
∂2F
Example 5.20, F (x, y) = xy when 0 < x < 1 and 0 < y < 1, so ∂x∂y = 1 there
and it is 0 otherwise.
160 CHAPTER 5. CONTINUOUS DISTRIBUTIONS

n random variables
The developments above generalize in a straightforward way to n > 2 random
variables X1 , . . . , Xn . f is a joint density function if f (x1 , . . . , xn ) ≥ 0 and
Z Z
· · · f (x1 , . . . , xn ) dxn · · · dx1 = 1

The joint distribution function is defined by

F (x1 , . . . , xn ) = P
Z (X 1 ≤ Zx1 , . . . , Xn ≤ xn )
x1 xn
= ··· f (y1 , . . . , yn ) dyn · · · dy1
−∞ −∞

and differentiating the last equality n times gives


∂nF
=f (5.10)
∂x1 · · · ∂xn
5.4. MARGINAL AND CONDITIONAL DISTRIBUTIONS 161

5.4 Marginal and Conditional Distributions


In the discrete case the marginal distributions are obtained from the joint dis-
tribution by summing
X X
P (X = x) = P (X = x, Y = y) P (Y = y) = P (X = x, Y = y)
y x

In the continuous case if X and Y have joint density f (x, y), then the marginal
densities of X and Y are given by
Z Z
fX (x) = f (x, y) dy fY (y) = f (x, y) dx (5.11)

The verbal explanation of the first formula is similar to that of the discrete case:
if X = x then Y will take on some value y, so to find P (X = x) we integrate
the joint density f (x, y) over all possible values of y.
To illustrate the use of these formulas we look at Example 5.17.

Example 5.21. (
e−y 0<x<y<∞
f (x, y) =
0 otherwise

In this case Z ∞ ∞
fX (x) = e−y dy = (−e−y ) x = e−x
x

since (3.8) tells us to integrate f (x, y) over all values of y but we only have f > 0
when y > x. Similarly,
Z y
fY (y) = e−y dx = ye−y
0

The next result is the continuous analogue of (5.4):

Theorem 5.5. Two random variables with joint density f are independent if
and only if
f (x, y) = fX (x)fY (y)
that is, if the joint density is the product of the marginal densities.

We will now consider three examples that parallel the ones used in the dis-
crete case.

Example 5.22. (
e−y 0<x<y<∞
f (x, y) =
0 otherwise
162 CHAPTER 5. CONTINUOUS DISTRIBUTIONS

In this case
f (3, 2) = 0 6= fX (3)fY (2) > 0
so Theorem 5.5 implies that X and Y are not independent. In general, if the
set of values where f > 0 is not a rectangle then X and Y are not independent.
Example 5.23.
(
(1 + x + y)/2 0 < x < 1 and 0 < y < 1
f (x, y) =
0 otherwise

In this case the set where f > 0 is a rectangle, so the joint distribution passes
the first test and we have to compute the marginal densities
Z 1 1
y2
 
1+x x 3
fX (x) = (1 + x + y)/2 dy = y + = +
0 2 4 0 2 4
y 3
fY (y) = + by symmetry
2 4
These formulas are valid for 0 < x < 1 and 0 < y < 1 respectively. To check
independence we have to see if
   
1+x+y x 3 y 3
(?) = + · +
2 2 4 2 4
Multiplying both sides by 4 and simplifying the right-hand side, we see this
holds if and only if
2 + 2x + 2y = 4xy + 6x + 6y + 9
for all 0 < x < 1 and 0 < y < 1, which is ridiculous. A simpler way to see that
(?) is wrong is simply to note that when x = y = 0 it says that 1/2 = 9/16.
Example 5.24.
y −3/2 √
(
cos x sin x−(1/2y)
(e−1) 2π
e 0 < x < π/2, y > 0
f (x, y) =
0 otherwise
In this case it would not be very much fun to integrate to find the marginal
densities, so we adopt another approach.
Theorem 5.6. If f (x, y) can be written as g(x)h(y) then there is a constant c
so that fX (x) = cg(x) and fY (y) = h(y)/c. It follows that f (x, y) = fX (x)fY (y)
and hence X and Y are independent.

In words, if we can write f as a product of a function of x and a function of


y then these functions must be constant multiples of the marginal densities.
Theorem 5.6 takes care of our example since
y −3/2
 
sin x −(1/2y)

f (x, y) = cos x e √ e
(e − 1) 2π
5.4. MARGINAL AND CONDITIONAL DISTRIBUTIONS 163

Proof. We begin by observing


Z Z
fX (x) = f (x, y) dy = g(x) h(y) dy
Z Z
fY (y) = f (x, y) dx = h(y) g(x) dx
Z Z Z Z
1= f (x, y) dx dy = g(x) dx h(y) dy
R R
So if we let c = h(y) dy then the last equation implies g(x) dx = 1/c, and
the first two give us fX (x) = cg(x) and fY (y) = h(y)/c.

Conditional distributions
Introducing fX (x|Y = y) as notation for the conditional density of X
given Y = y (which we think of as P (X = x|Y = y)), we have

f (x, y) f (x, y)
fX (x|Y = y) = =R (5.12)
fY (y) f (u, y) du

In words, we fix y, consider the joint density function as a function of x, and


then divide by the integral to make it a probability density. To see how formula
(8.2) works, we look at our continuous example.

Example 5.25. (
e−y 0<x<y<∞
f (x, y) =
0 otherwise

In this case we have computed fY (y) = ye−y (in Example 5.2) so

e−y 1
fX (x|Y = y) = −y
= for 0 < x < y
ye y

That is, the conditional distribution is uniform on (0, y). This should not be
surprising since the joint density does not depend on x.
To compute the other conditional distribution we recall fX (x) = e−x so

e−y
fY (y|X = x) = = e−(y−x) for y > x
e−x
That is, given X = x, Y − x is exponential with parameter 1. The last answer
is quite reasonable since in Example 6.1 we saw that if Z1 , Z2 are independent
exponential(1) then X = Z1 , Y = Z1 +Z2 has the joint distribution given above.
If we condition on X = x then Z1 = x and Y = x + Z2 .

The multiplication rule says

P (X = x, Y = y) = P (X = x)P (Y = y|X = x)
164 CHAPTER 5. CONTINUOUS DISTRIBUTIONS

Substituting in the analogous continuous quantities, we have

f (x, y) = fX (x)fY (y|X = x) (5.13)

The next example demonstrates the use of (5.13) to compute a joint distribution.

Example 5.26. Suppose we pick a point uniformly distributed on (0, 1), call it
X, and then pick a point Y uniformly distributed on (0, X).

To find the joint density of (X, Y ) we note that

fX (x) = 1 for 0 < x < 1


fY (y|X = x) = 1/x for 0 < y < x

So using (8.3), we have

f (x, y) = fX (x)fY (y|X = x) = 1/x for 0 < y < x < 1

To complete the picture we compute


Z Z 1
1
fY (y) = f (x, y) dx = dx = − ln y
y x
f (x, y) 1/x
fX (x|Y = y) = = for y < x < 1
fY (y) − ln y

Again the conditional density of X given Y = y is obtained by fixing y, regarding


the joint density function as a function of x, and then normalizing so that the
integral is 1. The reader should note that although X is uniform on (0, 1) and
Y is uniform on (0, X), X is not uniform on (Y, 1) but has a greater probability
of being near Y .
5.5. EXERCISES 165

5.5 Exercises
1. Suppose X has density function f (x) for a ≤ x ≤ b and Y = cX + d where
c > 0. Find the density function of Y .
2. Show that if X = exponential(1) then Y = X/λ is exponential(λ).
3. Suppose X is uniform on (0, 1). Find the density function of Y = X n .
4. Suppose X has density x−2 for x ≥ 1 and Y = X −2 . Find the density
function of Y .
5. Suppose X has an exponential distribution with parameter λ and Y = X 1/α .
Find the density function of Y . This is the Weibull distribution.
6. Suppose X has an exponential distribution with parameter 1 and Y =
ln(X). Find the distribution function of X. This is the double exponential
distribution.
7. Suppose X has a normal distribution and Y = eX . Find the density function
of Y . This is the lognormal distribution.
8. A drunk standing one foot from a wall shines a flashlight at a random angle
that is uniformly distributed between −π/2 and π/2. Find the density function
of the place where the light hits the wall. The answer is called the Cauchy
density.
9. Suppose X is uniform on (0, π/2) and Y = sin X. Find the density function
of Y . The answer is called the arcsine law because the distribution function
contains the arcsine function.
10. Suppose X has density function 3x−4 for x ≥ 1. (a) Find a function g so
that g(X) is uniform on (0, 1). (b) Find a function h so that if U is uniform on
(0, 1), h(U ) has density function 3x−4 for x ≥ 1.
11. Suppose X has density function f (x) for −1 ≤ x ≤ 1, 0 otherwise. Find the
density function of (a) Y = |X|, (b) Z = X 2 .
12. Suppose X has density function x/2 for 0 < x < 2, 0 otherwise. Find
the density function of Y = X(2 − X) by computing P (Y ≥ y) and then
differentiating.
13. A weather channel has the local forecast on the hour and at 10, 25, 30, 45,
and 55 minutes past. Suppose you wake up in the middle of the night and turn
on the TV, and let X be the time you have to wait to see the local forecast,
measured in hours. Find the density function of X.
14. Suppose r is differentiable and {x : r0 (x) = 0} is finite. Show that the
density function of Y = r(X) is given by
X
f (x)/|r0 (x)|
x:r(x)=y
166 CHAPTER 5. CONTINUOUS DISTRIBUTIONS

15. Show that if F1 (x) ≤ F2 (x) are two distribution functions then by us-
ing the recipe in (3.3) we can define random variables X1 and X2 with these
distributions so that X1 ≥ X2 .
Joint distributions
16. Suppose we draw 2 balls out of an urn with 8 red, 6 blue, and 4 green balls.
Let X be the number of red balls we get and Y the number of blue balls. Find
the joint distribution of X and Y .
17. Suppose we roll two dice that have the numbers 1, 2, 3, and 4 on their four
sides. Let X be the maximum of the two numbers that appear and Y be the
sum. Find the joint distribution of X and Y .
18. Suppose we roll two ordinary six-sided dice, let X be the minimum of the
two numbers that appear, and let Y be the maximum of the two numbers. Find
the joint distribution of X and Y .
19. Suppose we roll one die repeatedly and let Ni be the number of the roll on
which i first appears. Find the joint distribution of N1 and N6 .
20. Suppose P (X = x, Y = y) = c(x + y) for x, y = 0, 1, 2, 3. (a) What value of
c will make this a probability function? (b) What is P (X > Y )?
21. Suppose X and Y have joint density f (x, y) = c(x + y) for 0 < x, y < 1. (a)
What is c? (b) What is P (X < 1/2)?
22. Suppose X and Y have joint density f (x, y) = 6xy 2 for 0 < x, y < 1. What
is P (X + Y < 1)?
23. Suppose X and Y have joint density f (x, y) = 2 for 0 < y < x < 1. Find
P (X − Y > z).
24. We take a stick of length 1 and break it into 3 pieces. To be precise, we
think of taking the unit interval and cutting at X < Y where X and Y have
joint density f (x, y) = 2 for 0 < x < y < 1. What is the probability we can
make a triangle with the three pieces? (We can do this if no piece is longer than
1/2.)
25. Suppose X and Y have joint density f (x, y) = 1 for 0 < x, y < 1. Find
P (XY ≤ z).
26. X, Y , and Z have a uniform density on the unit cube. That is, their joint
density is 1 when 0 < x, y, z < 1 and 0 otherwise. Find P (X + Y + Z < 1).
27. Suppose X and Y have joint density f (x, y) = e−(x+y) for x, y > 0. Find
the distribution function.
28. Suppose X is uniform on (0, 1) and Y = X. Find the joint distribution
function of X and Y .
29. A pair of random variables X and Y take values between 0 and 1 and have
P (X ≤ x, Y ≤ y) = x3 y 2 when 0 ≤ x, y ≤ 1. Find the joint density function.
5.5. EXERCISES 167

Marginal Distributions, Independence


30. Suppose a point (X, Y ) is chosen at random from the circle x2 + y 2 ≤ 1.
Find the marginal density of X.
31. Suppose X and Y have joint density f (x, y) = x + 2y 3 when 0 < x < 1 and
0 < y < 1. Find the marginal densities of X and Y .
32. Suppose X and Y have joint density f (x, y) = 6y when x > 0, y > 0, and
x + y < 1. Find the marginal densities of X and Y .
33. Suppose X and Y have joint density f (x, y) = 10x2 y when 0 < y < x < 1.
Find the marginal densities of X and Y .
34. Let (X, Y, Z) be a random point in the unit sphere. That is, their joint
density is 3/4π when x2 + y 2 + z 2 ≤ 1, 0 otherwise. Find the marginal density
of (a) (X, Y ), (b) Z.
35. Given the joint distribution function FX,Y (x, y) = P (X ≤ x, Y ≤ y), how
do you recover the marginal distribution FX (x) = P (X ≤ x)?
36. Suppose X and Y have joint density f (x, y). Are X and Y independent if
(a) f (x, y) = xe−x(1+y) for x, y ≥ 0?
(b) f (x, y) = 6xy 2 when x, y ≥ 0 and x + y ≤ 1?
(c) f (x, y) = 2xy + x when 0 < x < 1 and 0 < y < 1?
(d) f (x, y) = (x + y)2 − (x − y)2 when 0 < x < 1 and 0 < y < 1?
In each case f (x, y) = 0 otherwise.
37. Two people agree to meet for a drink after work but they are impatient and
each will only wait 15 minutes for the other person to show up. Suppose that
they each arrive at independent random times uniformly distributed between 5
p.m. and 6 p.m. What is the probability they will meet?
38. Suppose X1 and X2 are independent and uniform on (0,1). In Exercise 6.1
you will show that the joint density of Y = X1 /X2 and Z = X1 X2 is given by
f(Y,Z) = 1/2y when y > z > 0 and yz < 1, 0 otherwise. Find the marginal
densities of Y and Z.
39. Suppose X1 and X2 are independent and normal(0,1). Find the distribution
of Y = (X12 + X22 )1/2 . This is the Rayleigh distribution.
40. Suppose X1 , X2 , X3 are independent and normal(0,1). Find the distribution
of Y = (X12 + X22 + X32 )1/2 . This is the Maxwell distribution. It is used in
physics for the speed of particles in a gas.
41. Suppose X1 , . . . , Xn are independent and have distribution function F (x).
Find the distribution functions of (a) Y = max{X1 , . . . , Xn } and (b) Z =
min{X1 , . . . , Xn }
42. Suppose X1 , . . . , Xn are independent exponential(λ). Show that

min{X1 , . . . , Xn } = exponential(nλ)
168 CHAPTER 5. CONTINUOUS DISTRIBUTIONS

43. Suppose X1 , X2 , . . . are independent and have the same continuous distri-
bution F . We say that a record occurs at time k if Xk > maxj<k Xj . Show that
the events Ak = “a record occurs at time k” are independent and P (Ak ) = 1/k.
44. Let E1 , . . . , En be events and let Xi be 1 if Ei occurs, 0 otherwise. These
are called indicator random variables, since they indicate whether or not the
ith event occurred. Show that the indicator random variables are independent
if and only if the events Ei are.

You might also like