0% found this document useful (0 votes)
519 views57 pages

Chapter 4 Maths

Uploaded by

22051774
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
519 views57 pages

Chapter 4 Maths

Uploaded by

22051774
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

Continuous Random

Variables and Probability


Distributions
4
IN T ROD U C T I O N
Chapter 3 concentrated on the development of probability distributions for dis­
crete random variables. In this chapter, we consider the second general type of
random variable that arises in many applied problems. Sections 4.1 and 4.2
present the basic definitions and properties of continuous random variables and
their probability distributions. In Section 4.3, we study in detail the normal ran­
dom variable and distribution, unquestionably the most important and useful in
probability and statistics. Sections 4.4 and 4.5 discuss some other continuous
distributions that are often used in applied work. In Section 4.6, we introduce a
method for assessing whether given sample data is consistent with a specified
distribution.

141

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
142 Chapter 4 Continuous random Variables and probability Distributions

4.1 Probability Density Functions

A discrete random variable (rv) is one whose possible values either constitute a finite
set or else can be listed in an infinite sequence (a list in which there is a first element,
a second element, etc.). A random variable whose set of possible values is an entire
interval of numbers is not discrete.
Recall from Chapter 3 that a random variable X is continuous if (1) possible
values comprise either a single interval on the number line (for some A , B, any
number x between A and B is a possible value) or a union of disjoint intervals, and
(2) P(X 5 c) 5 0 for any number c that is a possible value of X.

ExamplE 4.1 If in the study of the ecology of a lake, we make depth measurements at randomly
chosen locations, then X 5 the depth at such a location is a continuous rv. Here A is
the minimum depth in the region being sampled, and B is the maximum depth. n

ExamplE 4.2 If a chemical compound is randomly selected and its pH X is determined, then X is
a continuous rv because any pH value between 0 and 14 is possible. If more is known
about the compound selected for analysis, then the set of possible values might be a
subinterval of [0, 14], such as 5.5 # x # 6.5, but X would still be continuous. n

ExamplE 4.3 Let X represent the amount of time a randomly selected customer spends waiting for
a haircut before his/her haircut commences. Your first thought might be that X is
a continuous random variable, since a measurement is required to determine its
value. However, there are customers lucky enough to have no wait whatsoever
before climbing into the barber’s chair. So it must be the case that P(X 5 0) . 0.
Conditional on no chairs being empty, though, the waiting time will be continuous
since X could then assume any value between some minimum possible time A and a
maximum possible time B. This random variable is neither purely discrete nor purely
continuous but instead is a mixture of the two types. n

One might argue that although in principle variables such as height, weight,
and temperature are continuous, in practice the limitations of our measuring instru-
ments restrict us to a discrete (though sometimes very finely subdivided) world.
However, continuous models often approximate real-world situations very well, and
continuous mathematics (the calculus) is frequently easier to work with than math-
ematics of discrete variables and distributions.

Probability Distributions for Continuous


Variables
Suppose the variable X of interest is the depth of a lake at a randomly chosen point
on the surface. Let M 5 the maximum depth (in meters), so that any number in the
interval [0, M ] is a possible value of X. If we “discretize” X by measuring depth to
the nearest meter, then possible values are nonnegative integers less than or equal to
M. The resulting discrete distribution of depth can be pictured using a probability his-
togram. If we draw the histogram so that the area of the rectangle above any possible
integer k is the proportion of the lake whose depth is (to the nearest meter) k, then
the total area of all rectangles is 1. A possible histogram appears in Figure 4.1(a).
If depth is measured much more accurately and the same measurement axis as
in Figure 4.1(a) is used, each rectangle in the resulting probability histogram is much

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
4.1 Probability Density Functions 143

narrower, though the total area of all rectangles is still 1. A possible histogram is
pictured in Figure 4.1(b); it has a much smoother appearance than the histogram in
Figure 4.1(a). If we continue in this way to measure depth more and more finely, the
resulting sequence of histograms approaches a smooth curve, such as is pictured in
Figure 4.1(c). Because for each histogram the total area of all rectangles equals 1,
the total area under the smooth curve is also 1. The probability that the depth at a
randomly chosen point is between a and b is just the area under the smooth curve
between a and b. It is exactly a smooth curve of the type pictured in Figure 4.1(c)
that specifies a continuous probability distribution.

0 M 0 M 0 M
(a) (b) (c)

Figure 4.1 (a) Probability histogram of depth measured to the nearest meter; (b) probability
histogram of depth measured to the nearest centimeter; (c) a limit of a sequence of discrete
histograms

DEFINITION Let X be a continuous rv. Then a probability distribution or probability den­


sity function (pdf) of X is a function f(x) such that for any two numbers a and
b with a # b,
b
P(a # X # b) 5 # f(x)dx
a

That is, the probability that X takes on a value in the interval [a, b] is the area
above this interval and under the graph of the density function, as illustrated in
Figure 4.2. The graph of f(x) is often referred to as the density curve.

f(x)

x
a b

Figure 4.2 P (a # X # b) 5 the area under the density curve between a and b

For f (x) to be a legitimate pdf, it must satisfy the following two conditions:
1. f (x) $ 0 for all x
`
2. #2`
f(x) dx 5 area under the entire graph of f (x)
51

ExamplE 4.4 The direction of an imperfection with respect to a reference line on a circular object
such as a tire, brake rotor, or flywheel is, in general, subject to uncertainty. Consider
the reference line connecting the valve stem on a tire to the center point, and let X

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
144 Chapter 4 Continuous random Variables and probability Distributions

be the angle measured clockwise to the location of an imperfection. One possible


pdf for X is

5
1
0 # x , 360
f(x) 5 360
0 otherwise

The pdf is graphed in Figure 4.3. Clearly f(x) $ 0. The area under the density curve
is just the area of a rectangle: (height)(base) 5 (1/360)(360) 5 1. The probability
that the angle is between 908 and 1808 is
180
u
x5180
1 x 1
P(90 # X # 180) 5 # 90 360
dx 5
360 x590
5
4
5 .25

The probability that the angle of occurrence is within 908 of the reference line is
P(0 # X # 90) 1 P(270 # X , 360) 5 .25 1 .25 5 .50

f(x) f(x)

Shaded area 5 P(90 # X #180)


1
360

x x
0 360 90 180 270 360

Figure 4.3 The pdf and probability from Example 4.4 n

Because whenever 0 # a # b # 360 in Example 4.4, P(a # X # b) depends only


on the width b 2 a of the interval, X is said to have a uniform distribution.

DEFINITION A continuous rv X is said to have a uniform distribution on the interval


[A, B] if the pdf of X is

5
1
A#x#B
f(x; A, B) 5 B2A
0 otherwise

The graph of any uniform pdf looks like the graph in Figure 4.3 except that the inter-
val of positive density is [A, B] rather than [0, 360].
In the discrete case, a probability mass function (pmf) tells us how little
“blobs” of probability mass of various magnitudes are distributed along the mea-
surement axis. In the continuous case, probability density is “smeared” in a continu-
ous fashion along the interval of possible values. When density is smeared uniformly
over the interval, a uniform pdf, as in Figure 4.3, results.
When X is a discrete random variable, each possible value is assigned positive
probability. This is not true of a continuous random variable (that is, the second

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
4.1 probability Density Functions 145

condition of the definition is satisfied) because the area under a density curve that
lies above any single value is zero:
c c1«
P(X 5 c) 5 #c
f(x) dx 5 lim
«S0 # c2«
f(x) dx 5 0

The fact that P(X 5 c) 5 0 when X is continuous has an important practical


consequence: The probability that X lies in some interval between a and b does not
depend on whether the lower limit a or the upper limit b is included in the prob-
ability calculation:
P(a # X # b) 5 P(a , X , b) 5 P(a , X # b) 5 P(a # X , b) (4.1)
If X is discrete and both a and b are possible values (e.g., X is binomial with n 5 20
and a 5 5, b 5 10), then all four of the probabilities in (4.1) are different.
The zero probability condition has a physical analog. Consider a solid circular
rod with cross-sectional area 5 1 in2. Place the rod alongside a measurement axis
and suppose that the density of the rod at any point x is given by the value f (x) of a
density function. Then if the rod is sliced at points a and b and this segment is
removed, the amount of mass removed is #ab f (x) dx; if the rod is sliced just at the
point c, no mass is removed. Mass is assigned to interval segments of the rod but
not to individual points.

ExamplE 4.5 “Time headway” in traffic flow is the elapsed time between the time that one car
finishes passing a fixed point and the instant that the next car begins to pass that
point. Let X 5 the time headway for two randomly chosen consecutive cars on a
freeway during a period of heavy flow. The following pdf of X is essentially the one
suggested in “The Statistical Properties of Freeway Traffic” (Transp. Res., vol.
11: 221–228):

5.15e 0 x $ .5
2.15(x2.5)
f(x) 5
otherwise
The graph of f (x) is given in Figure 4.4; there is no density associated with
headway times less than .5, and headway density decreases rapidly (exponentially
fast) as x increases from .5. Clearly, f(x) $ 0; to show that #2`
`
f (x) dx 5 1, we use
the calculus result #a e dx 5 (1/k)e . Then
` 2kx 2k?a

` ` `
# 2`
f(x) dx 5 # .15e
.5
2.15(x2.5)
dx 5 .15e.075 #e.5
2.15x
dx

1 2(.15)(.5)
5 .15e.075 ? e 51
.15

f(x)
.15
P(X # 5)

x
0 2 4 6 8 10
.5

Figure 4.4 The density curve for time headway in Example 4.5

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
146 Chapter 4 Continuous random Variables and probability Distributions

The probability that headway time is at most 5 sec is


5 5
P(X # 5) 5 # 2`
f(x) dx 5 # .15e
.5
2.15(x2.5)
dx

1 u 2
5 x55
1 2.15x
#
5 .15e.075 e2.15x dx 5 .15e.075 ? 2
.5 .15
e
x5.5

5 e.075(2e2.75 1 e2.075) 5 1.078(2.472 1 .928) 5 .491


5 P(less than 5 sec) 5 P(X , 5) n

Unlike discrete distributions such as the binomial, hypergeometric, and nega-


tive binomial, the distribution of any given continuous rv cannot usually be derived
using simple probabilistic arguments. Instead, one must make a judicious choice of
pdf based on prior knowledge and available data. Fortunately, there are some general
families of pdf’s that have been found to be sensible candidates in a wide variety of
experimental situations; several of these are discussed later in the chapter.
Just as in the discrete case, it is often helpful to think of the population of
interest as consisting of X values rather than individuals or objects. The pdf is then
a model for the distribution of values in this numerical population, and from this
model various population characteristics (such as the mean) can be calculated.

EXERCISES Section 4.1 (1–10)

1. The current in a certain circuit as measured by an amme- c. Compute P(21 , X , 1).


ter is a continuous random variable X with the following d. Compute P(X , 2.5 or X . .5).
density function:
4. Let X denote the vibratory stress (psi) on a wind tur-

5
.075x 1 .2 3 # x # 5 bine blade at a particular wind speed in a wind tunnel.
f (x) 5 The article “Blade Fatigue Life Assessment with
0 otherwise
Application to VAWTS” (J. of Solar Energy Engr., 1982:
a. Graph the pdf and verify that the total area under the 107–111) proposes the Rayleigh distribution, with pdf
density curve is indeed 1.
x

5
b. Calculate P(X # 4). How does this probability com- ? e2x y(2u )
2 2
x.0
pare to P(X , 4)? f (x; u) 5 u2
c. Calculate P(3.5 # X # 4.5) and also P(4.5 , X). 0 otherwise

2. Suppose the reaction temperature X (in 8C) in a certain as a model for the X distribution.
chemical process has a uniform distribution with A 5 25 a. Verify that f (x; u) is a legitimate pdf.
and B 5 5. b. Suppose u 5 100 (a value suggested by a graph in
a. Compute P(X , 0). the article). What is the probability that X is at most
b. Compute P(22.5 , X , 2.5). 200? Less than 200? At least 200?
c. Compute P(22 # X # 3). c. What is the probability that X is between 100 and 200
d. For k satisfying 25 , k , k 1 4 , 5, compute (again assuming u 5 100)?
P(k , X , k 1 4). d. Give an expression for P(X # x).
3. The error involved in making a certain measurement is a 5. A college professor never finishes his lecture before the
continuous rv X with pdf end of the hour and always finishes his lectures within
2 min after the hour. Let X 5 the time that elapses
5.09375(40 2 x )
2
22 # x # 2 between the end of the hour and the end of the lecture and
f (x) 5
otherwise suppose the pdf of X is
a. Sketch the graph of f (x).
5kx0
2
0#x#2
f (x) 5
b. Compute P(X . 0). otherwise

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
4.2 Cumulative Distribution Functionsand expected Values 147

5
a. Find the value of k and draw the corresponding density 1
curve. [Hint: Total area under the graph of f (x) is 1.] y 0#y,5
25
b. What is the probability that the lecture ends within
1 min of the end of the hour? f ( y) 5 2 1
c. What is the probability that the lecture continues 2 y 5 # y # 10
5 25
beyond the hour for between 60 and 90 sec?
d. What is the probability that the lecture continues for 0 y , 0 or y . 10
at least 90 sec beyond the end of the hour?
a. Sketch a graph of the pdf of Y.
6. The actual tracking weight of a stereo cartridge that is set b. Verify that #2`
`
f (y) dy 5 1.
to track at 3 g on a particular changer can be regarded as c. What is the probability that total waiting time is at
a continuous rv X with pdf most 3 min?

5k[1 2 (x0 2 3) ]
2
2#x#4 d. What is the probability that total waiting time is at
f (x) 5 most 8 min?
otherwise
e. What is the probability that total waiting time is
a. Sketch the graph of f (x). between 3 and 8 min?
b. Find the value of k. f. What is the probability that total waiting time is
c. What is the probability that the actual tracking weight either less than 2 min or more than 6 min?
is greater than the prescribed weight?
9. Based on an analysis of sample data, the article
d. What is the probability that the actual weight is
“Pedestrians’ Crossing Behaviors and Safety at
within .25 g of the prescribed weight?
Unmarked Roadways in China” (Accident Analysis
e. What is the probability that the actual weight differs and Prevention, 2011: 1927–1936) proposed the pdf
from the prescribed weight by more than .5 g? f(x) 5 .15e2.15(x21) when x $ 1 as a model for the distribu-
7. The article “Second Moment Reliability Evaluation tion of X 5 time (sec) spent at the median line.
vs. Monte Carlo Simulations for Weld Fatigue a. What is the probability that waiting time is at most
Strength” (Quality and Reliability Engr. Intl., 2012: 5 sec? More than 5 sec?
887–896) considered the use of a uniform distribution b. What is the probability that waiting time is between
with A 5 .20 and B 5 4.25 for the diameter X of a certain 2 and 5 sec?
type of weld (mm).
10. A family of pdf’s that has been used to approximate the
a. Determine the pdf of X and graph it.
distribution of income, city population size, and size of
b. What is the probability that diameter exceeds 3 mm? firms is the Pareto family. The family has two parameters,
c. What is the probability that diameter is within 1 mm k and u, both . 0, and the pdf is

H
of the mean diameter?
d. For any value a satisfying .20 , a , a + 1 , 4.25, k ? uk
x$u
what is P(a , X , a 1 1)? f (x; k, u) 5 x k11
0 x,u
8. In commuting to work, a professor must first get on a bus
near her house and then transfer to a second bus. If the a. Sketch the graph of f (x; k, u).
waiting time (in minutes) at each stop has a uniform b. Verify that the total area under the graph equals 1.
distribution with A 5 0 and B 5 5, then it can be shown
c. If the rv X has pdf f (x; k, u), for any fixed b . u,
that the total waiting time Y has the pdf
obtain an expression for P(X # b).
d. For u , a , b, obtain an expression for the probabil-
ity P(a # X # b).

4.2 Cumulative Distribution Functions


and Expected Values

Several of the most important concepts introduced in the study of discrete distribu-
tions also play an important role for continuous distributions. Definitions analogous
to those in Chapter 3 involve replacing summation by integration.

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
148 Chapter 4 Continuous random Variables and probability Distributions

The Cumulative Distribution Function


The cumulative distribution function (cdf) F(x) for a discrete rv X gives, for any speci-
fied number x, the probability P(X # x). It is obtained by summing the pmf p(y) over all
possible values y satisfying y # x. The cdf of a continuous rv gives the same probabili-
ties P(X # x) and is obtained by integrating the pdf f (y) between the limits 2` and x.

DEFINITION The cumulative distribution function F(x) for a continuous rv X is defined


for every number x by
x
F(x) 5 P(X # x) 5 # 2`
f(y) dy

For each x, F(x) is the area under the density curve to the left of x. This is illus-
trated in Figure 4.5, where F(x) increases smoothly as x increases.

f (x) F (x)
F(8) 1
F(8)

.5

x x
5 10 5 10
8 8
Figure 4.5 A pdf and associated cdf

ExamplE 4.6 Let X, the thickness of a certain metal sheet, have a uniform distribution on
[A, B]. The density function is shown in Figure 4.6. For x , A, F(x) 5 0, since
there is no area under the graph of the density function to the left of such an x. For
x $ B, F(x) 5 1, since all the area is accumulated to the left of such an x. Finally, for
A # x # B,
x x
1 1 x2A
# # u
y5x
F(x) 5 f(y)dy 5 dy 5 ?y 5
2` A B2A B2A y5A B2A

f (x) f (x)
Shaded area 5 F(x)

1 1
B2 A B 2A

A B x A x B

Figure 4.6 The pdf for a uniform distribution

The entire cdf is

5
0 x,A
x2A
F(x) 5 A#x,B
B2A
1 x$B
The graph of this cdf appears in Figure 4.7.

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
4.2 Cumulative Distribution Functionsand expected Values 149

F (x)
1

A B x

Figure 4.7 The cdf for a uniform distribution n

Using F(x ) to Compute Probabilities


The importance of the cdf here, just as for discrete rv’s, is that probabilities of vari-
ous intervals can be computed from a formula for or table of F(x).

pROpOSITION Let X be a continuous rv with pdf f (x) and cdf F(x). Then for any number a,
P(X . a) 5 1 2 F(a)
and for any two numbers a and b with a , b,
P(a # X # b) 5 F(b) 2 F(a)

Figure 4.8 illustrates the second part of this proposition; the desired probability is
the shaded area under the density curve between a and b, and it equals the difference
between the two shaded cumulative areas. This is different from what is appro-
priate for a discrete integer-valued random variable (e.g., binomial or Poisson):
P(a # X # b) 5 F(b) 2 F(a 2 1) when a and b are integers.

f (x)
5 2

a b b a

Figure 4.8 Computing P(a # X # b) from cumulative probabilities

ExamplE 4.7 Suppose the pdf of the magnitude X of a dynamic load on a bridge (in newtons) is
given by

5
1 3
1 x 0#x#2
f(x) 5 8 8
0 otherwise
For any number x between 0 and 2,

# 18 1 8 y2 dy 5 8 1 16 x
x x
1 3 x 3
F(x) 5 # 2`
f(y) dy 5
0
2

Thus

5
0 x,0
x 3 2
F(x) 5 1 x 0#x#2
8 16
1 2,x

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
150 Chapter 4 Continuous random Variables and probability Distributions

The graphs of f (x) and F(x) are shown in Figure 4.9. The probability that the load
is between 1 and 1.5 is
P(1 # X # 1.5) 5 F(1.5) 2 F(1)

5 318 (1.5) 1 163 (1.5) 4 2 318 (1) 1 163 (1) 4


2 2

19
5 5 .297
64
The probability that the load exceeds 1 is

P(X . 1) 5 1 2 P(X # 1) 5 1 2 F(1) 5 1 2 318 (1) 1 163 (1) 4 2

11
5 5 .688
16

f (x) F (x)
1
7
8

1
8
x x
0 2 2

Figure 4.9 The pdf and cdf for Example 4.7 n

Once the cdf has been obtained, any probability involving X can easily be cal-
culated without any further integration.

Obtaining f (x) from F (x)


For X discrete, the pmf is obtained from the cdf by taking the difference between two
F(x) values. The continuous analog of a difference is a derivative. The following
result is a consequence of the Fundamental Theorem of Calculus.

pROpOSITION If X is a continuous rv with pdf f (x) and cdf F(x), then at every x at which the
derivative F 9(x) exists, F 9(x) 5 f(x).

ExamplE 4.8 When X has a uniform distribution, F(x) is differentiable except at x 5 A and x 5 B,
(Example 4.6 where the graph of F(x) has sharp corners. Since F(x) 5 0 for x , A and F(x) 5 1
continued) for x . B, F 9(x) 5 0 5 f(x) for such x. For A , x , B,

1 2
d x2A 1
F9(x) 5 5 5 f(x) n
dx B 2 A B2A

Percentiles of a Continuous Distribution


When we say that an individual’s test score was at the 85th percentile of the popu-
lation, we mean that 85% of all population scores were below that score and 15%
were above. Similarly, the 40th percentile is the score that exceeds 40% of all scores

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
4.2 Cumulative Distribution Functionsand expected Values 151

and is exceeded by 60% of all scores (having a value corresponding to a high per-
centile is not necessarily good; e.g., you would not want to be at the 99th percentile
for blood alcohol content).

DEFINITION Let p be a number between 0 and 1. The (100p)th percentile of the distribu-
tion of a continuous rv X, denoted by h(p), is defined by

h(p)
p 5 F(h(p)) 5 # 2`
f(y) dy (4.2)

According to Expression (4.2), h(p) is that value on the measurement axis such that
100p% of the area under the graph of f (x) lies to the left of h(p) and 100(1 2 p)%
lies to the right. Thus h(.75), the 75th percentile, is such that the area under the graph
of f (x) to the left of h(.75) is .75. Figure 4.10 illustrates the definition.

f (x) F(x)
Shaded area 5 p 1
p 5 F(h ( p))

h ( p) h ( p) x

Figure 4.10 The (100p)th percentile of a continuous distribution

ExamplE 4.9 The distribution of the amount of gravel (in tons) sold by a particular construction
supply company in a given week is a continuous rv X with pdf

5
3
(1 2 x 2) 0 # x # 1
f(x) 5 2
0 otherwise
The cdf of sales for any x between 0 and 1 is

1 2u 1 2
x y3
3 3 y5x
3 x3
F(x) 5 # 0 2
(1 2 y2) dy 5 y 2
2 3 y50
5
2
x2
3
The graphs of both f (x) and F(x) appear in Figure 4.11. The (100p)th percentile of
this distribution satisfies the equation

3 4
3 (h(p))3
p 5 F(h(p)) 5 h(p) 2
2 3
that is,
(h(p))3 2 3h(p) 1 2p 5 0
For the 50th percentile, p 5 .5, and the equation to be solved is h3 2 3h 1 1 5 0;
the solution is h 5 h(.5) 5 .347. If the distribution remains the same from week to
week, then in the long run 50% of all weeks will result in sales of less than .347 ton
and 50% in more than .347 ton.

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
152 Chapter 4 Continuous random Variables and probability Distributions

f (x) F(x)
1.5 1

.5

0 1 x 0 .347 1 x

Figure 4.11 The pdf and cdf for Example 4.9 n

DEFINITION The median of a continuous distribution, denoted by m ,, is the 50th percentile,


, ,
so m satisfies .5 5 F(m). That is, half the area under the density curve is to the
, and half is to the right of ,
left of m m.

A continuous distribution whose pdf is symmetric—the graph of the pdf to the


left of some point is a mirror image of the graph to the right of that point—has
median m , equal to the point of symmetry, since half the area under the curve lies
to either side of this point. Figure 4.12 gives several examples. The error in
a measurement of a physical quantity is often assumed to have a symmetric
distribution.

f (x) f (x) f (x)

x x x
A m˜ B m̃ m̃

Figure 4.12 Medians of symmetric distributions

Expected Values
For a discrete random variable X, E(X) was obtained by summing x ? p(x) over possi-
ble X values. Here we replace summation by integration and the pmf by the pdf to
get a continuous weighted average.

DEFINITION The expected or mean value of a continuous rv X with pdf f (x) is


`
mX 5 E(X) 5 # 2`
x ? f(x) dx

The pdf of weekly gravel sales X was

H
ExamplE 4.10
(Example 4.9 3
continued) (1 2 x 2) 0 # x # 1
f(x) 5 2
0 otherwise

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
4.2 Cumulative Distribution Functionsand expected Values 153

so
` 1
3
E(X) 5 #2`
x ? f(x) dx 5 # x ? 2 (1 2 x ) dx
0
2

# (x 2 x ) dx 5 2 1 2 2 4 2 u
1
3 3
3 x2 x4 x51
3
5 5 n
2 0 x50 8

When the pdf f (x) specifies a model for the distribution of values in a numeri-
cal population, then m is the population mean, which is the most frequently used
measure of population location or center.
Often we wish to compute the expected value of some function h(X) of the
rv X. If we think of h(X) as a new rv Y, techniques from mathematical statistics can
be used to derive the pdf of Y, and E(Y) can then be computed from the definition.
Fortunately, as in the discrete case, there is an easier way to compute E[h(X)].

pROpOSITION If X is a continuous rv with pdf f (x) and h(X) is any function of X, then
`
E[h(X)] 5 mh(X) 5 # 2`
h(x) ? f(x) dx

That is, just as E(X) is a weighted average of possible X values, where the weighting
function is the pdf f (x), E[h(X)] is a weighted average of h(X) values.

ExamplE 4.11 Two species are competing in a region for control of a limited amount of a certain
resource. Let X 5 the proportion of the resource controlled by species 1 and suppose
X has pdf

f(x) 5 510 0#x#1


otherwise
which is a uniform distribution on [0, 1]. (In her book Ecological Diversity, E. C.
Pielou calls this the “broken-stick” model for resource allocation, since it is analo-
gous to breaking a stick at a randomly chosen point.) Then the species that controls
the majority of this resource controls the amount

5
1
12X if 0 # X ,
2
h(X) 5 max (X, 1 2 X) 5
1
X if #X#1
2
The expected amount controlled by the species having majority control is then
` 1
E[h(X)] 5 # 2`
max(x, 1 2 x) ? f(x) dx 5 #
0
max(x, 1 2 x) ? 1 dx

1/2 1
3
5 # 0
(1 2 x) ? 1 dx 1 #1/2
x ? 1 dx 5
4
n

In the discrete case, the variance of X was defined as the expected squared devia-
tion from m and was calculated by summation. Here again integration replaces
summation.

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
154 Chapter 4 Continuous random Variables and probability Distributions

DEFINITION The variance of a continuous random variable X with pdf f (x) and mean value
m is
`
sX2 5 V(X) 5 # 2`
(x 2 m)2 ? f(x)dx 5 E[(X 2 m)2]

The standard deviation (SD) of X is sX 5 ÏV(X).

The variance and standard deviation give quantitative measures of how much spread
there is in the distribution or population of x values. Again s is roughly the size of
a typical deviation from m. Computation of s2 is facilitated by using the same short-
cut formula employed in the discrete case.

pROpOSITION V(X) 5 E(X 2) 2 [E(X)]2

ExamplE 4.12 For X 5 weekly gravel sales, we computed E(X) 5 38. Since
(Example 4.10
` 1
3
continued)
E(X2) 5 # 2`
x 2 ? f(x) dx 5 #x0
2
?
2
(1 2 x 2) dx

1
3 2 1
5 # 0 2
(x 2 x 4) dx 5
5

1 2 5 320
2
1 3 19
V(X) 5 2 5 .059 and sX 5 .244 n
5 8

When h(X) 5 aX 1 b, the expected value and variance of h(X ) satisfy the same
properties as in the discrete case: E[h(X)] 5 am 1 b and V[h(X)] 5 a2 ? s 2.

EXERCISES Section 4.2 (11–27)

11. Let X denote the amount of time a book on two-hour g. Calculate V(X) and sX.
reserve is actually checked out, and suppose the cdf is h. If the borrower is charged an amount h(X) 5 X2
when checkout duration is X, compute the expected

5
0 x,0
charge E[h(X)].
x2
F(x) 5 0#x,2 12. The cdf for X (5 measurement error) of Exercise 3 is
4
1 2#x

5
0 x , 22
a.
b.
Calculate P(X # 1).
Calculate P(.5 # X # 1).
F(x) 5
1
1
2 32
3
4x 2
31
x3
2 22 # x , 2
c. Calculate P(X . 1.5). 1 2#x
d. ,? [solve
What is the median checkout duration m
,
.5 5 F(m)]. a. Compute P(X , 0).
e. Obtain the density function f (x). b. Compute P(21 , X , 1).
f. Calculate E(X). c. Compute P(.5 , X).

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
4.2 Cumulative Distribution Functionsand expected Values 155

d. Verify that f (x) is as given in Exercise 3 by obtaining

5
u
F9(x). (1 2 xyt)u21 0#x,t
f (x; u, t) 5 t
e. Verify that ,
m 5 0. 0 otherwise
13. Example 4.5 introduced the concept of time headway in a. Graph f (x; u, 80) for the three cases u 5 4, 1, and .5
traffic flow and proposed a particular distribution for X 5 (these graphs appear in the cited article) and com-
the headway between two randomly selected consecutive ment on their shapes.
cars (sec). Suppose that in a different traffic environment,
b. Obtain the cumulative distribution function of X.
the distribution of time headway has the form
c. Obtain an expression for the median of the waiting

5
k time distribution.
x.1
f (x) 5 x4 d. For the case u 5 4, t 5 80, calculate P(50 # X # 70)
0 x#1 without at this point doing any additional integration.
a. Determine the value of k for which f (x) is a legiti- 17. Let X have a uniform distribution on the interval [A, B].
mate pdf. a. Obtain an expression for the (100p)th percentile.
b. Obtain the cumulative distribution function. b. Compute E(X), V(X), and sX.
c. Use the cdf from (b) to determine the probability that c. For n, a positive integer, compute E(Xn).
headway exceeds 2 sec and also the probability that 18. Let X denote the voltage at the output of a microphone,
headway is between 2 and 3 sec. and suppose that X has a uniform distribution on the
d. Obtain the mean value of headway and the standard interval from 21 to 1. The voltage is processed by a
deviation of headway. “hard limiter” with cutoff values 2.5 and .5, so the lim-
e. What is the probability that headway is within 1 stan- iter output is a random variable Y related to X by Y 5 X
dard deviation of the mean value? if |X| # .5, Y 5 .5 if X . .5, and Y 5 2.5 if X , 2.5.
14. The article “Modeling Sediment and Water Column a. What is P(Y 5 .5)?
Interactions for Hydrophobic Pollutants” (Water b. Obtain the cumulative distribution function of Y and
Research, 1984: 1169–1174) suggests the uniform dis- graph it.
tribution on the interval (7.5, 20) as a model for depth 19. Let X be a continuous rv with cdf
(cm) of the bioturbation layer in sediment in a certain

53
region. 0 x#0

1 24
a. What are the mean and variance of depth? x 4
F(x) 5 1 1 ln 0,x#4
b. What is the cdf of depth? 4 x
c. What is the probability that observed depth is at most 1 x.4
10? Between 10 and 15?
d. What is the probability that the observed depth is [This type of cdf is suggested in the article
within 1 standard deviation of the mean value? Within “Variability in Measured Bedload­Transport Rates”
2 standard deviations? (Water Resources Bull., 1985: 39–48) as a model for a
certain hydrologic variable.] What is
15. Let X denote the amount of space occupied by an article
placed in a 1-ft3 packing container. The pdf of X is a. P(X # 1)?
b. P(1 # X # 3)?

590x (10 2 x)
8
0,x,1 c. The pdf of X?
f (x) 5
otherwise 20. Consider the pdf for total waiting time Y for two buses

5
a. Graph the pdf. Then obtain the cdf of X and graph it. 1
y 0#y,5
b. What is P(X # .5) [i.e., F(.5)]? 25
c. Using the cdf from (a), what is P(.25 , X # .5)? f ( y) 5
What is P(.25 # X # .5)? 2 1
2 y 5 # y # 10
d. What is the 75th percentile of the distribution? 5 25
e. Compute E(X) and sX.
0 otherwise
f. What is the probability that X is more than 1 standard
deviation from its mean value? introduced in Exercise 8.
16. The article “A Model of Pedestrians’ Waiting Times a. Compute and sketch the cdf of Y. [Hint: Consider
for Street Crossings at Signalized Intersections” separately 0 # y , 5 and 5 # y # 10 in computing
(Transportation Research, 2013: 17–28) suggested that F(y). A graph of the pdf should be helpful.]
under some circumstances the distribution of waiting b. Obtain an expression for the (100p)th percentile. [Hint:
time X could be modeled with the following pdf: Consider separately 0 , p , .5 and .5 , p , 1.]

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
156 Chapter 4 Continuous random Variables and probability Distributions

c. Compute E(Y) and V(Y). How do these compare with 25. Let X be the temperature in 8C at which a certain chemi-
the expected waiting time and variance for a single cal reaction takes place, and let Y be the temperature in
bus when the time is uniformly distributed on [0, 5]? 8F (so Y 5 1.8X 1 32).
a. If the median of the X distribution is m ,, show that
21. An ecologist wishes to mark off a circular sampling
, 1 32 is the median of the Y distribution.
1.8m
region having radius 10 m. However, the radius of the
resulting region is actually a random variable R with pdf b. How is the 90th percentile of the Y distribution related
to the 90th percentile of the X distribution? Verify

5
3 your conjecture.
[1 2 (10 2 r)2] 9 # r # 11
f (r) 5 4 c. More generally, if Y 5 aX 1 b, how is any particular
0 otherwise percentile of the Y distribution related to the corre-
sponding percentile of the X distribution?
What is the expected area of the resulting circular region? 26. Let X be the total medical expenses (in 1000s of dollars)
22. The weekly demand for propane gas (in 1000s of gallons) incurred by a particular individual during a given year.
from a particular facility is an rv X with pdf Although X is a discrete random variable, suppose its
distribution is quite well approximated by a continuous

51 2
1
2 12 1#x#2 distribution with pdf f (x) 5 k(1 1 x /2.5)27 for x $ 0.
f (x) 5 x2
a. What is the value of k?
0 otherwise
b. Graph the pdf of X.
a. Compute the cdf of X. c. What are the expected value and standard deviation
b. Obtain an expression for the (100p)th percentile. of total medical expenses?
What is the value of m,?
d. This individual is covered by an insurance plan that
c. Compute E(X) and V(X). entails a $500 deductible provision (so the first $500
d. If 1.5 thousand gallons are in stock at the beginning of worth of expenses are paid by the individual). Then
the week and no new supply is due in during the week, the plan will pay 80% of any additional expenses
how much of the 1.5 thousand gallons is expected to exceeding $500, and the maximum payment by the
be left at the end of the week? [Hint: Let h(x) 5 individual (including the deductible amount) is
amount left when demand 5 x.] $2500. Let Y denote the amount of this individual’s
23. If the temperature at which a certain compound melts is medical expenses paid by the insurance company.
a random variable with mean value 1208C and standard What is the expected value of Y?
deviation 28C, what are the mean temperature and stan- [Hint: First figure out what value of X corresponds to
dard deviation measured in 8F? [Hint: 8F 5 1.88C 1 32.] the maximum out-of-pocket expense of $2500. Then
write an expression for Y as a function of X (which
24. Let X have the Pareto pdf

H
involves several different pieces) and calculate the
k ? uk expected value of this function.]
x$u
f (x; k, u) 5 x k11 27. When a dart is thrown at a circular target, consider the loc-
0 x,u ation of the landing point relative to the bull’s eye. Let X be
the angle in degrees measured from the horizontal, and
introduced in Exercise 10. assume that X is uniformly distributed on [0, 360]. Define
a. If k . 1, compute E(X). Y to be the transformed variable Y 5 h(X) 5
b. What can you say about E(X) if k 5 1? (2py360)X 2 p, so Y is the angle measured in radians and
c. If k . 2, show that V(X) 5 ku2(k 2 1)22 (k 2 2)21. Y is between 2p and p. Obtain E(Y) and sY by first obtain-
d. If k 5 2, what can you say about V(X)? ing E(X) and sX, and then using the fact that h(X) is a linear
e. What conditions on k are necessary to ensure that function of X.
E(Xn) is finite?

4.3 The Normal Distribution

The normal distribution is the most important one in all of probability and statistics.
Many numerical populations have distributions that can be fit very closely by an
appropriate normal curve. Examples include heights, weights, and other physical
characteristics (the famous 1903 Biometrika article “On the Laws of Inheritance in
Man” discussed many examples of this sort), measurement errors in scientific

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
4.3 the Normal Distribution 157

experiments, anthropometric measurements on fossils, reaction times in psycho-


logical experiments, measurements of intelligence and aptitude, scores on various
tests, and numerous economic measures and indicators. In addition, even when indi-
vidual variables themselves are not normally distributed, sums and averages of the
variables will under suitable conditions have approximately a normal distribution;
this is the content of the Central Limit Theorem discussed in the next chapter.

DEFINITION A continuous rv X is said to have a normal distribution with parameters m


and s (or m and s 2), where 2` , m , ` and 0 , s, if the pdf of X is
1
e2(x2m) y(2s
2 2
)
f(x; m, s) 5 2` , x , ` (4.3)
Ï2ps

Again e denotes the base of the natural logarithm system and equals approximately
2.71828, and p represents the familiar mathematical constant with approximate
value 3.14159. The statement that X is normally distributed with parameters m and
s 2 is often abbreviated X , N(m, s 2).
Clearly f(x; m, s) $ 0, but a somewhat complicated calculus argument must
`
be used to verify that #2` f(x; m, s) dx 5 1. It can be shown that E(X) 5 m and
2
V(X) 5 s , so the parameters are the mean and the standard deviation of X. Fig-
ure 4.13 presents graphs of f(x; m, s) for several different (m, s) pairs. Each density
curve is symmetric about m and bell-shaped, so the center of the bell (point of sym-
metry) is both the mean of the distribution and the median. The mean m is a location
parameter, since changing its value rigidly shifts the density curve to one side or the
other; s is referred to as a scale parameter, because changing its value stretches or
compresses the curve horizontally without changing the basic shape. The inflection
points of a normal curve (points at which the curve changes from turning downward to
turning upward) occur at m 2 s and m 1 s. Thus the value of s can be visualized as
the distance from the mean to these inflection points. A large value of s corresponds to
a density curve that is quite spread out about m, whereas a small value yields a highly
concentrated curve. The larger the value of s, the more likely it is that a value of X far
from the mean may be observed.

f(x)
0.09
0.08
0.07
0.06 = 100, = 5
0.05
0.04
0.03
= 80, = 15
0.02
0.01
0.00 x
40 60 80 100 120 1s
(a) (b)

Figure 4.13 (a) Two different normal density curves (b) Visualizing m and s for a normal
distribution

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
158 Chapter 4 Continuous random Variables and probability Distributions

The Standard Normal Distribution


The computation of P(a # X # b) when X is a normal rv with parameters m and s
requires evaluating
b
1
Ï2ps
# a
(4.4) e2(x2m) y(2s )dx
2 2

None of the standard integration techniques can be used to accomplish this. Instead,
for m 5 0 and s 5 1, Expression (4.4) has been calculated using numerical tech-
niques and tabulated for certain values of a and b. This table can also be used
to compute probabilities for any other values of m and s under consideration.

DEFINITION The normal distribution with parameter values m 5 0 and s 5 1 is called the
standard normal distribution. A random variable having a standard nor-
mal distribution is called a standard normal random variable and will be
denoted by Z. The pdf of Z is
1
e2z y2 2` , z , `
2
f(z; 0, 1) 5
Ï2p
The graph of f (z; 0, 1) is called the standard normal (or z) curve. Its inflection
z
points are at 1 and 21. The cdf of Z is P(Z # z) 5 #2` f(y; 0, 1) dy, which we
will denote by F(z).

The standard normal distribution almost never serves as a model for a naturally
arising population. Instead, it is a reference distribution from which information
about other normal distributions can be obtained. Appendix Table A.3 gives
F(z) 5 P(Z # z), the area under the standard normal density curve to the left of z,
for z 5 23.49, 23.48,…, 3.48, 3.49. Figure 4.14 illustrates the type of cumulative
area (probability) tabulated in Table A.3. From this table, various other probabilities
involving Z can be calculated.

Shaded area 5 F(z)

Standard normal (z) curve

0 z

Figure 4.14 Standard normal cumulative areas tabulated in Appendix Table A.3

ExamplE 4.13 Let’s determine the following standard normal probabilities: (a) P(Z # 1.25),
(b) P(Z . 1.25), (c) P(Z # 21.25), (d) P(2.38 # Z # 1.25), and (e) P(Z # 5).
a. P(Z # 1.25) 5 F(1.25), a probability that is tabulated in Appendix Table A.3 at
the intersection of the row marked 1.2 and the column marked .05. The number
there is .8944, so P(Z # 1.25) 5 .8944. Figure 4.15(a) illustrates this probability.
b. P(Z . 1.25) 5 1 2 P(Z # 1.25) 5 1 2 F(1.25), the area under the z curve
to the right of 1.25 (an upper-tail area). Then F(1.25) 5 .8944 implies that
P(Z . 1.25) 5 .1056. Since Z is a continuous rv, P(Z $ 1.25) 5 .1056. See
Figure 4.15(b).

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
4.3 the Normal Distribution 159

Shaded area 5 F(1.25)


z curve z curve

0 1.25 0 1.25
(a) (b)

Figure 4.15 Normal curve areas (probabilities) for Example 4.13

c. P(Z # 21.25) 5 F(21.25), a lower-tail area. Directly from Appendix Table


A.3, F(21.25) 5 .1056. By symmetry of the z curve, this is the same answer as
in part (b).
d. P(2.38 # Z # 1.25) is the area under the standard normal curve above the inter-
val whose left endpoint is 2.38 and whose right endpoint is 1.25. From Section
4.2, if X is a continuous rv with cdf F(x), then P(a # X # b) 5 F(b) 2 F(a).
Thus P(2.38 # Z # 1.25) 5 F(1.25) 2 F(2.38) 5 .8944 2 .3520 5 .5424.
(See Figure 4.16.)

z curve

5 2

2.38 0 1.25 0 1.25 2.38 0

Figure 4.16 P(2.38 # Z # 1.25) as the difference between two cumulative areas

e. P(Z # 5) 5 F(5), the cumulative area under the z curve to the left of 5. This
probability does not appear in the table because the last row is labeled 3.4.
However, the last entry in that row is Φ(3.49) 5 .9998. That is, essentially all of
the area under the curve lies to the left of 3.49 (at most 3.49 standard deviations
to the right of the mean). Therefore we conclude that P(Z # 5) < 1. n

Percentiles of the Standard


Normal Distribution
For any p between 0 and 1, Appendix Table A.3 can be used to obtain the (100p)th
percentile of the standard normal distribution.

ExamplE 4.14 The 99th percentile of the standard normal distribution is that value on the horizon-
tal axis such that the area under the z curve to the left of the value is .9900. Appendix
Table A.3 gives for fixed z the area under the standard normal curve to the left of z,
whereas here we have the area and want the value of z. This is the “inverse” prob-
lem to P(Z # z) 5 ? so the table is used in an inverse fashion: Find in the middle of
the table .9900; the row and column in which it lies identify the 99th z percentile.
Here .9901 lies at the intersection of the row marked 2.3 and column marked .03,
so the 99th percentile is (approximately) z 5 2.33. (See Figure 4.17.) By symmetry,
the first percentile is as far below 0 as the 99th is above 0, so equals 22.33 (1% lies
below the first and also above the 99th). (See Figure 4.18.)

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
160 Chapter 4 Continuous random Variables and probability Distributions

Shaded area 5 .9900 z curve

z curve Shaded area 5 .01

0 0

99th percentile 22.33 5 1st percentile 2.33 5 99th percentile

Figure 4.17 Finding the 99th percentile Figure 4.18 The relationship between the 1st
and 99th percentiles
n

In general, the (100p)th percentile is identified by the row and column of Appendix
Table A.3 in which the entry p is found (e.g., the 67th percentile is obtained by finding
.6700 in the body of the table, which gives z 5 .44). If p does not appear, the number
closest to it is typically used, although linear interpolation gives a more accurate
answer. For example, to find the 95th percentile, look for .9500 inside the table.
Although it does not appear, both .9495 and .9505 do, corresponding to z 5 1.64
and 1.65, respectively. Since .9500 is halfway between the two probabilities that do
appear, we will use 1.645 as the 95th percentile and 21.645 as the 5th percentile.

za Notation for z Critical Values


In statistical inference, we will need the values on the horizontal z axis that capture
certain small tail areas under the standard normal curve.

Notation
za will denote the value on the z axis for which a of the area under the z curve
lies to the right of za. (See Figure 4.19.)

For example, z.10 captures upper-tail area .10, and z.01 captures upper-tail area .01.

z curve Shaded area 5 P(Z $ za ) 5 a

za

Figure 4.19 za notation Illustrated

Since a of the area under the z curve lies to the right of za,1 2 a of the area
lies to its left. Thus za is the 100(1 2 a)th percentile of the standard normal distri-
bution. By symmetry the area under the standard normal curve to the left of 2za is
also a. The za’s are usually referred to as z critical values. Table 4.1 lists the most
useful z percentiles and za values.

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
4.3 the Normal Distribution 161

Table 4.1 Standard Normal Percentiles and Critical Values


Percentile 90 95 97.5 99 99.5 99.9 99.95
a (upper-tail area) .1 .05 .025 .01 .005 .001 .0005
za 5 100s1 2 adth 1.28 1.645 1.96 2.33 2.58 3.08 3.27
percentile

ExamplE 4.15 z.05 is the 100(1 2 .05)th 5 95th percentile of the standard normal distribution, so
z.05 5 1.645. The area under the standard normal curve to the left of 2z.05 is also
.05. (See Figure 4.20.)

z curve
Shaded area 5 .05 Shaded area 5 .05

21.645 5 2z.05 z.05 5 95th percentile 5 1.645

Figure 4.20 Finding z.05 n

Nonstandard Normal Distributions


When X , N(m, s 2), probabilities involving X are computed by “standardizing.” The
standardized variable is (X 2 m)ys. Subtracting m shifts the mean from m to zero, and
then dividing by s scales the variable so that the standard deviation is 1 rather than s.

pROpOSITION If X has a normal distribution with mean m and standard deviation s, then
X2m
Z5
s
has a standard normal distribution. Thus

1 2
a2m b2m
P(a # X # b) 5 P #Z#
s s

1 2 1 2
b2m a2m
5F 2F
s s

1 2 1 2
a2m b2m
P(X # a) 5 F P(X $ b) 5 1 2 F
s s

According to the first part of the proposition, the area under the normal (m, s2) curve
that lies above the interval [a, b] is identical to the area under the standard normal curve
that lies above the interval from the standardized lower limit (a – m)/s to the standard-
ized upper limit (b – m)/s. An illustration of the second part appears in Figure 4.21.
The key idea is that by standardizing, any probability involving X can be expressed as
a probability involving a standard normal rv Z, so that Appendix Table A.3 can be used.
The proposition can be proved by writing the cdf of Z 5 (X 2 m)/s as

sz1m
P(Z # z) 5 P(X # sz 1 m) 5 # 2`
f(x; m, s) dx

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
162 Chapter 4 Continuous random Variables and probability Distributions

N(m , s 2) N(0, 1)

m a 0
(a 2m )/s

Figure 4.21 Equality of nonstandard and standard normal curve areas

Using a result from calculus, this integral can be differentiated with respect to z to
yield the desired pdf f(z; 0, 1).

ExamplE 4.16 The time that it takes a driver to react to the brake lights on a decelerating vehicle
is critical in helping to avoid rear-end collisions. The article “Fast­Rise Brake
Lamp as a Collision­Prevention Device” (Ergonomics, 1993: 391–395) sug-
gests that reaction time for an in-traffic response to a brake signal from stand-
ard brake lights can be modeled with a normal distribution having mean value
1.25 sec and standard deviation of .46 sec. What is the probability that reaction
time is between 1.00 sec and 1.75 sec? If we let X denote reaction time, then
standardizing gives
1.00 # X # 1.75
if and only if
1.00 2 1.25 X 2 1.25 1.75 2 1.25
# #
.46 .46 .46
Thus

P(1.00 # X # 1.75) 5 P 11.00.462 1.25 # Z # 1.75.462 1.252


5 P(2.54 # Z # 1.09) 5 F(1.09) 2 F(2.54)
5 .8621 2 .2946 5 .5675
This is illustrated in Figure 4.22. Similarly, if we view 2 sec as a critically long reac-
tion time, the probability that actual reaction time will exceed this value is

1 2
2 2 1.25
P(X . 2) 5 P Z . 5 P(Z . 1.63) 5 1 2 F(1.63) 5 .0516
.46

Normal, m 5 1.25, s 5 .46 P(1.00 # X # 1.75)

z curve

1.25 0

1.00 1.75 2.54 1.09

Figure 4.22 Normal curves for Example 4.16 n

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
4.3 the Normal Distribution 163

Standardizing amounts to nothing more than calculating a distance from the mean
value and then reexpressing the distance as some number of standard deviations.
Thus, if m 5 100 and s 5 15, then x 5 130 corresponds to z 5 (130 2 100)/15 5
30/15 5 2.00. That is, 130 is 2 standard deviations above (to the right of) the mean
value. Similarly, standardizing 85 gives (85 2 100)/15 5 21.00, so 85 is 1 standard
deviation below the mean. The z table applies to any normal distribution provided that
we think in terms of number of standard deviations away from the mean value.

ExamplE 4.17 The breakdown voltage of a randomly chosen diode of a particular type is known to
be normally distributed. What is the probability that a diode’s breakdown voltage is
within 1 standard deviation of its mean value? This question can be answered with-
out knowing either m or s, as long as the distribution is known to be normal; the
answer is the same for any normal distribution:

P(X is within 1 standard deviation of its mean) 5 P(m 2 s # X # m 1 s)

1 2
m2s2m m1s2m
5P #Z#
s s
5 P(21.00 # Z # 1.00)
5 F(1.00) 2 F(21.00) 5 .6826

The probability that X is within 2 standard deviations of its mean is


P(22.00 # Z # 2.00) 5 .9544 and within 3 standard deviations of the mean is
P(23.00 # Z # 3.00) 5 .9974. n

The results of Example 4.17 are often reported in percentage form and referred
to as the empirical rule (because empirical evidence has shown that histograms of
real data can very frequently be approximated by normal curves).

If the population distribution of a variable is (approximately) normal, then


1. Roughly 68% of the values are within 1 SD of the mean.
2. Roughly 95% of the values are within 2 SDs of the mean.
3. Roughly 99.7% of the values are within 3 SDs of the mean.

It is indeed unusual to observe a value from a normal population that is much farther
than 2 standard deviations from m. These results will be important in the develop-
ment of hypothesis-testing procedures in later chapters.

Percentiles of an Arbitrary
Normal Distribution
The (100p)th percentile of a normal distribution with mean m and standard deviation
s is easily related to the (100p)th percentile of the standard normal distribution.

3 4
pROpOSITION (100p)th percentile (100p)th for
5m1 ?s
for normal (m, s) standard normal

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
164 Chapter 4 Continuous random Variables and probability Distributions

Another way of saying this is that if z is the desired percentile for the standard nor-
mal distribution, then the desired percentile for the normal (m, s) distribution is z
standard deviations from m.

ExamplE 4.18 The authors of “Assessment of Lifetime of Railway Axle” (Intl. J. of Fatigue,
2013: 40–46) used data collected from an experiment with a specified initial crack
length and number of loading cycles to propose a normal distribution with mean
value 5.496 mm and standard deviation .067 mm for the rv X 5 final crack depth.
For this model, what value of final crack depth would be exceeded by only .5% of all
cracks under these circumstances? Let c denote the requested value. Then the desired
condition is that P(X . c) 5 .005, or, equivalently, that P(X # c) 5 .995. Thus c is
the 99.5th percentile of the normal distribution with m 5 5.496 and s 5 .067. The
99.5th percentile of the standard normal distribution is 2.58, so

c 5 h(.995) 5 5.496 1 (2.58)(.067) 5 5.496 1 .173 5 5.669 mm

This is illustrated in Figure 4.23.

Shaded area 5 .995

m 5 5.496

c 5 99.5th percentile 5 5.669

Figure 4.23 Distribution of final crack depth for Example 4.18 n

The Normal Distribution and


Discrete Populations
The normal distribution is often used as an approximation to the distribution of val-
ues in a discrete population. In such situations, extra care should be taken to ensure
that probabilities are computed in an accurate manner.

ExamplE 4.19 IQ in a particular population (as measured by a standard test) is known to be


approximately normally distributed with m 5 100 and s 5 15. What is the prob-
ability that a randomly selected individual has an IQ of at least 125? Letting
X 5 the IQ of a randomly chosen person, we wish P(X $ 125). The temptation
here is to standardize X $ 125 as in previous examples. However, the IQ popula-
tion distribution is actually discrete, since IQs are integer-valued. So the normal
curve is an approximation to a discrete probability histogram, as pictured in
Figure 4.24.
The rectangles of the histogram are centered at integers. IQs of at least 125
correspond to rectangles beginning at 124.5, as shaded in Figure 4.24. Thus we
really want the area under the approximating normal curve to the right of 124.5.
Standardizing this value gives P(Z $ 1.63) 5 .0516, whereas standardizing 125
results in P(Z $ 1.67) 5 .0475. The difference is not great, but the answer .0516 is
more accurate. Similarly, P(X 5 125) would be approximated by the area between
124.5 and 125.5, since the area under the normal curve above the single value 125
is zero.

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
4.3 the Normal Distribution 165

125

Figure 4.24 A normal approximation to a discrete distribution n


The correction for discreteness of the underlying distribution in Exam-
ple 4.19—that is, the addition or subtraction of .5 before standardizing—is often
called a continuity correction. It is useful in the following application of the normal
distribution to the computation of binomial probabilities.

Approximating the Binomial Distribution


Recall that the mean value and standard deviation of a binomial random vari-
able X are mX 5 np and sX 5 Ïnpq, respectively. Figure 4.25 displays a bino-
mial probability histogram for the binomial distribution with n 5 25, p 5 .6, for
which m 5 25(.6) 5 15 and s 5 Ï25(.6)(.4) 5 2.449. A normal curve with this
m and s has been superimposed on the probability histogram. Although the prob-
ability histogram is a bit skewed (because p ± .5), the normal curve gives a very
good approximation, especially in the middle part of the picture. The area of any
rectangle (probability of any particular X value) except those in the extreme tails can
be accurately approximated by the corresponding normal curve area. For example,
P(X 5 10) 5 B(10; 25, .6) 2 B(9; 25, .6) 5 .021, whereas the area under the normal
curve between 9.5 and 10.5 is P(22.25 # Z # 21.84) 5 .0207.

Distribution n p
Binomial 25 0.6
Distribution Mean StDev
Normal 15 2.449

0.18

0.16

0.14

0.12

0.10
Density

0.08

0.06

0.04

0.02

0.00
5.0 7.5 10.0 12.5 15.0 17.5 20.0 22.5
X

Figure 4.25 Binomial probability histogram for n 5 25, p 5 .6 with normal approximation
curve superimposed

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
166 Chapter 4 Continuous random Variables and probability Distributions

More generally, as long as the binomial probability histogram is not too


skewed, binomial probabilities can be well approximated by normal curve areas. It
is then customary to say that X has approximately a normal distribution.

pROpOSITION Let X be a binomial rv based on n trials with success probability p. Then if


the binomial probability histogram is not too skewed, X has approximately a
normal distribution with m 5 np and s 5 Ïnpq. In particular, for x 5 a pos-
sible value of X,

1area tounder 2
the normal curve
P(X # x) 5 B(x, n, p) <
the left of x 1 .5

1 2
x 1 .5 2 np
5F
Ïnpq
In practice, the approximation is adequate provided that both np $ 10 and
nq $ 10 (i.e., the expected number of successes and the expected number
of failures are both at least 10), since there is then enough symmetry in the
underlying binomial distribution.

A direct proof of the approximation’s validity is quite difficult. In the next chapter
we’ll see that it is a consequence of a more general result called the Central Limit
Theorem. In all honesty, the approximation is not so important for probability cal-
culation as it once was. This is because software can now calculate binomial prob-
abilities exactly for quite large values of n.

ExamplE 4.20 Suppose that 25% of all students at a large public university receive financial aid. Let
X be the number of students in a random sample of size 50 who receive financial
aid, so that p 5 .25. Then m 5 12.5 and s 5 3.06. Since np 5 50(.25) 5 12.5 $ 10
and nq 5 37.5 $ 10, the approximation can safely be applied. The probability that
at most 10 students receive aid is

110 1 3.06 2
.5 2 12.5
P(X # 10) 5 B(10; 50, .25) < F

5 F(2.65) 5 .2578
Similarly, the probability that between 5 and 15 (inclusive) of the selected students
receive aid is
P(5 # X # 15) 5 B(15; 50, .25) 2 B(4; 50, .25)

115.53.06 2 2 F14.53.06 2 5 .8320


2 12.5 2 12.5
<F

The exact probabilities are .2622 and .8348, respectively, so the approximations are
quite good. In the last calculation, P(5 # X # 15) is being approximated by the area
under the normal curve between 4.5 and 15.5—the continuity correction is used for
both the upper and lower limits. n

When the objective of our investigation is to make an inference about a popula-


tion proportion p, interest will focus on the sample proportion of successes X/n rather
than on X itself. Because this proportion is just X multiplied by the constant 1/n, it will
also have approximately a normal distribution (with mean m 5 p and standard devia-
tion s 5 Ïpqyn) provided that both np $ 10 and nq $ 10. This normal approxima-
tion is the basis for several inferential procedures to be discussed in later chapters.

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
4.3 the Normal Distribution 167

EXERCISES Section 4.3 (28–58)

28. Let Z be a standard normal random variable and calculate a. What is the probability that the concentration exceeds
the following probabilities, drawing pictures wherever .50?
appropriate. b. What is the probability that the concentration is at
a. P(0 # Z # 2.17) b. P(0 # Z # 1) most .20?
c. P(22.50 # Z # 0) d. P(22.50 # Z # 2.50) c. How would you characterize the largest 5% of all
e. P(Z # 1.37) f. P(21.75 # Z) concentration values?
g. P(21.50 # Z # 2.00) h. P(1.37 # Z # 2.50)
35. In a road-paving process, asphalt mix is delivered to the
i. P(1.50 # Z) j. P( uZu # 2.50) hopper of the paver by trucks that haul the material from the
29. In each case, determine the value of the constant c that batching plant. The article “Modeling of Simultaneously
makes the probability statement correct. Continuous and Stochastic Construction Activities for
a. F(c) 5 .9838 b. P(0 # Z # c) 5 .291 Simulation” (J. of Construction Engr. and Mgmnt.,
c. P(c # Z) 5 .121 d. P(2c # Z # c) 5 .668 2013: 1037–1045) proposed a normal distribution with
e. P(c # uZu) 5 .016 mean value 8.46 min and standard deviation .913 min for
the rv X 5 truck haul time.
30. Find the following percentiles for the standard normal a. What is the probability that haul time will be at least
distribution. Interpolate where appropriate. 10 min? Will exceed 10 min?
a. 91st b. 9th c. 75th
b. What is the probability that haul time will exceed
d. 25th e. 6th 15 min?
31. Determine za for the following values of a: c. What is the probability that haul time will be
a. a 5 .0055 b. a 5 .09 between 8 and 10 min?
c. a 5 .663 d. What value c is such that 98% of all haul times are in
32. Suppose the force acting on a column that helps to sup- the interval from 8.46 2 c to 8.46 1 c?
port a building is a normally distributed random variable e. If four haul times are independently selected, what is
X with mean value 15.0 kips and standard deviation the probability that at least one of them exceeds 10 min?
1.25 kips. Compute the following probabilities by stan- 36. Spray drift is a constant concern for pesticide applicators
dardizing and then using Table A.3. and agricultural producers. The inverse relationship
a. P(X # 15) b. P(X # 17.5) between droplet size and drift potential is well known.
c. P(X $ 10) d. P(14 # X # 18) The paper “Effects of 2,4­D Formulation and
e. P( uX 2 15u # 3) Quinclorac on Spray Droplet Size and Deposition”
33. Mopeds (small motorcycles with an engine capacity (Weed Technology, 2005: 1030–1036) investigated the
below 50 cm3) are very popular in Europe because of their effects of herbicide formulation on spray atomization. A
mobility, ease of operation, and low cost. The article figure in the paper suggested the normal distribution with
“Procedure to Verify the Maximum Speed of Automatic mean 1050 mm and standard deviation 150 mm was a
Transmission Mopeds in Periodic Motor Vehicle reasonable model for droplet size for water (the “control
Inspections” (J. of Automobile Engr., 2008: 1615–1623) treatment”) sprayed through a 760 ml/min nozzle.
described a rolling bench test for determining maximum a. What is the probability that the size of a single drop-
vehicle speed. A normal distribution with mean value let is less than 1500 mm? At least 1000 mm?
46.8 km/h and standard deviation 1.75 km/h is postulated. b. What is the probability that the size of a single drop-
Consider randomly selecting a single such moped. let is between 1000 and 1500 mm?
a. What is the probability that maximum speed is at c. How would you characterize the smallest 2% of
most 50 km/h? all droplets?
b. What is the probability that maximum speed is at d. If the sizes of five independently selected droplets
least 48 km/h? are measured, what is the probability that exactly
c. What is the probability that maximum speed differs two of them exceed 1500 mm?
from the mean value by at most 1.5 standard deviations?
37. Suppose that blood chloride concentration (mmol/L) has
34. The article “Reliability of Domestic­Waste Biofilm a normal distribution with mean 104 and standard devia-
Reactors” (J. of Envir. Engr., 1995: 785–790) suggests tion 5 (information in the article “Mathematical Model
that substrate concentration (mg/cm3) of influent to a of Chloride Concentration in Human Blood,” J. of
reactor is normally distributed with m 5 .30 and s 5 .06. Med. Engr. and Tech., 2006: 25–30, including a normal

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
168 Chapter 4 Continuous random Variables and probability Distributions

probability plot as described in Section 4.6, supports this with mean m, the actual temperature of the medium,
assumption). and standard deviation s. What would the value of s
a. What is the probability that chloride concentration have to be to ensure that 95% of all readings are within
equals 105? Is less than 105? Is at most 105? .18 of m?
b. What is the probability that chloride concentration 43. Vehicle speed on a particular bridge in China can be
differs from the mean by more than 1 standard modeled as normally distributed (“Fatigue Reliability
deviation? Does this probability depend on the Assessment for Long­Span Bridges under Combined
values of m and s? Dynamic Loads from Winds and Vehicles,” J. of
c. How would you characterize the most extreme .1% Bridge Engr., 2013: 735–747).
of chloride concentration values? a. If 5% of all vehicles travel less than 39.12 m/h and
38. There are two machines available for cutting corks intended 10% travel more than 73.24 m/h, what are the mean
for use in wine bottles. The first produces corks with diam- and standard deviation of vehicle speed? [Note: The
eters that are normally distributed with mean 3 cm and resulting values should agree with those given in the
standard deviation .1 cm. The second machine produces cited article.]
corks with diameters that have a normal distribution with b. What is the probability that a randomly selected vehi-
mean 3.04 cm and standard deviation .02 cm. Acceptable cle’s speed is between 50 and 65 m/h?
corks have diameters between 2.9 cm and 3.1 cm. Which c. What is the probability that a randomly selected vehi-
machine is more likely to produce an acceptable cork? cle’s speed exceeds the speed limit of 70 m/h?
39. The defect length of a corrosion defect in a pressurized 44. If bolt thread length is normally distributed, what is the
steel pipe is normally distributed with mean value 30 mm probability that the thread length of a randomly selected
and standard deviation 7.8 mm [suggested in the article bolt is
“Reliability Evaluation of Corroding Pipelines a. Within 1.5 SDs of its mean value?
Considering Multiple Failure Modes and Time­
b. Farther than 2.5 SDs from its mean value?
Dependent Internal Pressure” (J. of Infrastructure
c. Between 1 and 2 SDs from its mean value?
Systems, 2011: 216–224)].
a. What is the probability that defect length is at most 45. A machine that produces ball bearings has initially
20 mm? Less than 20 mm? been set so that the true average diameter of the bear-
b. What is the 75th percentile of the defect length dis- ings it produces is .500 in. A bearing is acceptable if
tribution—that is, the value that separates the small- its diameter is within .004 in. of this target value.
est 75% of all lengths from the largest 25%? Suppose, however, that the setting has changed during
c. What is the 15th percentile of the defect length the course of production, so that the bearings have
distribution? normally distributed diameters with mean value .499
d. What values separate the middle 80% of the defect in. and standard deviation .002 in. What percentage of
length distribution from the smallest 10% and the the bearings produced will not be acceptable?
largest 10%? 46. The Rockwell hardness of a metal is determined by
40. The article “Monte Carlo Simulation—Tool for Better impressing a hardened point into the surface of the
Understanding of LRFD” (J. of Structural Engr., metal and then measuring the depth of penetration of the
1993: 1586–1599) suggests that yield strength (ksi) for point. Suppose the Rockwell hardness of a particular
A36 grade steel is normally distributed with m 5 43 and alloy is normally distributed with mean 70 and standard
s 5 4.5. deviation 3.
a. What is the probability that yield strength is at most a. If a specimen is acceptable only if its hardness is
40? Greater than 60? between 67 and 75, what is the probability that a ran-
b. What yield strength value separates the strongest domly chosen specimen has an acceptable hardness?
75% from the others? b. If the acceptable range of hardness is (70 2 c, 70 1 c),
for what value of c would 95% of all specimens have
41. The automatic opening device of a military cargo para-
acceptable hardness?
chute has been designed to open when the parachute is
200 m above the ground. Suppose opening altitude c. If the acceptable range is as in part (a) and the hard-
actually has a normal distribution with mean value ness of each of ten randomly selected specimens is
200 m and standard deviation 30 m. Equipment dam- indepen-dently determined, what is the expected
age will occur if the parachute opens at an altitude of number of acceptable specimens among the ten?
less than 100 m. What is the probability that there is d. What is the probability that at most eight of ten inde-
equipment damage to the payload of at least one of five pendently selected specimens have a hardness of less
independently dropped parachutes? than 73.84? [Hint: Y 5 the number among the ten
specimens with hardness less than 73.84 is a binomial
42. The temperature reading from a thermocouple placed in
variable; what is p?]
a constant-temperature medium is normally distributed

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
4.3 the Normal Distribution 169

47. The weight distribution of parcels sent in a certain manner a. At least 40 can taste the difference between the two
is normal with mean value 12 lb and standard deviation oils?
3.5 lb. The parcel service wishes to establish a weight b. At most 5% can taste the difference between the
value c beyond which there will be a surcharge. What two oils?
value of c is such that 99% of all parcels are at least 1 lb
51. Chebyshev’s inequality, (see Exercise 44, Chapter 3), is valid
under the surcharge weight?
for continuous as well as discrete distributions. It states that
48. Suppose Appendix Table A.3 contained F(z) only for for any number k satisfying k $ 1, P( uX 2 mu $ ks) # 1/k2
z $ 0. Explain how you could still compute (see Exercise 44 in Chapter 3 for an interpretation). Obtain
a. P(21.72 # Z # 2.55) this probability in the case of a normal distribution for
b. P(21.72 # Z # .55) k 5 1, 2, and 3, and compare to the upper bound.
Is it necessary to tabulate F(z) for z negative? What prop- 52. Let X denote the number of flaws along a 100-m reel of
erty of the standard normal curve justifies your answer? magnetic tape (an integer-valued variable). Suppose X
49. Consider babies born in the “normal” range of 37–43 has approximately a normal distribution with m 5 25
weeks gestational age. Extensive data supports the and s 5 5. Use the continuity correction to calculate the
assumption that for such babies born in the United probability that the number of flaws is
States, birth weight is normally distributed with mean a. Between 20 and 30, inclusive.
3432 g and standard deviation 482 g. [The article “Are b. At most 30. Less than 30.
Babies Normal?” (The American Statistician, 1999: 53. Let X have a binomial distribution with parameters
298–302) analyzed data from a particular year; for a n 5 25 and p. Calculate each of the following probabil-
sensible choice of class intervals, a histogram did not ities using the normal approximation (with the continu-
look at all normal, but after further investigations it ity correction) for the cases p 5 .5, .6, and .8 and
was determined that this was due to some hospitals compare to the exact probabilities calculated from
measuring weight in grams and others measuring to Appendix Table A.1.
the nearest ounce and then converting to grams. A
a. P(15 # X # 20)
modified choice of class intervals that allowed for this
b. P(X # 15)
gave a histogram that was well described by a normal
distribution.] c. P(20 # X)
a. What is the probability that the birth weight of a 54. Suppose that 10% of all steel shafts produced by a cer-
randomly selected baby of this type exceeds 4000 g? tain process are nonconforming but can be reworked
Is between 3000 and 4000 g? (rather than having to be scrapped). Consider a random
b. What is the probability that the birth weight of a ran- sample of 200 shafts, and let X denote the number among
domly selected baby of this type is either less than these that are nonconforming and can be reworked. What
2000 g or greater than 5000 g? is the (approximate) probability that X is
c. What is the probability that the birth weight of a ran- a. At most 30?
domly selected baby of this type exceeds 7 lb? b. Less than 30?
d. How would you characterize the most extreme .1% c. Between 15 and 25 (inclusive)?
of all birth weights?
55. Suppose only 75% of all drivers in a certain state regu-
e. If X is a random variable with a normal distribution
larly wear a seat belt. A random sample of 500 drivers is
and a is a numerical constant (a ± 0), then Y 5 aX
selected. What is the probability that
also has a normal distribution. Use this to determine
the distribution of birth weight expressed in pounds a. Between 360 and 400 (inclusive) of the drivers in the
(shape, mean, and standard deviation), and then sample regularly wear a seat belt?
recalculate the probability from part (c). How does b. Fewer than 400 of those in the sample regularly wear
this compare to your previous answer? a seat belt?
50. In response to concerns about nutritional contents of 56. Show that the relationship between a general normal
fast foods, McDonald’s has announced that it will use a percentile and the corresponding z percentile is as stated
new cooking oil for its french fries that will decrease in this section.
substantially trans fatty acid levels and increase the 57. a. Show that if X has a normal distribution with
amount of more beneficial polyunsaturated fat. The com- parameters m and s, then Y 5 aX 1 b (a linear
pany claims that 97 out of 100 people cannot detect a function of X) also has a normal distribution. What
difference in taste between the new and old oils. Assuming are the parameters of the distribution of Y [i.e.,
that this figure is correct (as a long-run proportion), E(Y ) and V(Y )]? [Hint: Write the cdf of Y, P(Y # y),
what is the approximate probability that in a random as an integral involving the pdf of X, and then
sample of 1000 individuals who have purchased fries at differentiate with respect to y to get the pdf of Y.]
McDonald’s,

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
170 Chapter 4 Continuous random Variables and probability Distributions

b. If, when measured in 8C, temperature is normally P(Z $ z) 5 1 2 F(z)


distributed with mean 115 and standard deviation 2,
53 46
(83z 1 351)z 1 562
what can be said about the distribution of temperature < .5 exp 2
measured in 8F? 703yz 1 165
58. There is no nice formula for the standard normal cdf F(z), The relative error of this approximation is less than
but several good approximations have been published in arti- .042%. Use this to calculate approximations to the fol-
cles. The following is from “Approximations for Hand lowing probabilities, and compare whenever possible to
Calculators Using Small Integer Coefficients” the probabilities obtained from Appendix Table A.3.
(Mathematics of Computation, 1977: 214–222). For a. P(Z $ 1) b. P(Z , 23)
0 , z # 5.5, c. P(24 , Z , 4) d. P(Z . 5)

4.4 The Exponential and Gamma Distributions

The density curve corresponding to any normal distribution is bell-shaped and


therefore symmetric. There are many practical situations in which the variable of
interest to an investigator might have a skewed distribution. One family of distribu-
tions that has this property is the gamma family. We first consider a special case, the
exponential distribution, and then generalize later in the section.

The Exponential Distribution


The family of exponential distributions provides probability models that are very
widely used in engineering and science disciplines.

DEFINITION X is said to have an exponential distribution with (scale) parameter l (l . 0)


if the pdf of X is

f (x; l) 5 5 le2lx
0
x$0
otherwise
(4.5)

Some sources write the exponential pdf in the form (1/b)e2xyb, so that b 5 1/l. The
expected value of an exponentially distributed random variable X is
`
m 5 E(X) 5 # xle
0
2lx
dx

Obtaining this expected value necessitates doing an integration by parts. The vari-
ance of X can be computed using the fact that V(X) 5 E(X2) 2 [E(X)] 2. The deter-
mination of E(X 2 ) requires integrating by parts twice in succession. The results of
these integrations are as follows:
1 1
m5 s2 5
l l2
Both the mean and standard deviation of the exponential distribution equal 1/l.
Graphs of several exponential pdf’s are illustrated in Figure 4.26.

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
4.4 the exponential and Gamma Distributions 171

f (x; l)
2
l52

1 l 5 .5

l51
.5

Figure 4.26 Exponential density curves

The exponential pdf is easily integrated to obtain the cdf.

F(x; l) 5 51 20e 2lx


x,0
x$0

ExamplE 4.21 The article “Probabilistic Fatigue Evaluation of Riveted Railway Bridges” (J. of
Bridge Engr., 2008: 237–244) suggested the exponential distribution with mean value
6 MPa as a model for the distribution of stress range in certain bridge connections. Let’s
assume that this is in fact the true model. Then E(X) 5 1/l 5 6 implies that l 5 .1667.
The probability that stress range is at most 10 MPa is
P(X # 10) 5 F(10; .1667) 5 1 2 e2(.1667)(10) 5 1 2 .189 5 .811
The probability that stress range is between 5 and 10 MPa is
P(5 # X # 10) 5 F(10; .1667) 2 F(5; .1667) 5 (1 2 e21.667) 2 (1 2 e2.8335)
5 .246 n

The exponential distribution is frequently used as a model for the distribution


of times between the occurrence of successive events, such as customers arriving at
a service facility or calls coming in to a switchboard. The reason for this is that the
exponential distribution is closely related to the Poisson process discussed in Chapter 3.

pROpOSITION Suppose that the number of events occurring in any time interval of length t
has a Poisson distribution with parameter at (where a, the rate of the event
process, is the expected number of events occurring in 1 unit of time) and that
numbers of occurrences in nonoverlapping intervals are independent of one
another. Then the distribution of elapsed time between the occurrence of two
successive events is exponential with parameter l 5 a.

Although a complete proof is beyond the scope of the text, the result is easily veri-
fied for the time X1 until the first event occurs:
P(X1 # t) 5 1 2 P(X1 . t) 5 1 2 P[no events in (0, t)]
e2at ? (at)0
512
5 1 2 e2at
0!
which is exactly the cdf of the exponential distribution.

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
172 Chapter 4 Continuous random Variables and probability Distributions

ExamplE 4.22 Suppose that calls to a rape crisis center in a certain county occur according to a
Poisson process with rate a 5 .5 call per day. Then the number of days X between
successive calls has an exponential distribution with parameter value .5, so the prob-
ability that more than 2 days elapse between calls is
P(X . 2) 5 1 2 P(X # 2) 5 1 2 F(2; .5) 5 e2(.5)(2) 5 .368
The expected time between successive calls is 1/.5 5 2 days. n

Another important application of the exponential distribution is to model


the distribution of component lifetime. A partial reason for the popularity of
such applications is the “memoryless” property of the exponential distribution.
Suppose component lifetime is exponentially distributed with parameter l . After
putting the component into service, we leave for a period of t0 hours and then return
to find the component still working; what now is the probability that it lasts at least
an additional t hours? In symbols, we wish P(X $ t 1 t0 u X $ t0). By the definition
of conditional probability,
P[(X $ t 1 t0) ù (X $ t0)]
P(X $ t 1 t0 uX $ t0) 5
P(X $ t0)

But the event X $ t0 in the numerator is redundant, since both events can occur if
and only if X $ t 1 t0. Therefore,
P(X $ t 1 t0) 1 2 F(t 1 t0; l)
P(X $ t 1 t0 u X $ t0) 5 5 5 e2lt
P(X $ t0) 1 2 F(t0; l)

This conditional probability is identical to the original probability P(X $ t) that the
component lasted t hours. Thus the distribution of additional lifetime is exactly the
same as the original distribution of lifetime, so at each point in time the component
shows no effect of wear. In other words, the distribution of remaining lifetime is
independent of current age.
Although the memoryless property can be justified at least approximately
in many applied problems, in other situations components deteriorate with age or
occasionally improve with age (at least up to a certain point). More general lifetime
models are then furnished by the gamma, Weibull, and lognormal distributions (the
latter two are discussed in the next section).

The Gamma Function


To define the family of gamma distributions, we first need to introduce a function
that plays an important role in many branches of mathematics.

DEFINITION For a . 0, the gamma function G(a) is defined by


`
G(a) 5 #x0
a21 2x
e dx (4.6)

The most important properties of the gamma function are the following:
1. For any a . 1, G(a) 5 (a 2 1) ? G(a 2 1) [via integration by parts]
2. For any positive integer, n, G(n) 5 (n 2 1)!
3. G(1/2) 5 Ïp

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
4.4 the exponential and Gamma Distributions 173

Now let

5
x a21e2x
x$0
f(x; a) 5 G(a) (4.7)
0 otherwise
Then f(x; a) $ 0. Expression (4.6) implies that #0` f (x; a) dx 5 G(a)yG(a) 5 1. Thus
f(x; a) satisfies the two basic properties of a pdf.

The Gamma Distribution

DEFINITION A continuous random variable X is said to have a gamma distribution if the


pdf of X is

5
1
x a21e2xyb x$0
f(x; a, b) 5 a
b G(a) (4.8)
0 otherwise

where the parameters a and b satisfy a . 0, b . 0. The standard gamma


distribution has b 5 1, so the pdf of a standard gamma rv is given by (4.7).

The exponential distribution results from taking a 5 1 and b 5 1/l.


Figure 4.27(a) illustrates the graphs of the gamma pdf f(x; a, b) (4.8) for sev-
eral (a, b) pairs, whereas Figure 4.27(b) presents graphs of the standard gamma pdf.
For the standard pdf, when a # 1, f(x; a) is strictly decreasing as x increases from 0;
when a . 1, f(x; a) rises from 0 at x 5 0 to a maximum and then decreases. The
parameter b in (4.8) is a scale parameter, and a is referred to as a shape parameter
because changing its value alters the basic shape of the density curve.

f (x; , ) f (x; ,)


1
 = 2,  = 3
1.0 1.0 =1

 = 1,  = 1
 = .6
0.5 0.5
 = 2,  = 2 =2 =5
 = 2,  = 1

0 x 0 x
1 2 3 4 5 6 7 1 2 3 4 5
(a) (b)

Figure 4.27 (a) Gamma density curves; (b) standard gamma density curves

The mean and variance of a random variable X having the gamma distribution
f(x; a, b) are
E(X) 5 m 5 ab V(X) 5 s 2 5 ab2
When X is a standard gamma rv, the cdf of X,
x ya21e2y
F(x; a) 5
0 G(a)
dy x.0 # (4.9)

is called the incomplete gamma function [sometimes the incomplete gamma func-
tion refers to Expression (4.9) without the denominator G(a) in the integrand]. There

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
174 Chapter 4 Continuous random Variables and probability Distributions

are extensive tables of F(x; a) available; in Appendix Table A.4, we present a small
tabulation for a 5 1, 2,…, 10 and x 5 1, 2,…, 15.

ExamplE 4.23 The article “The Probability Distribution of Maintenance Cost of a System Affected
by the Gamma Process of Degradation” (Reliability Engr. and System Safety, 2012:
65–76) notes that the gamma distribution is widely used to model the extent of degrada-
tion such as corrosion, creep, or wear. Let X represent the amount of degradation of a
certain type, and suppose that it has a standard gamma distribution with a 5 2. Since
P(a # X # b) 5 F(b) 2 F(a)
when X is continuous,
P(3 # X # 5) 5 F(5; 2) 2 F(3; 2) 5 .960 2 .801 5 .159
The probability that the amount of degradation exceeds 4 is
P(X . 4) 5 1 2 P(X # 4) 5 1 2 F(4; 2) 5 1 2 .908 5 .092 n

The incomplete gamma function can also be used to compute probabilities


involving nonstandard gamma distributions. These probabilities can also be obtained
almost instantaneously from various software packages.

pROpOSITION Let X have a gamma distribution with parameters a and b. Then for any x . 0,
the cdf of X is given by

PsX # xd 5 Fsx; a, bd 5 F 1 bx ; a2
where F( ? ; a) is the incomplete gamma function.

ExamplE 4.24 Suppose the survival time X in weeks of a randomly selected male mouse exposed
to 240 rads of gamma radiation has (what else!) a gamma distribution with
a 5 8 and b 5 15. (Data in Survival Distributions: Reliability Applications in
the Biomedical Services, by A. J. Gross and V. Clark, suggests a < 8.5 and
b < 13.3.) The expected survival time is E(X) 5 (8)(15) 5 120 weeks, whereas
V(X) 5 (8)(15)2 5 1800 and sX 5 Ï1800 5 42.43 weeks. The probability that a
mouse survives between 60 and 120 weeks is
P(60 # X # 120) 5 P(X # 120) 2 P(X # 60)
5 F(120y15; 8) 2 F(60y15; 8)
5 F(8;8) 2 F(4;8) 5 .547 2 .051 5 .496
The probability that a mouse survives at least 30 weeks is
P(X $ 30) 5 1 2 P(X , 30) 5 1 2 P(X # 30)
5 1 2 F(30y15; 8) 5 .999 n

The Chi-Squared Distribution


The chi-squared distribution is important because it is the basis for a number of
procedures in statistical inference. The central role played by the chi-squared
distribution in inference springs from its relationship to normal distributions (see
Exercise 71). We’ll discuss this distribution in more detail in later chapters.

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
4.4 the exponential and Gamma Distributions 175

DEFINITION Let n be a positive integer. Then a random variable X is said to have a chi­
squared distribution with parameter n if the pdf of X is the gamma density
with a 5 ny2 and b 5 2. The pdf of a chi-squared rv is thus

f(x; n) 5 H 2 G(vy2)
ny2
1

0
x (ny2)21e2xy2 x $ 0

x,0
(4.10)

The parameter n is called the number of degrees of freedom (df) of X. The


symbol x2 is often used in place of “chi-squared.”

EXERCISES Section 4.4 (59–71)

59. Let X 5 the time between two successive arrivals at the sea-ice advance/retreat from each sensor is similar and
drive-up window of a local bank. If X has an exponential is approximately double exponential.” The proposed
distribution with l 5 1 (which is identical to a standard double exponential distribution has density function
gamma distribution with a 5 1), compute the following: f (x) 5 .5le2l|x| for 2` , x , `. The standard devia-
a. The expected time between two successive arrivals tion is given as 40.9 km.
b. The standard deviation of the time between succes- a. What is the value of the parameter l?
sive arrivals b. What is the probability that the extent of daily sea-
c. P(X # 4) d. P(2 # X # 5) ice change is within 1 standard deviation of the mean
value?
60. Let X denote the distance (m) that an animal moves from
its birth site to the first territorial vacancy it encounters. 63. A consumer is trying to decide between two long-dis-
Suppose that for banner-tailed kangaroo rats, X has an tance calling plans. The first one charges a flat rate of
exponential distribution with parameter l 5 .01386 (as 10¢ per minute, whereas the second charges a flat rate of
suggested in the article “Competition and Dispersal 99¢ for calls up to 20 minutes in duration and then 10¢
from Multiple Nests,” Ecology, 1997: 873–883). for each additional minute exceeding 20 (assume that
a. What is the probability that the distance is at most calls lasting a noninteger number of minutes are charged
100 m? At most 200 m? Between 100 and 200 m? proportionately to a whole-minute’s charge). Suppose the
b. What is the probability that distance exceeds the consumer’s distribution of call duration is exponential
mean distance by more than 2 standard deviations? with parameter l.
c. What is the value of the median distance? a. Explain intuitively how the choice of calling plan
should depend on what the expected call duration is.
61. Data collected at Toronto Pearson International Airport
b. Which plan is better if expected call duration is
suggests that an exponential distribution with mean value
10 minutes? 15 minutes? [Hint: Let h1(x) denote the
2.725 hours is a good model for rainfall duration (Urban
cost for the first plan when call duration is x minutes
Stormwater Management Planning with Analytical
and let h2(x) be the cost function for the second plan.
Probabilistic Models, 2000, p. 69).
Give expressions for these two cost functions, and then
a. What is the probability that the duration of a partic- determine the expected cost for each plan.]
ular rainfall event at this location is at least 2 hours?
At most 3 hours? Between 2 and 3 hours? 64. Evaluate the following:
b. What is the probability that rainfall duration exceeds a. G(6) b. G(5/2)
the mean value by more than 2 standard deviations? c. F(4; 5) (the incomplete gamma function) and F(5; 4)
What is the probability that it is less than the mean d. P(X # 5) when X has a standard gamma distribution
value by more than one standard deviation? with a 5 7.
62. The article “Microwave Observations of Daily e. P(3 , X , 8) when X has the distribution specified
Antarctic Sea­Ice Edge Expansion and Contribution in (d).
Rates” (IEEE Geosci. and Remote Sensing Letters, 65. Let X denote the data transfer time (ms) in a grid com-
2006: 54–58) states that “The distribution of the daily puting system (the time required for data transfer

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
176 Chapter 4 Continuous random Variables and probability Distributions

between a “worker” computer and a “master” computer. elapses before all of the next n events occur has pdf
Suppose that X has a gamma distribution with mean f (x; l, n).
value 37.5 ms and standard deviation 21.6 (suggested by a. What is the expected value of X? If the time (in min-
the article “Computation Time of Grid Computing utes) between arrivals of successive customers is
with Data Transfer Times that Follow a Gamma exponentially distributed with l 5 .5, how much
Distribution,” Proceedings of the First International time can be expected to elapse before the tenth cus-
Conference on Semantics, Knowledge, and Grid, 2005). tomer arrives?
a. What are the values of a and b? b. If customer interarrival time is exponentially distrib-
b. What is the probability that data transfer time exceeds uted with l 5 .5, what is the probability that the
50 ms? tenth customer (after the one who has just arrived)
c. What is the probability that data transfer time is will arrive within the next 30 min?
between 50 and 75 ms? c. The event {X # t} occurs iff at least n events occur
66. The two-parameter gamma distribution can be general- in the next t units of time. Use the fact that the num-
ized by introducing a third parameter g, called a thresh- ber of events occurring in an interval of length t has
old or location parameter: replace x in (4.8) by x 2 g and a Poisson distribution with parameter lt to write an
x $ 0 by x $ g. This amounts to shifting the density expression (involving Poisson probabilities) for the
curves in Figure 4.27 so that they begin their ascent or Erlang cdf F(t; l, n) 5 P(X # t).
descent at g rather than 0. The article “Bivariate Flood 69. A system consists of five identical components con-
Frequency Analysis with Historical Information nected in series as shown:
Based on Copulas” (J. of Hydrologic Engr., 2013:
1018–1030) employs this distribution to model X 5 1 2 3 4 5
3-day flood volume (108 m3). Suppose that values of the
parameters are a 5 12, b 5 7, g 5 40 (very close to
As soon as one component fails, the entire system will
estimates in the cited article based on past data).
fail. Suppose each component has a lifetime that is expo-
a. What are the mean value and standard deviation of X?
nentially distributed with l 5 .01 and that components
b. What is the probability that flood volume is between fail independently of one another. Define events Ai 5
100 and 150? {ith component lasts at least t hours}, i 5 1,…, 5, so
c. What is the probability that flood volume exceeds its that the Ais are independent events. Let X 5 the time at
mean value by more than one standard deviation? which the system fails—that is, the shortest (minimum)
d. What is the 95th percentile of the flood volume lifetime among the five components.
distribution? a. The event {X $ t} is equivalent to what event involv-
67. Suppose that when a transistor of a certain type is sub- ing A1,…, A5?
jected to an accelerated life test, the lifetime X (in weeks) b. Using the independence of the Ai9s, compute
has a gamma distribution with mean 24 weeks and stan- P(X $ t). Then obtain F(t) 5 P(X # t) and the pdf of
dard deviation 12 weeks. X. What type of distribution does X have?
a. What is the probability that a transistor will last c. Suppose there are n components, each having expo-
between 12 and 24 weeks? nential lifetime with parameter l. What type of dis-
b. What is the probability that a transistor will last at tribution does X have?
most 24 weeks? Is the median of the lifetime distri- 70. If X has an exponential distribution with parameter l,
bution less than 24? Why or why not? derive a general expression for the (100p)th percentile
c. What is the 99th percentile of the lifetime distribution? of the distribution. Then specialize to obtain the
d. Suppose the test will actually be terminated after t median.
weeks. What value of t is such that only .5% of all
transistors would still be operating at termination? 71. a. The event {X 2 # y} is equivalent to what event involv-
ing X itself?
68. The special case of the gamma distribution in which a is b. If X has a standard normal distribution, use part (a)
a positive integer n is called an Erlang distribution. If we to write the integral that equals P(X 2 # y). Then dif-
replace b by 1/l in Expression (4.8), the Erlang pdf is ferentiate this with respect to y to obtain the pdf of

5
l(lx)n21e2lx X2 [the square of a N(0, 1) variable]. Finally, show
x$0
f (x; l, n) 5 (n 2 1)! that X2 has a chi-squared distribution with n 5 1 df
0 x,0 [see (4.10)]. [Hint: Use the following identity.]

It can be shown that if the times between successive


5# 6
b(y)
d
events are independent, each with an exponential dis- f (x) dx 5 f [b(y)] ? b9(y) 2 f [a(y)] ? a9(y)
dy a(y)
tribution with parameter l, then the total time X that

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
4.5 Other Continuous Distributions 177

4.5 Other Continuous Distributions

The normal, gamma (including exponential), and uniform families of distributions


provide a wide variety of probability models for continuous variables, but there are
many practical situations in which no member of these families fits a set of observed
data very well. Statisticians and other investigators have developed other families of
distributions that are often appropriate in practice.

The Weibull Distribution


The family of Weibull distributions was introduced by the Swedish physicist
Waloddi Weibull in 1939; his 1951 article “A Statistical Distribution Function
of Wide Applicability” (J. of Applied Mechanics, vol. 18: 293–297) discusses a
number of applications.

DEFINITION A random variable X is said to have a Weibull distribution with shape param-
eter a and scale parameter b (a . 0, b . 0) if the pdf of X is

5
a
x a21e2(xyb)a x $ 0
f(x; a, b) 5 ba (4.11)
0 x,0

In some situations, there are theoretical justifications for the appropriateness


of the Weibull distribution, but in many applications f(x; a, b) simply provides a
good fit to observed data for particular values of a and b. When a 5 1, the pdf
reduces to the exponential distribution (with l 5 1yb), so the exponential distribu-
tion is a special case of both the gamma and Weibull distributions. However, there
are gamma distributions that are not Weibull distributions and vice versa, so one
family is not a subset of the other. Both a and b can be varied to obtain a number of
different-looking density curves, as illustrated in Figure 4.28.

f(x)
f(x)
1 8

 = 1,  = 1 (exponential) 6
 = 10,  = .5
 = 2,  = 1
.5 4
 = 10,  = 1
 = 2,  = .5  = 10,  = 2
2

x 0 x
0 5 10 0 .5 1.0 1.5 2.0 2.5

Figure 4.28 Weibull density curves

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
178 Chapter 4 Continuous random Variables and probability Distributions

Integrating to obtain E(X) and E(X 2 ) yields

1 2 51 2 31 24 6
2
1 2 1
m 5 bG 1 1 s2 5 b2 G 1 1 2 G 11
a a a
The computation of m and s2 thus necessitates using the gamma function.
The integration #0x f(y; a, b) dy is easily carried out to obtain the cdf of X.

The cdf of a Weibull rv having parameters a and b is

F(x; a, b) 5 51 2 e0 2(xyb)a
x,0
x$0
(4.12)

ExamplE 4.25 In recent years the Weibull distribution has been used to model engine emissions of
various pollutants. Let X denote the amount of NOx emission (g/gal) from a randomly
selected four-stroke engine of a certain type, and suppose that X has a Weibull distribu-
tion with a 5 2 and b 5 10 (suggested by information in the article “Quantification
of Variability and Uncertainty in Lawn and Garden Equipment NOx and Total
Hydrocarbon Emission Factors,” J. of the Air and Waste Management Assoc.,
2002: 435–448). The corresponding density curve looks exactly like the one in Fig-
ure 4.28 for a 5 2, b 5 1 except that now the values 50 and 100 replace 5 and 10 on
the horizontal axis. Then
2
P(X # 10) 5 F(10; 2, 10) 5 1 2 e2(10y10) 5 1 2 e21 5 .632
Similarly, P(X # 25) 5 .998, so the distribution is almost entirely concentrated on
values between 0 and 25. The value c which separates the 5% of all engines having
the largest amounts of NOx emissions from the remaining 95% satisfies
2
.95 5 1 2 e2(cy10)
Isolating the exponential term on one side, taking logarithms, and solving the result-
ing equation gives c < 17.3 as the 95th percentile of the emission distribution. n

In practical situations, a Weibull model may be reasonable except that the


smallest possible X value may be some value g not assumed to be zero (this would
also apply to a gamma model; see Exercise 66). The quantity g can then be regarded
as a third (threshold or location) parameter of the distribution, which is what Weibull
did in his original work. For, say, g 5 3, all curves in Figure 4.28 would be shifted 3
units to the right. This is equivalent to saying that X 2 g has the pdf (4.11), so that
the cdf of X is obtained by replacing x in (4.12) by x 2 g.

ExamplE 4.26 An understanding of the volumetric properties of asphalt is important in designing


mixtures which will result in high-durability pavement. The article “Is a Normal
Distribution the Most Appropriate Statistical Distribution for Volumetric
Properties in Asphalt Mixtures?” (J. of Testing and Evaluation, Sept. 2009:
1–11) used the analysis of some sample data to recommend that for a particular
mixture, X 5 air void volume (%) be modeled with a three-parameter Weibull distri-
bution. Suppose the values of the parameters are g 5 4, a 5 1.3, and b 5 .8 (quite
close to estimates given in the article).
For x . 4, the cumulative distribution function is
1.3
F(x; a, b, g) 5 F(x; 1.3, .8, 4) 5 1 2 e2[(x24)y.8]

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
4.5 Other Continuous Distributions 179

The probability that the air void volume of a specimen is between 5% and 6% is
1.3 1.3
P(5 # X # 6) 5 F(6; 1.3,.8,4) 2 F(5; 1.3, .8, 4) 5 e2[(524)y.8] 2 e2[(624)y.8]
5 .263 2 .037 5 .226
Figure 4.29 shows a graph from Minitab of the corresponding Weibull density func-
tion in which the shaded area corresponds to the probability just calculated.

f(x)
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2 .226
0.1
0.0 x
4 5 6

Figure 4.29 Weibull density curve with threshold 5 4, shape 5 1.3, scale 5 .8 n

The Lognormal Distribution

DEFINITION A nonnegative rv X is said to have a lognormal distribution if the rv


Y 5 ln(X) has a normal distribution. The resulting pdf of a lognormal rv when
ln(X) is normally distributed with parameters m and s is

f(x; m, s) 5 Ï2psx
1
e2[ln(x)2m] y(2s )

0
2 2

H x$0

x,0

Be careful here; the parameters m and s are not the mean and standard deviation of
X but of ln(X). The mean and variance of X can be shown to be
y2
V(X) 5 e2m1s ? (es 2 2 1)
2 2
E(X) 5 em1s
In Chapter 5, we will present a theoretical justification for this distribution in con-
nection with the Central Limit Theorem. But as with other distributions, the lognor-
mal can be used as a model even in the absence of such justification. Figure 4.30
illustrates graphs of the lognormal pdf; although a normal curve is symmetric, a
lognormal curve has a positive skew.
Because ln(X) has a normal distribution, the cdf of X can be expressed in terms
of the cdf F(z) of a standard normal rv Z.

F(x; m, s) 5 P(X # x) 5 P[ln(X) # ln(x)]

1 2 1 2
ln(x) 2 m ln(x) 2 m
5P Z# 5F x$0 (4.13)
s s

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
180 Chapter 4 Continuous random Variables and probability Distributions

f(x)
.25

.20
 = 1,  = 1
.15

.10  = 3,  = 3
 = 3,  = 1
.05

0 x
0 5 10 15 20 25

Figure 4.30 Lognormal density curves

ExamplE 4.27 According to the article “Predictive Model for Pitting Corrosion in Buried Oil
and Gas Pipelines” (Corrosion, 2009: 332–342), the lognormal distribution has
been reported as the best option for describing the distribution of maximum pit depth
data from cast iron pipes in soil. The authors suggest that a lognormal distribution
with m 5 .353 and s 5 .754 is appropriate for maximum pit depth (mm) of buried
pipelines. For this distribution, the mean value and variance of pit depth are
E(X) 5 e.3531(.754) y2 5 e.6373 5 1.891
2

2 2
V(X) 5 e2(.353)1(.754) ? (e(.754) 2 1) 5 (3.57697)(.765645) 5 2.7387
The probability that maximum pit depth is between 1 and 2 mm is
P(1 # X # 2) 5 P(ln(1) # ln(X) # ln(2)) 5 P(0 # ln(X) # .693)

10 2.754.353 # Z # .693.754 2 5 F(.47) 2 F(2.45) 5 .354


2 .353
5P

This probability is illustrated in Figure 4.31 (from Minitab).

f(x)
0.5

0.4
.354
0.3

0.2

0.1

0.0 x
0 1 2

Figure 4.31 Lognormal density curve with m 5 .353 and s 5 .754

What value c is such that only 1% of all specimens have a maximum pit depth
exceeding c? The desired value satisfies

1 2
ln(c) 2 .353
.99 5 P(X # c) 5 P Z #
.754
The z critical value 2.33 captures an upper-tail area of .01 (z.01 = 2.33), and thus a
cumulative area of .99. This implies that
ln(c) 2 .353
5 2.33
.754

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
4.5 Other Continuous Distributions 181

from which ln(c) = 2.1098 and c = 8.247. Thus 8.247 is the 99th percentile of the
maximum pit depth distribution. n

The Beta Distribution


All families of continuous distributions discussed so far except for the uniform dis-
tribution have positive density over an infinite interval (though typically the density
function decreases rapidly to zero beyond a few standard deviations from the mean).
The beta distribution provides positive density only for X in an interval of finite length.

DEFINITION A random variable X is said to have a beta distribution with parameters a, b


(both positive), A, and B if the pdf of X is

5 1 2 1 2
1 G(a 1 b) x 2 A a21
B2x b21
? A#x#B
f(x; a, b, A, B) 5 B 2 A G(a) ? G(b) B 2 A B2A
0 otherwise

The case A 5 0, B 5 1 gives the standard beta distribution.

Figure 4.32 illustrates several standard beta pdf’s. Graphs of the general pdf are
similar, except they are shifted and then stretched or compressed to fit over [A, B].
Unless a and b are integers, integration of the pdf to calculate probabilities is dif-
ficult. Either a table of the incomplete beta function or appropriate software should
be used. The mean and variance of X are
a (B 2 A)2ab
m 5 A 1 (B 2 A) ? s2 5
a1b (a 1 b)2(a 1 b 1 1)

f(x; a , b )
5

4 52
b 5 .5

3 55
b5 2
2 a 5 b 5 .5

x
0 .2 .4 .6 .8 1

Figure 4.32 Standard beta density curves

ExamplE 4.28 Project managers often use a method labeled PERT—for program evaluation and
review technique—to coordinate the various activities making up a large project.
(One successful application was in the construction of the Apollo spacecraft.) A
standard assumption in PERT analysis is that the time necessary to complete any
particular activity once it has been started has a beta distribution with A 5 the opti-
mistic time (if everything goes well) and B 5 the pessimistic time (if everything
goes badly). Suppose that in constructing a single-family house, the time X (in days)

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
182 Chapter 4 Continuous random Variables and probability Distributions

necessary for laying the foundation has a beta distribution with A 5 2, B 5 5, a 5 2,


and b 5 3. Then ay(a 1 b) 5 .4, so E(X) 5 2 1 (3)(.4) 5 3.2. For these values of
a and b, the pdf of X is a simple polynomial function. The probability that it takes
at most 3 days to lay the foundation is

1 21 2
3
1 4! x 2 2 52x 2
P(X # 3) 5 # 2
?
3 1!2! 3 3
dx

4 3 4 11 11
5
27 2 #
(x 2 2)(5 2 x)2dx 5 ?
27 4
5
27
5 .407 n

The standard beta distribution is commonly used to model variation in the


proportion or percentage of a quantity occurring in different samples, such as the
proportion of a 24-hour day that an individual is asleep or the proportion of a certain
element in a chemical compound.

EXERCISES Section 4.5 (72–86)

72. The lifetime X (in hundreds of hours) of a certain type of before the individual becomes infectious. The article
vacuum tube has a Weibull distribution with parameters “The Probability of Containment for Multitype
a 5 2 and b 5 3. Compute the following: Branching Process Models for Emerging Epidemics”
a. E(X) and V(X) (J. of Applied Probability, 2011: 173–188) proposes a
b. P(X # 6) Weibull distribution with a 5 2.2, b 5 1.1, and g 5 .5
c. P(1.5 # X # 6) (refer to Example 4.26).
a. Calculate P(1 , X , 2).
(This Weibull distribution is suggested as a model for
b. Calculate P(X . 1.5).
time in service in “On the Assessment of Equipment
Reliability: Trading Data Collection Costs for c. What is the 90th percentile of the distribution?
Precision,” J. of Engr. Manuf., 1991: 105–109.) d. What are the mean and standard deviation of X?

73. The authors of the article “A Probabilistic Insulation 75. Let X have a Weibull distribution with the pdf from
Life Model for Combined Thermal­Electrical Expression (4.11). Verify that m 5 bGs1 1 1yad. [Hint:
Stresses” (IEEE Trans. on Elect. Insulation, 1985: In the integral for E(X), make the change of variable
519–522) state that “the Weibull distribution is widely y 5 sx/bda, so that x 5 by1ya.]
used in statistical problems relating to aging of solid 76. The article “The Statistics of Phytotoxic Air Pollutants”
insulating materials subjected to aging and stress.” (J. of Royal Stat. Soc., 1989: 183–198) suggests the
They propose the use of the distribution as a model for lognormal distribution as a model for SO2 concentration
time (in hours) to failure of solid insulating specimens above a certain forest. Suppose the parameter values are
subjected to AC voltage. The values of the parameters m 5 1.9 and s 5 .9.
depend on the voltage and temperature; suppose a. What are the mean value and standard deviation of
a 5 2.5 and b 5 200 (values suggested by data in the concentration?
article). b. What is the probability that concentration is at most
a. What is the probability that a specimen’s lifetime is 10? Between 5 and 10?
at most 250? Less than 250? More than 300? 77. The authors of the article from which the data in Exercise
b. What is the probability that a specimen’s lifetime is 1.27 was extracted suggested that a reasonable probability
between 100 and 250? model for drill lifetime was a lognormal distribution with
c. What value is such that exactly 50% of all specimens m 5 4.5 and s 5 .8.
have lifetimes exceeding that value? a. What are the mean value and standard deviation of
74. Once an individual has been infected with a certain lifetime?
disease, let X represent the time (days) that elapses b. What is the probability that lifetime is at most 100?

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
4.5 Other Continuous Distributions 183

c. What is the probability that lifetime is at least 200? b. What is the probability that delay time exceeds
Greater than 200? 12 months?
78. The article “On Assessing the Accuracy of Offshore c. What is the probability that delay time is within one
Wind Turbine Reliability­Based Design Loads from the standard deviation of its mean value?
Environmental Contour Method” (Intl. J. of Offshore d. What is the median of the delay time distribution?
and Polar Engr., 2005: 132–140) proposes the Weibull e. What is the 99th percentile of the delay time
distribution with a 5 1.817 and b 5 .863 as a model for distribution?
1-hour significant wave height (m) at a certain site. f. Among 10 randomly selected such items, how many
a. What is the probability that wave height is at most would you expect to have a delay time exceeding
.5 m? 8 months?
b. What is the probability that wave height exceeds its 82. As in the case of the Weibull and Gamma distributions,
mean value by more than one standard deviation? the lognormal distribution can be modified by the intro-
c. What is the median of the wave-height distribution? duction of a third parameter g such that the pdf is shifted
d. For 0 , p , 1, give a general expression for the to be positive only for x . g. The article cited in
100pth percentile of the wave-height distribution. Exercise 4.39 suggested that a shifted lognormal distri-
79. Nonpoint source loads are chemical masses that travel to bution with shift (i.e., threshold) 5 1.0, mean value 5
the main stem of a river and its tributaries in flows that 2.16, and standard deviation 5 1.03 would be an appro-
are distributed over relatively long stream reaches, in priate model for the rv X 5 maximum-to-average depth
contrast to those that enter at well-defined and regulated ratio of a corrosion defect in pressurized steel.
points. The article “Assessing Uncertainty in Mass a. What are the values of m and s for the proposed
Balance Calculation of River Nonpoint Source distribution?
Loads” (J. of Envir. Engr., 2008: 247–258) suggested b. What is the probability that depth ratio exceeds 2?
that for a certain time period and location, X 5 nonpoint c. What is the median of the depth ratio distribution?
source load of total dissolved solids could be modeled d. What is the 99th percentile of the depth ratio
with a lognormal distribution having mean value 10,281 distribution?
kg/day/km and a coefficient of variation CV 5 .40 (CV 5
sXymX). 83. What condition on a and b is necessary for the standard
beta pdf to be symmetric?
a. What are the mean value and standard deviation of
ln(X)? 84. Suppose the proportion X of surface area in a randomly
b. What is the probability that X is at most 15,000 selected quadrat that is covered by a certain plant has a
kg/day/km? standard beta distribution with a 5 5 and b 5 2.
a. Compute E(X) and V(X).
c. What is the probability that X exceeds its mean
value, and why is this probability not .5? b. Compute P(X # .2).
c. Compute P(.2 # X # .4).
d. Is 17,000 the 95th percentile of the distribution?
d. What is the expected proportion of the sampling region
80. a. Use Equation (4.13) to write a formula for the not covered by the plant?
median ,m of the lognormal distribution. What is the
median for the load distribution of Exercise 79? 85. Let X have a standard beta density with parameters a
and b.
b. Recalling that za is our notation for the 100(1 2 a)
percentile of the standard normal distribution, write a. Verify the formula for E(X) given in the section.
an expression for the 100(1 2 a) percentile of the b. Compute E[(1 2 X)m]. If X represents the proportion of
lognormal distribution. In Exercise 79, what value a substance consisting of a particular ingredient, what
will load exceed only 1% of the time? is the expected proportion that does not consist of this
ingredient?
81. Sales delay is the elapsed time between the manufacture
of a product and its sale. According to the article 86. Stress is applied to a 20-in. steel bar that is clamped in a
“Warranty Claims Data Analysis Considering Sales fixed position at each end. Let Y 5 the distance from the
Delay” (Quality and Reliability Engr. Intl., 2013: left end at which the bar snaps. Suppose Y/20 has a stan-
113–123), it is quite common for investigators to model dard beta distribution with E(Y) 5 10 and V(Y) 5 100 7 .
sales delay using a lognormal distribution. For a particu- a. What are the parameters of the relevant standard beta
lar product, the cited article proposes this distribution distribution?
with parameter values m 5 2.05 and s2 5 .06 (here the b. Compute P(8 # Y # 12).
unit for delay is months). c. Compute the probability that the bar snaps more than
a. What are the variance and standard deviation of 2 in. from where you expect it to.
delay time?

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
184 Chapter 4 Continuous random Variables and probability Distributions

4.6 Probability Plots

An investigator will often have obtained a numerical sample x 1, x 2,…, x n and wish
to know whether it is plausible that it came from a population distribution of some
particular type (e.g., from a normal distribution). For one thing, many formal pro-
cedures from statistical inference are based on the assumption that the population
distribution is of a specified type. The use of such a procedure is inappropriate
if the actual underlying probability distribution differs greatly from the assumed
type. For example, the article “Toothpaste Detergents: A Potential Source of
Oral Soft Tissue Damage” (Intl. J. of Dental Hygiene, 2008: 193–198) contains
the following statement: “Because the sample number for each experiment (rep-
lication) was limited to three wells per treatment type, the data were assumed to
be normally distributed.” As justification for this leap of faith, the authors wrote
that “Descriptive statistics showed standard deviations that suggested a normal
distribution to be highly likely.” Note: This argument is not very persuasive.
Additionally, understanding the underlying distribution can sometimes give
insight into the physical mechanisms involved in generating the data. An effective
way to check a distributional assumption is to construct what is called a probability
plot. The essence of such a plot is that if the distribution on which the plot is based is
correct, the points in the plot should fall close to a straight line. If the actual distribu-
tion is quite different from the one used to construct the plot, the points will likely
depart substantially from a linear pattern.

Sample Percentiles
The details involved in constructing probability plots differ a bit from source to source.
The basis for our construction is a comparison between percentiles of the sample data
and the corresponding percentiles of the distribution under consideration. Recall that
the (100p)th percentile of a continuous distribution with cdf F( ? ) is the number h(p)
that satisfies F(h(p)) 5 p. That is, h(p) is the number on the measurement scale such
that the area under the density curve to the left of h(p) is p. Thus the 50th percentile
h(.5) satisfies F(h(.5)) 5 .5, and the 90th percentile satisfies F(h(.9)) 5 .9. Consider
as an example the standard normal distribution, for which we have denoted the cdf
by F( ? ). From Appendix Table A.3, we find the 20th percentile by locating the row
and column in which .2000 (or a number as close to it as possible) appears inside the
table. Since .2005 appears at the intersection of the 2.8 row and the .04 column, the
20th percentile is approximately 2.84. Similarly, the 25th percentile of the standard
normal distribution is (using linear interpolation) approximately 2.675.
Roughly speaking, sample percentiles are defined in the same way that percen-
tiles of a population distribution are defined. The 50th-sample percentile should sepa-
rate the smallest 50% of the sample from the largest 50%, the 90th percentile should
be such that 90% of the sample lies below that value and 10% lies above, and so on.
Unfortunately, we run into problems when we actually try to compute the sample per-
centiles for a particular sample of n observations. If, for example, n 5 10, we can split
off 20% of these values or 30% of the data, but there is no value that will split off exactly
23% of these ten observations. To proceed further, we need an operational definition
of sample percentiles (this is one place where different people do slightly different
things). Recall that when n is odd, the sample median or 50th-sample percentile is the
middle value in the ordered list, for example, the sixth-largest value when n 5 11. This
amounts to regarding the middle observation as being half in the lower half of the data

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
4.6 probability plots 185

and half in the upper half. Similarly, suppose n 5 10. Then if we call the third-smallest
value the 25th percentile, we are regarding that value as being half in the lower group
(consisting of the two smallest observations) and half in the upper group (the seven larg-
est observations). This leads to the following general definition of sample percentiles.

DEFINITION Order the n sample observations from smallest to largest. Then the ith smallest
observation in the list is taken to be the [100(i 2 .5)yn] th sample percentile.

Once the percentage values 100(i 2 .5)/n (i 5 1, 2,…, n) have been calcu-
lated, sample percentiles corresponding to intermediate percentages can be obtained
by linear interpolation. For example, if n 5 10, the percentages corresponding to
the ordered sample observations are 100(1 2 .5)/10 5 5%, 100(2 2 .5)/10 5 15%,
25%,…, and 100(10 2 .5)/10 5 95%. The 10th percentile is then halfway between
the 5th percentile (smallest sample observation) and the 15th percentile (second-
smallest observation). For our purposes, such interpolation is not necessary because
a probability plot will be based only on the percentages 100(i 2 .5)/n corresponding
to the n sample observations.

A Probability Plot
Suppose now that for percentages 100(i 2 .5)/n (i 5 1,…, n) the percentiles are
determined for a specified population distribution whose plausibility is being
investigated. If the sample was actually selected from the specified distribution,
the sample percentiles (ordered sample observations) should be reasonably close
to the corresponding population distribution percentiles. That is, for i 5 1, 2,…, n
there should be reasonable agreement between the ith smallest sample observation
and the [100(i 2 .5)/n]th percentile for the specified distribution. Let’s consider the
(population percentile, sample percentile) pairs—that is, the pairs

1[100(i 2
2 .5)yn]th percentile , ith smallest sample
of the distribution, observation
for i 5 1,…, n. Each such pair can be plotted as a point on a two-dimensional
coordinate system. If the sample percentiles are close to the corresponding popula-
tion distribution percentiles, the first number in each pair will be roughly equal to
the second number. The plotted points will then fall close to a 458 line. Substantial
deviations of the plotted points from a 458 line cast doubt on the assumption that the
distribution under consideration is the correct one.

Example 4.29 The value of a certain physical constant is known to an experimenter. The experi-
menter makes n 5 10 independent measurements of this value using a particular
measurement device and records the resulting measurement errors (error 5 observed
value 2 true value). These observations appear in the accompanying table.

Percentage 5 15 25 35 45

z percentile 21.645 21.037 2.675 2.385 2.126

Sample observation 21.91 21.25 2.75 2.53 .20

Percentage 55 65 75 85 95

z percentile .126 .385 .675 1.037 1.645

Sample observation .35 .72 .87 1.40 1.56

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
186 Chapter 4 Continuous random Variables and probability Distributions

Is it plausible that the random variable measurement error has a standard normal dis-
tribution? The needed standard normal (z) percentiles are also displayed in the table.
Thus the points in the probability plot are (21.645, 21.91), (21.037, 21.25),…,
and (1.645, 1.56). Figure 4.33 shows the resulting plot. Although the points deviate
a bit from the 458 line, the predominant impression is that this line fits the points
very well. The plot suggests that the standard normal distribution is a reasonable
probability model for measurement error.
x
1.6 45° line

1.2

.8

.4

z percentile
21.6 21.2 2.8 2.4 .4 .8 1.2 1.6
2.4

2.8

21.2

21.6

21.8

Figure 4.33 Plot of pairs (z percentile, observed value) for the data of Example 4.29

Figure 4.34 shows a plot of pairs (z percentile, observation) for a second sample
of ten observations. The 458 line gives a good fit to the middle part of the sample but
not to the extremes. The plot has a well-defined S-shaped appearance. The two small-
est sample observations are considerably larger than the corresponding z percentiles
(the points on the far left of the plot are well above the 458 line). Similarly, the two
largest sample observations are much smaller than the associated z percentiles. This
plot indicates that the standard normal distribution would not be a plausible choice
for the probability model that gave rise to these observed measurement errors.

45° line
x
1.2
S-shaped curve
.8

.4

z percentile
21.6 21.2 2.8 2.4 .4 .8 1.2 1.6
2.4

2.8

21.2

Figure 4.34 Plots of pairs (z percentile, observed value) for the scenario of Example
4.29: second sample n

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
4.6 probability plots 187

An investigator is typically not interested in knowing just whether a particu-


lar probability distribution, such as the standard normal distribution (normal with
m 5 0 and s 5 1) or the exponential distribution with l 5 .1, is a plausible model
for the population distribution from which the sample was selected. Instead, the
issue is whether some member of a family of probability distributions specifies
a plausible model—the family of normal distributions, the family of exponential
distributions, the family of Weibull distributions, and so on. The values of the
parameters of a distribution are usually not specified at the outset. If the family
of Weibull distributions is under consideration as a model for lifetime data, are
there any values of the parameters a and b for which the corresponding Weibull
distribution gives a good fit to the data? Fortunately, it is frequently the case that
just one probability plot will suffice for assessing the plausibility of an entire
family. If the plot deviates substantially from a straight line, no member of the
family is plausible. When the plot is quite straight, further work is necessary to
estimate values of the parameters that yield the most reasonable distribution of
the specified type.
Let’s focus on a plot for checking normality. Such a plot is useful in applied
work because many formal statistical procedures give accurate inferences only when
the population distribution is at least approximately normal. These procedures should
generally not be used if the normal probability plot shows a very pronounced depar-
ture from linearity. The key to constructing an omnibus normal probability plot is the
relationship between standard normal (z) percentiles and those for any other normal
distribution:

normal (m, s) percentile 5 m 1 s ? (corresponding z percentile)

Consider first the case m 5 0. If each observation is exactly equal to the


corresponding normal percentile for some value of s, the pairs (s ? [z percen-
tile], observation) fall on a 458 line, which has slope 1. This then implies that the
(z percentile, observation) pairs fall on a line passing through (0, 0) (i.e., one with
y-intercept 0) but having slope s rather than 1. The effect of a nonzero value of
m is simply to change the y-intercept from 0 to m.

A plot of the n pairs

([100(i 2 .5)yn]th z percentile, ith smallest observation)


is called a normal probability plot. If the sample observations are in fact
drawn from a normal distribution with mean value m and standard deviation
s, the points should fall close to a straight line with slope s and intercept m.
Thus a plot for which the points fall close to some straight line suggests that
the assumption of a normal population distribution is plausible.

ExamplE 4.30 There has been recent increased use of augered cast-in-place (ACIP) and drilled dis-
placement (DD) piles in the foundations of buildings and transportation structures.
In the article “Design Methodology for Axially Loaded Auger Cast­in­Place and
Drilled Displacement Piles” (J. of Geotech. Geoenviron. Engr., 2012: 1431–1441),
researchers propose a design methodology to enhance the efficiency of these piles.
Here are length-diameter ratio measurements based on 17 static pile load tests on

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
188 Chapter 4 Continuous random Variables and probability Distributions

ACIP and DD piles from various construction sites. The values of p for which z per-
centiles are needed are (1 2 .5)/17 5 .029, (2 2 .5)/17 5 .088, … and .971.

x(i) : 30.86 37.68 39.04 42.78 42.89 42.89 45.05 47.08 47.08
z percentile: –1.89 –1.35 –1.05 –0.82 –0.63 –0.46 –0.30 –0.15 0.00

x(i) : 48.79 48.79 52.56 52.56 54.80 55.17 56.31 59.94


z percentile: 0.15 0.30 0.46 0.63 0.82 1.05 1.35 1.89

Figure 4.35 shows the corresponding normal probability plot generated by the R
software package. The pattern in the plot is quite straight, indicating it is plausible
that the population distribution of length-diameter ratio is normal.

Figure 4.35 Normal probability plot from R for the Length-Diameter Ratio data n

There is an alternative version of a normal probability plot in which the z percen-


tile axis is replaced by a nonlinear percentage axis. The scaling on this axis is constructed
so that plotted points should again fall close to a line when the sampled distribution is
normal. Figure 4.36 shows such a plot from Minitab for the ratio data of Example 4.30.
(The last two numbers in the small box on the right will be explained in Chapter 14.)

99
Mean 47.31
StDev 7.560
95
N 17
90 RJ 0.990
80 P-Value .0.100
70
Percent

60
50
40
30
20
10
5

1
30 40 50 60 70
Ratio

Figure 4.36 Normal probability plot of the ratio data from Minitab

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
4.6 probability plots 189

A nonnormal population distribution can often be placed in one of the follow-


ing three categories:
1. It is symmetric and has “lighter tails” than does a normal distribution; that is, the
density curve declines more rapidly out in the tails than does a normal curve.
2. It is symmetric and heavy-tailed compared to a normal distribution.
3. It is skewed.
A uniform distribution is light-tailed, since its density function drops to zero outside
a finite interval. The Cauchy density function f(x) 5 1/[pb(1 1 ((x 2 u)/b)2 )] for
2` , x , ` is heavy-tailed, since 1/(1 1 x 2) declines much less rapidly than does
e2x y2. Lognormal and Weibull distributions are among those that are skewed. When
2

the points in a normal probability plot do not adhere to a straight line, the pattern
will frequently suggest that the population distribution is in a particular one of these
three categories.
The largest and smallest observations in a sample from a light-tailed distribu-
tion are usually not as extreme as would be expected from a normal random sample.
Visualize a straight line drawn through the middle part of the plot; points on the far
right tend to be below the line (observed value , z percentile), whereas points on the
left end of the plot tend to fall above the straight line (observed value . z percentile).
The result is an S-shaped pattern of the type pictured in Figure 4.34.
A sample from a heavy-tailed distribution also tends to produce an S-shaped
plot. However, in contrast to the light-tailed case, the left end of the plot curves
downward (observed , z percentile), as shown in Figure 4.37(a). If the underlying
distribution is positively skewed (a short left tail and a long right tail), the smallest
sample observations will be larger than expected from a normal sample and so will
the largest observations. In this case, points on both ends of the plot will fall above
a straight line through the middle part, yielding a curved pattern, as illustrated in
Figure 4.37(b). A sample from a lognormal distribution will usually produce such
a pattern. A plot of (z percentile, ln(x)) pairs should then resemble a straight line.

x
x

z percentile z percentile
(a) (b)

Figure 4.37 Probability plots that suggest a nonnormal distribution: (a) a plot consistent with a heavy-tailed
distribution; (b) a plot consistent with a positively skewed distribution

Even when the population distribution is normal, the sample percentiles will
not coincide exactly with the theoretical percentiles because of sampling variability.
How much can the points in the probability plot deviate from a straight-line pattern
before the assumption of population normality is no longer plausible? This is not an
easy question to answer. Generally speaking, a small sample from a normal distribu-
tion is more likely to yield a plot with a nonlinear pattern than is a large sample. The
book Fitting Equations to Data (see the Chapter 13 bibliography) presents the results

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
190 Chapter 4 Continuous random Variables and probability Distributions

of a simulation study in which numerous samples of different sizes were selected


from normal distributions. The authors concluded that there is typically greater vari-
ation in the appearance of the probability plot for sample sizes smaller than 30, and
only for much larger sample sizes does a linear pattern generally predominate. When
a plot is based on a small sample size, only a very substantial departure from linearity
should be taken as conclusive evidence of nonnormality. A similar comment applies
to probability plots for checking the plausibility of other types of distributions.

Beyond Normality
Consider a family of probability distributions involving two parameters, u1 and u2,
and let F(x; u1, u2) denote the corresponding cdf’s. The family of normal distribu-
tions is one such family, with u1 5 m, u2 5 s, and F(x; m, s) 5 F[(x 2 m)ys].
Another example is the Weibull family, with u1 5 a, u2 5 b, and
a
F(x; a, b) 5 1 2 e2(xyb)
Still another family of this type is the gamma family, for which the cdf is an integral
involving the incomplete gamma function that cannot be expressed in any simpler form.
The parameters u1 and u2 are said to be location and scale parameters, respec-
tively, if F(x; u1, u2) is a function of (x 2 u1)yu2. The parameters m and s of the
normal family are location and scale parameters, respectively. In general, changing u1
shifts the location of the corresponding density curve to the right or left, and changing
u2 amounts to stretching or compressing the horizontal measurement scale. Another
example is given by the cdf
(x 2 u1)yu2
F(x; u1, u2) 5 1 2 e2e 2` , x , `
A random variable with this cdf is said to have an extreme value distribution. It is
used in applications involving component lifetime and material strength.
Although the form of the extreme value cdf might at first glance suggest that u1
is the point of symmetry for the density function, and therefore the mean and median,
this is not the case. Instead, P(X # u1) 5 F(u1; u1, u2) 5 1 2 e21 5 .632, and the
density function f(x; u1, u2) 5 F9(x; u1, u2) is negatively skewed (a long lower tail).
Similarly, the scale parameter u2 is not the standard deviation (m 5 u1 2 .5772u2 and
s 5 1.283u2). However, changing the value of u1 does rigidly shift the density curve
to the left or right, whereas a change in u2 rescales the measurement axis.
The parameter b of the Weibull distribution is a scale parameter, but a is not
a location parameter. A similar comment applies to the parameters a and b of the
gamma distribution. And for the lognormal distribution, m is not a location param-
eter, nor is s a scale parameter. In the usual form, the density function for any mem-
ber of these families is positive for x . 0 and 0 otherwise. Examples and exercises
in the two previous sections introduced a third location (i.e., threshold) parameter
g for these three distributions; this shifts the density function so that it is positive if
x . g and zero otherwise.
When the family under consideration has only location and scale parameters,
the issue of whether any member of the family is a plausible population distribution
can be addressed via a single, easily constructed probability plot. One first obtains
the percentiles of the standard distribution, the one with u1 5 0 and u2 5 1, for
percentages 100(i 2 .5)/n (i 5 1,…, n). The n (standardized percentile, observation)
pairs give the points in the plot. This is exactly what we did to obtain an omnibus
normal probability plot. Somewhat surprisingly, this methodology can be applied to
yield an omnibus Weibull probability plot. The key result is that if X has a Weibull
distribution with shape parameter a and scale parameter b, then the transformed

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
4.6 probability plots 191

variable ln(X) has an extreme value distribution with location parameter u1 5 ln(b)
and scale parameter 1/a. Thus a plot of the (extreme value standardized percen-
tile, ln(x)) pairs showing a strong linear pattern provides support for choosing the
Weibull distribution as a population model.

ExamplE 4.31 The accompanying observations are on lifetime (in hours) of power apparatus insula-
tion when thermal and electrical stress acceleration were fixed at particular values
(“On the Estimation of Life of Power Apparatus Insulation Under Combined
Electrical and Thermal Stress,” IEEE Trans. on Electrical Insulation, 1985:
70–78). A Weibull probability plot necessitates first computing the 5th, 15th, . . . ,
and 95th percentiles of the standard extreme value distribution. The (100p)th per-
centile h(p) satisfies
p 5 F(h(p)) 5 1 2 e2e
h(p)

from which h( p) 5 ln[2ln(1 2 p)].

Percentile 22.97 21.82 21.25 2.84 2.51

x 282 501 741 851 1072

ln(x) 5.64 6.22 6.61 6.75 6.98

Percentile 2.23 .05 .33 .64 1.10

x 1122 1202 1585 1905 2138

ln(x) 7.02 7.09 7.37 7.55 7.67

The pairs (22.97, 5.64), (21.82, 6.22),…, (1.10, 7.67) are plotted as points in
Figure 4.38. The straightness of the plot argues strongly for using the Weibull dis-
tribution as a model for insulation life, a conclusion also reached by the author of
the cited article.
ln(x)
8

5 Percentile
–3 –2 –1 0 1

Figure 4.38 A Weibull probability plot of the insulation lifetime data n

The gamma distribution is an example of a family involving a shape param-


eter for which there is no transformation h( ? ) such that h(X) has a distribution that
depends only on location and scale parameters. Construction of a probability plot
necessitates first estimating the shape parameter from sample data (some methods
for doing this are described in Chapter 6). Sometimes an investigator wishes to
know whether the transformed variable Xu has a normal distribution for some value
of u (by convention, u 5 0 is identified with the logarithmic transformation, in
which case X has a lognormal distribution). The book Graphical Methods for Data
Analysis, listed in the Chapter 1 bibliography, discusses this type of problem as well

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
192 Chapter 4 Continuous random Variables and probability Distributions

as other refinements of probability plotting. Fortunately, the wide availability of


various probability plots with statistical software packages means that the user can
often sidestep technical details.

EXERCISES Section 4.6 (87–97)

87. The accompanying normal probability plot was con- Observation 24.46 25.61 26.25 26.42 26.66
structed from a sample of 30 readings on tension for z percentile 21.96 21.44 21.15 2.93 2.76
mesh screens behind the surface of video display tubes Observation 27.15 27.31 27.54 27.74 27.94
used in computer monitors. Does it appear plausible that z percentile 2.60 2.45 2.32 2.19 2.06
the tension distribution is normal?
Observation 27.98 28.04 28.28 28.49 28.50
x z percentile .06 .19 .32 .45 .60
350 Observation 28.87 29.11 29.13 29.50 30.88
z percentile .76 .93 1.15 1.44 1.96
90. The article “A Probabilistic Model of Fracture in
300
Concrete and Size Effects on Fracture Toughness”
(Magazine of Concrete Res., 1996: 311–320) gives
arguments for why fracture toughness in concrete speci-
250
mens should have a Weibull distribution and presents
several histograms of data that appear well fit by super-
200 imposed Weibull curves. Consider the following sample
z percentile of size n 5 18 observations on toughness for high-
–2 –1 0 1 2 strength concrete (consistent with one of the histo-
grams); values of pi 5 (i 2 .5)y18 are also given.
88. A sample of 15 female collegiate golfers was selected
and the clubhead velocity (km/hr) while swinging a Observation .47 .58 .65 .69 .72 .74
driver was determined for each one, resulting in the fol- pi .0278 .0833 .1389 .1944 .2500 .3056
lowing data (“Hip Rotational Velocities During the Observation .77 .79 .80 .81 .82 .84
Full Golf Swing,” J. of Sports Science and Medicine, pi .3611 .4167 .4722 .5278 .5833 .6389
2009: 296–299):
Observation .86 .89 .91 .95 1.01 1.04
69.0 69.7 72.7 80.3 81.0 pi .6944 .7500 .8056 .8611 .9167 .9722
85.0 86.0 86.3 86.7 87.7
89.3 90.7 91.0 92.5 93.0 Construct a Weibull probability plot and comment.
The corresponding z percentiles are 91. Construct a normal probability plot for the fatigue-crack
propagation data given in Exercise 39 (Chapter 1). Does
21.83 21.28 20.97 20.73 20.52 it appear plausible that propagation life has a normal
20.34 20.17 0.0 0.17 0.34 distribution? Explain.
0.52 0.73 0.97 1.28 1.83 92. The article “The Load­Life Relationship for M50
Construct a normal probability plot and a dotplot. Is it Bearings with Silicon Nitride Ceramic Balls”
plausible that the population distribution is normal? (Lubrication Engr., 1984: 153–159) reports the accom-
panying data on bearing load life (million revs.) for
89. The accompanying sample consisting of n 5 20 observa-
bearings tested at a 6.45 kN load.
tions on dielectric breakdown voltage of a piece of epoxy
resin appeared in the article “Maximum Likelihood 47.1 68.1 68.1 90.8 103.6 106.0 115.0
Estimation in the 3­Parameter Weibull Distribution 126.0 146.6 229.0 240.0 240.0 278.0 278.0
(IEEE Trans. on Dielectrics and Elec. Insul., 1996: 289.0 289.0 367.0 385.9 392.0 505.0
43–55). The values of (i 2 .5)yn for which z percentiles
are needed are (1 2 .5)y20 5 .025, (2 2 .5)y20 5 a. Construct a normal probability plot. Is normality
.075,…, and .975. Would you feel comfortable estimat- plausible?
ing population mean voltage using a method that assumed b. Construct a Weibull probability plot. Is the Weibull
a normal population distribution? distribution family plausible?

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Supplementary exercises 193

93. Construct a probability plot that will allow you to assess suggested check for normality is to plot the
the plausibility of the lognormal distribution as a model (F21((i 2 .5)yn), yi) pairs. Suppose we believe that the
for the rainfall data of Exercise 83 in Chapter 1. observations come from a distribution with mean 0, and let
94. The accompanying observations are precipitation values w1,…, wn be the ordered absolute values of the x i9s.
during March over a 30-year period in Minneapolis-St. A half­normal plot is a probability plot of the wi9s. More spe-
Paul. cifically, since P( uZu # w) 5 P(2w # Z # w) 5
2F(w) 2 1, a half-normal plot is a plot of the
.77 1.20 3.00 1.62 2.81 2.48 (F21y{[(i 2 .5)yn 1 1]y2}, wi) pairs. The virtue of this
1.74 .47 3.09 1.31 1.87 .96 plot is that small or large outliers in the original sample will
.81 1.43 1.51 .32 1.18 1.89 now appear only at the upper end of the plot rather than at both
1.20 3.37 2.10 .59 1.35 .90 ends. Construct a half-normal plot for the following sample of
1.95 2.20 .52 .81 4.75 2.05 measurement errors, and comment: 23.78, 21.27, 1.44,
2.39, 12.38, 243.40, 1.15, 23.96, 22.34, 30.84.
a. Construct and interpret a normal probability plot for
this data set. 97. The following failure time observations (1000s of hours)
resulted from accelerated life testing of 16 integrated
b. Calculate the square root of each value and then
circuit chips of a certain type:
construct a normal probability plot based on this
transformed data. Does it seem plausible that the 82.8 11.6 359.5 502.5 307.8 179.7
square root of precipitation is normally distributed? 242.0 26.5 244.8 304.3 379.1 212.6
c. Repeat part (b) after transforming by cube roots. 229.9 558.9 366.7 204.6
95. Use a statistical software package to construct a normal Use the corresponding percentiles of the exponential
probability plot of the tensile ultimate-strength data distribution with l 5 1 to construct a probability plot.
given in Exercise 13 of Chapter 1, and comment. Then explain why the plot assesses the plausibility of
96. Let the ordered sample observations be denoted by the sample having been generated from any exponential
y1, y2, …, yn (y1 being the smallest and yn the largest). Our distribution.

SUPPLEMENTARY EXERCISES (98–128)

98. Let X 5 the time it takes a read/write head to locate a e. The expected length of the shorter segment when the
desired record on a computer disk memory device once break occurs.
the head has been positioned over the correct track. If
100. Let X denote the time to failure (in years) of a certain
the disks rotate once every 25 millisec, a reasonable
hydraulic component. Suppose the pdf of X is f(x) 5
assumption is that X is uniformly distributed on the
32y(x 1 4)3 for x , 0.
interval [0, 25].
a. Verify that f (x) is a legitimate pdf.
a. Compute P(10 # X # 20). b. Determine the cdf.
b. Compute P(X $ 10). c. Use the result of part (b) to calculate the probability
c. Obtain the cdf F(X). that time to failure is between 2 and 5 years.
d. Compute E(X) and sX. d. What is the expected time to failure?
e. If the component has a salvage value equal to
99. A 12-in. bar that is clamped at both ends is to be sub- 100y(4 1 x) when its time to failure is x, what is the
jected to an increasing amount of stress until it snaps. Let expected salvage value?
Y 5 the distance from the left end at which the break
occurs. Suppose Y has pdf 101. The completion time X for a certain task has cdf F(x)
given by

5 y 1 2 122
1 24 2 1
1 y
0 # y # 12

5
f(y) 5
0 x,0
0 otherwise x3
0#x,1
Compute the following: 3
a. The cdf of Y, and graph it.
b. P(Y # 4), P(Y . 6), and P(4 # Y # 6)
c. E(Y), E(Y2), and V(Y)
12 1
1 7
2 3
2x 2174 2 43x2 1#x#
7
3
d. The probability that the break point occurs more than 7
1 x.
2 in. from the expected break point. 3

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
194 Chapter 4 Continuous random Variables and probability Distributions

a. Obtain the pdf f (x) and sketch its graph. has elapsed or until the person reacts (whichever hap-
b. Compute P(.5 # X # 2). pens first). Determine the expected amount of time
c. Compute E(X). that the light remains lit. [Hint: Let h(X) 5 the time
that the light is on as a function of reaction time X.]
102. Let X represent the number of individuals who respond
to a particular online coupon offer. Suppose that X has 107. Let X denote the temperature at which a certain chemical
approximately a Weibull distribution with a 5 10 and reaction takes place. Suppose that X has pdf
b 5 20. Calculate the best possible approximation to 1
the probability that X is between 15 and 20, inclusive.
103. The article “Computer Assisted Net Weight Control”
(Quality Progress, 1983: 22–25) suggests a normal dis-
f (x) 5
5 9
(4 2 x 2)

0
21 # x # 2

otherwise

tribution with mean 137.2 oz and standard deviation a. Sketch the graph of f (x).
1.6 oz for the actual contents of jars of a certain type. The b. Determine the cdf and sketch it.
stated contents was 135 oz. c. Is 0 the median temperature at which the reaction
a. What is the probability that a single jar contains takes place? If not, is the median temperature smaller
more than the stated contents? or larger than 0?
b. Among ten randomly selected jars, what is the prob- d. Suppose this reaction is independently carried out
ability that at least eight contain more than the stated once in each of ten different labs and that the pdf of
contents? reaction time in each lab is as given. Let Y 5 the
c. Assuming that the mean remains at 137.2, to what number among the ten labs at which the temperature
value would the standard deviation have to be changed exceeds 1. What kind of distribution does Y have?
so that 95% of all jars contain more than the stated (Give the names and values of any parameters.)
contents? 108. An oocyte is a female germ cell involved in reproduc-
104. When circuit boards used in the manufacture of compact tion. Based on analyses of a large sample, the article
disc players are tested, the long-run percentage of defec- “Reproductive Traits of Pioneer Gastropod Species
tives is 5%. Suppose that a batch of 250 boards has been Colonizing Deep­Sea Hydrothermal Vents After an
received and that the condition of any particular board is Eruption” (Marine Biology, 2011: 181–192) proposed
independent of that of any other board. the following mixture of normal distributions as a model
a. What is the approximate probability that at least 10% for the distribution of X 5 oocyte diameter (mm):
of the boards in the batch are defective? f (x) 5 pf1(x; m1, s) 1 (1 2 p) f2(x; m2, s)
b. What is the approximate probability that there are
exactly 10 defectives in the batch? where f1 and f2 are normal pdfs. Suggested parameter
values were p 5 .35, m1 = 4.4, m2 = 5.0, and s = .27.
105. Exercise 38 introduced two machines that produce wine
a. What is the expected (i.e. mean) value of oocyte
corks, the first one having a normal diameter distribution
diameter?
with mean value 3 cm and standard deviation .1 cm, and
the second having a normal diameter distribution with b. What is the probability that oocyte diameter is
mean value 3.04 cm and standard deviation .02 cm. between 4.4 mm and 5.0 mm? [Hint: Write an
Acceptable corks have diameters between 2.9 and 3.1 cm. expression for the corresponding integral, carry the
If 60% of all corks used come from the first machine and integral operation through to the two components,
a randomly selected cork is found to be acceptable, what and then use the fact that each component is a nor-
is the probability that it was produced by the first mal pdf.]
machine? c. What is the probability that oocyte diameter is
smaller than its mean value? What does this imply
106. The reaction time (in seconds) to a certain stimulus is a about the shape of the density curve?
continuous random variable with pdf
109. The article “The Prediction of Corrosion by Statistical

5
3 1 Analysis of Corrosion Profiles” (Corrosion Science,
? 1#x#3
f(x) 5 2 x2 1985: 305–315) suggests the following cdf for the depth
0 otherwise X of the deepest pit in an experiment involving the expo-
sure of carbon manganese steel to acidified seawater.
a. Obtain the cdf.
F(x; a, b) 5 e2e
2(x2a)yb

b. What is the probability that reaction time is at most 2` , x , `


2.5 sec? Between 1.5 and 2.5 sec? The authors propose the values a 5 150 and b 5 90.
c. Compute the expected reaction time. Assume this to be the correct model.
d. Compute the standard deviation of reaction time. a. What is the probability that the depth of the deepest
e. If an individual takes more than 1.5 sec to react, a light pit is at most 150? At most 300? Between 150 and
comes on and stays on either until one further second 300?

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Supplementary exercises 195

b. Below what value will the depth of the maximum pit b. Obtain the cdf of X and sketch it.
be observed in 90% of all such experiments? c. Compute P(X # 0), P(X # 2), P(21 # X # 2), and
c. What is the density function of X? the probability that an error of more than 2 miles is
d. The density function can be shown to be unimodal (a made.
single peak). Above what value on the measurement 113. The article “Statistical Behavior Modeling for Driver­
axis does this peak occur? (This value is the mode.) Adaptive Precrash Systems” (IEEE Trans. on Intelligent
e. It can be shown that E(X) < .5772b 1 a. What is the Transp. Systems, 2013: 1–9) proposed the following mix-
mean for the given values of a and b, and how does ture of two exponential distributions for modeling the
it compare to the median and mode? Sketch the graph behavior of what the authors called “the criticality level of
of the density function. [Note: This is called the larg- a situation” X.
est extreme value distribution.]
110. Let t = the amount of sales tax a retailer owes the govern-
ment for a certain period. The article “Statistical
f (x; l1, l2, p) 5 5 pl1e2l1x 1 (1 2 p)l2e2l2x
0
x$0
otherwise
Sampling in Tax Audits” (Statistics and the Law, 2008:
320–343) proposes modeling the uncertainty in t by This is often called the hyperexponential or mixed expo-
regarding it as a normally distributed random variable nential distribution. This distribution is also proposed
with mean value m and standard deviation s (in the arti- as a model for rainfall amount in “Modeling Monsoon
Affected Rainfall of Pakistan by Point Processes” (J. of
cle, these two parameters are estimated from the results of
Water Resources Planning and Mgmnt., 1992: 671–688).
a tax audit involving n sampled transactions). If a
represents the amount the retailer is assessed, then an a. Determine E(X) and V(X). Hint: For X distributed
under-assessment results if t . a and an over-assessment exponentially, E(X) 5 1/l and V(X) 5 1/l2; what
results if a . t. The proposed penalty (i.e., loss) function does this imply about E(X2)?
for over- or under-assessment is L(a, t) 5 t 2 a if t . a b. Determine the cdf of X.
and 5 k(a 2 t) if t # a (k . 1 is suggested to incorporate c. If p 5 .5, l1 5 40, and l2 5 200 (values of the l’s
the idea that over-assessment is more serious than suggested in the cited article), calculate P(X . .01).
under-assessment). d. For the parameter values given in (c), what is the
a. Show that a*5 m 1 sF21(1y(k 1 1)) is the value of probability that X is within one standard deviation of
a that minimizes the expected loss, where F21 is the its mean value?
inverse function of the standard normal cdf. e. The coefficient of variation of a random variable (or
b. If k 5 2 (suggested in the article), m = $100,000, and distribution) is CV 5 sym. What is CV for an expo-
s 5 $10,000, what is the optimal value of a, and what nential rv? What can you say about the value of CV
is the resulting probability of over-assessment? when X has a hyperexponential distribution?
111. The mode of a continuous distribution is the value x* that f. What is CV for an Erlang distribution with parameters
maximizes f (x). l and n as defined in Exercise 68? [Note: In applied
a. What is the mode of a normal distribution with work, the sample CV is used to decide which of the
parameters m and s? three distributions might be appropriate.]
b. Does the uniform distribution with parameters A and 114. Suppose a particular state allows individuals filing tax
B have a single mode? Why or why not? returns to itemize deductions only if the total of all item-
c. What is the mode of an exponential distribution with ized deductions is at least $5000. Let X (in 1000s of dol-
parameter l? (Draw a picture.) lars) be the total of itemized deductions on a randomly
d. If X has a gamma distribution with parameters a and chosen form. Assume that X has the pdf

5kyx0
b, and a . 1, find the mode. [Hint: ln[f(x)] will be a
x$5
maximized iff f(x) is, and it may be simpler to take the f(x; a) 5
otherwise
derivative of ln[f(x)].]
e. What is the mode of a chi-squared distribution having a. Find the value of k. What restriction on a is neces-
n degrees of freedom? sary?
b. What is the cdf of X?
112. The article “Error Distribution in Navigation” (J. of the
Institute of Navigation, 1971: 429–442) suggests that the c. What is the expected total deduction on a randomly
frequency distribution of positive errors (magnitudes of chosen form? What restriction on a is necessary for
errors) is well approximated by an exponential distribution. E(X) to be finite?
Let X 5 the lateral position error (nautical miles), which d. Show that ln(X/5) has an exponential distribution with
can be either negative or positive. Suppose the pdf of X is parameter a 2 1.
f(x) 5 (.1)e2.2|x| 2` , x , ` 115. Let Ii be the input current to a transistor and I0 be the
output current. Then the current gain is proportional to
a. Sketch a graph of f(x) and verify that f(x) is a legiti- ln(I0yIi). Suppose the constant of proportionality is 1
mate pdf (show that it integrates to 1).

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
196 Chapter 4 Continuous random Variables and probability Distributions

(which amounts to choosing a particular unit of measure- variance s2. It can then be shown that the pdf of the dis-
ment), so that current gain 5 X 5 ln(I0 /Ii). Assume X is tance V from the target to the landing point is
normally distributed with m 5 1 and s 5 .05. v
f(v) 5 2 ? e2v y2s v . 0
2 2

a. What type of distribution does the ratio I0 /Ii have? s


b. What is the probability that the output current is a. This pdf is a member of what family introduced in
more than twice the input current? this chapter?
c. What are the expected value and variance of the ratio b. If s 5 20 mm (close to the value suggested in the
of output to input current? paper), what is the probability that a dart will land
116. The article “Response of SiCf/Si3N4 Composites Under within 25 mm (roughly 1 in.) of the target?
Static and Cyclic Loading—An Experimental and 121. The article “Three Sisters Give Birth on the Same
Statistical Analysis” (J. of Engr. Materials and Day” (Chance, Spring 2001, 23–25) used the fact that
Technology, 1997: 186–193) suggests that tensile strength three Utah sisters had all given birth on March 11, 1998
(MPa) of composites under specified conditions can be as a basis for posing some interesting questions regard-
modeled by a Weibull distribution with a 5 9 and ing birth coincidences.
b 5 180. a. Disregarding leap year and assuming that the other
a. Sketch a graph of the density function. 365 days are equally likely, what is the probability
b. What is the probability that the strength of a ran- that three randomly selected births all occur on
domly selected specimen will exceed 175? Will be March 11? Be sure to indicate what, if any, extra
between 150 and 175? assumptions you are making.
c. If two randomly selected specimens are chosen and b. With the assumptions used in part (a), what is the
their strengths are independent of one another, what probability that three randomly selected births all
is the probability that at least one has a strength occur on the same day?
between 150 and 175? c. The author suggested that, based on extensive data,
d. What strength value separates the weakest 10% of all the length of gestation (time between conception and
specimens from the remaining 90%? birth) could be modeled as having a normal distribu-
117. Let Z have a standard normal distribution and define a tion with mean value 280 days and standard devia-
new rv Y by Y 5 sZ 1 m. Show that Y has a normal tion 19.88 days. The due dates for the three Utah
distribution with parameters m and s. [Hint: Y # y iff sisters were March 15, April 1, and April 4, respec-
Z # ? Use this to find the cdf of Y and then differentiate tively. Assuming that all three due dates are at the
it with respect to y.] mean of the distribution, what is the probability that
all births occurred on March 11? [Hint: The devia-
118. a. Suppose the lifetime X of a component, when mea- tion of birth date from due date is normally distrib-
sured in hours, has a gamma distribution with param- uted with mean 0.]
eters a and b. Let Y 5 the lifetime measured in min- d. Explain how you would use the information in part (c)
utes. Derive the pdf of Y. [Hint: Y # y iff X # y/60. to calculate the probability of a common birth date.
Use this to obtain the cdf of Y and then differentiate to
obtain the pdf.] 122. Let X denote the lifetime of a component, with f (x) and
F(x) the pdf and cdf of X. The probability that the com-
b. If X has a gamma distribution with parameters a and
ponent fails in the interval (x, x 1 Dx) is approximately
b, what is the probability distribution of Y 5 cX?
f (x) ? Dx. The conditional probability that it fails in
119. In Exercises 117 and 118, as well as many other situa- (x, x 1 Dx) given that it has lasted at least x is
tions, one has the pdf f(x) of X and wishes to know the pdf f (x) ? Dx /[1 2 F(x)]. Dividing this by Dx produces the
of y 5 h(X). Assume that h( ? ) is an invertible function, failure rate function:
so that y 5 h(x) can be solved for x to yield x 5 k(y). f(x)
Then it can be shown that the pdf of Y is r(x) 5
1 2 F(x)
g(y) 5 f [k(y)] ? u k9(y)u An increasing failure rate function indicates that older
components are increasingly likely to wear out, whereas
a. If X has a uniform distribution with A 5 0 and
a decreasing failure rate is evidence of increasing reli-
B 5 1, derive the pdf of Y 5 2ln(X).
ability with age. In practice, a “bathtub-shaped” failure
b. Work Exercise 117, using this result.
is often assumed.
c. Work Exercise 118(b), using this result. a. If X is exponentially distributed, what is r(x)?
120. Based on data from a dart-throwing experiment, the arti- b. If X has a Weibull distribution with parameters a and
cle “Shooting Darts” (Chance, Summer 1997, 16–19) b, what is r(x)? For what parameter values will r(x)
proposed that the horizontal and vertical errors from aim- be increasing? For what parameter values will r(x)
ing at a point target should be independent of one another, decrease with x?
each with a normal distribution having mean 0 and c. Since r(x) 5 2(d /dx)ln[1 2 F(x)], ln[1 2 F(x)] 5
2#r(x)dx. Suppose

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Bibliography 197

51 2
x
a 12 0#x#b
r(x) 5 b
0 otherwise

so that if a component lasts b hours, it will last forever Tangent


(while seemingly unreasonable, this model can be used to line
study just “initial wearout”). What are the cdf and pdf of X?
123. Let U have a uniform distribution on the interval [0, 1]. x
Then observed values having this distribution can be
obtained from a computer’s random number generator. 126. Let X have a Weibull distribution with parameters a 5 2
Let X 5 2(1/l)ln(1 2 U). and b. Show that Y 5 2X2/b2 has a chi-squared distribu-
a. Show that X has an exponential distribution with tion with n 5 2. [Hint: The cdf of Y is P(Y # y); express
parameter l. [Hint: The cdf of X is F(x) 5 P(X # x); this probability in the form P(X # g(y)), use the fact that
X # x is equivalent to U # ?] X has a cdf of the form in Expression (4.12), and differ-
b. How would you use part (a) and a random number entiate with respect to y to obtain the pdf of Y.]
generator to obtain observed values from an expo- 127. An individual’s credit score is a number calculated based
nential distribution with parameter l 5 10? on that person’s credit history that helps a lender deter-
124. Consider an rv X with mean m and standard deviation s, and mine how much he/she should be loaned or what credit
let g(X) be a specified function of X. The first-order Taylor limit should be established for a credit card. An article in
series approximation to g(X) in the neighborhood of m is the Los Angeles Times gave data which suggested that a
beta distribution with parameters A 5 150, B 5 850,
g(X) < g(m) 1 g9(m) ? (X 2 m) a 5 8, b 5 2 would provide a reasonable approximation
The right-hand side of this equation is a linear function to the distribution of American credit scores. [Note:
of X. If the distribution of X is concentrated in an inter- credit scores are integer-valued].
val over which g(?) is approximately linear [e.g., Ïx is a. Let X represent a randomly selected American credit
approximately linear in (1, 2)], then the equation yields score. What are the mean value and standard deviation
approximations to E(g(X)) and V(g(X)). of this random variable? What is the probability that X
a. Give expressions for these approximations. [Hint: is within 1 standard deviation of its mean value?
Use rules of expected value and variance for a linear b. What is the approximate probability that a randomly
function aX 1 b.] selected score will exceed 750 (which lenders con-
b. If the voltage v across a medium is fixed but current sider a very good score)?
I is random, then resistance will also be a random 128. Let V denote rainfall volume and W denote runoff volume
variable related to I by R 5 v/I. If mI 5 20 and (both in mm). According to the article “Runoff Quality
sI 5 .5, calculate approximations to mR and sR. Analysis of Urban Catchments with Analytical
125. A function g(x) is convex if the chord connecting any two Probability Models” (J. of Water Resource Planning
points on the function’s graph lies above the graph. and Management, 2006: 4–14), the runoff volume will
When g(x) is differentiable, an equivalent condition is be 0 if V # nd and will be k(V 2 nd) if V . nd. Here nd is
that for every x, the tangent line at x lies entirely on or the volume of depression storage (a constant), and k (also
below the graph. (See the figure below.) How does a constant) is the runoff coefficient. The cited article pro-
g(m) 5 g(E(X)) compare to E(g(X))? [Hint: The equation poses an exponential distribution with parameter l for V.
of the tangent line at x 5 m is y 5 g(m) 1 g9(m) ? (x 2 m). a. Obtain an expression for the cdf of W. [Note: W is
Use the condition of convexity, substitute X for x, and neither purely continuous nor purely discrete; instead
take expected values. [Note: Unless g(x) is linear, the it has a “mixed” distribution with a discrete compo-
resulting inequality (usually called Jensen’s inequality) nent at 0 and is continuous for values w . 0.]
is strict (, rather than # ); it is valid for both continu- b. What is the pdf of W for w . 0? Use this to obtain an
ous and discrete rv’s.] expression for the expected value of runoff volume.

BIBlIOgRaphy
Bury, Karl, Statistical Distributions in Engineering, Cambridge Nelson, Wayne, Applied Life Data Analysis, Wiley, New York,
Univ. Press, Cambridge, England, 1999. A readable and 1982. Gives a comprehensive discussion of distributions
informative survey of distributions and their properties. and methods that are used in the analysis of lifetime data.
Johnson, Norman, Samuel Kotz, and N. Balakrishnan, Olkin, Ingram, Cyrus Derman, and Leon Gleser, Probability
Continuous Univariate Distributions, vols. 1–2, Wiley, Models and Applications (2nd ed.), Macmillan, New York,
New York, 1994. These two volumes together present an 1994. Good coverage of general properties and specific
exhaustive survey of various continuous distributions. distributions.

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

You might also like