Englne E Ring: Applpcalon
Englne E Ring: Applpcalon
ENGLNE E RING
WIT =
APPLPCALON;
TO RELIABILITY
LAVON B. PAGE? |
PROBABILITY FOR ENGINEERING
with Applications to Reliability
ELECTRICAL ENGINEERING, COMMUNICATIONS,
AND SIGNAL PROCESSING
Raymond L. Pickholtz, Series Editor
Lavon B. Page
North Carolina State University
fe 2 Ese ss ee
a S2aea ae of 5 - 2
ean oa =
_ ‘ =
my =e i 2 =. z a
x o>
4 a =< F tt we = me i a * Se ee L 3
ot s ——— ae Se a
o uP x — : 4 .
4 i —- - +
eI = a ile Lar
= eee a ae rs
a, paeee
a z 2a - _
“ =
om = Ef sn - ee a a
wd os : a
a Zs
———s : a 2
» q = >)
oe coh Pw Av
; ,
Sta Ge payee eee ae
for Jo Ellen—
, . & , I
4 Peg
e a eee
8 .
yy Te
‘ ‘ = iy
Seana ; -
a
; = ca - | -
re na ra i « i
a
ca Aree “ay
oe Ranch can ye. FaGipe: dl
F = in
aS
Contents
Chapter 2: Applications
rea Circuits
aie Networks
2.3 Case Study: Two Network Reliability Problems
2.4 Fault Trees
Problems
Vii
Vili Contents
References 213
Index 230
Preface
This book is an attempt to merge the best of two worlds. It brings precise
mathematical language and terminology into an arena of modern engineering
reliability probems. These include such topics as simple analysis of circuits, design
of redundancy into systems, reliability of communication networks, mean time to
failure, time-dependent system reliability, and uses of probability in recursive
solutions to real-world problems.
All books reflect the point of view of the author. Texts on probability written
by engineers usually move as quickly as possible to the applications of most interest
to the writer. The mathematical foundations are presented casually and quickly, if at
all, and understanding is gleaned from examples rather than from definitions and
theorems. Mathematicians, on the other hand, tend to write probability books that
dwell on topics such as combinatorics, labeling problems, allocation schemes, and
limit theorems. Such books present little that seems relevant to the world of an
undergraduate engineering student.
In teaching probability to engineering students for more than a decade, I have
seen the difficulties caused by both of these kinds of textbooks. Students with
good intuition are often handicapped from never having come to grips with the basic
concepts. On the other hand, the importance of fundamental concepts escapes the
student unless some application is in sight. For this reason, significant applications
appear earlyin this text. Chapter 2, for example, illustrates how the idea of
conditional probability can be blended with algorithmic problem solving to develop
tools for reliability analysis of complex systems such as circuits, communication
networks, and chemical reactors. These contemporary problems mix probability
and discrete mathematics, and they require a solid understanding of the basics. But
the student is rewarded with a sense of relevance that isn’t matched by drawing
balls out of an urn.
The essentials of an introduction to probability are found in Chapters 1, 3, 4,
the first three sections of Chapter 5, and the first four sections of Chapter 6. The
remaining sections of Chapter 6 introduce conditional density functions, conditional
expectation, and the central limit theorem. (The De Moivre-Laplace version of the
central limit theorem appears in Chapter 5.) Chapter 7 gives a brief introduction to
stochastic processes, with heavy emphasis on the Poisson process. Chapter 2
x Preface
Lavon B. Page
September 1988
PROBABILITY FOR ENGINEERING
with Applications to Reliability
ca seneOR
LA fired
x
ar! Oe
. 7
Ore i¢ Pecake
shcong
ies ‘is
aie
Peal :
> Basse hsThis)
= a)
ue
iu
me, Ria
- hi -
*
Pa)
4 we
: »
‘:
a -
wire)
es,
Chapter 1: The Basics
In January 1986 the space shuttle Challenger exploded in midair. The space
shuttle had previously been considered so safe by NASA that plans were afoot to
send plutonium powered modules into orbit. The previous year an accident at the
Union Carbide plant in Bhopal, India, had killed thousands of people in what was
at the time the worst industrial accident in history. A few months later, the Soviet
reactor at Chernobyl was to run amok and burn out of control for days, while
spewing radioactivity into the atmosphere. The Soviets had thou ght the probability
of such a major accident to be extremely low.
As long as such accidents continue, and there is every indication that they
will, there will be a lot of interest in trying to determine the reliability of things like
space vehicles, nuclear and chemical reactors, and such ordinary devices as
automobiles, garage door openers, or heating systems. Skepticism now greets
claims that something or other is “less likely to happen to you than being hit by
lightning,” or has “only 1 chance in 10,000 of occurring in the next 50 years.”
Often such claims have been based more on hope than on science.
Estimates of the reliability of equipment or complex systems depend heavily
on the field of mathematics known as probability. Probability can be abused, just
as can most tools. The best mathematical model can’t produce true answers if
incorrect or naive assumptions are fed into it. Even at a fairly elementary level,
however, probability opens the door to the investigation of complex systems and
situations. If we want to answer such questions as “What were the chances of that
happening?” or “How much do we expect to gain if we make that decision?”, the
answer will have to be expressed in the language of probability. The purpose of
this book is to present the basics of that language and to show its application to a
variety of meaningful examples, with an emphasis on the idea of reliability.
Interest in probability blossomed around the gambling tables of Europe
hundreds of years ago, though much earlier references can be found in Hebrew and
Chinese. Many games can be analyzed by looking at the possible outcomes of an
experiment, such as rolling a pair of dice or dealing some cards from a deck.
Frequently something about the situation suggests that the various possible
@ Chapter 1: The Basics
outcomes should be considered equally likely. For example, the symmetric shape
of a six-sided die suggests that the six outcomes are equally probable, and the
purpose in shuffling a deck of cards before dealing is to try to approximate a
situation in which one arrangement of the deck is just as likely as another.
The concept of equally likely outcomes leads to a natural concept of
probability. For example, since there are 13 hearts in a deck of cards, there are 13
chances out of 52 that an arbitrary card drawn from a deck will be a heart.
Considering the ratio of these two numbers gives the intuitively satisfying
conclusion that the probability of drawing a heart when a card is drawn from a deck
should be 13/52, or 1/4.
A more skeptical person might argue that the only way to test probabilities is
to experiment. For example, given a coin of unknown characteristics, the only way
to determine the probability of the coin coming up heads is to toss it many times and
see what happens. A mathematician might look at the situation in this way: If H,,
is the number of heads obtained in the first n tosses of a sequence of tosses, the
probability of heads might be taken as
ah
noo N
Of course there’s no way actually to toss a real coin an infinite number of times to
evaluate the limit, but the intuitive idea is that such a limit ought to exist and should
define whatever it is we mean by the probability of the coin coming up heads.
A third way that probabilities are tossed about in everyday conversation
involves subjective considerations. Someone might say, for example, “Notre Dame
is a 2 to 1 favorite to beat Michigan.” This statement has a clear meaning as far as
probabilities go. The speaker is saying that Notre Dame’s chances of winning are 2
chances out of 3, which means a probability of 2/3. Such a statement is the
speaker’s quantitative pronouncement of his or her or somebody’s opinion on the
matter. Another such illustration is a weather forecaster announcing a 30% chance
of rain. Presumably such a statement would be based on existing weather data and
would not be purely subjective. It may be, however, that some kind of subjective
guesswork went into building the weather model from which the 30% figure is
obtained.
A mathematically useful treatment of probability must lay a common
groundwork so that everyone is speaking the same language. Much of this
groundwork consists of the elementary language of sets and set operations.
1.1 Sets and Set Operations 3
complement of A = {x:x>1} 1s
Ao= x ess
The difference of two sets, A —B, is defined as A M BS.
Some of the important elementary laws governing the set operations are as
follows:
Figure 1.1 Venn diagram representing the set Aq (BU C) = (AN B)U(AN ©).
1.1 Sets and Set Operations 5
Example 1.1. A group of 9 men and 7 women are administered a test for
high blood pressure. Among the men, 4 are found to have high blood pressure,
whereas 2 of the women have high blood pressure. Use a Venn diagram to
illustrate this data.
Solution:
The circle labeled H represents the 6 people having high blood pressure, and
the circle labeled W represents the 7 women. The numbers placed in the various
regions indicate how many people there are in the category corresponding to the
region. For example, there are 4 people who have high blood pressure and who are
not women. Such people are in set H but not in set W; that is, they belong to the
set HO W*°. The number 5 in the lower right corner indicates the number of men
without high blood pressure.
The decision to use circles to represent “high blood pressure” and “women”
was quite arbitrary. We could just as well use circles for “low blood pressure” and
Sinem. a(seebropleny 1225,)
Example 1.2. If A, B, and C are sets, draw a Venn diagram and shade the
region corresponding to the set (A UB‘) NC.
Example 1.3. If A,B, and C are the following sets of characters, then
determine the set (A UB‘) AC. Here we will consider the universal set to be
all 26 letters of the alphabet.
A={
B= {d,
9. 0, t}
C = {d, a, g}
Example 1.4. Ifa coin is tossed, the sample space could be taken to be the
set S = {H, T}. If an ordinary six-sided die is rolled, the sample space could be
taken to. be. the set.S = (1, 273. 4) S46).
1.2 The Sample Space ri
Example 1.5. A card is drawn from a standard deck of 52. Here one could
take the sample space S to be
S = (2%, 24,24, 24, 3m%,3¢,---,Ka, Am, Ao, Av, Aa}
A simpler convention would be to agree to think of the cards as being identified
with the numbers 1,2, - -- , 52 and to simply think of S as consisting of these 52
numbers. It is important to realize that the particular bookkeeping scheme used is
not very important compared to the conceptual understanding of what kind of set is
an appropriate model. Whatever notation is used here, the sample space is a set of
52 elements.
Example 1.6. Suppose a simple electric circuit has two components, say A
and B. Either component can be “good” or “bad” in the sense that the component
may or may not be in working order. If we are interested in all possible states of
the circuit, the sample space used could be S = {GG, GB, BG, BB}, where the
convention might be that “GB,” for example, means that component A is good and
component B is bad.
Example 1.7. A pair of dice is rolled. Let’s refer to them as “red” and
“green.” An appropriate choice for the sample space for this experiment would be
ine set: S_={(1, 1), -G, 2), 3)- > 2,)46-5)> (6, 6)} swhere for instance, swe
might agree that (3, 5) represents the outcome of 3 on red and 5 on green. The set
S contains 36 elements since either die can come up 6 different ways and 6 x 6 is
36. (There’s a basic underlying principle here that some elementary texts call the
multiplication principle. When one task can be performed in m different ways and
another task can be performed in n different ways, then the number of different
ways of performing the two operations together is mn.)
In playing many games, one is not interested in the individual numbers that
appear on the dice, but rather in the sum. In this case it might be tempting to take
the sample space to be S = {2, 3, 4,---, 11,12}. This is not necessarily wrong,
but it does sacrifice information. For example, if this sample space is used, one can
no longer answer questions such as “Did the red die show an even number?” The
result on the red die is not even being recorded. Another reason for caution is that
the outcomes of this set of possible sums are not equally likely. For example, a
sum of 2 occurs only if both dice show the number 1, whereas a sum of 7 occurs in
six different ways. It is often necessary to consider sample spaces in which
individual outcomes don’t all have the same probability. One needs, however, to
8 Chapter 1: The Basics
be aware when this is the situation. A frequent naive mistake is to assume that
outcomes are equally likely when they are not. It is common to refer to a sample
space in which all the elements are considered equally probable as a uniform
sample space.
Example 1.8. When binary data is transmittéd, the output can be thought of
as a string of 0’s and 1’s. (In electrical transmission, a voltage above a certain level
could be defined as 1 and below a certain level as 0.) If 4 bits are transmitted, what
would be an appropriate sample space to represent the possibilities?
Solution: The logical choice would be to take the sample space to be all
ordered quadruples of binary digits. In other words, S = {0000, 0001, 0010,
0011, 0100, 0101, 0110, 0111, 1000, 1001, 1010, 1011, 1100, 1101, 1110,
1111}. These 16 outcomes are simply the numbers from 0 to 15 written in binary
form. There are 16 elements of the sample space because 24 = 16. Eight bits
(commonly referred to as a byte) can represent 2° = 256 different possibilities.
This is equivalent to saying that if 8 bits are transmitted, then the sample space for
all possible outcomes has 256 elements.
The intuitive basis for all three axioms should be obvious. When we say that
something has probability 1/10, we mean that there is one chance in 10 that it will
occur. A negative probability has no conceivable meaning. Similarly, a probability
of 1 represents absolute certainty, and probabilities greater than 1 would be
meaningless. The last axiom is very important in that it allows the probability of
events (in the case of a finite sample space) to be computed in terms of the
probabilities of the individual elements that make up the events. This will be
demonstrated shortly. (See Equation 1.1.)
In other words, the probability of any (finite) event may be computed by simply
adding up the probabilities of the individual elements of the event.
Later in this book we will see that probabilities of events often must be
approached from a different perspective when the sample space is infinite. The idea
of computing probabilities of all events via a sum, as in Equation 1.1, must be
abandoned. In Chapter 5 we will see that in “continuous” models, integration takes
the place of summation.
Continuous models arise, for instance, when measurements are being made
on some kind of continuous scale and one wishes to think of an interval of possible
values. For example, think of the experiment of “selecting a random number
between 0 and 1.” Clearly it would be intuitively satisfying to think that the
probability is 1/2 that the number should come from the subinterval [0, 1/2], or 1/5
that the number chosen should be in the interval [3/5, 4/5]. In fact, we would like
to know that the probability of the number falling in any particular subinterval is
simply the length of the subinterval. Simple as this situation sounds, the actual
demonstration that there is a probability measure on the interval [0, 1] that has these
1.4 Conditional Probability 11
pleasant properties was a milestone in mathematics around the turn of the century
and made possible the field of study known as real variables. An interesting
consequence of the Axioms | through 3 is that such a probability measure must
necessarily assign probability zero to every individual number. To understand
why, simply think of a number x sitting in the interval Bs = [x — 6, x + 8]. The
length of this interval is 25. But in light of Property 3, P({x}) < P(Bs) for
every 6 > 0, and so P({x}) must be zero. (To say that an event has probability
zero is not to say that the event is impossible. For instance, if a random number is
selected between zero and one, the probability that the number would be 1/2 is zero
as just indicated. Yet it is not impossible for the number selected to be 1/2.)
For the sake of honesty, it is advisable to admit at this point that there are still
other pathological things that occur with infinite sample spaces. With finite sample
spaces, an event is simply a subset of the sample space. With infinite sample
spaces, it is not always possible to consider every subset of the sample space to be
an event. The problem is that for infinite sample spaces it simply isn’t usually
possible to have our probability measures defined for all subsets of the sample
space and still to satisfy the desired axioms. You needn’t lose sleep over this,
however, because in all the frequently encountered sample spaces, the subsets that
must be avoided are ones that you would never encounter anyway because they
cannot be expressed in terms of elementary sets. In other words, you can ignore
this conceptual difficulty and never run into computational problems because of it.
Definition 1.1: If A and B are events in a sample space and P(A) #0, then
the conditional probability of B, given A, is denoted by P(B |A) and is defined
by
P(B OA)
SS ae
Example 1.9. A card is drawn from a 52 card deck. Let A be the event that
the card is black and B the event that the card is a spade. Thus AM B = B, so
P(A OB) = 13/52, and P(A) = 26/52. This means that
1.4 Conditional Probability 13
13/52 1
P(B|A) = 26/52 2
Example 1.10. A pair of dice is rolled (one red and one green). The sample
space is as in Example 1.7. Let A be the event that the sum on the two dice is 9,
and let B be the event that the red die shows the number 5. Then A 7B contains
the single outcome (5, 4) and has probability 1/36, and P(A) = 4/36 because A =
{(3, 6), (4, 5), (5, 4), (6, 3)}. Thus, P(B 1A) = 1/36 + 4/36 = 1/4. This is
intuitively correct, for knowing that the sum is 9 guarantees that the red die must
show one of the four numbers 3, 4, 5, or 6.
Example 1.11. In the sample space of Example 1.5, let B denote the spades
and now let C denote the aces. Then P(B ™C) = 1/52 since there is only one
card that is the ace of spades, and so P(B | C) = 1/52 + 4/52 = 1/4. Notice that in
this case P(B | C) = P(B). It can be easily verified that in this example it is also
true that P(C |B) =P(C). These two equations indicate that the knowledge that
one of the events occurs does not affect the probability of the other occurring. Two
events that have this relation to each other are called independent events (or
Statistically independent events). The concept of independence is very important in
mathematical models that involve probability. In many situations it will be apparent
for one reason or another that two events can have no possible effect on one
another, and in the mathematical model this leads to an assumption that the events
are independent.
Example 1.12. Two bits of binary data are transferred. Let’s consider the
sample space to be S = {00, 01, 10, 11}. So 01 represents the case in which a 0 is
transmitted first and a 1 is transmitted as the second bit. We will consider the
transmission to be random in the sense that each of these arrangements has
probability 1/4.
Now let’s consider the following two events:
A = {10, 11}
B= "(0134 Lb}
Event A can easily be described as the event “the first digit transmitted is the
digit 1,” andB as the event “the second digit is the digit 1.”
Notice that P(B) = 1/2, and furthermore that P(B |A) = 1/2. (Why?) The
fact that P(B) = 1/2 says that the probability that the second digit is a 1 is equal to
14 Chapter 1: The Basics
1/2. That P(B |A) = 1/2 says that if we know that the first digit is a 1, then the
conditional probability (based on this information) that the second digit is a 1 is
still 1/2. This is another illustration of independence, the concept introduced in
Definition 1.2 below. Problem 1.28 asks you to show that the three conditions are
in fact equivalent (that is, if one of them is true, then all of them have to be true).
This idea can be extended to compute the occurrence of more than two events. For
example,
P(ANBOC)=P(A)
P(BIA) P(CIAQB) (13)
Now for some examples that illustrate the use of these simple but useful little
formulas.
Example 1.13. A box contains 10 transistors, of which 7 are good and 3 are
defective. If 2 transistors are randomly taken from the box, what is the probability
that both are good?
Solution: Think of the transistors as being drawn one at a time. (This just
means that we're going to label them “first” and “second.” Whether they are
actually drawn one at a time or simultaneously doesn’t matter in the least. You
should convince yourself that this little mental trick is legitimate.) Let A be the
event that the first is good, and B the event that the second is good. Then the
probability that both are good is
1.4 Conditional Probability 15
et sG ¥
P(A MB)=P(A) P(B|\A)= T0'O = 15
The reason that P(B |A) = 6/9 is that knowing that the first one is good means
that there are 6 good left among the 9 possible ones that might be chosen second.
In a similar fashion, the probability that all would be good if 3 were selected
could be computed as
TORI ae
10 oS
The last factor, 5/8, is the conditional probability that the third would be good given
that the first two are both good.
Example 1.14. Consider the experiment of rolling a red and a green die.
Let A be the event that the red die shows a 5, and B be the event that the green die
shows a 3. Most people’s intuition says that the two dice should not exercise any
control over each other; that is, the outcome on one die should be “independent” of
the outcome on the other. If the sample space of Example 1.7 is used, then
puso. b-=-{ (1-3), C73)4G673), 44, 3) 6e3)6, 3))}s cand so -P(B) =-6/86.
Thus,
136 _ 1
Ee(AdkB) = 636 ars = P(A)
This shows that A and B are independent according to the Definition 1.2,
just as our intuition would indicate. (A somewhat subtle but important point should
be made here. One cannot prove mathematically whether two real-world dice
behave independently or not. All we are illustrating is that if the sample space
introduced in Example 1.7 is used to model a pair of dice, with the assumption that
the outcomes are equally likely, then the theoretical dice of the model behave
independently.)
Solution: Let B denote the event that the randomly selected item is good,
and let A, and A, be the events that it comes from plants 1 and 2 respectively.
Then P(B) = P(B 1 A,) + P(BO Az) because A; and Ag are disjoint
events and every element of B must be in either A; or Aj. But then
P(B) = P(BOA,) + P(BOA)).
P(A,) P(B1A,) + P(A2) P(B 1A2)
(.4)(.95) + (.6)(.9) = .92
(We are using Equation 1.2 here. An alternative is to use a tree diagram for this
computation. Tree diagrams are explained in Section 1.5.)
Solution: Let S = smokers and L = people who develop lung cancer. The
question is what is P(S|1L). By definition P(S|L)= P(S ML)/P(L).
However,
P(S QL) =P(S) P(L 1S) = (.3)(.03) = .009
Moreover,
P(L) = P(SSQL)+P(S° OL) since SAL and S° OL are disjoint
=-P(S) PLS) + PS Peis
= (.3)(.03) + (.7)(.005) = .009 + .0035 = .0125.
Thus P(S1L) = .009/.0125 =.72.
Bayes’ theorem comes into play whenever the elements of the sample space
under consideration are divided up into mutually exclusive categories. Specifically,
suppose that S is the sample space and that A,, A>,---,A, are events in S$
having the property that A,,A>,---,A, are disjoint and that A; U---UA,
= S$. (This means that each element of the sample space S belongs to exactly one
of the events A,, Ay,---,A,, and for that reason such a collection of sets is
often referred to as a partition of the sample space.) Then for any eventBCS,
the additivity property of probabilities implies that
P(B) = P(BOA,) +---+P(BOA,)
If each of the terms on the right is rewritten using the definition of conditional
probability, the first equation in Proposition 1.1 is obtained (the multiplicative law).
Bayes’ theorem itself is an immediate consequence of the multiplicative law. The
definition of conditional probability says that
P(A, OB)
P(A, |B) = pear
If the definition of conditional probability is used now to rewrite the numerator of
this fraction, and if the multiplicative law is used to replace P(B) in the
denominator by the sum on the right side of Equation 1 in Proposition 1.1, then
Equation 2 (Bayes’ theorem) is obtained.
P(A,) P(BIA
P(A, |B) = Bal cata
S'P(4) PBA)
i=1
18 | Chapter 1: The Basics
The solution shown in Example 1.16 is exactly the one that would be
produced by using Bayes’ theorem. To make the identification, simply let the event
L in Example 1.16 correspond to B in Bayes’ theorem, and let S and S°
correspond to A, and A, (with n = 2 and k = 1) in Bayes’ theorem.
Many simple problems that can be done with Proposition 1.1 can also be done
with tree diagrams. The concept of a tree diagram is easily understood by looking
at a few examples.
05 19 19 57
High Low
6/25 19/25
At the lower level in the tree, the nodes labeled “High” and “Low” represent
the two possibilities (high or low mercury concentration) for the first dolphin.
(Again, it doesn’t really matter whether they are chosen one at a time or
simultaneously. We can simply think of one of them as being labeled “first”? and
the other as being labeled “‘second.”) The probabilities on the branches are 6/25 and
19/25 because of what is known about the makeup of the group of 25. At the
1.5 Tree Diagrams and Bayes’ Theorem 19
second level in the tree, the possible conditions of the second dolphin are recorded.
The numbers on the second level of branches are conditional probabilities. For
example, the leftmost number 5/24 at the second level is the conditional probability
that the second dolphin has a high concentration of mercury given that the first one
does. The reason this conditional probability is 5/24 is that once we know that the
first one has a high concentration, we know that 5 of the remaining 24 also have
high concentrations. The four ovals at the top of the figure represent the four
possible outcomes of the experiment. From left to right they may be described as:
both high, first high and second low, first low and second high, both low. Thus
the tree diagram is consistent with thinking of the sample space as something like S$
= {HH, HL, LH, LL}. Finally, the numbers at the top of the figure are the
probabilities of the four outcomes corresponding to the ovals. These are computed
by multiplying the numbers on the branches connecting the root of the tree to the
ovals. For example, the probability of the outcome HH is computed as 6/25 x 5/24
= .05. This use of multiplication is an application of Equation 1.2.
three outcomes in which exactly three resistors are tested (the circles at the
intermediate level). Since each has probability 1/10, P(B) = 3/10. An B
consists of two of these outcomes; so P(A A B) = 2/10. Therefore,
2/10 2
P(A1|B) = 3/10 3.
that is, if it is known that exactly three resistors are tested, then the probability that
both defectives are tested is 2/3.
3/5 2/5
For example, three events A, B, and C are independent provided all of the
following are true:
1. PANBOC) =P(A) PC) PCO)
2. P(A QB) = P(A) P(B)
3. P(AQC)=P(A)PC)
4. P(BOC)=P@)42OQ
whereas
Pett 1
So the collection of three events does not form an independent collection. One way
to understand this is to observe, for example, that if we know that A and B occur,
then we know with absolute certainty that C occurs. Intuitively, independence of
the collection of events would indicate that no knowledge involving only A and B
could influence the probability of C occurring. (A more mathematically precise
statement is the following: If A,B, and C are independent, then C will be
22 Chapter 1: The Basics
independent of any event that can be formed from A and B using set union,
intersection and complementation. This can be proved using Definition 1.3.) In
particular, in a collection of independent events, any of the events may be replaced
by their complements and the resulting collection of events will still be
independent. This will be extremely important to us in Chapter 2. Most of the
applications will feature the concept of independent events in one form or another.
In Chapter 2, we will constantly be performing calculations such as
P(AANBOC = P(A) PB) [1-P(O)]J
if A, B, and C are independent.
tn)
a eee
Example 1.20. If you have offered to give a friend his choice of any 3 books
from 10 that you own, the number of different ways he can make his selection is
10!
C0, 3) = 3077 = 120
Notice that the selection of 3 books to be given away also completely determines
which 7 will be left behind. In other words, C(10, 3) = C(10, 7). More
generally, if 0 <r <n, then C(n, rf) = C(n, n—7r).
Example 1.21. In aclub with 10 members, how many ways are there to fill
a Slate of officers consisting of a president, secretary, and treasurer? The answer is
10x9x8 =720. The difference in this example and the last is that here we are not
just selecting 3 people from 10, but specifically 3 people for 3 separate positions.
Utilizing the multiplication principle, you can think of this as 10 choices for
president, then (after that choice has been made) 9 possibilities for secretary, and
then 8 for treasurer. The assumption is that no one occupies two offices.
o.(§) (5) = as
You should verify the validity of the formula in this simple case by looking at the
same experiment in terms of a tree diagram, with one level of branching for each
roll of the die (with the outcomes for each roll shown as “6” or “not 6”). Doing
this, in fact, leads to understanding why the formula is correct. For in a tree
diagram to represent n independent trials, there would be exactly C(n, k)
outcomes across the top corresponding to k successes in the n trials, and each of
these outcomes would have probability p* q”“*.
In the tree diagram for an independent trials process, the probability for
success is the same at all locations in the tree. It is important to keep in mind that
much more general kinds of experiments can be viewed in terms of a tree diagram
than can be viewed as an independent trials process. The relation between an
independent trials process and the concept of independent events is this: In an
independent trials process, if A is an event that may be described in terms of the
outcome of one trial and B is an event that may be described in terms of the
outcome of another trial, then A and B will be independent events. This statement
may be generalized to more than two events. For example, if a coin is tossed 10
times, the experiment may be viewed as 10 trials in an independent trials process.
If A, is the event that “heads occurs on the first toss,” A> that “heads occurs on
the second toss,” and so forth, then Aj, - - - , Ajg are independent events.
Axiom 3.1: If Aj, Ay, --- is a sequence of disjoint events in a sample space,
then
P(A, U Ap U =++) = P(Ay) + P(g) +++
Axiom 3.1 simply says that even with a countably infinite collection of
disjoint events, the probability of the union is still the “sum of the probabilities”
provided that the latter is interpreted as the sum of an infinite series.
The concept of a geometric series will be useful in the next example and in
other applications later. ;
Solution: Since there is no upper bound that can be placed on how many
tosses might be required, an infinite sample space is required to model this
experiment. In fact, we can simply think of the sample space S as being the set of
positive integers, where the integer 4, for example, would represent the outcome
that the head occurs on the fourth toss.
How should the probabilities be assigned? In order for the head to occur on
the fourth toss, tails would have been necessary on the first three tosses. The
probability of getting 3 tails followed by a head is 54 when computed by any of
our standard methods. In general, an integer ke S represents the outcome in
26 Chapter 1: The Basics
which a head follows k-1 tails and hence has probability .5*. To check
consistency, observe that the sum of the probabilities of all the outcomes in S is
Dy he Speke Sok
gg ee ig ee
Factoring out 1/2 from each term here one is left with the geometric series found in
Proposition 1.4 with r = 1/2.
The question that is asked is what is the probability of the event
A= {2, 4,6, +. —}
that is, that the head occurs on an even numbered toss. Then
P(A)
G) GG)
7 a5 > =e
Gasoy es Wa mn at A mn cae
Gi lores)+} eee
at t+(s)+(4) +04) = (GQ) = 3
In this example we have been interested in the number of the toss on which a
head will first appear. We have viewed the sample space as simply being the
positive integers,
Sa tly 2 dee seuy
For example, the integer 17 represents the outcome in which the first head appears
on the seventeenth toss.
Someone night raise the question, “But isn’t it also possible that a head will
never appear?” You might be inclined to include some other element in the sample
space to represent that possibility, for example, the symbol oe. Is it correct to do
this?
Actually, it doesn’t matter. Since the sum of the probabilities that 1, 2, 3, - -.
occur is equal to 1, we would have no choice but to assign probability 0 to any
additional outcome that we included in the sample space, including an outcome to
represent the possibility of heads never occurring. So including it is pointless, but
not necessarily incorrect.
This example involves a countably infinite sample space. A countably
infinite set is one that can be put in one-to-one correspondence with the positive
integers. Notice that the techniques used are almost identical to those used with
finite sample spaces. We needed only to replace the idea of a finite sum by that of
an infinite series.
In Chapter 5 we will be studying continuous models. As mentioned earlier,
in continuous models the computations involve integrals rather than sums. Often
Problems 27
these are, in fact, easier to work with. This is because the standard methods of
calculus often are sufficient to evaluate integrals, whereas evaluating sums (even
finite ones) can be pretty tedious at times. There is not much that can be said in this
introductory chapter about continuous models, however, because the groundwork
that is laid in Chapter 3 must come first.
Problems
ae Assume that P(A) = .6, P(B) = .3, and A and B are independent.
Give the following probabilities:
(a)P(ANB) (b)P(ATB). (c) P(A UB) (d) P(A UB‘)
£2 Two devices are tested from a batch of 6 items of which 4 are good and two
are defective. Find the probability that
(a) both are good.
(b) both are good given that at least one is good.
1.4 Peter has 4 blue socks, 3 brown socks, and 2 white socks in his drawer. If
he randomly reaches in and pulls out a pair of socks, what is the probability
that they will match?
1.5 Suppose A and B are events with the properties that P(AS M B°) = .2,
that P(A > B) = .2, and that P(B°) = .4. Find P(A).
1.6 A device has 5 components. Assume that each one has a probability .1 of
being defective and that whether a given one is defective does not depend on
the condition of any of the other components. Thus the 5 components may
be viewed as an independent trials process with n = 5 = number of trials.
(a) For k = 0, 1, 2, 3, 4, and 5, find the probability that exactly k are
good.
(b) What is the probability that at least 3 are good?
28 Chapter 1: The Basics
1.8 A and B are going to play a tennis match. The winner must win two sets
in order to win the match. (They stop playing when someone has won two
sets.) In each set, the probability that A wins the set is .6.
(a) Draw a tree diagram for the tennis match.
(b) What is the probability that A wins the match?
(c) What is the probability that B wins exactly one set?
(d) What is the probability that B wins at least one set?
(e) What is the probability that A wins the match given that B wins the
first set?
1-9 Assume the following to be true: Thirty percent of the members of a club
play tennis. Of those who play tennis, 70% also play badminton. Of those
who do not play tennis, only 10% play badminton.
(a) What is the probability that a randomly chosen person from the club
plays badminton?
(b) If you know that a randomly chosen person plays badminton, what
then is the probability that he or she plays tennis?
1.10 A system has five components, two of which are known to have a special
marking stamped on the side. (The other three are known not to have the
marking.) Components are checked one at a time until two of the
unmarked components have been found.
(a) Draw a tree diagram for this experiment.
(b) What is the probability that both of the marked components are
checked?
(c) What is the probability that exactly one of the marked components is
checked?
(d) What is the probability that at least one marked component is checked?
Problems 29
1.11 In a certain population, 5% have a disease. A diagnostic test for the disease
returns a positive result 90% of the time when it is used on a person who
actually does have the disease. (The other 10% of the time the test fails to
detect the disease.) However, when the test is used on a person who
actually does not have the disease, the test returns a positive result 8% of the
time.
(a) What is the probability that a randomly chosen person actually has the
disease and will test positive?
(b) If the test returns a positive result for a randomly chosen person, what
is the probability the person actually has the disease?
In a certain high school, 10% of the male students weigh 200 lb or more.
Of those who do weigh at least 200 lb, 30% play football. Of those who
weigh less than 200 lb, only 5% play football. If a randomly chosen male
student is known to play football, what is the probability that he weighs at
least 200 1b? Do this problem with a tree diagram and again using Bayes’
theorem.
Suppose S; and S$, are finite sample spaces associated with two different
experiments. The sample space for the combined experiment consisting of
both of these experiments could be taken to be the set of all ordered pairs
(w1,W 2) where w;eS, and we 5. Do you see a natural way of
defining a probability measure on this sample space that will make the
results of the two experiments independent of each other? (Consider this
question in light of Problem 1.14.)
1.18 A system has 100 components, 30 of which are defective. If you pick out
10 different components for testing, what is the probability that exactly 2 of
them are defective? (Use the combinations formula.) Make sure you
understand the difference in the situation described here and the situation of
Problem 1.17. This is a bit subtle, for in Problem 1.17 you presumably are
considering the testing to take place on 10 different items as well. Yet in
this problem, the sampling would have to be considered to be sampling
with replacement in order to produce the same answer as in 1.17.
Problems 31
(Sampling with replacement means that an item tested once may be tested
again. In other words, each item is replaced in the original pool from which
the samples are being drawn immediately after it is tested. For example, if 2
cards are drawn with replacement from a deck of cards, they both could be
the ace of spades.)
A device is put into service on a Monday and operates seven days each
week. Each day there is a 10% chance that the device will break down.
(This includes the first day of operation.) The maintenance crew is not
available on weekends, and so the manager hopes that the first breakdown
does not occur on a weekend. What is the probability that the first
breakdown will occur on a weekend. [Hint: Make use of the concept of a
geometric series. View each day as a separate trial in an independent trials
process in which you are waiting for the first success.]
£20 Show by induction that for any collection of events A,, A2,---,A,, it
is true that P(A; U---UA,) S$ P(A,) +--+ +P(A,).
b23 Show that if a pair of events A and B is both independent and mutually
exclusive, then at least one of the events must have probability zero.
1.25 Draw a Venn diagram for the data in Example 1.1, using circles to
correspond to the categories “low blood pressure” and “men” as opposed to
“high blood pressure” and “women,” as is shown in the text.
32 Chapter 1: The Basics
1.26 In the Venn diagram that follows, A, B, and C represent the sets of letters
given in Example 1.3. For each of the 26 letters of the alphabet, place the
letter in the correct one of the eight regions shown in the Venn diagram.
For example, the letter x would go in the outer region (outside the 3 circles)
as shown in the figure because it doesn’t belong to any of the three events.
P27 In a Venn diagram with three sets, the figure is divided into eight connected
regions.
AaABoac
One of the eight regions is shaded in the figure, and the set that is
represented by the region is described in terms of A, B, C, and the
elementary set operations. Give a similar description for each of the other 7
regions in the figure.
1.28 To check whether two given events are independent, any of three different
equations can be used. (See Definition 1.2.) Prove that the three equations
Problems SiS}
in Definition 1.2 are equivalent. In other words, show that if one of them is
true, then all of them must be true. (This is very easy. For example, you
might first show that equations 1 and 3 are equivalent. By symmetry, an
identical argument would show that Equations 2 and 3 are equivalent.)
1°29 Example 1.22 shows that if a coin is tossed repeatedly until a head occurs,
the probability is 1/3 that the first head will occur on an even-numbered
toss. Give a similar argument to show that the probability is 2/3 that the
first head will occur on an odd-numbered toss.
Chapter 2: Applications
This section will give further engineering applications of the topics discussed
in Chapter 1. Some such problems involve reliability of systems like circuits and
communication networks. The examples given in this chapter will be elementary,
but they will show the usefulness of the properties and techniques that have already
been encountered.
2.1 Circuits
In the circuits we are interested in determining the probability that the circuit
will carry current, and we assume we know the probability that current can flow
through each of the components. It is common practice to call the probability that
the circuit will carry current the reliability of the circuit, and similarly for the
34
2.1 Circuits BD
Series Circuits
Example 2.1. The series circuit in Figure 2.1 (a) can carry current only if all
components are functioning. Thus the reliability of the circuit is
More generally, it should be clear that for any series circuit, the reliability is
simply the product of the reliabilities of the individual components, assuming that
the events corresponding to the components being good are independent events.
(Engineers usually describe this assumption by saying that “the components fail
independently.’’)
Parallel Circuits
Example 2.2. The parallel circuit in Figure 2.1 (b) can carry current if at
least one of the components is functioning. Thus rel = P(AUBUC). The
easiest way to evaluate this probability is to recall De Morgan’s law:
(A.GB-U.C)
= ALERB? A.C
The fact that A, B, and C are independent guarantees that A‘°, B°, and C* are
also independent. Thus,
rel. =21 -P( Aw BOC T= 1 Pa4a Pb) PC)
Sa aS (2.2)
where p PR and Pc denote P(A), P(B), and P(C), respectively.
In general, a parallel circuit with n components would have reliability
36 Chapter 2: Applications
rel = 1= C= peep.) Cl ae
where p, denotes the reliability of the ith component, and the components are
assumed independent.
In Examples 2.1 and 2.2 we have expressed the reliability of simple series
and parallel circuits in terms of the reliabilities of the individual components. If
there is more reason to be interested in failure of the circuit than in reliability, the
situation can be turned around and the unreliability of the circuit can be expressed
in terms of the unreliability of the components. The unreliability of a circuit or
component is simply the probability of failure; that is, the probability that the device
cannot carry current. (See Problem 2.6.)
Series-Parallel Circuits
2.2 Networks
A B C
source @——————_@—_@—__® sink
source sink
In the case of the circuits, it was assumed that only the components can fail;
that is, we did not consider the possibility of any of the “wires” connecting the
38 Chapter 2: Applications
components failing. Similarly, in the case of the networks we are assuming that the
nodes are perfectly reliable. (In Figure 2.2 the nodes are the black dots at the ends
of the links.)
One current important class of network problems has to do with
communication networks. Visualize a large group of computers or terminals
(nodes) linked together by a variety of communication lines (edges). Since things
like power outages or equipment failures can cause communication links to fail,
there is a clear need to study the reliability of such systems. It is also possible to
study systems in which both the vertices and the edges can fail. We did not allow
for that possibility in treating Figures 2.1 and 2.2 above.
Graphs
Both networks and circuits, when viewed from a purely mathematical point of
view, are instances of what are called graphs. A graph consists of “edges” and
“vertices.” When drawn, the edges are usually lines or curves, and the vertices are
the junction points where edges meet. Thus graphs are visually similar to the
networks of Figure 2.2. To identify a circuit with a graph, one identifies the
components of the circuit with the edges of the graph. The individual connected
clumps of “connecting wires” that connect the components in the circuit correspond
to the vertices of the graph. Graph theory is a branch of mathematics which has
important applications to many types of engineering problems.
Example 2.4. The nice thing about series-parallel networks (or circuits) is
that reliability calculations can be manually reduced to simpler problems in a
straightforward way. More general kinds of networks do not necessarily lend
themselves to such direct treatment. Figure 2.3 illustrates this. In this figure we
assume that the edges A, B, C, D, and E have known probabilities of being in a
working condition and that these events are independent.
If w and y are the source and the sink, then the network is series-parallel
and the reliability is the probability of the event (A 7B) VU EU(CaD).
If x and z are source and sink, then the problem becomes substantially more
difficult. It is no longer possible to merge series or parallel components. There are
a variety of approaches, however, that can be used. Perhaps the simplest is to
2.2 Networks 39
“partition” the problem depending upon whether link E is working or not. To see
what this means, let’s let G represent the event that a communication route from x
to zis available. Then
P(G) = P(E) P(G| E) + P(E‘) P(GIES
(In this equation the event E denotes the event that link E is functioning.) This is
Proposition 1.1, the multiplicative law. Since P(E) and P(E*) are assumed
known, we yet need to know how to evaluate P(G | E) and P(G| E°).
Figure 2.3 If wand y are viewed as the source and the sink, this is a series-
parallel network. If x and z are source and sink, the network is not series-parallel.
independence.
There is a conceptual point here that is more important than the nitty-gritty
details. It is that by “pivoting” on the edge EZ, that is, by splitting the problem into
two cases depending upon whether edge E is good or bad, the problem can be split
into two simpler problems. This idea plays a key role even in very sophisticated
techniques and in current research. ;
Even very important contemporary problems needn’t involve highly abstract
concepts. The network reliability problems introduced above illustrate this fact.
Suppose that a group of telephone users are hooked together by a network of
telephone lines of known reliabilities and that whether a given line is down is
independent of whether any of the other lines are down. It is hypothetically
possible to determine the probability that all individuals can communicate with each
other or that any particular subgroup can communicate with each other. This
calculation can be done using a variety of the techniques developed in Chapter 1.
Some of the techniques can be refined into quite powerful tools, while others have
to be discarded as too crude. All methods of determining network reliability are
very “inefficient” algorithms in that the time required for the calculation grows
exponentially with the size of the network. The amount of time or the number of
computations required to perform a calculation is referred to as the computational
complexity of the method. The development of computer algorithms for
performing such analyses is an important area of contemporary research.
Example 2.5. In Figure 2.4, the five circles in the network Go at the top
represent five people linked together via the seven communication lines shown in
the picture as edges e€,,---, 7. We will assume that the reliability of each of the
links is known; that is, that the probability the link is working is known.
Furthermore, let’s assume that the links are independent. As a convenient notation,
we denote by p, the probability that link e, is good.
The problem we are going to investigate is the problem of determining the
reliability of communication between the two people represented by the black
nodes. What are some possible approaches that could be used?
Tree Diagrams
Here we are looking at seven links, each of which is going to be either good
or bad. To represent all states of this system in a tree diagram, we would need
to have seven layers of branching in the tree, one layer for each line. Since the
number of outcomes shown in the tree will double for each layer, we will wind up
with 27 = 128 outcomes in the tree.
In reality it isn’t necessary to draw the entire tree in order to do this problem.
For instance, suppose we were to start out by considering edge e3 first and edge
€ second as we start building the tree. If both these links are bad, then certainly it
is going to be impossible for the two black nodes to communicate. Therefore, there
isn’t any point in continuing to draw the part of the tree that would continue on
from the assumption that that these two links have both failed, if all we are
interested in getting from our tree diagram is the probability that the black nodes can
communicate. Even so, this problem clearly taxes the methodology of a tree
diagram to its limit. While it may be (barely) feasible to do this problem with a tree
diagram, given a very large sheet of paper, it wouldn’t be much fun.
There are clearly many routes that could be used to send a message from one
black node to the other. A few are e€3 e4 és and é2 &€¢ and é3 €; €7 €s5. For
any one of these possible paths, we can easily compute the probability of the path
being available because of the independence we are assuming for the links. For
example, the probability that the path e3e4e5 is a working path is simply the
42 Chapter 2: Applications
product p3 x p4 x ps, Since it depends on all three of the three edges involved
being good. Therefore, we could list all the possible paths and compute the
probability for each path. The problem with this approach, however, is that the
different paths aren’t mutually exclusive. For example, consider the two paths
€3€4€5 and €3€,€7€5. Let’s think of event A as being the event that the first
of these two paths is good and event B as the event that the second is good.
Clearly A and B aren’t mutually exclusive, because it’s quite possible that both
paths might be good, and so we can’t compute P(A U B) just by adding P(A)
and P(B). The best we can do is
P(A UB) =P(A) + P(B)-P(A MB) (223)
=P3P4P5 + P3P1P7P5 — P3P4P5P1P7
This computation is easy, since A B, the event that both paths are good,
depends only upon é3, €4, €5, €;, and e7 being good, and the probability of this
happening is given by p34Ps P P7-
So if there were only two possible paths connecting the black nodes, the
computation we have just described would do the trick. The problem is that there
are many such paths, not just two. Let’s think of n as representing the number of
paths between the black nodes and A,,---,A, as being the events “path 1 is
good,” “path 2 is good,” and so forth. Since what is needed in order for
communication to be possible is for at least one of the paths to be good, the task is
to compute the probability P(A; U---UA,).
There is a standard way to do this using what is sometimes called the
inclusion-exclusion principle. In fact, Equation 2.3 above is the special case of
this principle when n = 2. For n = 3 the inclusion-exclusion principle says that
PACE OC) = PA) EPGy+PC)
—P(ANB)—-PAANC)-P(BOC)
+P(AANBAC)
For the union A, U- - - U A,, the inclusion-exclusion principle says that
the probability P(A; U - - - U A,) can be computed by adding the probabilities of
each of the individual events, then subtracting off the probability of the intersection
of all possible pairs of the events, then adding on the probabilities of all
intersections of three events, subtracting off the probabilities of all intersections of
four, and so forth. While it’s not terribly difficult to see why this principle is
correct (in fact, it can be derived from Equation 2.3 using mathematical induction),
it clearly isn’t a recipe that’s going to be very pleasant to use if n is large.
44 Chapter 2: Applications
The usefulness of this little formula lies in the fact that there is a very natural
interpretation to each of the conditional probabilities contained in it. The
conditional probability P(A | £) is the probability that communication is possible
given that the link e, is good. If e, is good,jhowever, then we know for certain
that the two nodes at the ends of e, will be able to communicate, and so any
message that can get to one can automatically get to the other. Why not then think
of them as a single node? In other words, take e; out of the picture altogether and
“collapse” the two nodes at the ends of e, into a single logical node, because being
able to get to one of them is equivalent logically to being able to get to the other one
given that e, is good. What this observation boils down to is the fact that the
conditional probability P(A | E) is the same as the unconditional probability that
the two black nodes can communicate in network G, of Figure 2.4. On the other
hand, the conditional probability P(A | E°) has an even simpler interpretation.
For if €, is bad, we might as well remove it from the picture altogether. The
conditional probability P(A | E°) is the same as the probability that the two black
nodes could communicate if e, weren’t there at all, and this is what is shown in
network G, in the figure. The conditional probability P(A | E°) is the same as
2.3 Case Studies: Two Network Reliability Problems 45
the unconditional probability that the black nodes can communicate in the network
G,.
It is common to refer to G; as the network (or graph) obtained from Go by
deleting the edge e, and to refer to Gy as the network obtained from Gog by
contracting the edge e,. Notationally this is often written as G, = Go — e; and
G» = Go * ej, as is indicated in the figure.
You may not have noticed, but a very important thing has just happened. We
have described how to solve our original problem by expressing the solution in
terms of two simpler problems. Specifically, we have passed from having to deal
with network Go to having to deal with networks G, and Gy. Furthermore, each
of these networks can be further simplified by series and parallel reductions. For
example, e3 and e, in G; are series edges, and G; shows them being replaced
by a single edge eg whose reliability is given by pg = p3x py. Similarly, G,
has €y and é3 as parallel edges, as well as e4, and e7, and G4 shows these pairs
each being replaced by a single edge with the appropriate probability. G7 and Gg
continue this process, and in fact G, can be completely solved by series and
parallel reductions.
In solving G,, however, we are stuck again when we get to G3, because
G; has no series or parallel edges. For that reason we have to repeat the trick we
used to start with, that is, to pick another edge and to delete and contract it leading
to two further problems. In the figure, the edge chosen is e, (the particular choice
of edge is not critical), and deleting and contracting e7 leads to the two networks
G6 and Gs, respectively. Gs is then reduced to Gg and G¢ to Gyo.
All known methods for doing problems such as this one require a number of
computational steps that grows exponentially with the size of the network.
Reduction methods similar to what is shown in Figure 2.4, however, work about
as well as any known method. In fact, it is possible to develop algorithms using
such reduction methods and to run the algorithms on a microcomputer.
Figure 2.5 shows a much more complicated network than the one we have
been considering. In fact it is based on a computer network that existed in the
1970s with source in California and sink in New England. The probability that the
two black nodes can communicate in the network of Figure 2.5 can be calculated on
a microcomputer in less than a minute using an algorithm that implements the ideas
of Figure 2.4. Since the network has 22 links, a complete tree diagram for all
states of the network would show 222 = 4,194,304 outcomes.
46 Chapter 2: Applications
source sink
Figure 2.5 The network reliability problem is to determine the probability that
communication is possible between the source and the sink.
A Hee
SS eeP1 Po a
P1 + P2— Py Po
al €2
V. e3
Vo Vo 2 ¥3
V4 V5
V,4 G
Vi5
G, 2
rel(G;) = (p; ain Obey 1
OF Po) rel(Go)
Clearly P(C) =p, + pz —PiP2, where p, and p> are the reliabilities of
€, and é€9, respectively. What about the term P(A 1C)? Given that C occurs,
we know that the node v, is not cut off from all other nodes. Furthermore, it is
possible that v. and v3 can communicate through v, if both e, and e, are
good. Here’s where things become a bit subtle. If we knew for certain that
exactly one of the edges e, and e, were good, we could simply remove eé), é9,
and v, from the network, and the reliability of the reduced network would be the
same as the original. However, since both might be good, what we need to do is
replace them by link e3 in the reduced network and to assign the proper probability
to e3. The reason for the presence of e3 is to take care of the possibility that e,
and é can provide a communication route from v2 to v3. The probability of this
is the probability that both e; and ey are good, which is p; pz. Given the
information that at least one of them is good, the conditional probability that both
are good is the probability p3 shown in the figure.
The relationship between the two networks is described by the equation at the
bottom of the figure. The reliability of G, is the conditional probability that G,
is good given that at least one of the two original edges e, and e, is good. The
important point though is that the equation
rel(G,) = (Pp, + P2—P}P2) Tel(G2)
enables us to describe the solution to our original problem in terms of the solution
48 Chapter 2: Applications
So What?
Examples 2.5 and 2.6 have been used to show how solutions to important
contemporary problems can be formulated in such a way that the probability
concepts involved are basically elementary. Some other very important problem-
solving techniques have come up also. In fact, it might be worthwhile to list them.
eae ifn>0O
oS 1 ifn=0
2.4 Fault Trees 49
(a OR gate (union)
Sometimes it is possible to construct the fault tree in such a way that the
primary inputs will be independent events. (Some of the most serious defects in
safety analyses, however, have centered around the assumption of independence of
events that, in fact, were later found not to be independent. So independence
assumptions should be made with a great deal of caution. For instance, in Figure
2.7 if water is supplied by an electric pump, then nodes 9 and 7 will not be
independent unless the electricity supplying the pump comes from a different
source than that described in node 7.)
If the primary inputs are assumed to be independent, then it is easy to
compute the probability of the top event by a simple bottom-up hand calculation.
To make things concrete, let’s introduce some hypothetical probabilities:
Event Probability
node 4 pipe plugged at
node 6 valve defective O01
2.4 Fault Trees 51
Normally fault trees require much more subtle approaches than is shown
here. Suppose, for instance, that there were more than one way in which electricity
failure could contribute to failure of the tank. In this case we would have the same
primary input (node 7) appearing at more than one location in the fault tree. This
means that the kind of bottom-up approach used here would not work because
subtrees beneath certain nodes would fail to be independent.
Fault tree analysis was developed primarily within the aerospace and nuclear
industries during the 1960s and 1970s. Better ways of calculating or estimating the
probability of the top event for very large and complex trees (sometimes having
hundreds or even thousands of nodes) are still being sought.
Example 2.7. Draw a fault tree for the circuit of Figure 2.1c. The top event
of the fault tree should represent the event that the circuit can transmit circuit.
Solution: There is not a unique way of constructing such a fault tree. Either
of the following trees would be correct, and, in fact, it should be easy to convince
yourself that these two fault trees are logically equivalent.
ifs a
ae (2) & on
Sao OieViOn "©
Notice that if we wish the top event to represent failure of the circuit, then a
different fault tree is called for. Such a tree is the “dual” of a tree that represents
“non-failure” of the circuit. Simply convert all the AND gates to OR gates and all
the OR gates to AND gates. For example, here are the duals of the two fault trees
above:
52 Chapter 2: Applications
In either of these fault trees the top event represents failure of the circuit if we now
interpret the primary inputs A, B, C, and D to represent failures of the
respective components.
Problems
px (a) What is the reliability of 5 items in parallel if each has reliability .6?
(b) What is the reliability of 5 items in series if each has reliability .6?
2.2 Find the reliability of the circuits shown below. The reliability of each
component is shown with the component.
2.3. Find the reliability of the circuits shown below. The reliability of each
component is shown with the component.
Problems 53
2.4 The following figure is a highway network. Consider each of the 5 links to
function independently of the others.
(a) Assume that for each link the probability is .9 that the link is open.
What is the probability that you can get from “start” to “finish”? (As an
instructive exercise and a check of your answer, it might be worthwhile
to list a sample space for this network. Since there are 5 links (edges),
there are 2° = 32 possible states (outcomes). One such state, for
example, would be to have link A be good, B good, C bad, D bad,
and E good. By the assumed independence of the links, the
probability of this state .
=PAANBAC ADS OE)
= P(A) P(B) P(C*) PD P(E)
The desired probability then can be obtained by just adding up the
probabilities of all of the 32 states for which it is possible to get from
“start” to “‘finish.’’)
(b) Now drop the assumption that each link has reliability .9 and simply
write the reliability of the network in terms of the reliabilities of the 5
links.
start finish
Die Find the probability that a route is open from the source to the sink in the
network that follows. The reliability of each edge is listed alongside the
edge.
source sink
2:6 Examples 2.1 and 2.2 show how the reliability of simple series and parallel
circuits are expressed in terms of the reliabilities of the individual
54 Chapter 2: Applications
components. Show how the failure probability (the unreliability) for the
circuits in Figure 2.1(a, b) can be expressed in terms of the failure
probabilities of the individual components. Let q4, gg, and dc denote the
failure probabilities of components A, B, and C, respectively.
aa, Show how the unreliability of the circuit in Figure 2.1(c) can be expressed
in terms of the unreliabilities of the failure probabilities of the individual
components. (If you have already done Problem 2.6, the meaning of this
should be clear.)
2.8 The circuit in the following picture shows a battery, a light, and two
switches for redundancy. The two switches are operated by different
people, and for each person there is a probability of .9 that the person will
remember to turn on the switch. The battery and the light have reliability
.99. Assuming that the battery, the light, and the two people all function
independently, what is the probability that the light will actually turn on?
battery
switch 2 light
switch 1
Dye} In the figure below, the probability that the battery is good is py, the
probability that the light bulb is good is ps, and the probabilities that the
resistors R,, Ry, and R3 are good are pj, p>, and p3, respectively.
is white, let B be the event that the two items drawn are of the same
color, and let C be the event that the items are drawn from box 1.
Show that A and B are independent events, but not conditionally
independent relative to C and C° in the above sense. (This is easy to
check with a tree diagram, but you should also try to understand
intuitively why A and B are independent and why they cease to be
independent once we know what box we are drawing from.)
(b) Suppose again that box 1 contains 2 red and 1 white items and box 2
contains 1 red and 2 white items. Again randomly choose one of the
boxes and draw two items, only this time replace the first item before
drawing the second. Let A be the event that the first item is red, let B
denote the event that the second item is red, and let C be the event that
the items are being drawn from box 1. Show (via a tree diagram) that
A and B are conditionally independent relative to C and C*, but that
A and B are not independent. This says that A and B become
independent once we know which box we are drawing from, but in the
absence of this knowledge, A and B are dependent.
2.13 Fill in the details of the calculation in Example 2.5. Suppose that each of
the links shown in the original network Gg has reliability .9. Determine
then, with the assumption that the links function independently, what the
probability is that the source and sink shown as the black nodes in network
Go will be able to communicate.
2.14 Complete Example 2.6. Assume that all links have reliability .9 and that the
links function independently. What is the probability that every node will
be able to communicate with every other node?
ould Draw a fault tree for each of the circuits of Problems 2.2 and 2.3. First
construct a fault tree in which the top event is the event that the circuit is in
working order, and then give the dual fault tree in which the top event
represents the failure of the circuit. (See Example 2.7.)
Chapter 3: Random Variables
Sif
58 Chapter 3: Random Variables
complicated sample spaces and instead do all our work with real numbers where
everything is simpler.
The common practice in studying probability is to use uppercase letters from
near the end of the alphabet such as X, Y, or Z to denote random variables.
While this may seem strange at first, it does enable us to reserve common letters
such as f, g, and so on, for use in the conventional calculus sense, that is, as real-
valued functions of a real variable.
Example 3.1. A pair of dice is rolled. Let X denote the random variable
that indicates the sum on the two.dice. Using the usual 36 element sample space to
represent the experiment, the value that X takes on for the outcome (2, 4) is 6.
The domain of the random variable X is the 36-element sample space, and the
range of X is the set of numbers {2,---, 12}. One might indicate the action of
X on (6, 4), for example, by writing
K(6,4) => 10
(One can of course think of functions in terms of inputs and outputs. In the case of
the random variable X, if we input the element (6, 4) from the sample space S,
then X outputs the value 10.)
Example 3.2. A room full of people is considered as the sample space. The
random variable Y gives the weight of each person; that is, when a given person is
“input” to Y, the random variable Y “outputs” the weight of the person.
Example 3.3. A single die is tossed. Consider the random variable that
indicates the number that appears on the die. Here it is natural to identify the
outcomes of the experiment with the numbers 1, - - - , 6; that is, it is reasonable to
consider the sample space to be the set of numbers S = {1, 2, 3, 4, 5, 6}. If one
does this, then X is simply the identity function on S; that is, X(w) = w for
each element we S. This makes sense in this case, because S is a set of
numbers.
Example 3.4. A coin is tossed twice. Let X be the number of heads that
occurs. If we take S = {HH, H1I,_TH, TT}, then X(HH) = 2, X(HT) ='1,
X(TH) = 1, and X(TT) =0. This is similar to Example 3.1.
Someone may take the following view, however. If the number of heads is
all we are interested in, why not record just that information; that is, why not
Chapter 3: Random Variables 59
consider the sample space to be just S = {0, 1, 2}, where 0 represents the outcome
of no heads, 1 represents the outcome of 1 head, and 2 represents the outcome of 2
heads? In this way we can, in effect, consider the random variable that counts the
number of heads to be simply the identity function X(w) = w on this sample
space.
This latter viewpoint is not really as different as it might first seem. We shall
soon see that what is really needed in order to understand a random variable is
knowledge of the probabilities of the random variable assuming specific values.
Furthermore, the above interpretations are consistent in this respect if consistent
probability measures are used on each of the two possible sample spaces. For
example, the probability of the event {w: X(w) = 1} is 1/2 in either case. In the
former sample space, this event is {HT, TH}. In the latter it is {1}. The latter
sample space is not a uniform one since two of the three outcomes listed have
probability 1/4 and the other 1/2.
Example 3.5. Four bits of binary data are transmitted, so the sample space is
that of Example 1.8. Consider Z to be the random variable that assigns to any 4-
bit pattern the number of 1’s appearing among the 4 bits. So
Z:0110 42
If we identify the sample space here with the numbers 0, - - - , 15, then the random
variable is simply assigning to each number the number of 1’s that appear in its
binary representation.
Example.3.6. Five coins are tossed. The random variable Y indicates how
many heads are obtained. Here Y acts on a uniform sample space of 32 = 2°
possible outcomes for the experiment and Y assumes values {0, 1, 2, 3, 4, 5}.
Example 3.7. Lines of text are being transmitted. Each line is 80 characters.
We will suppose that for formatting purposes it is necessary to know how many
blank spaces are included in each line. Furthermore, we assume that characters
may consist of the 26 letters, 10 digits, or 6 different punctuation symbols, for a
total of 42 different characters.
The number of different possible lines that could be transmitted would then be
equal to 8042. If we take this set of all possible lines as our sample space and if we
wish to count the blank characters, then we should think of the random variable
defined on this sample space that simply assigns to each possible line the number of
60 Chapter 3: Random Variables
blank characters within the line. Clearly the values that this random variable can
assume are restricted to the counting numbers from 0 (no blanks in a line) to 80 (the
whole line being made up of blank spaces).
Example 3.8. A resistor is put into a circuit and the length of time X it lasts
before burning out is measured. This situation is a bit more vague since it is not
clear what the “possible outcomes” are. As a first effort one might try to visualize
some hypothetical batch of resistors as the sample space. Since it is “time to
burnout” that we are interested in, however, it is perhaps more useful to think of the
sample space as a set of numbers representing possibilities for the time to burnout.
In fact, one could construe any non-negative length of time as being hypothetically
possible, in which case the random variable could take on all non-negative values.
With this point of view, the random variable X is simply the identity function, in
other words X(w) = w for all w 20. Clearly in this example we have no guess
as to how to treat probabilities on such a sample space. That will come later.
It is useful to have a short notation for certain events associated with random
variables. For instance, in Example 3.1, the event {w : X(w) = 4} is simply the
set {(1, 3), (2, 2), (3, 1)}.. The notation {w : X(w) = 4} is often shortened to
(X = 4). Similarly, {w:X(w) < 9} is shortened to (X < 9). Other similar
situations are handled in a like manner. And finally, the probabilities of the two
events (X = 4) and (X < 9) are usually written as P(X = 4) and P(X < 9).
Example 3.9. IfX is the sum when a pair of dice is rolled, then X has
range equal to the set {2,---, 12}. So py(t) takes on non-zero values only
when fe {2,---, 12}. For example, py(5) = 4/36 = 1/9, whereas px(13) = 0.
Example 3.10. If X is the number of 1’s that appear when 6 random binary
digits are transmitted, then py(t) # 0 only when fe {0, 1, 2, 3,4, 5, 6}. If we
think of the stream of digits as an independent trials process, then the values of the
probability mass function are easily obtained from the independent trials formula.
For example, the probability of exactly two 1’s being transmitted among the 6 digits
is py(2) = C(6, 2) (1/2) (1/2)".
Solution: If we letX denote the number of 1’s among the 10 digits, then
what we have to compute is
One can intuitively think of a continuous random variable as one that takes on
values in some interval of the real line. For example, many continuous random
variables have as their range of possible values the non-negative real numbers. The
concept of the probability mass function does not apply to random variables of this
type. The analogous concept is that of the density function of the random variable.
Fundamentally, when dealing with continuous random variables, one has to ask a
different kind of question. Rather than asking for the probability that the random
variable assumes a specific value, we ask instead what is the probability that it
assumes a value in some interval of numbers. The role that the probability mass
function plays in the discrete case is played by the density function in the
continuous case.
Zz (ftbat =)
[toa = [fio at
The analogy between this definition and the definition of the probability mass
function in the discrete case is that the requirement that the integral equals 1 in
Definition 3.3 corresponds to the fact that the sum of the (nonzero) values of a
probability mass function is 1. Whereas each discrete random variable has a
probability mass function, every continuous random variables has a density
function. The relationship between a continuous random variable and its density
function is given in Definition 3.4.
3.2 Continuous Random Variables and Density Functions 63
The values a =—ce and b = o are permitted here, in which case this
becomes an improper integral.
Example 3.13. Perhaps the most intuitively natural density function is the
function f defined by f(t) = 1 forO <t< 1 and f(t) = 0 otherwise. If a and b
are two numbers with 0 sa<b< 1, then
eo b
Pa<X<b)= | fO ar=| ldt =b-a
Example 3.14. Figure 3.1 shows the density functions for three random
variables. Suppose X, has f, as its density function and X, has fj. Since both
functions vanish off the interval [0,1], we know that P(X, < 0) = 0 and also that
P(X, > 1) = 0 and similarly for X2. In other words, X, and Xz are random
variables that assume values between 0 and 1. The graph of f,, however, is higher
toward the right end of this interval, and the graph of f, is higher toward the left
end. This tells us that values for X; are more probable near 1 than near 0, and vice
versa for X>. As a sample computation to illustrate this, let’s compute the
64 Chapter 3: Random Variables
Example 3.14 illustrates that a simple glance at the density function gives
much information about the random variable. A random variable having the
function g at the bottom of Figure 3.1 as its density function would be one that
assumes values between 2 and 5 with values around 3 being most probable.
2 Z
{ {
{ 1
frit
or eR 2-21 if0<te<1
dite O otherwise
id=
2
0 otherwise
1 2 3 + 5
Figure 3.1 Three density functions.
To see this, simply think of taking a small interval of radius 6 centered at fp. Then
tot6
P(X =1) S$ P(ty-8<X <ty+8) = |x xO dt
t,
and the fact that fis an integrable function guarantees that the integral on the right
tends to zero as 6 > 0. Since P(X = fp) is less than or equal to this quantity for
every positive 6, P(X = fy) must equal zero.
Proposition 3.1 includes the observation we have just made as well as a
useful additional consequence of this fact.
The explanation for the latter part of Proposition 3.1 is simple. For example,
P(asx <b)=(P(a@<X <b) +P?C =a)
= P(a<Xi<b)+0=-Pia<X-<b)
The other parts may be proved in a similar way.
Other examples of commonly occurring density functions will be given in
Chapter 5. Before looking at continuous models, however, it is a good idea to get a
better understanding of the mathematical implications of the relation between a
random variable and its density function. It is important to understand some
parallels between discrete and continuous random variables, and one way to
understand the parallelism is in terms of a third way of representing the “probability
distribution” of a random variable: the cumulative distribution function.
jumps by 1/8
————SsSsSS
ss jumps by 3/8
The similarity between the discrete case and the continuous case is illustrated
by Figure 3.3. The integrals encountered when working with continuous random
variables replace the sums that appear in the discrete case, and the density function
replaces the probability mass function.
68 Chapter 3: Random Variables
b
P(asX <b) Pee | Felt) dt
ast< b a
ty
F x(t) DY pxlty [feo at
iat ny
Ss
0
1
Fys)= | 2r dt=1 ifs>1
3.4 Expected Value and Variance 69
A little should be said about the “‘not quite uniqueness” of the density function
for a random variable. Since the relation between a random variable and its density
function depends solely on integrating the density function, strictly speaking the
density function is not unique. (Two different functions can be density functions
for the same random variable.) To see this, simply think about changing the value
of the density function at some finite set of numbers. Making such a change would
not change the value of the integral of the function over any interval. For example,
in the case of the uniform density on the interval [0,1], it doesn’t matter whether we
say f(t) = 1 on the closed interval 0 <¢ <1 or the open interval 0 <t<1. The
only difference in the two involves the value of f(t) at the endpoints 0 and 1, and
the value assigned at these two places will not affect the value of the integral of f
over any subinterval of the real line. These observations notwithstanding, it is a
common practice to speak of the density function of a random variable with the
realization that this is a slight abuse of language.
When dealing with continuous random variables, Proposition 3.1 says that
one can disregard the endpoints of intervals since the probability of the random
variable assuming a value in a given interval is the same regardless of whether the
endpoints are included or not. With discrete random variables this is not the case.
For a discrete random variable X, P(a < X <b) and P(a < X <b) are not the
same in general, the reason being simply that there is no longer any guarantee that
P(x. = a). and P(X = b) are 0. In tact, it is easy to see that P(a@=<X <b) =
P(a<X <b) if and only if P(X =a) = P(X =b) = 0.
value is E(X) = 3.5, which is just the average of the numbers 1 to 6 that constitute
the values of X.
If Y is the number of defects in a sample of two devices drawn from a batch
of 5 of which 2 are defective, then it is easy to check (using a tree diagram) that
P(Y:=:0) = 3, PY =) = Omen =2)-= ai ts Caccrr ae anes
defined as :
EQ y=" 02S) 1X6 2 oe as
This example illustrates that the expected value of a random variable is indeed a
weighted average, where the weights are simply the probabilities of the respective
outcomes. The rationale is that if an experiment were conducted a large number of
times, each leading to an “observed value” of the random variable and if each value
assumed by the random variable were to be “observed” exactly as frequently as
predicted by the probability mass function, then this theoretical weighted average
would indeed match the true average of all the observed results of the sequence of
experiments. The fact that E(Y) = .8 in this simple illustration reflects the fact that
if the experiment were repeated a large number of times and if no defects were
found 30% of the time, 1 defect was found 60% of the time, and 2 were found 10%
of the time, then the average number found would be .8. This idea is incorporated
in Definition 3.6.
If X assumes infinitely many values, then there is no guarantee that the series
in the above definition converges. If the series fails to converge, then the mean of
the random variable is not defined. As a matter of fact, when one speaks of a
random variable having an expected value, it is generally understood that not only
does the series above converge, but that it converges absolutely.
If we keep in mind that for continuous random variables integration replaces
3.4 Expected Value and Variance 71
E(X) = irtf(t) dt
Just as with the sum in the discrete case, there is no guarantee that the
improper integral here will converge. Thus when one says that a given continuous
random variable has a mean or expected value, the statement implies that the integral
converges, and again common usage is to interpret this as meaning that the integral
actually converges absolutely.
Ew =| dt = [rend = 2x5], = 3
co 1 P 1
Proposition 3.4: IfX and Y are random variables (with finite expected value)
defined on the same sample space and if a and D are real numbers, then
E(aX + bY) = a EQ) + DEY)
E(axX
+ bY) DY) aX + bYVGH) PCCw4))
a >, XC) P{wy}) +b >, Yow, PC(wp))
=a E(X) + bDE(Y)
X=X,+X2+---+Xe
Definition 3.8: IfX is a random variable with expected value Ly, then the
variance of X is defined by
var(X) = El (X — Ly)"]
The standard deviation of X is the square root of var(X). It is common practice
Whereas the mean represents the theoretical average value, the variance
indicates the extent to which values of the random variable tend to concentrate
around the average (leading to a small variance) or fluctuate greatly (leading to a
large variance). The variance or standard deviation is used as a measure of how
“spread out” the values of a random variable are. If the values of X tend to cluster
tightly around the mean Ly, then the variance of X will be small. On the other
hand, if the values of X vary widely with high probability, then the variance will
be large. It is of course possible that (X — Ly)? not have finite expected value,
which is just to say that not all random variables have a finite variance.
Definition 3.8 is useful in that it gives a good intuitive understanding of what
it is that the variance measures. There is an alternate way of computing the
variance, however, that is often more convenient. This alternative is given in
Proposition 3.5. Proposition 3.5 says that if we know the mean, then all we need
to know additionally is E(X?).
= a’ E(X”) — a°E(X)*
= q’ var(X)
While most random variables encountered in practice fall into one of the
categories discrete or continuous, it is not hard to visualize situations where a
mixture of the two might occur. First, let’s examine a situation that leads to a mix
of two continuous random variables of the type found in Problem 3.14.
Solution: We will denote by X the random variable that gives the time to
failure for the randomly selected device. What we must do is to partition the sample
space according to the manufacturer of the chosen device. If we think of E and F
as the following two events,
E = event that device is made by manufacturer A
_F =event that device is made by manufacturer B
then for any t> 0,
P(X > t) P(E) PX >tlE)+P(F) PX >tlF)
Set a 6 eo
The exponential terms in this expression come from the simple calculation
called for in Problems 3.14 and 3.15. (By Problem 3.15, the expected value of an
exponential random variable is the reciprocal of the parameter A in the distribution.
The reciprocal of 4 is .25 and the reciprocal of 10 is .1). Thus the distribution
function forX is given by
Fy) = 1-.4e°7>' - 6e°" whent>0
The density function then is
fx) = 167? + .06e7" fort >0
76 Chapter 3: Random Variables
Problem 3.20 asks you to find the expected value for the random variable X,
that is, the expected time to failure for the randomly chosen component under
discussion.
Ad -—e?") ift<10
Fy(t)=1-PX>f = u
A) lest anes if
7>10
The graph of this distribution function is shown in the following picture.
From the form of the equations defining Fy(2), it is apparent that there is going to
be a jump discontinuity of magnitude .6 at t = 10.
3.5 Functions of a Random Variable LF
5 10
If you bear in mind that a discrete random variable always has a step function
as its distribution function and that a continuous random variable always has a
continuous function as its distribution function, it is apparent here that we are
looking at a distribution function that is of neither type. In particular, there will be
no probability mass function or density function for this random variable. For
distributions of this “mixed” type, the cumulative distribution function is the only
tool available in computations.
BM= | eOfo at
Example 3.23. Let’s consider taking the square root of a random number
between 0 and 1. Denote by X the random variable representing the random
number between 0 and 1, and the assumption is that the probability distribution of
X is the uniform distribution on [0, 1] as in Example 3.13. Let’s denote Y = VX
, in which case Y = g(X), where g is the function g(t)=Vr .
According to Proposition 3.7,
a 1
BO) =| Vi fy(t) at’ = [ Vt dt= :
In this example it is not difficult to work out the density function for Y and
find the value of E(Y) from Definition 3.7. In fact, for any t 2 0,
Fy) =P st) =PVX st) =PK sr) =Fy(’)
In Problem 3.3 the distribution F y is determined. From that exercise it follows
that
0 ifrs0
Fy(t) = + Pif0<t<1
Lire
Differentiation of Fy gives the density function for Y:
Zeb <7 <I
fy) = |
OQ otherwise
u 5 :
E(Y) =| tfy(t) dt = I i(2t) dt = =
Except in elementary cases, such as the above, it is often not easy to
determine the density function for Y = g(X) even if g is simple and X has an
easily described density function. Proposition 3.7 is very valuable in enabling
computation of E(Y) without needing to know the density function for Y.
= ye eee |
fo=
0 otherwise
of Problem 3.19(a). What is the variance of X?
Solution:
Problems
3.1 Graph the cumulative distribution function for the random variable that
counts the number of heads in four coin tosses.
eye) Suppose X is a random variable with density function equal to the uniform
density function on the interval [0, 1] as in Example 3.13. Determine the
distribution function of X and sketch its graph.
ees) Determine the probability mass function and sketch a graph of the
distribution function for the random variable that gives the largest number
appearing when a pair of dice is rolled.
S:7 If X has the density function f(t) = 3e-*! for t = 0, with f(t) = 0
whenever t < 0, find P(X > 1) and P(1 < X < 2).
3.8 Show that the function defined by f(t) = (1/2)e~"" for all t is a density
function, and for a random variable X having this density function, find
P(X! < 1).
3.9 Suppose f(t) = c(4 — #7) for -2 <t < 2, with f(t) = 0 otherwise.
Determine the value that c must have in order for f to be a density
function.
By A word is chosen at random from the following list of words: horse, dog,
cow, elephant, pig. (“At random” in this context means that each word
has an equal chance of being chosen.) Let X be the random variable that
tells the number of vowels in the word. Determine the probability mass
function for X, sketch the graph of Fy, and find E(x).
re for r>0
AR eer er
where A > 0, then P(X >t) = e™.
3:15 Show that the expected value of a random variable having the density
function of Problem 3.14 is E(X) = 1/h.
3.16 Suppose X is uniform on [0, 1]. Determine and sketch a graph of the
density function and the distribution function of Y = 2X + 3.
(Notice that the first of these two functions is unbounded, so the integral in
Definition 3.3 must be interpreted as an improper integral.)
3:20 Verify that the function defined by f(t) = .1e7?>! + .06e7" for all
numbers t¢ > 0, with f(t) = 0 for t < 0, is a density function and find the
expected value of a random variable having such a density function. (This
is the final step in Example 3.18).
This chapter will demonstrate how some of the most common discrete
probability distributions are used to model real-world phenomena. Two of these
have been presented in Chapter 1, though not by name. They are the binomial
distribution and the geometric distribution. Both are intimately involved with
independent trials processes.
84
4.1 The Binomial Distribution 85
The first part of this proposition is extremely plausible. If 10 coins are tossed,
the expected number of heads is 5. (Here n = 10 and p = 1/2.)
Some texts provide proofs for this proposition that depend on rather tedious
observations involving binomial coefficients. It is more instructive to focus on the
relationship between the first equation E(X) = np and the linearity of expected
value, Proposition 3.4.
Consider nv trials in an independent trials process with success probability p.
It is useful to introduce random variables X,,---,X, to indicate whether
success occurs on each of the n trials. Specifically, let X, be the random variable
that assigns value 1 to any outcome of the sequence of trials in which success
occurs on the first trial and assigns value 0 to any outcome in which failure occurs
on the first trial. Similarly, X, = 1 if success occurs on the second trial and X, =
0 if failure occurs on the second trial, etc.
Now if we look at X =X, +--.-+X,, the random variable X will tell how
many successes occur in the v trials. If you are at all confused about what is going
on here, it might be a good idea to consider a special case, such as n = 3. In that
case the sample space can be represented by a tree diagram or can be viewed as
{SSS, SSF, SES, SFF, FSS, FSF, FFS, FFF} where S is success and F is
failure. Give the value of X,, Xz, and X3 for each of the eight elements of the
sample space, and make sure you understand that X =X, + Xz + X3 is indeed
the number of successes in the three trials.
By the linearity property of expected value (Proposition 3.4), the expected
value of X is E(X) = E(X,) +---+E(X,). However, for each k from 1 to
n, X, assumes only the values 0 and 1, and E(X;,) = P(X, = 1) =p. Thus
E(X), being a sum of n terms each of which is equal to p, is in fact np.
A short proof of the second part of Proposition 4.1 will be given in Chapter 6
as an application of the fact that variance is additive for “independent” random
variables. (The concept of independence of random variables is introduced in
Chapter 6.)
86 Chapter 4: Discrete Models
Definition 4.2: A random variable X that assumes positive integer values is said
to be geometric with parameter p provided that for each positive integer k
PX =k) =q*"p
where g=1-—-p.
variable. This alternate way of modeling the waiting time for a flood to occur will
be considered in the next chapter.
An alternative and intuitively helpful way to observe that P(X >n) =q” is
to think of X as modeling waiting time for success in an independent trials
process. Then P(X >n) is simply the probability that more than n trials are
required, that is, that the first trials result in failure. However, the probability
that the first n trials result in failure is just q”.
If k > O then the intersection of the events (X >n) and (X =n +k) is the
event (X =n +h), since (X =n +k) is a subset of (X > 7). Thus
P(X=n+k) _ pqr**7!
P(X
(oie =n+kiX>n) ole = apn) o
pq! =P(X =k)
Let’s interpret the meaning of this property in light of Example 4.1. Suppose
in that example that a flood does not occur during the first five years. The
conditional probability that the first flood occurs in year 12, based on this
88 Chapter 4: Discrete Models
information, is then exactly the same as the initial (unconditional) probability that a
flood would occur in year 7. In other words, there is no penalty for the 5 years of
good luck. The conditional probabilities after the 5 years of good luck are the same
as if we think of the process as starting all over again.
An example that is simpler still is the independent trials process consisting of
repeated tosses of a coin. If we think of the random variable X that indicates the
number of the trial on which the first head occurs, then X is geometric (with
parameter p = 1/2 if the coin is unbiased). The lack of memory property says, for
example, that if tails occur on the first two tosses, then the probability that the first
head occurs on the fifth toss is the same as the original probability that the first
head occurs on the third toss. In other words, you can forget about the first two
tosses that have resulted in tails and think of the whole process as starting all over
again. This mathematical model is consistent with most people’s intuitive feeling
that a coin doesn’t have a memory, and it provides a little more evidence supporting
the choice of an independent trials model as the “correct” one for repeated coin
tossing.
Proof: By definition,
It is a pleasant fact from calculus that power series may be differentiated term by
term. This means that if
f(x) = Ya,"
n=0
then
4.2 The Geometric Distribution 89
f@=1+xt+xr+...= 1-x
then
1
Ff = + 0+ 3x +. ;
(=x)
In particular, this implies that the infinite series that defines E(X) in the first
line of the proof does sum to p x (1 — quer = plp” = I/p.
Example 4.3. Proposition 1.1 in Chapter 1 is valid even when the sample
space is partitioned into an infinite sequence of mutually exclusive events
90 Chapter 4: Discrete Models
= (s\a)+lala)+(e)G)
STD Ca gay
=rtr¢PeAtis where r= >
=ril¢+rt+rt:--)
PX <VIX=1)=5
PX <YIX=2)=4
PIX <Y1X=3)= 5
and so on, in the derivation of the above series. Do you understand the basis for
this? See Problem 4.17.
4.3 Discrete Uniform Random Variables 91
The Poisson distribution is less easily motivated than those considered to this
point. It is commonly used to model situations that involve some kind of random
phenomena such as malfunctions of equipment, calls coming in to a switchboard,
cars entering a parking lot, etc. In each of these situations, the Poisson distribution
is used to model the number of times the phenomena occurs during a fixed time
interval. For example, the number of calls received at a telephone switchboard
during a 30-minute time interval might be assumed to be a Poisson random
variable. A more thorough discussion of these applications will come after the
discussion of Poisson processes in Chapter 7.
To note that these probabilities sum to 1 requires remembering the usual series
expansion for the exponential function:
n
Replacing the x in this expansion by A, we can see that the reason that the e~*
factor is present in the definition of the Poisson mass function is simply to make the
probabilities sum to 1.
4.4 Poisson Random Variables 93
Example 4.4. Suppose the number of cars entering a certain parking lot
during a 30-second time period is known to be a random variable having a Poisson
mass function with parameter 1 = 5. What is the probability that during a given 30
second period exactly 7 cars will enter the lot? What is the probability that more
than 5 cars enter the lot during this time period?
The proof of the first part of Proposition 4.4 depends only on the series
representation for the exponential function, Equation 4.1. (See Problem 4.9.) The
derivation of the variance of the Poisson distribution requires more manipulation
with power series and is a more difficult computation. (See Problem 4.18.)
Example 4.6. Assume now that 17 items are to be tested from a batch of 100
items of which 5% are defective. What is the probability that exactly 2 in the
sample of 17 will be defective?
Solution: The question that has to be decided first is this: What are we going
to do with an item after we test it? Are we going to lay it aside, or are we going to
put it back in with the others so that the same item might get tested again? (In the
latter case we would not be testing 17 different items necessarily.)
First let’s suppose that we lay items aside once they are tested, so that there is
no chance of testing the same item twice. In this case 17 items are being chosen
from the batch of 100, of which 95 are good and 5 defective. If X is the number
of defectives in the sample of 17, then X is a random variable having a
hypergeometric distribution with N = 100, n = 17, and M=5. And
CS, 27 C5, 15)
P(X =2)=
CUO ae La
96 Chapter 4: Discrete Models
Often it is useful to study the distribution of values in some fixed data set. The
following example illustrates the relation between this idea and the concept of a
probability distribution. One can, of course, divorce the data set analysis from
probability altogether, but the idea of a histogram, or frequency distribution, is in
fact quite similar to the ideas of the probability mass function and the density
function.
6 1
i 0
8 1
There are two important observations to make about this simple situation. One
is that the decision to include or exclude the outcomes 5 and 7 shown in the table is
arbitrary. If they are included, they will be assigned probability 0. However,
having them included with probability 0 serves no useful purpose. Rather than to
include them in the sample space with assigned probability 0, it is simpler to leave
them out. In reality, there is no more reason to include the outcomes 5 and 7 in the
sample space with assigned probability zero than there is to include the values 9 and
10 or any other numbers. The only thing special about 5 and 7 is that they happen
to be included in the table of data above.
The second important point in this example is that a probability model based on
this data does not necessarily have predictive value for other data. For example, if
this data were accumulated for items made at one plant, it does not mean necessarily
that similar equipment made at another plant would have the same failure rates. Nor
does it mean that equipment from the same plant made during a different time
period would have similar failure rates. The fact that 5 or 7 failures were never
observed during the period that the table covers does not mean that the next piece of
equipment made at the plant will not fail 5 or 7 times during the first year.
Different data sets may or may not reflect a similar distribution. An analysis of
the “correlation” between data sets is a question for statistics. The important
probabilistic concept is that the probabilities introduced in this example to describe
this data set do only precisely that: The probabilities describe the probability
distribution of the given data set.
The following figure is a histogram. Histograms are frequently used to
display frequency distributions. In the figure the values 5 and 7 are shown only to
maintain the linearity of the horizontal scale. Notice that the idea of a histogram is
basically the same as the idea of displaying the probabilities of the various
outcomes. If the values on the vertical scale were divided by 250, the heights of the
bars would be the probabilities of the outcomes. This would then be a visual
representation of the probability mass function of the random variable that gives the
number of failures during the first year of operation for the equipment described by
the data set. (This would not technically be a “graph” of the probability mass
function because the mass function satisfies p(x) = 0 except for a few integer
values of x. However, it could be construed as the graph of the density function
98 Chapter 4: Discrete Models
Problems
4.1 Three cards are drawn without replacement from the 13 spades in a deck of
cards. Let X denote the number of face cards drawn. (The ace, king,
queen, and jack are the face cards.) Determine the probability mass function
for X.
4.3 A random variable X assumes values 1, 2, 3, ---, 100, and for each
integer k where 1< k < 100, P(X =k) = .01. Using & notation, give a
sum that would evaluate each of the following:
(a) E(X) (b) var (X) (c) E(Y) where Y = 2*
4.4 A coin is tossed 3 times. Let X denote the number of heads obtained. Let
Y be the absolute value of the number of heads minus the number of tails.
Compute E(Y) in each of the following ways:
(a) Write Y as a function of X; that is, determine g such that Y = g(X)
and then use Proposition 3.6.
Problems 99
(b) Determine the probability mass function for Y and compute E(Y)
directly from Definition 3.6.
4.7 Suppose that it is known that the number of components to fail in a complex
electrical device during a one-day time period is a random variable X having
a Poisson distribution with parameter 2 = 5. What is the probability that
there will be exactly 3 failures next Tuesday?
4.8 Suppose it is known that a particular electrical device had 50 components fail
during the past 10 days. What is the probability that exactly 3 should have
failed yesterday? (The model you should use is that of an independent trials
process. Each of the 50 breakdowns could have occurred yesterday or on 1
of the other 9 days. Considering the days equally likely means that a
particular breakdown had a 1/10 chance of occurring yesterday and a 9/10
chance of occurring one of the other days.)
Notice that the answers to this problem and to Problem 4.7 are almost
identical. In each case we are looking at a random variable for which E(X)
= 5. (For the Poisson case, this is a consequence of Problem 4.9.) In fact
there is a very close relationship between the Poisson and the binomial
distributions which these problems illustrate. This relationship will be
explored in Chapter 7.
4.9 Show that if X is a Poisson random variable with parameter 1, then E(X)
=}. [Hint: This is easy. Just write out the series that defines E(X) and
100 Chapter 4: Discrete Models
4.10 Determine the variance of the discrete uniform random variable that assumes
values 1, 2, 3, 4, and 5, each with probability .2.
4.12 A random variable X gives the voltage output from an acoustical transducer.
The voltage varies from —3.5 to +3.5 volts, but X measures the voltage as
‘rounded off to the nearest integer. Assuming that the probability distribution
forX is the discrete uniform distribution and that X assumes the values
{—3, -2, -1, 0, 1, 2, 3}, find the cumulative distribution function for X and
sketch its graph.
4.13 A “black box” transmits binary data in the form of 0’s and 1’s which go into
the box and are transmitted out according to the following probabilities.
Let’s denote by X and Y the random variables that give the input digit and
the output digit, respectively. The relation between X and Y is as follows:
P(Y =01\X =0)=pp and PY =11X =0) =1-pp.. On the other
hand, if a 1 is input then the probability is p, that a 1 is output and 1 —p,
that a 0 is output.
(a) If the probability is 1/3 that a given input digit is 0 and 2/3 that it is 1,
find the probability that the output digit is a O and the probability that it
isa 1. These will be expressed in terms of pg and p;.
(b) If an output digit 1 is observed, what is the probability that the input
digit was aQ? (This is a conditional probability question. Continue to
assume that the unconditional probability of 0 as the input digit is 1/3.)
4.14 The number of letters mailed out from a business office in a day was
surveyed over a period of time. Results of the survey were as follows:
= 12
5 9
6 5
I 5
8 4
9 1
10 or more 4
Construct a probability distribution based on this data with the outcome “10
or more” considered as one of the elements of the sample space. Construct a
histogram for this data set, and sketch a graph of the distribution function of
the random variable that corresponds to the number of letters leaving the
office in a day.
Specifically, show that Fy(t) = Fy(t) if tis an integer. Now compare the
graphs of Fy and Fy.
4.16 Use the idea of a geometric series to show that for any geometric random
variable X, P(X =1)+ P(X =2)+ P(X =3)+---=1.
4.17 In Example 4.3, give an explanation for why P(X < Y|1X = 1) = 1/2,
P(X <Y1X =2) = 1/4, and PX < Y1X =3) = 1/8, and so on.
4.18 Show that if X¥is a Poisson random variable with parameter A, then
Oy =A
[Note: This problem should be attempted only if you are comfortable with
102 Chapter 4: Discrete Models
4.19 A motel owner has bought 7 television sets from UltraView TV Corporation.
If 26% of UltraView’s televisions have to be returned for repair during the
first year of operation, what is the probability that the motel owner will have
to return more than two of his sets? (Make sure that you know how to frame
your solution in terms of the binomial distribution.)
4.20 The motel owner of Problem 4.19 put 4 of his 7 televisions in private rooms
and 3 in the lobby. If 3 of them have to be returned for repairs during the
first year, what is the probability that 2 of the 3 to be repaired are ones that
were in the lobby? [Hint: Assume that the seven televisions are all equally
likely to fail. Frame your model here in terms of the hypergeometric
distribution. What are n, M, and N? Can you summarize explicitly what
assumption you are making about this question when you decide to model it
using the hypergeometric distribution?]
4.21 Suppose the motel owner of Problems 4.19 and 4.20 has 3 instances during
the year of a guest damaging a TV so that it must be repaired. What is the
probability that two of the repairs are to televisions in the lobby?
This question is ambiguous in that it is not explicitly stated whether the
same television may be damaged more than once. Common sense would
suggest that it might be; so let’s assume that each case of vandalism is
equally likely to happen to any of the 7 televisions. Which of the probability
distributions discussed in this chapter is your model then based on? What is
the relation between this problem and Problem 4.20?
4.22 Suppose you were going to sample the opinion of 200 registered voters in
the United States as a gauge of public opinion on some issue. Would it make
a significant difference whether you chose sampling with replacement or
sampling without replacement? Why or why not?
Chapter 5: Continuous Models
Continuous random variables arise when quantities are being observed that
may assume values throughout some interval of numbers. The necessary
information to do computations involving such random variables is carried by the
density function. In Chapter 3 we saw a few elementary examples of density
functions, and in this chapter we will see how some of the most common are used
to model real-world phenomena. Just as was the case with discrete models, the
skill that the modeler must bring to a problem involving continuous random
variables is the experience and intuitive understanding necessary to judge what kind
of distribution “fits” the situation.
A random variable that has this density function is said to be uniform (or
uniformly distributed) on the interval [a, b].
The reason f(t) must assume the value 1/(b — a) on the interval [a, b], of
course, is that the area under the graph is required to be 1, in accordance with
Definition 3.3. It should be apparent from the symmetry of this density function
103
104 Chapter 5: Continuous Models
that the mean of a uniform random variable is simply the midpoint of the interval
and that the variance is greater the longer the interval. (See Problem 5.1.)
Figure 5.1 shows two uniform density functions, the uniform density on [0,
1/2] and the uniform density on [3, 5]. The area under the graph (which is required
to equal 1) is shaded in each instance.
SS
exponential density
with parameter A =2 ; :
exponential density
with parameter 2 = 1/2
1 2 3 1 2 3 4 5
Figure 5.2 Two exponential density functions.
P(X>w)= A i ea =e
Ww
(This computation has appeared previously as Problem 3.14.) From this, it follows
106 Chapter 5: Continuous Models
_X(s+t)
that P(X >s+t)=e , and so
—s+t)
P(X>sttiX>)= -M
=e
= P(X >s)
é
Solution: From Problem 3.14, if s >0 then P(X >s) = eS. Thus
that the first failure occurs after month 30” or as being “what is the probability that
the first failure occurs during or after month number 30?”
You might be interested to know that Proposition 5.1 characterizes
exponential random variables. In other words, a continuous random variable that
has the lack of memory property exhibited in Proposition 5.1 necessarily has an
exponential density function.
fo) aa Von ee
A random variable X that has this density function is called a standard normal
random variable.
(sed = n.
requires a special trick from multivariable calculus. Finding the mean and the
108 Chapter 5: Continuous Models
Definition 5.4: The normal density function with parameters 1 and © (where
6 >0) is the function
1 t= wpe?
1D Sain
If a random variable has this density function, it is said to be a normal random
variable with parameters UX and o.
5.3.) Thus the value of 1 determines where the “bell” is located on the ¢ axis.
This is shown in Figure 5.3. It is important to realize that the general shape of the
bell is determined by the parameter o. A small value of o produces a tall skinny
bell, whereas a large value of o causes the values to be much more spread out with
a correspondingly lower peak.
TT
Proposition 5.3:
1. If X is normal with parameters wp and o, then X has mean wu and
variance 0°.
2. If X is normal with parameters wu and o, then the random variable
(X—\)/o is standard normal.
3. IfX is standard normal and uy and o are numbers with o > 0, then the
random variable Y = oX + is normal with parameters Lt and o.
Solution: Let X denote the lifetime of the bulb. Then X is normal with
parameters = 1,000 and o = 200. Furthermore, if we let Y denote (X — )/o,
then Y is standard normal.
We wish to know P(X > 900). However,
P(X > 900) = P( (X — )/o > (900 — L)/o )
P(Y >-.5) =1-P(¥ <—.5)
1 — Fy(-.5) = 1 — .3085 = .6915
Thus the probability that the bulb lasts more than 900 hours is approximately
.6915. (See Problem 5.14.)
process is consistent enough that observed lifetimes do tend to cluster around some
mean value.
Both the normal and the exponential distributions are used to model “time to
failure” for various devices. The normal distribution fits situations such as the
lightbulb example above where some kind of actual “wearing out” or “burning out”
is taking place. Clearly, a bulb that has burned for 1,000 hours is not physically
the same as a new bulb. There is no “lack of memory” property in this situation.
On the other hand, something like a piece of electrical cable may not be undergoing
any physical deterioration during use and may be no more likely to fail after 10
years of use than the day it was installed. Instead, it might fail only because of
some random occurrence having nothing to do with any aging process, and thus its
lifetime might well have the “lack of memory” property exhibited by the exponential
distribution.
X — py ete kn az 800
Oy 20
is “approximately” standard normal. Thus,
P(X < 775) = P(X — 800 < —25)
= P[(X — 800)/20 < -1.25]
= PCY <=1.25)
= Fy(-1.25) = @O(-1.25) = .1056
Some ways of obtaining this value of @(—1.25) are discussed in the next
section. All require use of some kind of numerical method.
What we have done in this example is to use the normal distribution as an
112 Chapter 5: Continuous Models
Bear in mind that the larger n is, the better the approximation. This is
fortunate, because with large values of n, the binomial distribution is cumbersome
to use. So the approximation provided by the normal distribution is most accurate
just where it is most needed, for large values of n.
The relation between the binomial and the normal distribution is of more than
just theoretical interest. It provides a useful tool for estimating the reliability of
“sampling” strategies. The following example illustrates this.
5.3 Normal Random Variables 1413
q ofLX — np fen |
——— —_
Vnpq Vpq
\X — npl —
Se
Ss = > Io n } (5.1)
The reason this latter inequality is true depends on a little elementary calculus.
Check that the function f(p) = p(1 — p) takes on its maximum value at p = 1/2,
114 Chapter 5: Continuous Models
where f(p) = 1/4. This means that pq, where g = 1 — p, is always less than or
equal to 1/4, and therefore Vpq < 1/2 for all possible values of p and g = 1 —p.
Therefore,
Because of this inequality, the event whose probability forms the left side of
Inequality 5.1 above is a subset of the event whose probability is on the right. So
the inequality is a special case of the fact that P(A) < P(B) if A CB.
Here’s where the normal distribution comes in. Recall that the random
variable
X —np
Vnpq
is approximately standard normal. Problem 5.30 asks you to show that if Z is a
standard normal random variable, then P(IZ| > d) = 2 — 2®(d) for any positive
number d. Therefore, the rightmost term in Inequality 5.1 is approximately equal
to 2—2@(.06V7).
The company would like to be 95% confident of their claim, that is, the goal
is that this term be no greater than .05. So let’s just set 2 — 2(.06 Vn ) equal to
.05 and solve for n. Then, since 2 — 2(.06 Vn) = .05, we know ®(.06 Vn) =
.975. In the next section we will discuss ways of evaluating the function ® and its
inverse. If @(.06 Vn )= .975, then .06 Vn ~ 1.96 (use the table in Appendix B if
you like), and solving for 2 gives n = 1067.
Conclusion: If the firm wishes to claim with 95% confidence that they have
estimated the probability p to within .03 by testing n cases and then estimating p
by the fraction of the cases on which the treatment works, it will be necessary for
them to test at least 1067 patients with the drug.
Clearly there are numerous situations to which the techniques of the above
example would be applicable. The most familiar is that of the public opinion
research firms that make statements such as, “If the election were held today,
candidate A would get 42% of the vote, plus or minus 3%.” The 95% confidence
level is so often used that it has become something of a standard and is often left
unsaid in statements such as this.
There is another useful tool for estimating the probability that a random
variable differs from its mean by more than a given amount. It is called
5.3 Normal Random Variables HES
Comment on Proposition 5.5: Let’s examine what the proof looks like in the
continuous case. (The discrete case may be proved similarly by replacing the
integrals by sums.)
First notice that it is sufficient to prove the proposition for the special case in
which E(X) =0. The reason is that the random variables X and X — u have the
same variance. Therefore, if the proposition is true for X — LL, it will be true for X
also.
So now suppose that X is a continuous random variable with mean u = 0 and
with density function f; then
P(xXl2e) = PC 2 €)+ POX s=-€)
a =
ith + iS(t) dt
ee, : oo (2
lA | ait dt + ) ait at
ea ch iri.
oe 9 d
said f(t) dt + —Bi t2 f(t) d at
116 Chapter 5: Continuous Models
Example 5.5. A lot of 2,000 items has been produced on an assembly line
on which 2% of the items produced are defective. Treating the assembly line as an
independent trials process with each item produced having probability .02 of being
defective, use Chebyshev’s inequality to give a bound on the probability that in the
batch of 2,000 items the number of defects is between 30 and 50.
Solution: Let X denote the number of defects in the 2,000 items. The
assumption is that X is binomial with parameters n = 2,000 and p = .02. This
enables us to compute w and o: = np = 40 and o” = npq = 39.2. If we now
take € = 10 in Chebyshev’s inequality, we have
392
P(30<X <50) = P(X-pul<10) = 1- 00 = .608
This should be viewed not so much as an estimate of the actual probability as a
bound on the probability. The probability that X assumes a value within 10 of its
mean is at least as great as .608. Since Chebyshev’s inequality is based on
nothing more than the mean and the variance of the random variable involved, it
can’t possibly give accurate estimates.
By way of comparison, we can relate this to the estimate provided by the
central limit theorem (Proposition 5.4):
10
v 20( )-1
Vnpq.
=2~x 9441-1
= 888
P(X, —npl<e)>1- os
The value of this information is that it is true for whatever choice of € > 0 that we
wish to make. In particular, let’s replace € by nS, where 6 is thought of as
“small.” Then
{ [=I <5) a
i wd nd
Now notice that for any value of 8, the right side converges to 1 as n > o»,
What does this mean? It means that for any choice of 6, no matter how small, the
probability that the fraction of trials resulting in success differs from the theoretical
probability p by less than 6 tends to 1 as the number of trials increases without
bound. In other words, it is guaranteed that the observed relative frequency of
successes will converge to the theoretical relative frequency (as measured by p) as
the number of trials tends to ce. This is a special case of an important theoretical
result called the law of large numbers.
a= | fod
where f is the standard normal density function
Thus, if values for the standard normal distribution function are needed, what is
necessary is a good way to approximate this improper integral. In fact, the
symmetry of the integrand f(¢) and the fact that f is a density function mean that
oo 0
{ f()dt = 1 and } f(t) dt =5
Os) = 5 - iyf() dt
On the other hand, if s > 0, then the symmetry of f(t) about the y-axis implies
that
=S, oo Ss
AY
Simpson’s Rule
1
TD * { f(O) + 4 f(.25) + 2 f(.5) + 4 f(.75) + f(A) J
where f is the standard normal density function. Such a calculation is easy with a
scientific calculator and even easier with a programmable one. Students who know
Simpson’s rule are encouraged to try it out as a method of getting numerical
answers in exercises involving normal random variables.
Since the mid 1970s many scientific calculators have included the normal
distribution function as a built-in function. Some program must then reside within
the calculator for computing the values. The one described in Problem 5.14 has
been used in Texas Instruments™ calculators for a number of years. It is easily
programmed on any programmable calculator or pocket computer.
Tables
Problems
years?
(c) If such a device has already lasted 10 years, what is the probability that
it will fail during the next 10 years?
5.4 Suppose you have just bought a piece of equipment that has been advertised
to have an expected lifetime of 2 years. Is it likely to actually last that long?
(In other words, how probable is it that the device will actually last that
long?)
(a) Assume that the lifetime is normally distributed.
(b) Assume that the lifetime is exponentially distributed.
5.6 Suppose X is a uniform random variable on [0, 1]. Let Y =X’. Compute
the distribution function for Y and the density function for Y. Notice that
fy is an unbounded function, but that part 2 of Definition 3.3 is satisfied by
fy if the integral is interpreted as an improper integral. Compute E(Y)
using Definition 3.7 and again using Proposition 3.7.
SVP Assume X is uniform on [0, 2], but now suppose Y = LX — 1|. Determine
the distribution function, the density function, and the expected value of Y.
[Hint: The biggest difference between this and Problem 5.6 is that you will
have to be more careful and clever in trying to write Fy in terms of Fy.]
One interpretation of this problem is to think of it as a model for choosing a
random number between 0) and 2 and then to consider how close the number
chosen is to 1.
Problems 421
5.8 A wire is 2 ft long. A random point on the wire is picked, and the wire is cut
at that point.
(a) What is the expected value for the length of the shortest piece?
(b) If Y is the random variable that denotes the length of the shortest piece
of wire obtained when the original is cut, find the distribution function
and the density function for Y.
[Hint: This problem is closely related to Problem 5.7. Think of the wire as
lying on the interval [0, 2] and X as being the point where the cut is made.
Assume X is uniform on [0, 2]. Now write Y as a function of X.]
Sell If X is normal with parameters uw = 5 and o = 2, find P(2 < X < 4). You
can use Simpson’s rule, the numerical method of Problem 5.14, or the table
in Appendix B.
aes Show that if X is an exponential random variable with parameter 1, then the
variance of X is 1/47.
5.14 It used to be commonplace to consult long and tedious tables when values of
the standard normal distribution function were needed. Many scientific
calculators now have this function built in. In addition, there are many
common and simple numerical methods that may be used to evaluate it. The
following peculiar but useful formula is given in the Handbook of
Mathematical Functions of the National Bureau of Standards.
If ® denotes the standard normal distribution function and
ogre Py
fO aa
denotes the standard normal density function, then values of ® may be
122 Chapter 5: Continuous Models
5.15 Assume that the number of defects in a batch of 10,000 manufactured items
is anormal random variable with mean ut = 300 and standard deviation o =
100. (Clearly the actual number is an integer-valued random variable. By
now you should be getting used to the idea of using a continuous model for a
‘situation that technically is discrete.) Find the probability that the actual
number of defects found in a batch will be between 225 and 275.
5.16 If 100 items are tested from a batch in which 20% are bad, what is the
probability that the number of bad ones found will be between 15 and 25
inclusive, that is, P(15 < X < 25), where X = number of defectives
found?
(a) If each one tested is considered an independent trial with probability p
= .2 that the item is bad, then the number of bad items found is binomial
with n = 100 and p = .2. Compute P(15 < X < 25) assuming that
X has this probability distribution.
(b) Now approximate this probability using the fact that (X — np) npq is
approximately standard normal, and compare your answer to the exact
answer in part a. (If you are really interested in great accuracy in your
approximation here, you might want to consider further the fact that X
is actually integer-valued. This means that, in fact,
P(USSX<25) =P(14 <X < 26)
If one “averages” these two intervals and uses
P(14.5 < X < 25,5)
in calculating the normal approximation, then the approximation of the
discrete binomial with the continuous normal will be very good.)
5.17 Suppose that a flagpole is erected and that it is assumed that the “lifetime” of
the flagpole is a random variable having an exponential density function and
Problems 123
alo Prove Proposition 5.2. [Hint: The mean is easy. To compute the variance,
use integration by parts with u = and dy =te~/2. Then to evaluate
Jv du
in the integration-by-parts formula, where
fudv = uw -Jvdu
use the fact that the standard normal density is a density function.]
21 Suppose that the number of cases of a certain disease per 100,000 population
is a normally distributed random variable with mean 300 and standard
deviation 100. What is the probability that a particular community of
100,000 people would have more than 500 cases of the disease? (Notice that
this is another instance where we are using a continuous model for what is
actually a discrete phenomenon.)
124 Chapter 5: Continuous Models
Deed Suppose a random number is generated (uniform on [0, 1]) and then
truncated after the first digit so the result is one of the numbers 0, .1, Phe
, 9. Compute the expected value for this truncated result, and understand
your computation in the context of Proposition 3.7. One interesting feature
of this situation is that we are looking at an instance of the relation Y =
g(X), where X is continuous and Y is discrete.
See: Suppose that a > 0 and that X is a random variable that is uniformly
distributed on the interval [0, 1]. Find the distribution function and the
density function for the random variable
y=-— In (1 —X)
(This problem has very useful applications, because many devices such
as calculators and computer routines are equipped with random number
generators that produce numbers between 0 and 1 according to a uniform
density on [0, 1]. If one wants a “random” value for an exponentially
distributed random variable and has such a “uniform on [0, 1]” random
number generator available, this exercise shows how to get one.)
5.24 A random number between 0 and 1 is produced using the uniform density on
[0, 1]. What is the probability that the first digit in the decimal representation
of the square root of the number is a 7?
ee) In the circuit pictured below, the resistance of the resistor labeled R
fluctuates between 1 and 2 ohms. Assume a uniform distribution for the
resistance between 1 and 2 ohms. If the battery output is a constant 12 volts,
what is the probability distribution of the voltage drop across this resistor?
Find the distribution function and the density function for this voltage drop.
What is the expected value for this voltage drop across R?
R
12 volts
1 ohm
Problems 125
R
2 12 volts
a9: Redo the calculation in Example 5.3 using the continuity correction (as
described in that example) to see how much it changes the answer.
126 Chapter 5: Continuous Models
5.30 Show that if Z is a standard normal random variable, then P(IZ| > d) =
2=—2(d) for any number d > 0.
5.34 If X is a normal random variable, find the probability that the value assumed
by X lies more than 2 standard deviations away from the mean. [Hint: Part
of the problem here is to convince yourself that the answer to this question is
the same regardless of what the mean and variance of X happen to be.]
Many common mathematical models involve more than one quantity. For
example, a model for a gas is concerned with the interaction of temperature and
pressure. When dealing with a probabilistic model, this means that we must be
concerned with the way that two or more random variables interact. Knowledge of
the individual behavior of the two will not be adequate because we may need to
know how certain values assumed by one affect the probability of the other
assuming given values. In the discrete case the interaction is studied via the joint
probability mass function, and in the continuous case via the joint density function.
There is additionally a natural way to extend the concept of the distribution function
to this “joint” setting, but the joint distribution function is less useful for
computation and will not be emphasized in this book.
When studying the relation between two random variables, we often will need
to consider the intersection of two events of the form (X =a) and (Y=b). A
useful way to denote the intersection of these two events is to write the intersection
as (X =a, Y=b). The probability of this event is written P(X =a, Y =b).
Expressions such as P(a <X <b,c <Y <d) have a similar meaning, that is,
the event whose probability is being represented is the intersection of the two
events separated by commas.
Definition 6.1: If X and Y are discrete random variables, then their joint
probability mass function py y is the function of two variables defined by
Py y% Y) = PX=x,Y=y)
er
128 Chapter 6: Joint Distributions
Example 6.1. Four items are labeled with the numbers 0, 1, 2, and 3, and
two of the items are randomly selected one at a time without replacement. We will
denote the number on the first item selected by X, and the number on the second
by Y.
It is sometimes useful, in examples such as this where there are a small
number of possible outcomes, to exhibit all values of the joint mass function in a
table such as the one shown in Figure 6.1, which presents everything there is to say
about the joint mass function forX and Y.
The 16 numbers in the table that are not in boldface give all values of py y.
For example the first 1/12 in the top row gives the probability P(X = 1, Y = 0).
The boldface numbers at the bottom and at the right edge are the values for the mass
function of X (bottom row) and Y (rightmost column). Notice that these
probabilities may be obtained simply by adding the values of the joint mass function
in the corresponding row or column. The reason is elementary. For example, the
event (X = 1) is the disjoint union
A= 1YeQURCeLYaH1l)vuG
a1 Y=2) 0s, res)
and therefore,
Proposition 6.1 summarizes the relation between py, py, and Pxy- The
proof of the proposition is based on the same ideas that are illustrated in the
example above. Conceptually, it is important to understand that the joint probability
mass function carries all the information necessary to do computations related to the
interaction of X and Y. In particular, Proposition 6.1 summarizes how the mass
functions of X and Y individually may be determined from the joint mass
function. Often the probability mass functions of X and Y are referred to as the
marginal probability mass functions of X and Y to distinguish them from the
Joint mass function for the pair X, Y.
Proposition 6.1: The relationship between the marginal mass functions py and
Py and the joint mass function py y can be described in general as follows:
1. For any number x, py(x) = BY Px y(%, y,) where the sum extends
The basic idea is that two random variables are independent if events
described in terms of one random variable are independent from events described in
terms of the other. Literally, the definition says that X and Y will be considered
independent random variables provided that events of the form (X = x) and
(Y =y) are independent events for all choices of numbers x and y.
130 Chapter 6: Joint Distributions
Example 6.2. One 4-ohm resistor and two 8-ohm resistors are in a box. A
resistor is randomly drawn from the box and inspected; then it is replaced in the box
and a second is drawn. We will denote by X the resistance of the first one drawn
from the box and by Y the resistance of the second. Since this is sampling with
replacement, the result of one test has no influence on the result of the other test.
This is indicated in the following table of values of the joint mass function for X
and Y. In this case X and Y are independent random variables.
When X and Y are independent, the values of the joint mass function are
determined by the values of the mass functions of X and Y. For this example,
this is clearly visible in Figure 6.2. The circled entries illustrate that 2/9 = 1/3 x 2/3.
Similarly, each of the four values of the joint mass function shown in Figure 6.2 is
the product of the appropriate value of the mass function of X and the mass
function of Y. In general (when independence is lacking), it is not possible to
construct the joint probability mass function from a knowledge of the mass
functions of X and Y individually. See Problem 6.31(a) for an interesting
illustration of this.
The joint density for a pair of continuous random variables will also
necessarily be a function of two variables. It plays the same role that the joint mass
function plays for a pair of discrete random variables. That the probability of the
entire sample space be 1 requires the double integral over the xy-plane be 1, just as
the integral in Definition 3.3 in Chapter 3 is required to be 1.
6.2 Joint Density Functions 1341
the xy-plane.
132 Chapter 6: Joint Distributions
Among the easiest to understand examples of joint density functions are the
joint uniform densities.
Definition 6.5: Given a region T in the xy-plane, the uniform joint density
function on the region T is the function f defined by
i
on if (x,y) € T
fy) =
0 if &y) ¢T
where A denotes the area of the region T.
The reason that the constant value that fassumes on T must be the reciprocal
of the area of T is that the integral of the constant function 1 over the region T is
precisely the area of T. So this requirement makes the joint uniform density satisfy
condition 2 of Definition 6.3.
Example 6.3. Suppose X and Y have as joint density function the uniform
density’ on the square T = {@)y) 70 s2 =1,0 sys 1}. What is:the
probability that X takes on a value larger than Y? In other words, what is the
probability of the event (X > Y)?
P{(X, Ye B} = ilef(x,y) dx dy
: (pki f(x,y) dx dy
I 1 dx dy
Baw
a
2
(Since we are integrating the constant function 1 over the triangle B 1 T, the value
of the integral is just the area of the triangle.)
Recall how, in the discrete case, the mass functions for X and Y can be
constructed by adding the appropriate values of the joint mass function. A similar
method is available in the continuous case for obtaining the density functions for X
and Y from the joint density. This method is described in Proposition 6.2, which
is similar to Proposition 6.1 except that (as usual) summation is replaced by
integration. Often the density functions of X and Y are referred to as the
marginal densities to distinguish them from the joint density for X and Y.
134 Chapter 6: Joint Distributions
Proposition 6.2: If X and Y have joint density functionf, then the densities
forX and Y may be obtained from f as follows:
f=
| fonray and fo=
| fon» ax
Proof: Given any numbers a and b with a <b, P(a <X <b) can be
computed from the joint density function using Definition 6.4 by taking B to be the
region
{(%, y):a<x<b,--o<y<oo}
that is, B is an infinite vertical strip in which no restriction is placed on the second
coordinate. Then
Pia<X <b) =P{(X, YeB} = J fo dx dy
~ f He f(x, y) dy dx = feo dx
where g(x) is the function
sa)=
| fee» ay
But knowing that
b
Pia<X <b) =| g(x)ax
for all choices of a and b with a < b is exactly what we need to know in order to
conclude that g is the density function for X. (See Definition 3.4.)
Clearly the proof of the second statement involving the density of Y can be
given in a similar fashion.
(a) Find the value c must have in order to make this a joint density.
(b) Find the marginal densities fy and fy.
(c) What is the probability that 3X is greater than Y?
co co 2 p4— x?
Solution: (a) | { SX, yak dy = [{ ec dyax=1
eee 0~0
?
BO Nee eyV4 igiS 3V4—y
eee
Moreover, fy(y) = 0 if y is not between 0 and 4.
(c) P(Y < 3X) is computed by integrating the joint density function over
the shaded region in the picture that follows.
YA
1|
If the double integral is evaluated using iterated integration in the order shown, the
136 Chapter 6: Joint Distributions
endpoints of integration for the integration with respect to y are 0 and 3. This
reason is that the y coordinate of the point where the line y = 3x and the parabola
y = 4-—x? intersect is 3.
3 pV4—y
py <3x) = || c dx dy
0 *y/3 :
3
-bd |
A pareSe yey {ee
is
P(G:2Xh-<0
CY 30), = P@<X
<b) PE <a dD)
for all values of a,b,c, and d. This approach would be satisfactory (see Problem
6.11), but it is a bit easier to take the following definition, which is essentially
equivalent and is easier to use.
Solution: Let X be the lifetime of the device with the expected lifetime of 5
years and Y the lifetime of the device with expected lifetime of 8 years. This
simply says E(X) = 5 and E(Y) = 8. Furthermore, since both have exponential
distributions, we know from Problem 3.15 that X is exponential with parameter A
= .2 and Y is exponential with parameter 4 = .125.
Since X and Y are to be assumed independent, we know that the function
I(x, y) =fx(x) fy) will be a joint density function for the pair. The question
posed is what is the probability that Y > 2X. The picture guides the calculation.
1 2
The event (Y > 2X) can be viewed as {(X, Y)eB} where B is the region
consisting of the half-plane above the line y = 2x in the picture. So the
probability of this event can be computed by integrating the joint density over this
region. However, the joint density function itself is equal to 0 except in the first
quadrant. Therefore the integral to be computed actually can be reduced to an
integral over the wedge shaped slice of the first quadrant shown in the picture. The
computation then goes as follows:
P{(X, Ye B} - || f(x,y) dx dy
wedge
138 Chapter 6: Joint Distributions
fli 5) = Vee)
In this book we will not do computations involving such n-dimensional
density functions. In the next section, however, we will be examining some
situations where more than two independent random variables interact. While the
technical definitions involve concepts such as those illustrated in the above
paragraph, the most important idea you need to grasp is the intuitive meaning of
independence of more than two random variables. The idea is that knowledge
about values assumed by certain ones is independent (in the sense of independent
events) from knowledge about others. For example, if X,, X2, X3, and X4 are
independent, this means, for instance, that the values assumed by X, and X3
would have no effect on the values assumed by X, and X4.
The reason this intuitive understanding is important is that in constructing a
mathematical model, one decision that often must be made is the decision as to
whether certain random variables can be considered to be independent.
Historically, some of the most serious errors in constructing complex probability
models (for example, safety analyses of complicated systems such as aircraft or
nuclear reactors) have come about because certain random variables were assumed
to be independent when, in fact, they were not. Section 6.4 will introduce various
scenarios where the concept of independence is important and useful.
6.3 Functions of Two Random Variables 139
Proposition 6.3:
1. IfX and Y are discrete random variables and if Z = g(X, Y), then
the expected value of Z is given by
where the sum extends over all ordered pairs x;, y, where x; is a
value assumed by X and y, is a value assumed by Y.
Example 6.7. Two signals will occur at a random time during a one-hour
period. Assume that they are timed independently and that the time of occurrence of
each is uniformly distributed throughout the one-hour interval. What is the
expected length of time between the occurrences of the two signals?
Solution: LetX and Y denote the times when the two signals occur. The
assumption is that both X and Y are uniform random variables on [0, 1] and that
they are independent. Their joint density then is the uniform density on the unit
square, as in Example 6.3. The length of time between occurrence of the two
signals is the random variable Z = IX — Yl = g(X, Y) where g is the function
g(x,y) =lx-yl. So the expected value of Z can be computed by performing
the integration
Example 6.8 (Redundant safety systems). The picture below shows a pair of
redundant safety switches that are installed to cut off the power to a device if a
dangerous situation should quickly develop.
Power Source
Process that may
need to be
interrupted
Switch 1
Poet edie ee
Switch 2
6.3 Functions of Two Random Variables 141
Different operators control the two switches, and so the two switches are
viewed as functioning independently. There is a brief time period between the time
that the dangerous situation arises and the time the switch is actually opened. We
will assume that for both switches the delay time is uniformly distributed over a 5
second time interval. Clearly, if there were only one safety switch, the expected
value for the length of time that elapses from the time that danger develops until the
switch is actually thrown is 2.5 seconds. How much safety is actually achieved via
the use of a backup operator controlling a second switch? In other words, using the
two-switch system, what is the expected value for the elapsed time before a switch
is opened to shut down the system?
Solution: We can let X denote the waiting time until switch 1 is opened and
Y the waiting time until switch 2 is opened. The assumption is that X and Y are
each uniformly distributed on the time interval [0, 5] and that they are independent.
If Z is the waiting time until the current is actually cut off to the device, then Z =
g(X, Y), where g is the function of two variables
g(x, y) = min{x,y}
that is, g is the function that selects the minimum of the two numbers input.
The relation between Z,X, and Y is that Z = min{X, Y}. For any
number 5s,
PCS SFA SS. 1 2 Se Pees)
hae Ss)
the last expression coming from the independence of X and Y. (See Problem
6.11.) In particular, if 0 < s <5, this probability is
ee ok
3 P(Z>s)= ise Bee
(Check this detail. It depends only upon understanding what the uniform
distribution on the interval [0,5] looks like.)
Therefore,
(5-5)
55
F7(s) = P(ZS$s)=1-P(Z>s)=1-
whenever 0 < s <5. The density function f7 for Z may be obtained by
differentiating Fz. The two functions are as follows:
Onis
if (S45) 1
F7(s) = ia ae nOssS5
i eyes
142 Chapter 6: Joint Distributions
E(Z) = [ t fz(t) at
. (pou a
<J9 285
5
04 | 10r—2¢2 dt
0
2
3
So the addition of the backup safety switch controlled by an independent operator
reduces the expected time until shutdown from 2.5 seconds to 1.67 seconds.
Of course, we can also compute E(Z) from Proposition 6.3, and this will be
the easiest way to get the expected value if we have no particular interest in the
probability density. Since Z = min{X, Y},
The factor .04 in the last integral is the product of fy(x) and fy(y), each of which
is 1/5. The way to evaluate the integral on the right is to realize that the value of the
function g(x, y) = min{x, y} is equal to y if (x, y) is a point lying below the
line y = x and is equal to x if (x, y) lies above the line y = x. So we split the
double integral into two pieces as in the equation below, and then we compute
S px 5S 5
BZ) ies i } O4y dydx + }| 04x dy dx
0 “0 0 #x
These integrals are easily evaluated. Each is equal to 5/6, so the calculation again
confirms that E(Z) = 5/3.
6.4 Sums of Random Variables 143
Proposition 6.4: For any two independent continuous random variables X and
Y, the distribution and density functions for Z = max{X, Y} are given by
Fo) =Fy() x Fy) and f7(t)=fy(t) Fy) + fp Fy)
The explanation for this observation is simple. To say that Z < tis to say
that X <tand Y <z, that is, (Z <1) = (X <1) A (Y <D), and so
F7(t)=P(ZSH=PXSt,Yson=PXs)PY Sd) =F yy) Fy
This equation can be differentiated with respect to ¢ to obtain the density function
CS Ge
= x Ky +X, +---+X,)
where X,,---,X,, are the random variables that give the readings on the n
individual trials.
Often it is desirable to reproduce experiments for the purpose of verification
of the results. In such cases it is important to know that the experiments are being
conducted “independently” in the sense that the outcome of one experiment doesn’t
affect the outcome of the others. In this context, the random variables X,,---,
X,, are independent random variables.
144 Chapter 6: Joint Distributions
Sums of independent random variables are especially important. One tool for
working with such sums is provided by the “convolution” integral.
Definition 6.7: Iffand g are functions defined on the entire real number line,
then the convolution of f and g is the function denoted by f*g and defined by
By definition,
Solution: It is intuitively clear that the sum will assume values between 0 and
2. Is it also clear to you that the values of the sum will be more concentrated near 1
than toward the endpoints of the interval?
Denote the two numbers generated independently by X and Y, and denote
their sum by Z=X+Y.
From Proposition 6.5, we need only compute the convolution of fy and fy,
each of which is the uniform density on [0, 1]. This is not difficult if we get the
proper geometry in mind. This is shown in the figure below.
Here the functions g and /f in parts a and b of the figure are simply two
copies of the uniform density on [0, 1]. It is the convolution of g and f that we
need to determine. In each part of the figure, the shaded area is the region between
the graph and the s axis. In part c we have a similar representation of the function
146 Chapter 6: Joint Distributions
f(—s), where f is the uniform density on [0, 1]. And in parts d and e we have
graphs of f(t—s). In part d the graph of f(t—s) is shown for a typical value of
t such that 0 <t < 1, and part e shows a similar graph for a f value in the interval
D<p< 2.
XS) f(s)
0 t<0
V<T<S1
Feat) =
Z-t. Leape2
0 P22
6.4 Sums of Random Variables 147
Notice that the result of this calculation does indeed show that values of the random
variable Z = X + Y will more concentrated near the center of the interval [0, 2].
(Graph the density function if this isn’t obvious to you.)
Proof: We will check the validity of this proposition in the continuous case.
The discrete case is similar if one replaces the pee and the density functions by
sums and the mass functions.
Since the product XY is expressed as a simple function of the random
variables X and Y, its expected value is given by Proposition 6.3 as
The converse of Proposition 6.6 is not true. Knowing just the fact that
E(XY) = E(X)E(Y) is not enough to conclude that X and Y are independent.
There is, in fact, a special term that describes a pair of random variables for which
E(XY) = E(X)E(Y). IfX and Y satisfy this condition, they are said to be
uncorrelated. So Proposition 6.6 states that if two random variables are
independent, then they are uncorrelated.
Recall that for any pair of random variables X and Y defined on the same
sample space, E(X + Y) = E(X) + E(Y). While expected value is additive in
this sense, in general variance is not. In the case of independent random variables,
however, the variance is additive. 3
148 Chapter 6: Joint Distributions
distribution. These are the two that reproduce themselves when independent
random variables are summed. Let’s consider sums of normal random variables
first.
Proposition 6.8: If X and Y are normal random variables and are independent,
then Z = X + Y is also normal with
Uz=Uxy+Uy and var (Z) = var (X) + var (Y)
Solution: Let’s consider the length of time the original remains in service to
be X and the length of time the replacement remains in service to be Y. The
assumption then is that X and Y are normal random variables with mean 2 years
and standard deviation 1/2 year.
The combined lifetime of the two then is the random variable Z =X + Y,
which is normal with mean
EZ) =2+2=4
years, and variance
150 Chapter 6: Joint Distributions
The fact that the variance of the average of the measurements is appreciably
smaller than the variance of an individual measurement means that it is much more
probable that the average will be near uy than will an arbitrary single measurement.
A common assumption is that the measuring technique is not biased, that is, that the
expected value u of any given measurement agrees with the true value of whatever
quantity is being measured.
Y= 115
2-—20(1.2) = .2301
IX — 115!
P(X —1151>3) = pa > 6) 2 —20(.6) = .5485
Summary: The probability that an individual reading will differ from the
actual voltage by more than 3 volts is 5485, whereas if four readings are taken and
the average used then the probability that the average will differ from from the true
value by more than 3 volts is only .2301.
152 Chapter 6: Joint Distributions
Definition 6.8: IfX and Y are discrete random variables and if b is a number
having the property that P(Y = b) # 0, then the conditional probability mass
function Pyyy_, is defined by
Pry = ae eo
for each real number x.
Often situations involving more than one random variable are most naturally
described in terms of conditional probability mass functions. The following
6.5 Conditional Probabilities and Random Variables 153
Solution:
4
0214
Determining the expected value for the number of failed devices in the system
is not a difficult computation if we proceed along the lines begun above. Having
computed P(X = 2), it is clear that P(X = 0), P(X = 1), P(X = 3) and
P(X = 4) can also be determined in a similar manner. Moreover, once all the
values of the mass function of X are known, the expected value is easy.
154 Chapter 6: Joint Distributions
There is another point of view, however, that makes this computation still
easier. It involves the idea of conditional expectation. Just as the expectation of a
discrete random variable utilizes the probability mass function, the conditional
expectation utilizes the conditional probability mass function.
Definition 6.9: Suppose that X and Y are discrete random variables and that
P(Y =b)>0. The conditional expectation ofX given that Y = b is denoted
by E(X | Y = b) and defined by
Notice that all that is happening in this definition is that the probability mass
function in Definition 3.6 is now being replaced by the conditional probability mass
function.
Proposition 6.10 will now demonstrate a way in which the conditional
expectation can be used to determine the unconditional expectation. Depending on
what kind of information is available, this may in fact be the most attractive
approach to get the expected value. The continuation of Example 6.12 that appears
after Proposition 6.10 will illustrate this.
Proof:
= » SS x4 P(X =xX,,
Y =yj)
jek
= Dd)ted PR =tpE=y)
k j
= So Xx P(X = Xz)
i ,
= E(X)
Notice that this proof depends only on rearranging the terms of the sum and using
elementary properties of conditional probabilities.
Now let’s turn to the case of continuous random variables. What will be
meant by the conditional probability density function in the case of two continuous
random variables? We cannot proceed as in Definition 6.8 because the two events
(X =x) and (Y = b) will both have probability 0 if X and Y are continuous
random variables. Remember, however, that even in the discrete case
P(Xs xyV=b)
PX =x1¥=b) = —Sysy
Both the numerator and the denominator of this expression on the right have parallel
concepts in the continuous case. The numerator is the joint probability mass
156 Chapter 6: Joint Distributions
Definition 6.10: Suppose X and Y are continuous random variables. For all
numbers y where the density function fy satisfies fy(y) # 0, the conditional
density function of X given that Y = y is denoted by fyjy-y and defined by
Fyyy)
Fxiyay) = 7,0)
Similarly,
fyyy)
frx=x) = “F.0)
The assumption will be that the coordinates of the point selected, which we
denote by X and Y, have as their joint density function the uniform density on the
triangle.
In this context, what is meant by the conditional density fyy_3/4? From the
definition, fy)y-3/4(x) = 2/fy(3/4) provided that the point (x, 3/4) lies inside the
triangle, and fy,y-3/4(x) = 0 otherwise. (Recall that the joint density has the
constant value 2 throughout the triangle and 0 outside the triangle.) Furthermore,
the point (x, 3/4) will be inside the triangle precisely when 3/4 <x < 1.
It is easy to compute (using Proposition 6.2) that fy(y) = 2 — 2y whenever
O<y <1. So fy(3/4) = 1/2. Therefore fyiy-3;4(x) = 2 + .5 = 4 whenever
3/4 <x <1. Notice that this is simply the uniform density function on the interval
6.5 Conditional Probabilities and Random Variables 157
[3/4, 1].
One final look at the picture should make all of this fall into place. Once we
know that Y = 3/4, this guarantees that if (X, Y) is in the triangle then X must
satisfy the inequality 3/4 < X <1. The “uniformity” of the joint density on the
triangle is passed down to X when we are conditioning on the information that Y
= 3/4 in the sense that the conditional density for X then is the uniform density on
the interval [3/4, 1].
EQIY=y) = | thayay(0 at
Notice that the only way in which the above definition differs from the
definition of E(X) is that the density function fy is replaced by the conditional
density function. Also it differs from Definition 6.9 only in that summation of the
conditional mass function turns to integration with the conditional density function.
In the continuous case, just as in the discrete case, the conditional expectation
can be used to compute the unconditional expectation. The summation and mass
function in Proposition 6.10 are replaced by integration and the density function in
the continuous case.
Proposition 6.11: IfX and Y are continuous random variables having a joint
density function, then
Proof:
(t,y)
fyy
ify) =a Cy
-{. Ie ty fy)
= | | theres) at ay
The last integral above is equal to E(X). In fact, this integral is just the
special case of the integral in part 2 of Proposition 6.3 in which g is the function
g(x,y) =x. (You may, of course, change the variable ftto x in the double
integral here if it helps you to see the connection between this and Proposition 6.3.)
Does this make sense? The worst cables (where A = 1) have expected time to
failure 1 year. The best ones (where 4 = .5) have expected time to failure 2 years.
The parameter in the distribution for the available supply of cables is assumed to be
uniformly distributed between .5 and 1, so the average would be .75. In other
words, E(Y) = 3/4. Notice that E(X |Y = 3/4) = 4/3 since the conditional
density fyy —3/4 is the exponential density with parameter X = 3/4. If it seems
paradoxical to you that this 4/3 doesn’t agree with the answer 1.386 above, the
explanation is that two different kinds of averaging are being compared.
Example 6.15. A person will arrive at work between 9 and 10 o’clock in the
morning. Sometime before 10 o’clock an important phone call must be placed.
Assume that the time of arrival is uniformly distributed between 9:00 and 10:00,
and assume that the time that the call is placed is uniformly distributed between the
time of arrival and 10:00. What is the probability distribution of the time at which
the call is placed, and when is the expected time for the call to be placed?
Solution: This situation ties together several of the concepts of this chapter
because there are two random variables that interact. One is the time of arrival (let’s
call it X), and the other is the time at which the call is placed (which we’ll call Y).
If we agree to measure time in hours starting at 9:00, then the assumption
regarding X is thatX is uniformly distributed on the interval [0, 1]. But what
about Y? Here the information is conditional. It is that given the information X =
x, then Y is uniformly distributed on the interval [x, 1]. Or to put it more
succinctly, fyy-, is the uniform density on the interval [x, 1] for each value of
x between 0 and 1.
We can solve for the joint density by simply looking back at Definition 6.10.
Since fy(x) = 1 whenever 0 < x < 1 and since fyjy_,(y) = 1/-x) whenever
x<y<1, this means that
The first term here is 0. (You have to look carefully at the limit as t-0 to see
that this is true since In t + —ce.) The integral on the right is equal to 3/4. This
means that the expected time for the call to be placed is 9:45.
However, if it’s the expected value we want, then the easy route is to use
Proposition 6.11. Since fyy_, is the uniform density on [x, 1], this means that
E(Y |X =x) = (x + 1)/2, simply the midpoint of the interval [x, 1]. But then
Proposition 6.11 says that
co 1
E(Y) - | f(x)
EW IX =x) dx -| 1x dx = 5
Proposition 5.4 is the original version of the central limit theorem and is often
referred to as the De Moivre-Laplace theorem. It states that the binomial distribution
with parameters n and p is closely approximated by the normal distribution if the
parameter n is large.
Recall that every binomial random variable is a sum of Bernoulli random
variables. (A Bernoulli random variable is a binomial random variable in which the
parameter n is equal to 1.) For example, if Y is the number of successes in 4
trials of an independent trials process, then
y= xq + Xo +X3 +X4
where X, is either 0 or 1, depending upon whether success occurs on the first
trial, and X2, X3, and X4 similarly indicate whether success occurs on the
second, third, and fourth trials. Furthermore, the random variables X,,-- -, X4
6.6 The Central Limit Theorem 161
all have the same distribution function and they are independent.
If the parameter n in the binomial distribution of Y were large, then Y
would be the sum of a large number of independent, identically distributed random
variables. A more general form of the central limit theorem says that this is all that
is required in order for “convergence” to the normal distribution to take place.
Specifically, suppose that X,, X2, X3,--- are random variables which are
independent and have a common distribution function. Furthermore, suppose that
they have finite mean 1 and variance o”. (Since they all have the same distribution,
they must all have the same mean and variance.) For each positive integer n, let
S, =X, +++. +X, (6.2)
Then S,, has mean nu and variance no”. (The variance of a sum is the sum of the
variances. Proposition 6.7 states this fact for two random variables, but it is
equally true for any finite number of independent random variables.)
The De Moivre-Laplace version of the central limit theorem, Proposition 5.4,
requires that we subtract the mean of the binomial random variable and divide by
the standard deviation in order to make the binomial random variable approximately
standard normal. The effect of subtracting the mean and dividing by the standard
deviation is to give the “adjusted” random variable mean 0 and variance 1, just as
the standard normal distribution has mean 0 and variance 1. The corresponding
adjusted form of the random variable S,, in Equation 6.2 is the random variable
5), Hy
a (6.3)
oVn
is approximately standard normal when vis large. (More precisely, the distribution
functions converge to the standard normal distribution function as n > ©.)
162 Chapter 6: Joint Distributions
Problems
6.1 A coin is tossed twice. Let X denote the number of heads on the first toss
(0 or 1) and let Y denote the total number of heads on the two tosses (0, 1,
or 2). Draw a table similar to Figure 6.1, showing all values of the joint
probability mass function for X and Y.
6.2 In Example 6.1, let Z denote the sum of the numbers on the two items
selected. Draw a table similar to Figure 6.1, showing all values of the joint
mass function for X and Z. As in Figure 6.1, sum the rows and columns
to show the values for the mass functions of X and Z along the bottom
and right side.
6.3. A device contains three transistors and three resistors. One transistor and
one resistor are defective. Two of the six components are randomly
selected. Let X denote the number of transistors selected and Y denote the
number of defective components selected. Draw a table that shows all the
values of the joint probability mass function for X and Y.
Problems 163
6.4 A card is drawn from a standard deck of 52. Let X be the number of hearts
drawn and Y the number of red cards drawn. (So X and Y assume
values 0 or 1.) Show all values of the joint probability mass function in a
table.
6.5 A coin is tossed three times. The random variable X is the number of
heads occurring on the first two tosses, and Y is the number of heads
occurring on the last two tosses. Draw a table like Figure 6.1 for the joint
probability mass function for X and Y. (Observe that X and Y are not
independent.)
6.6 Let T be the triangle bounded by the x and y axes and the linex+y=1.
Suppose that f is the function defined by f(x, y) = cxy for (x, y) in T,
and that f(x, y) = 0 when (x, y) is not in T.
(a) Find the value that the constant c must have in order for f to be a joint
density function.
(b) Suppose that this is the joint density for random variables X and Y.
Find the density fy.
()-LeeZ a maxixe Yeakind F(Z <l/2):
6.7 Suppose X and Y have the joint density function f(x, y) = 24xy inside
the triangle bounded by the line x + y = 1 and the coordinate axes, with
f(x, y) = 0 for points (x, y) not in this triangle. Find the expected
values of X, Y, and XY.
6.8 Two random numbers are independently generated using a random number
generator which generates numbers according to the uniform density on
[0,1]. Find the expected value for the square of the difference of the two
numbers.
ae) Two dice are rolled. Let X denote the maximum of the two numbers that
appear, and let Y denote the minimum of the two numbers that appear.
(a) Ina table show all values of the joint mass function py y.
(b) Find E(X) and E(Y).
6.10 This exercise is designed to check your understanding of what Equation 6.1
says. Suppose two dice are rolled and X and Y are the numbers that
appear on the two dice. Let B denote the region in the xy plane consisting
Chapter 6: Joint Distributions
of all points (x, y) such that x2 + y2 < 15. Compute the probability
P{(X, Y)eB} and verify that Equation 6.1 is valid in this specific case.
6.13 Suppose that X and Y have as joint density function the uniform density
on the triangle in the xy plane having vertices at (0, 0), (1, 0), and (1, 1).
Determine the density functions of X and Y. Are X and Y independent?
6,15 Let f(x, y) = ye” whenever x > 0 and y > 0, with f(x, y) = 0
otherwise.
(a) Show that fis a density function.
(b) If fis the joint density function for a pair of random variables X and
Y, find the density function for X and similarly for Y.
6.16 Complete the calculation of the probability that the device with expected
lifetime 8 years lasts more than twice as long as the device with expected
lifetime 5 years in Example 6.6.
Problems 165
Oly A friend says he will call you on the phone between 8 and 9 o’clock.
(Assume a uniform distribution for the time of the call during this interval.)
Other phone calls occur on your phone according to an exponential
distribution; that is, the waiting time for a call is an exponential random
variable. Let’s assume the expected waiting time for a call from someone
other than the friend is 30 minutes. If you arrive at home at precisely 8
o’clock, how likely is it that your friend will be the first person to call you?
Comment on this problem: Remember the lack-of-memory property that
exponential random variables have. This means that if you walk in at 8:00,
it really doesn’t matter when the last call occurred. You might as well
consider the whole process starting from 8:00; that is, the waiting time as
measured from 8:00 until the next call (from someone other than the friend
who promised to call) can be assumed to be exponentially distributed with
expected value 30 minutes.
6.18 Fill in the details in calculating that E(Z) = 1/3 in Example 6.7.
6.19 A room is lighted with two 100-watt bulbs and one 60-watt bulb. During
the course of a week, two of the bulbs burn out. Let X be the wattage of
the first bulb to burn out and Y the wattage of the second. Assume that
bulbs are not replaced when they burn out and that the bulbs are equally
likely to burn out.
(a) Draw atree diagram to represent the possibilities.
(b) Ina table similar to Figure 6.1, show all values of the joint mass
function of X and Y.
(c) What is the expected value of X + Y?
6.20 Suppose the random variables X and Y represent the lifetimes of two
devices (measured in hours) and that the joint density function for X and
Y is given by
fx, y) = .02 elk —.2y
but you may find it easier to compute this expression by looking at the
complement and using P(X + Y>10) = 1-P(*+Y<10).]
6.21 Answer parts c and d of Problem 6.20 under the assumption that X and Y
are independent and uniformly distributed. This time assume that the
expected lifetime of each device is 10 hours.
Gi29 A measurement is made, and the result is a random variable having mean 10
and standard deviation 1. How many times would the experiment have to
be repeated independently until the average for all the measurements
obtained would have mean 10 and standard deviation less than 1/5?
6.26 Two houses are built in separate flood plains. House A is in a 100-year
flood plain, and house B is in a 50-year flood plain. This means that the
Problems 167
expected waiting time fora flood in the two areas is, respectively, 100 years
and 50 years. Assume these waiting times both to be exponentially
distributed random variables. Assume furthermore that these two waiting
times are independent random variables. This means that you know the
joint density function since you know the distribution of each. What then is
the probability that house A will be washed out by a flood before house B
is?
All Suppose both X and Y are uniform on [0, 1] and are independent.
(a) Find P(X +Y <1).
(b) Find P(X +Y <1), assuming that 0<r< 1.
(c) Find P(X +Y <2), assuming now that 1<1t<2.
(d) Now put all this information together and sketch a graph of the
cumulative distribution function for W =X + Y.
6.28 Suppose X and Y are random variables having as joint density function
the uniform density on the square S = {(x, y) |0<x<1,0<y<1}.
(a) Find the probability P(Y < X?).
(b) Find E(XxY?).
6.29 In Example 6.8, greater safety (in the sense of a shorter expected time to
shutdown) is achieved by putting the two switches in series and having
them independently controlled. If they were instead placed in parallel, as in
the picture below, the effect would be to make it more difficult to shut down
the system.
Power source
Process that may
need to be
interrupted
Switch 1 nt P
Switch 2
With this kind of configuration, the system would not be shut down
until both operators had opened their switches. This kind of configuration
168 Chapter 6: Joint Distributions
could be desirable if time is not terribly critical (from the standpoint of the
danger involved) and if a shutdown is expensive enough that we elect not to
shut down unless both operators agree that a shutdown is necessary. With
this configuration, the waiting time for the system to be shut down is the
random variable Z = max{X, Y}. What is the expected waiting time for
the system to be shut down if this configuration is used? (Assume that the
assumptions aboutX and Y remain the same as in Example 6.8.)
6.30 Show that f*g = g*f. (See the comment following Definition 6.7.)
6.31 (a) What does the joint probability mass function of X and Y look like if
X and Y are the same random variable? Make up a simple example if
this situation sounds confusing to you. For example, let X and Y
both be equal to the number obtained when a die is rolled. If we were
to roll the die twice and let X be the number on the first roll and Y the
number on the second, then X and Y are independent and certainly
not equal. (Their probability mass functions are equal.) Now,
however, you need to think of X and Y as both corresponding to the
same roll of the die.
(b) As a follow up to part a, can you intuitively see why in the continuous
case if Y = X, then it is impossible for X and Y to have a joint
density function? The reason is that P{(X, Y)eB} would then have
to be 0 unless the region B intersects the line y = x. From this it can
be proved that the joint density would have to be 0 except on the line y
=x, and such a function could not possibly satisfy
J [feo dxdy = 1
6.32 Many devices, such as lightbulbs, for example, have time to failure which is
approximately normally distributed. (Notice that the exponential
distribution is not a good model for a device that is physically wearing out
when in use. The lack-of-memory property would be totally out of place
here. For a device with the lack-of-memory property, a used device is
always just as good as a new one.) Let’s suppose that 75-watt lightbulbs
have normally distributed time to failure with mean 750 hours and standard
deviation 150 hours. If you buy three bulbs, what is the probability that
Problems 169
you get more than 2,000 hours use from the three?
6.33 The lifetime of a device is a normal random variable with expected value
1,000 hours and standard deviation 100 hours.
(a) If an identical backup device is available to be placed in service when
the original fails, what is the probability that the total length of service
they provide is more than 1,800 hours?
(b) If three such devices are available, what is the probability that the
cumulative service they provide exceeds 2,700 hours?
(Assume in all cases that the devices function independently; that is, the
lifetimes of the devices are independent random variables.)
6.35 A person’s blood contains 70 parts per million (ppm) of a certain substance.
When a particular technique is used to measure the concentration of the
substance in the person’s blood, the result is a normal random variable with
mean 70 ppm and standard deviation 10 ppm. Assuming that it is possible
to reproduce the tests independently, what is the probability that the average
of four tests conducted independently would be between 65 and 75 ppm?
6.36 In Example 6.1, find the conditional probability mass functions pyjy_7 and
Pxiy=3- -Then use these to compute E(X | Y = 2) and E(X | Y = 3).
6.37 If X and Y are independent discrete random variables and if P(Y =b) #
0, then show that the mass function py and the conditional mass function
Pxiy=p are identical as functions.
6.38 If X and Y are as in Problem 6.3, find the conditional mass functions
Pxiya=1 and pyly-). Then find E(X | Y = 1) and E(Y |X = 1).
6.39 In Example 6.14, what is the joint density function for the pair of random
variables X and Y?
6.40 In Example 6.4, what is the conditional density function fyy.9? Now
find the conditional expectation E(Y | X = 0).
170 Chapter 6: Joint Distributions
171
WZ Chapter 7: Stochastic Processes
It is, in fact, true that Equations 1 and 2 in Proposition 7.1 characterize the
binomial distribution. In other words, if all one knows about X,, X9, - - - is that
Equations 1 and 2 in Proposition 7.1 are valid, it is easy to show that in fact each of
the random variables X,, does have the binomial distribution with parameters n
and p. This argument uses mathematical induction. Suppose that Equations 1 and
2 are known to be true. Equation 1 says that X, is binomial with parameters n =
1 and p. Assume now that X,_; is binomial with parameters n — 1 and p. Then
from Equation 2,
P(X, =k) = p P(X), =k-1) + (1 -p) P&,4 =4)
But since X,,_; 1s binomial with parameters n— 1 and p, this means that
One thing to notice is that the right side here is clearly 0 unless n+k is an
even integer. This is because after an even number of changes in the number of
users (that is, when 7 is even), the net gain must be an even number (that is, Y,, is
even). Thus P(Y, =k) will necessarily be 0 if n is even and k is odd. For
similar reasons, P(Y,, =k) will be 0 if n is odd and k is even. Since X, is
known to be binomial with parameters n and p, we can now describe the
probability distribution of Y,,.
If a =(n+k)/2 is an integer, then
someone is entering the system and 1/3 that someone is leaving. Then,
5+3
P(Y5 = 3) = P(X = )= P(X; = 4)
2
4
While the Poisson and exponential distributions have been introduced earlier,
it is in the context of modeling phenomena via the concept of a Poisson process that
the intricate relationship between Poisson and exponential random variables is most
visible.
At an intuitive level, the essence of a Poisson process is the idea of “random
phenomena” occurring intermittently in accordance with certain descriptive
assumptions. Real phenomena that might be modeled via a Poisson process are
things such as the following: (1) calls arriving at a telephone switchboard, (2)
breakdowns of a piece of equipment, (3) traffic entering a parking lot, (4)
7.3 Poisson Processes 75
emissions of alpha particles from a quantity of radioactive substance, and (5) traffic
accidents in a city. Some of these examples are idealized. For example, if a traffic
light affects the flow of traffic into a parking lot, the movement of cars into the lot
will not satisfy the axioms of a Poisson process.
A Poisson process involves a family {X,:t>0} of random variables. The
basic idea is that for a given time t, X, counts the number of occurrences during
the time interval [0, ¢] of whatever phenomena is being modeled. So time is
measured on a continuous scale relative to some fixed “starting time” referred to as
time t= 0. For example, if we are observing cars entering a parking lot and if time
is being measured in minutes, then X;9 would be the number of cars that enter the
parking lot during the first 10 minutes. Notice that X, — X,;, would represent the
number of occurrences of the phenomena between times ft, and ty.
Let’s look at the precise requirements that a Poisson process must satisfy and
try to understand intuitively what each of the conditions is all about. The
requirements are given in Definition 7.1.
1 = PX ar= Xp) i
(a) At>0 At =)
PQ&par— X= 1)
(c) At>0 At
=
Property 1 simply says that we start counting at time ¢ = 0 and that the
number of occurrences in two disjoint time intervals should be independent of each
176 Chapter 7: Stochastic Processes
other. This is clearly intuitively plausible for many situations. If we are modeling
traffic accidents, the number of accidents between 10 o’clock and 11 o’clock would
have no apparent reason to affect the number of accidents between 1 o’clock and 2
o’clock.
Property 2(c) says that for small time intervals, the probability of exactly one
occurrence of the phenomena during the time interval is approximately proportional
to the length of the time interval. The constant p is the constant of proportionality.
Property 2(b) says that for short time intervals the likelihood of more than one
occurrence is negligible, that is, negligible in comparison to the length of the time
interval.
Notice that the expression 1 — P(X;+at = X;) in the numerator of Property
2(a) is 1 minus the probability that no occurrence has taken place during the time
interval from ¢ttot+At. But since 1—P(E) = P(E°) for any event E, this
is simply the probability of at least one occurrence between time ¢ and time ft + Af.
So Property 2(a) says that for small time intervals the probability of at least one
occurrence is approximately proportional to the length of the time interval, with u
being the constant of proportionality. Property 2(a) is a logical consequence of
Properties 2(b) and 2(c), but it is listed separately for reference later.
fot
+ At) = P(X t+At =0) = P(X,=0,X t+At =0) = P(X, =0) P(X,,,,t+At =0)
Being able to write this last product depends on Property 1; that is, the random
variables X, and X;4a; — X; are independent. But now
PX = ae) Sl
= P{(X,= laa > -ufp(t) as At > 0
t
[This uses Property 2(a).] However, the value of this limit is, by definition, the
derivative fp'(t), and so we have shown that fy'(t) = -1 fo(t).
This simple differential equation is easily solved to give fo(t) =Ae™,
where A is constant. If we take into account now that f9(0) = P(X9 = 0) = 1
(because no occurrences have occurred yet at time ¢ = 0), this tells us that A = 1,
and so
1
ee { P(X, =k) P(X pyar —Xz = O)+ P(X, =k - 1) P(X pyar —X; = 1D)
PO oe On 1 PU =k DP Oa XD)
Sept )) _ ooo + ——————
ed
: At At
+ other terms
178 Chapter 7: Stochastic Processes
KO = PAO + UA AO (7.2)
Since we already know what fo is, we can use Equation 7.2 with k = 1 to
find f,;. The function f; must satisfy the differential equation
AiO =vf\O+ne
This differential equation is easily solved to give f,(t) = (ut + c)e’, where c
is a constant. However, f;(0) = P(X9 = 1) = 0, since there have been no
occurrences yet at time ¢=0, and this tells us that the constant c = 0, which
allows us to conclude that
fi = pte™
Now, using Equation 7.2 again, it is possible to compute f/>, since we now
know f;. Equation 7.2 says that
y(t) = 5beers
The method that has been used to find f; and f, from Equation 7.2 can be
continued, since each function f, is recursively defined in terms of f;_; by
Equation 7.2. It is not difficult to show via mathematical induction that for any
positive integer k,
1
fi) = P&p=k) = FF (uyte™ (7.3)
Recall now from Chapter 4 the definition of the Poisson distribution. A
random variable X is a Poisson random variable with parameter 4 provided
1
P(X = ky = e* a
The computation we have just finished has demonstrated Proposition 7.2.
7.3 Poisson Processes 179
Example 7.2. Suppose that the calls coming into a switchboard constitute a
Poisson process with intensity u = 4 calls per minute. Then for any value of f, the
random variable X,, which counts the number of calls that have arrived by time f,
has the Poisson distribution with parameter A = 4t. So if we wish to measure the
number of calls coming into the switchboard during a 10-minute time period, we
can think of this as a Poisson process in which X19 is the random variable of
interest, and it will be Poisson with the parameter’ = 4 x 10 = 40. Since the
probability distribution is now explicitly known, any probabilities of interest can be
computed. For example, the probability that 30 or fewer calls come in during a 10-
minute time interval would be
30
-+0 40°
k=0
consider the waiting time for the first occurrence in a Poisson process? Let Y; be
this random variable; that is, Y,; gives the length of time from the starting time
(time t = 0) to the time of the first occurrence. In Example 7.2, Y; would be the
length of time that passes during the observed time period before the first call comes
in to the switchboard.
The key relation between Y, and the Poisson process {X,:t = 0} is that
P(Y, >t) = P(X; = 0). This is because (Y; >t) and (X, = 0) represent
precisely the same event; to say that the waiting time for the first occurrence is
greater than ftis the same as saying that at time ¢ there have been no occurrences.
However, Equation 7.1 gives P(X, =0) ase 4’. Therefore P(Y, >?) =
e! and P(Y, <t)=1-—e 4". We can differentiate to find that for t > 0, the
density function of Y, is the exponential density fr, @ = le for f >"0 30
Y,, the waiting time for the first occurrence, is exponential with parameter [L.
It might be a good idea at this time to review the analogy between an
independent trials process and a Poisson process. The binomial distribution arises
when we fix the number of trials in an independent trials process and count the
number of successes. The Poisson distribution arises when we fix the time interval
in a Poisson process and count the number of occurrences of the observed
phenomena. The geometric distribution arises in an independent trials process
when we consider “waiting time for first success” in terms of the number of trials
required. The exponential distribution arises when we consider “waiting time for
first occurrence” in a Poisson process in which time is measured on a continuous
scale. Both of these “waiting time” phenomena, described by geometric or
exponential random variables, have the lack-of-memory property discussed in
Chapters 4 and 5.
Proposition 7.3 summarizes the main features of a Poisson process. Property
3 says, for example, that if we let Y, denote the length of time required for two
occurrences (in other words the total waiting time from time t = 0 until the second
occurrence), then Y,—Y, has the same distribution as Y,;. What is of perhaps
more interest is that Y, — Y, is independent from Y,. All this is summarized by
saying that waiting time between the first and second occurrence is independent of
the time of the first occurrence and has the same probability distribution as the
waiting time for the first occurrence. This is a consequence of the lack-of-memory
property. In fact it can be shown that the entire Poisson process has a certain lack-
of-memory feature in that the choice of starting time is irrelevant. In other words,
for any to >0, the random variables X, and X,.,; — X,, have the same
7.3 Poisson Processes 181
distribution. The former counts occurrences during the time interval {0, ¢] and the
latter during the interval [tp, t9 + t]. The distributions turn out to be the same
because the intervals are of the same length. A stochastic process having this
property is said to have stationary increments. There are other senses in which
some stochastic processes are unaffected by time shifts. Stationary stochastic
processes are introduced and discussed in Section 7.7.
1. For every tg > 0, the random variable Xt)+t - Xt, (which gives the
number of occurrences during the time interval from fp to fg + ¢) is a
Poisson random variable with parameter A = Ut.
5. If there is exactly one occurrence between times ¢, and f, then the time of
that occurrence is uniformly distributed on the interval [, fq].
182 Chapter 7: Stochastic Processes
Example 7.3. Suppose a snowstorm has begun and a grid such as the one
shown in Figure 7.1 has been laid out on the ground. The “intensity” of the
snowfall is 3 flakes per square inch per minute. The experiment we are going to do
is to count the number of snowflakes hitting the one-square-inch shaded section of
the grid during a specified 1-minute time interval. Let’s call this random variable
X. What kind of model is appropriate here?
Solution: The first thing to consider is that there are at least two different
interpretations to the statement that “snowflakes are falling at a rate of three
snowflakes per square inch per minute.”
One-square-inch section
Model 2: Now let’s model this situation as a Poisson process. Since the
intensity of the snowfall is three snowflakes per square inch per minute, this means
that X, the number of snowflakes to hit the shaded square in a 1-minute interval,
will now be considered to have a Poisson distribution with parameter A = intensity
x length of time interval = 3 x 1 = 3. Since X is Poisson now, E(X) =A, by
Problem 4.9, so again we have E(X) = 3, just as it should be.
In this model, what is the probability that exactly two snowflakes hit the
shaded square during the 1-minute interval?
Isn’t it astounding that two apparently very different models should give such
nearly identical answers for this calculation? The reason this happens can be
explained mathematically by saying that if 4 is fixed, then the binomial distribution
with parameters n and p =A/n converges to the Poisson distribution with
parameter A asn— co. A less precise but intuitively more pleasant way of
describing this is to say that if n is “large” and p is “small,” then the binomial
distribution with parameters n and p is approximately the same as the Poisson
distribution with parameter A = np.
But which model is the correct one? A little further insight is gained by
considering one more scenario. Suppose that the 10 by 10 grid is replaced by a 100
by 100 grid and that the binomial model is used. Now n = 30,000 and p = .0001.
However, np = 3 still. In this model,
P(X = 2) = C(30000, 2) .00012 .999829998 = .2240
This calculation is consistent with the Poisson model to at least four significant
digits. Furthermore, it sheds additional light on the question as to which model is
correct. If one interprets the average rate of 3 snowflakes per square inch per
minute as being an exact rate measured over a finite grid, then the correct model is a
binomial model based on the idea of an independent trials process. If, however,
one imagines an infinite plane on which snow is falling at a rate of 3 snowflakes per
square inch per minute, then the correct model is the Poisson model based on the
184 Chapter 7: Stochastic Processes
idea of a Poisson process. (Of course, the very idea of snowflakes falling at a rate
of 3 per square inch per minute over an infinite plane inherently involves the limit
concept.)
To make this idea still more concrete, let’s return to the example
X,=A cos (Wt+ 0)
Here @ is a fixed constant. We’ll pretend that the values of the random variables A
and 8 are produced independently by random-number generators so that0 <A < 1
and 0 <@<2n. The “possible outcomes” of the experiment then correspond to
pairs of random numbers produced, that is, the values of A and @. For each such
“possible outcome,” there results a sample function. The sample function
Jd) = .2 cos (@t + 5)
being the result, for instance, if the values produced by the random-number
generator for A were .2 and for 6 were 5.
The stochastic process X, =A cos (Wt + @) is somewhat specialized in the
sense that every random variable X, here is described in terms of the random
variables A and @. Generally the random variables that make up a stochastic
process do not have such a simple relationship connecting them with each other.
Even so, the concept of a “sample function” remains that of fixing an outcome
se S in the underlying sample space and considering how X,(s) varies as a
function of ¢.
As a further illustration, suppose we are monitoring a Poisson process such
as calls coming into a telephone switchboard. One “possible outcome” can be
partially described by saying that the first call arrives at time ¢t, = 1.3 minutes, the
second at time ft, = 3.2 minutes, and the third at time ¢; = 3.8 minutes. In this
case, a portion of the graph of the sample function that corresponds to such an
“observation” of the process would be as shown in Figure 7.2. This is a graph of
X, corresponding to the observation just partially described, where X, is the total
number of calls to have arrived by time t.
1 2 3 4
Figure 7.2 Typical sample function for a Poisson process.
186 Chapter 7: Stochastic Processes
There are some very subtle points related to the study of stochastic processes
that require sophisticated and advanced mathematics. This can be illustrated, for
example, even with regard to an independent trials process. How does one think of
the sample space for an infinite sequence of coin tosses? A finite sequence of
tosses is no problem, but when one envisions the stochastic process {X n}» where
X,, is the number of heads to occur in the first n tosses of an infinite sequence of
tosses, things get a bit confusing. While it’s clear what the probability distribution
of X,, is going to be, it’s not so clear just how one can envision a sample space
(with probability measure) on which all of the X,,’s are defined. Fortunately, one
can do a lot of worthwhile modeling without dealing with such riddles, but it is
worth realizing that the complexities required to do a rigorous mathematical
treatment are substantial.
Solution:
Aig ea
sa z [ cos (Wt, + Mt, + 2s) + Cos (Mt, — Wry) ] ds
—
The last integral is easily evaluated. Notice that one of the cosine terms does
not contain s at all and therefore may be treated as a constant as far as integration
with respect to s is concerned. The other term has a corresponding sine term as its
antiderivative with respect to s, and since the endpoints of integration are —m and
mt, the periodicity of the sine function makes that term integrate to 0. Thus, the
answer is simply
AZ
Definition 7.3 treats a manner in which time shifts are irrelevant in some
stochastic processes. This definition says, for instance, that the probability
distribution for X; is the same for every value of t. However, it says much more
than this. Remember that the joint distribution contains all the information as to
how the random variables interact with each other. The definition requires that all
the details of this interaction be preserved if there is a time shift.
We can illustrate this definition by considering exactly what is and what is not
stationary with regard to an independent trials process.
When one talks about an “independent trials process,” there are really two
stochastic processes lurking in the background. One is the process {X,,}, where
X,, is the number of successes to occur during the first n trials. For each n, X,,
is binomial with parameters n and p, where p is the success probability
associated with the process. Clearly, if n #m then X, and X,, have different
distributions, and so this stochastic process isn’t stationary.
Rather than looking at the accumulated number of successes during the first
n trials (which is what X,, tells us), we could consider the process {Y,,}, where
Y,, is 1 or 0 depending upon whether success does or does not (respectively) occur
on the nth trial. In other words, Y,, is the number of successes (0 or 1) that occur
on the nth trial. This stochastic process {Y,,} is stationary. This is an easy
consequence of two facts: One is that P(Y, =1)=p and P(Y,=0) = 1-p
for every n (so the Y,,’s all have the same probability distribution), and the other
is that Y,, Y,,--- are independent. The fact that they are independent means that
the joint probability mass function of any finite collection of them is just the product
of their individual mass functions which are all the same. The stochastic process
{Y,,} then clearly meets the criteria of Definition 7.3.
Standard terminology is to refer to {X,,} as the binomial process and {Y,,}
as the Bernoulli process. Either can be defined in terms of the other quite easily:
190 Chapter 7: Stochastic Processes
Notice that the computation shown in Example 7.6 was precisely the
computation to show that the stochastic process X,=A cos (ot + 6), where A
and @ are constant and @ is uniform on [—z, 1], is stationary in the wide sense.
Stochastic processes that satisfy Definition 7.3 are often called strictly stationary to
contrast them with the wide-sense stationary processes of Definition 7.4. It is easy
to see that any strictly stationary stochastic process is also wide-sense stationary,
but the converse isn’t true. (See Problem 7.16.)
Comment on Proof: From Definition 7.3, any two random variables Xt,
and X;, in the process must have the same probability distribution. It follows then
that they must have the same expected value.
But why must Rx(t,,t.) = Rx(t, + t,t, +t)? We will illustrate in the
case in which X,, and X,, are continuous random variables with a joint density
function. This will enable us to view the computation in terms of Proposition 6.3.
Since Ry(t;, ty) = E(X,, X,,), it is first necessary to recall how this
expected value can be computed in terms of the joint density function for X,, and
X,,. The means for doing this is provided by Proposition 6.3:
was
k
ll P(X,= 1,X,=)
The reason for the last step is simply that the only way that the product of X,, and
X,, can equal 1 is for each of them to be 1.
For definiteness, let’s suppose that t,; < f,. The best way then to determine
the probability P(X,,= 1, X;,= 1) is to use the idea of conditional probability:
P(B OA) =P(A) P(B|\A). This gives
this example gives the probability for a given number of changes in the state of the
line during a given time interval, we can write the probability of an even number of
changes during a time interval of length fas
2 4 6
1 u Cee et ne Pe)
Ry(t1,t) = E(X,X,) = 7X @ ae + — sp ae + aaa +
where t = fy — f, is the length of the time interval between times ¢, and fp.
The remaining question is how to evaluate the sum of this infinite series on
the right. This is not difficult if we make use of the standard power series
representation for the exponential function. The sum of this series is
1
> (e + eH)
Notice that we have shown that this stochastic process is stationary in the
wide-sense. This is a consequence of the fact that Ry(t,,t.) depends here only
on the difference t. —t,. Consequently, if t; and t, were translated by the same
amount, there would be no change in Ry(fj, fy).
This is the time average of the process X, for the time interval [0, 7]. The value
194 Chapter 7: Stochastic Processes
will depend on what “observation” of the process we are looking at. In other
words, since X,=A cos (wt +6), the value we get for the time average will
depend on what value between —1 and m we use for the random variable 0. If we
carry out this averaging treating 0 for the time being as just an unspecified real
number, we get
Lf oar Seances
Ti
Tap ea dt = ell
T
cos (wt+8) dt
1 oT+0
= ari, Acosz dz (by achange of variables z= wt + @)
A : oT+0
or Sinz 1,
A ; :
or [ sin (oT + 8) — sin
6]
Observe that the time average obtained here does depend on the value that the
random variable 6 assumes. However, if we take the limit of this as T >0°, we
get 0, which is the expected value of X, This is true no matter what value @
assumes. So in fact we have
Be es Bae
E(X,) = ah X, dt (7.5)
A stochastic process that satisfies this condition is said to be ergodic in the mean.
Notice that in order for this condition to be met, a few curious things have to
be happening. Normally one would expect E(X,) to be different for different
values of t. However, the right side of this equation is not dependent on f¢ since it
is a time average over the whole positive real line. Thus E(X,) must necessarily
be the same for all values of t when Equation 7.5 is valid. Secondly, the right side
would normally be expected to be dependent upon what “observation” of the
process we are averaging, that is, what sample function. In this simple example,
this means that we would normally expect the right size of Equation 7.5 to be
different for different values assumed by the random variable 6. What we saw in
the above computation is that this is, in fact, true for any finite interval. But when
we average over the entire positive real line, such differences wash out.
In theory, the two kinds of averaging being performed on the two sides of
Equation.7.5 are very different. The left side averages the value of a fixed random
variable X, over the sample space, and the right side averages all X,’s over time
Problems 195
Problems
7.3 For an independent trials process with success probability p, derive the
probability mass function for the random variable X that gives the number
of the trial on which the second success occurs.
7.5 Using Equations 7.1 and 7.3, derive the probability density function for the
196 Chapter 7: Stochastic Processes
random variable that measures the “waiting time until the second occurrence”
in a Poisson process.
7.8 Calls are coming into a switchboard at a rate of 30 calls per hour. Assume
this to be a Poisson process, and let X be the random variable that gives the
number of incoming calls between 1:30 and 1:40, and Y the random variable
that gives the number of incoming calls between 1:40 and 1:50. What kind
of probability distribution does X have? How about Y? How about their
sum X + Y? [Hint: Use the properties of Poisson processes described in
this chapter. It would be possible to compute the mass function from basic
principles based upon knowledge of the mass function of X and of Y and
the fact that they are independent. In a Poisson process, however, the
number of occurrences during any interval is a Poisson random variable
where the parameter in the distribution is determined by the length of the time
interval and the intensity of the process.].
7.10 Consider the sine-wave process X,=A cos (ot +6), where @ is a
constant and where A and 6 are independent random variables with A
uniform on [0, 1] and @ uniform on [0, 27]. Find the expected value of X,
as a function of t. (The function t— E(X,) can be viewed as the average
value of the process as a function of ¢.)
Ald Let Y,=A cos wt + B sin wt, where A and B both are uniform on
[0, 1] and are independent and where @ is a fixed constant. Find E(Y,).
Problems 197
Z=a) A? +B?
Notice that this is a random variable that does not depend on t. The
expected value of this random variable, E(Z), is then the expected
amplitude. Write down an integral that would give the value of E(Z).
(Don’t try to evaluate the integral.)
ThC2 Let X,=A+Bt, where A and B are independent and where both A
and B have mean 1 and variance 1. Find the expected value and the
variance of the stochastic process X,. In other words, express E(X,) and
var(X,) as functions of tf.
Ue Find E(X,) and var(X,), where X,=A cost and where A is standard
normal.
the negative part of the real line.) Consider the stochastic process
X,=e" for r20
What is the relation between E(X,) and the Laplace transform?
TAS Flow of traffic into a parking lot has been cited as an example of a Poisson
process. If the flow of traffic into the lot is regulated by a stop light, why
then will the entrance of cars into the lot not be a Poisson process? [Hint:
Which of the properties listed in Proposition 7.3 are not now satisfied?]
Chapter 8: Time-Dependent
oystem Reliability
This means that R(t) is simply the probability that the device is still functioning at
time ¢. Since Fy(t) > 1 as t > ©, it follows that R(t) > 0 as t > ©.
The hazard function h(2) for the device is defined then by
199
200 Chapter 8: Time-Dependent System Reliability
Ft) %, fy
h(t) = ce ee eet 2
1-Fy(t) = Rd) Cf
To see why this is a useful concept, notice first that the conditional
probability P(¢< X <1+ArlX > 2) is the probability that the device fails
between time f and time ¢ + At, given that the device is still functioning at time t.
The hazard rate h(t) is
P(tsX<t+AtlxX>2)
~ At
—>0+ At
Thus,
PUSK
Sit AtlxX >t) PosxX=<1eA)
At At P(X
> 2)
< 1 PUSsX
sit At)
Pars!) At
fx)
——— asdAtr>0
PRE
Equation 8.2 describes the hazard rate h(t) in such a way that it is easily
computed if the density function of X is specified. It is also possible to turn the
situation around and describe the density function or distribution function in terms
of the hazard rate. The key relationship is
t
R(t) = 1-Fy(t) =exp {- j h(s) ds } (8.3)
where exp denotes the exponential function. The derivation of this expression is as
follows. If we integrate Equation 8.2, we have
t
fxs) ‘
I h(s) ds = 1 Fy) ds = -—In{l -Fy(o1 |= —In{1—Fy(0)]
8.1 Reliability That Varies with Time 201
The last step here assumes that Fy(0) = 0; that is, that at time t = 0 we
know the device is in working condition. (Remember that fy is the derivative of
Fy. That’s what the integration here is based on.) Applying the exponential
function to both sides of this equation gives Equation 8.3.
A common scenario with real-world devices is a hazard rate something like
what is shown in Figure 8.1. The curve is relatively high initially because of
potential failure during the break-in period, and the curve rises again as the device
ages. Clearly, not all devices behave this way, however. The most intuitively
pleasant and easily understood situation is that of a constant failure rate; that is,
h(t) is a constant function. In this case Equation 8.3 becomes simply
Ri) =F) =e"
which is the familiar case of the exponential distribution. This is important because
it is the canonical example for this concept.
Time
Summary: To say that the failure rate of a device as given by Equation 8.1
is constant is precisely the same as saying that the “time to failure” for the device is
exponentially distributed, that is, that the random variable that measures elapsed
time until the device fails has an exponential density function.
One widely used statistic that is an indicator of the reliability of a device is the
mean time to failure (mttf). Intuitively, this is just the average waiting time from
the time that the device is put into service until it first experiences failure. If X is
the random variable that gives the time at which first failure occurs, then the mean
202 Chapter 8: Time-Dependent System Reliability
Proposition 8.1: Suppose a device has reliability function R(t). The mean
time to failure, or mttf, for the device is given by
mttf = I,R(t) dt
Proof: If X is the random variable that indicates time to failure for the
device, then by definition mttf = E(X). However,
that would correspond to p, failures per unit time. For example, if p, =2 and time
is measured in years, then the expected value for time until failure would be 1/2
year, which corresponds to a failure rate of two failures per year.)
From the form of this expression, we know that the “waiting time for the circuit to
fail” is itself an exponential random variable with parameter) =p, +p, +p3. (The
reason is simply that this reliability expression is equivalent to the statement that the
distribution function for “time to failure for the circuit” is given by 1 — e where
XN=P, +P.
+ P3-)
For the parallel circuit in part b of the picture, at least one device must still be
functioning in order for current to flow, and so
204 Chapter 8: Time-Dependent System Reliability
= PoP3e Syyorys
PoP + P1P2P3e 4 PiP2P3 t
The mean time to failure is easy to compute here, because each of the exponential
terms is of the form Ae, and
1 i 1 1 1 1 of i
P(A good) P(B good) P(C good orD good) = p, Pp (Po +Pp -PcPp)
Notice that in this expression it doesn’t matter whether we are thinking of the
reliability as being time-dependent or not. Now if we utilize the assumption that
device A has an exponentially distributed lifetime with parameter p,, this means
that at time ¢ the probability that A is good is e~?:’. If we substitute e~?:’ for
p,, in the equation above and do a similar replacement for the other probabilities,
the reliability function for the system is obtained just as in Equation 8.4. As a more
elaborate example, see Problem 8.3. A solution using these ideas is given in
Appendix A.
irrelevant.)
We will similarly assume that the “time to repair” for the device is also
exponentially distributed; that is, the total time that the device is out of service
because of the failure is also an exponential random variable. We will denote by
the parameter in the exponential distribution for the time the unit is out of service.
It is now possible to track the performance of the device over a lon g period
of time. Presumably there would be a number of times that the device fails, and in
each case there would be a certain delay in getting it serviced and put back into
operation. What are some of the things that would be useful for us to know in this
situation? One thing we would be interested in is the percentage of the time that the
device is operational. For example, if breakdowns are so difficult to repair that the
device is unavailable for use 60% of the time, this is certainly something that a
prospective buyer would be interested in knowing. Or we might be interested in
knowing the probability that the device will be functioning at some specific time in
the future. This question is related to the former one. For instance, if we knew that
the device was going to be out of service for 60% of the time during the next
several years, that certainly suggests that over the long haul its probability of being
operational ought to “average out” in some sense to 40%. Whether its probability
of being operational next Tuesday afternoon is 40% is another question, and this
suggests that we look at the situation in more detail.
First, let’s remember some things we have already encountered. Since time
to failure is exponentially distributed with parameter 1, we know that the mean time
to failure for the device is 1/A. Similarly, since the time to repair is exponential
with parameter [1, the mean time required for the device to be repaired is 1/u. For
example, if time is measured in years and A = .5 and uy = 12, then the mean time to
failure will be 2 years and the mean repair time 1 month.
Now let’s consider how we might actually work out the probability that the
device will be functional at some specific time in the future. In order to do that, we
need first to establish the necessary notation. Specifically, we would like to know
about the following two functions:
f(t) I= probability the device is operational at time t
g(t) = 1-—f(t) = probability the device is nor operational at time t
We need to recall a few properties of the exponential function. If we think of
the random variable X as representing time to failure for the device (as measured
from initial time when the device is known to be working), then P(X > At) =
e —At Now remember from the power series representation for the exponential
8.2 Systems with Repair 207
function that
2
(AAr)
et = 1 A+
2!
In particular, if Art is small, then e-*4‘ ~ 1—AAt. (This approximation
amounts to using a linear approximation to evaluate the exponential function near
0.) This means that
P(X > At) = 1—AAt
when At is small, and it means that the probability that a failure will not occur
before time Af is approximately 1 — Az. Conversely, the probability that a failure
will occur before time At is approximately AAt. Applying this same idea to the
time to repair leads to the conclusion that for a small time interval At, if we know
the device to be nonoperational at the beginning of the time interval, then the
probability that it will still be nonoperational after time At has elapsed is
approximately 1 — uWAr, whereas the probability that it will have been repaired
during the time interval is approximately LAr.
We now need to consider f(t + At), where t is some fixed time and At is
considered a small positive number. By definition, f(t + Az) is the probability
that the device is working at time ft + At.
There are two primary ways that the device can be working at time ¢ + At:
(1) It could be working at time ¢ and experience no failure between time ¢ and time
t+ At. (2) It could have been out of service at time ¢ but have been repaired
between time t andt¢+At. There are clearly other possibilities involving more
than one failure and/or repair during the time interval, but if At is small then the
probabilities of such terms will be small compared to the terms above. (If all of this
seems a little fishy, it might be a good idea to go back and review the discussion of
Poisson processes in Chapter 7. The assumptions here are quite similar to those
that describe the Poisson process.)
So we can break f(t + At) down as follows:
f(t+ At) = probability device works at time ¢ and
doesn’t fail during time interval of length At
+ probability device is failed at time t
but is repaired during interval of length At
= f(t) (1—-AAr) + g(x) pat
If we rearrange the terms and replace g(t) by 1 — f(t), we can write this as
208 Chapter 8: Time-Dependent System Reliability
remain in each state was assumed exponentially distributed in our example, and in
our case there is no doubt about which state we will move to when a change of state
takes place. In a more general setting with numerous states involved, the transition
probabilities specify this information. For example, if one is modeling the number
of items waiting to be repaired at a service shop and if the items are being delivered
and processed individually, then the “state” of the system would be the number of
items present, and a change of state would be caused by (1) finishing work on an
item and shipping it out or (2) a new item being brought to the shop. If the state is
k, then a change of the first type makes the new state of the system k — 1, anda
change of the second type makes the new state k + 1. Whether the transition
probability of going from state k to state k — 1 is greater or less then the
probability of going from state k to k + 1 is then simply a question of whether the
items are being processed faster than they are arriving.
The precise definition of a Markov process is somewhat technical, but the
general idea is that the probability of making a transition from one state (state 7) to
another (state /) in a certain time period should not be dependent on how one came
to be in state i in the first place. In our problem above, this assumption is
implicitly present in the assumed exponential waiting times for a change of state.
From any point in time and from either of the two states, future transitions are
governed by the current state of the device and by the two exponential distributions
involved and are independent of the past history of the device.
Problems
8.1 Determine the mean time to failure (mttf) for the system in Figure 8.2(c)
assuming that the components A, B, C, and D function independently and
have constant failure rates p,, Py, P3, and p, respectively, that is, assuming
that the time to failure for each component is an exponential random variable
with the given parameter.
8.2 Consider a device with hazard function given by A(t) =1+(2- ‘ye
whenever t > 0. If we think of time as measured in years, you can notice
that the instantaneous failure rate is dropping off until the age of the device
reaches 2 years, at which time the failure rate starts to increase again because
of age. Let X denote “time to failure.” Determine the density function for
210 Chapter 8: Time-Dependent System Reliability
8.3 In the circuits shown in Problem 2.2 of.Chapter 2, assume that each
component has an exponentially distributed lifetime with mean 1 year, and
that the components function independently. Find the reliability function
8.4 In the highway network shown in Problem 2.4 of Chapter 2, assume that
each link in the network (each edge of the graph) has an exponentially
distributed lifetime with mean 6 weeks. Find the reliability function
8.5 Determine the density function for “waiting time to system failure” for the
series-parallel system in the following figure.
8.6 Suppose a device has a hazard function that is a linear function of time; that
is, h(t) =kt, where k is a positive constant. (This says that the rate at
which such devices fail is proportional to the age of the devices.) Find the
reliability function for such a device.
8.7 A piece of equipment averages going 90 days between breakdowns, and the
time requirement to get it serviced averages 3 days. Answer the following
questions using the model developed in Section 8.2.
(a) In the long run, what fraction of the time is the equipment operable?
(b) If itis operating now, what is the probability it will be operating 5 days
Problems 211
from now? [Hint: Treat “now” as time zero and the given information
that it is operating at time zero as an initial condition.]
8.8 A telephone is a two-state device in that the line is either free or busy.
Assume that when the phone is not being used, the time until a shift into the
“busy” state is exponentially distributed with expected value 10 minutes.
And assume that when the phone is busy, the time until a shift into the not-
in-usé state is exponentially distributed with expected value 3 minutes.
Based on the model of Section 8.2, answer the following:
(a) If the phone is busy at time ¢ = 0, find the function f(t) = probability
that the phone will be in use at time t.-
(b) In the long run, what fraction of the time is the phone in use?
change to state 1? This is simply the question: What is the probability that
the present caller will hang up before the next incoming call arrives? We
solved this problem in Chapter 6. (See Problem 6.26.)
If f(t), g(t), and A(t) represent the probabilities that at time t we
will be in states 1, 2, and 3, respectively, see if you can derive the system of
differential equations that must be satisfied by these three functions. You can
do so by mimicking the approach used in Section 8.2.
References
Breiman, Leo, Probability and Stochastic Process with a View toward Applications,
Houghton Mifflin, Boston, 1969.
Page, Lavon B., and Jo Ellen Perry, “A practical implementation of the factoring
theorem for network reliability,” JEEE Trans. Reliability, R-36, pp. 259-267, 1988.
This article describes a microcomputer algorithm which uses conditional probabilities to treat network
reliability problems in a manner reflecting the ideas encountered in Chapter 2.
213
Appendices
214
Appendix A 2NS
Chapter 1
1.2 This problem can be done easily using a tree diagram or the combinations
formula. If the combinations formula is used the answers can be given by
Cr 2) 2 b 2/5 23
Ge 2) 3 CET CO, DiLG Det
Pe kes
1.5. -P(A)=
1.8 The answers are easily obtained from following tree diagram.
QD Q®
a io
EI 1a (uae
1.11 Using the following tree, we see that P(disease and +) = .045 and
P(disease | +) = .045/.121 = .3719.
ee 08 92
Disease . Nodisease
.05 95
C30, 2) €U0.S
nibs os ee
1.22 If n is the number of random digits generated, then P(no 7’s) = (.9)”.
Therefore, P(at least one 7) = 1 — (.9)”. So what we want is to have n
sufficiently large that 1 - (.9)” > .95. This means we need (.9)” < .05.
Take the natural logarithm of both sides: To have n In (.9) < In(.05)
requires that n 2 In(.05)/In(.9) = 28.43. Thus we need to generate at least
29 random digits.
Appendix A DAT.
Chapter 2
2.4 Going from start to finish is possible provided that link E is good and that at
least one of the two pairs A, B or C, D is good. The probability is
P{[ANB)U(CAD)]AE}
= PL(ANB)U(COD)] x P(E)
=[P(ANB)+P(CAD)-P(ANBOACAD)) x P(E)
=PaPBPE*PcPDPE~PAPBPCPDPE
2608 eo10299
2.10 Suppose n of the devices are used. The probability they all fail is then .2”,
so the probability that at least one works is 1 — .2”. We wish this probability
to be = .9999. So we need to have n large enough that 1 — .2”> .9999.
This inequality is easily solved for n: The inequality .2”< .0001 will be
true whenever n In (.2) < In (.0001). This requires that n = 5.72. Since a
whole number of devices is required, we should use n = 6 devices.
Chapter 3
3.1. Your graph should have a jump discontinuity at 0, 1, 2, 3, and 4 and should
be constant on any interval not containing one of these numbers.
3.3
0 ift<0
Fylt)rathet df0sisd
Rapes
218 Appendices
0 ift<-l
3.6 This random variable could be interpreted as what you’re looking at if you
choose a random number between 0 and 2 and truncate it after the first
decimal place.
a cC=—=
Dalz Skip ahead and look at Definition 4.1. In Chapter 4 the random variable X
will be called a binomial random variable with parameters n = 4 and p =.1.
316 Fy(t)= PY <1) = P(2X+3.s tf) = P(X $.@ —3)/2) =F y(t = 3)/2).
You computed Fy in Problem 3.3. Using that information we get
0 ifi<3
t-3.
Fy(t) = ee ito Sts)
iy Mitts5S
To obtain fy simply differentiate Fy. What you will notice is that Y is
uniformly distributed on the interval [3, 5]. This should not be surprising
given the way that Y is defined in terms of X.
Chapter 4
paca = SeBeS
4.1 Forexample,
100 100
4.10 var(X) =2
4.19 .2646
Chapter 5
5.4 (a) The answer is 1/2. You don’t need a table if you just remember the
symmetry of the density function.
(b) .36788
Appendix A 221
>.6 fy is given by
sae THOS fl
0 otherwise
EQ) =5
5.8 This problem is really a combination of two problems already done. Think
of the wire as being laid on the x-axis from 0 to 2. If X is the x coordinate
of the point where the cut is made, the assumption is that X is uniform on
{0, 2]. Figure out that the length Y of the shortest piece is the random
variable Y = 1 — IX — 1I. To put this in the context of problems already
done, let’s temporarily use the notation Z = |X — 1]. From Problem 5.7 we
know that Z is uniform on [0, 1]. Since Y = 1-—Z, we know from
Problem 5.3 that Y then is also uniform on [0, 1], and so E(Y) = 1/2. The
expected length of the shortest piece of wire is 6 inches.
Of course it is not absolutely necessary to view this problem in terms of
Problems 5.3 and 5.7. This problem could be done from scratch using the
same techniques as are used in those problems.
222 Appendices
E(X’) = { Rre™ dt
0
@D(—1/4) — B(-3/4)
(3/4) — (1/4) = .1747
5.18 (a) 98% =~ 6035 (b) 98° = .3642 (c) .98!° ~ 1326
SEO Let X denote the random number, and let Y denoted the value of X
truncated to a single digit after the decimal. Then Y = g(X) where g is the
step function shown in Figure 4.1. This means that E(Y) can be computed
as
co iL
1 ift=8
Chapter 6
6.3 The computations to obtain the table below are a bit tedious, but not terribly
difficult if you get the probabilities from a tree diagram.
2 0 1/15 0 4/15
Y |] 15 3/5 1/5
224 Appendices
6.12 Let X and Y be the two numbers. The first step is to realize that the
probability in question can be obtained by integrating the joint density
function Fy y over the half-plane below the line y = x/2. Secondly, the
joint density function is easily obtained from Definition 6.6 since X and Y
are independent. The problem then boils down to integrating the joint density
function over the shaded triangle in the picture. Since the joint density is
simply the uniform density on the unit square, the probability is just the area
of the triangle, which is 1/4.
6.15 The density function for X is: fy(x) = 1/(1 + x)? whenever x > 0, with
fx(x) = 0 when x < 0. The density for Y is simply the exponential density
with parameter A = 1.
Appendix A 225
6.17
Set time t = 0 to be 8 o’clock. Let X be the time when your friend calls
you, so the assumption is that X is uniform on [0, 1]. Let Y be the waiting
time for the first call from someone other than your friend, so Y is
exponential with parameter A = 2. (Remember that in the exponential
distribution the parameter is the reciprocal of the expected value.)
The problem is to find P(X < Y). This is computed by integrating the
joint density function over the region consisting of the half-plane above the
line y =x. The joint density is 0 except when 0 < x < 1 and y > 0, and the
part of this region that lies in this half-plane is the region shaded in the picture.
So the probability is computed by performing the integration
1 poo
) 2e-2y dy dx = .4323
0 4x
6.19
|
E(X+Y)= Se
20
6.20 (a) It is easy to show that both X and Y have exponential densities.
(Dan Y eS:
(Cha ROG SAO ee LO een,
(d) Integrate the joint density over the appropriate triangle and subtract from
1.
6.22 It should be apparent that this computation is going to require integrating the
joint density over a triangle in the first quadrant. The density function works
out to be like this: If Z= X + Y, then
226 Appendices
—2t
tht >0
0 otherwise
6.23 This is an excellent problem, and it will be well worthwhile for you to work it
two different ways. One way is to work out the density function for the
random variable Z = max{X, Y}, and then to compute E(Z) using this
density function. The second way is to simply utilize the joint density, since
Z is already expressed-as a function of X and Y. Since this requires
figuring out how to integrate the function of two variables g(x, y) =
max {x, y}, the picture below should be helpful.
Chapter 7
fie) Denote by Y the number of the trial on which the second success occurs. In
order for the second success to occur on trial n, there must be exactly one
success during the first n — 1 trials. Given any two specific trials among
the first n trials, the probability of success occurring on those 2 and failure
on all others would be p2qg"-?._ How many ways are there in which the
second success could occur on the nth trial? The first must then occur on
one of the trials from trial 1 to trial n — 1, so there are n — 1 ways for the
second to occur on the nth trial. Thus
PY=n) = (n- 1) p2q”
Appendix A 227
Es If Y, and Y> are the times of the first and second occurrences respectively,
then Yy = Y, +(Y,-—Yj,). This simply amounts to looking at Y, as the
time to the first occurrence plus the time between the first and second
occurrence. From part 3 of Proposition 7.3, Y, and Y, —Yj, are
exponential with parameter 1 = intensity of Poisson process, and they are
independent. So Y, is the sum of two independent exponential random
variables with parameter 1. This type of problem was treated in Chapter 6.
See for example Problem 6.27.
vey Let X denote the number of breakdowns to occur during 6 months. Then
X is Poisson with parameter 2 = pt = (10 per year) x (.5 years) = 5. And
the probability of 7 or more occurring is then
P(X >7)=1-{
P(X =0)+---+P(X =6) }
7.8 X is Poisson with parameter A = ut = (30 per hour) x (1/6 hour) = 5. And
Y has the same distribution. X + Y is Poisson with parameter
X = ut = 30perhour x = hour = 10
Thus, using the properties of a Poisson process described in this section,
there is no work to do in this problem. However, there is a basic underlying
fact at work: If X is Poisson with parameter 4, and Y is Poisson with
parameter A. and X and Y are independent, then X + Y is Poisson with
parameter A, +,.
228 Appendices
Chapter8
8.3 First work out a reliability expression for the circuit. The following is one
possibility:
trel=[ (Dp +Pp-PBPp) Pc +Pa-PaPc (Pp +Pc —PBPp) IPE
Now since each of the components is assumed to have exponentially
distributed lifetime with parameter 1, this means that for any given device, at
time ¢ the probability that the device is still functioning is e’. So for every
one of the probabilities on the right side of this equation, we can substitute
e', This gives the reliability of the system as a function of t, and works
out to be
relays(t) = e-7! + 264! Be + e-F
Let’s denote by X the time to failure for the whole circuit. So (X > 2) is
the event that the circuit is still working at time ¢, and this is what rel,,.(¢)
denotes above. Therefore
Fy(t) = P(X St) = 1-rel,,.(f) = 1— { e+ 2e- — et + 8}
Differentiation gives the density function for X, and it is then easy to
compute E(X), which is the mttf for the circuit. Is it clear to you that Fy is
a density function? If not, you should check out that it is. Can you make
that observation based on the fact that1+2-—3+1=1?
8.7 (a) H/(A +L) = 30/31 = .968; that is, 96.8% of the time the equipment is
operable.
0.00 | 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
0.10 | 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753
0.20 | 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141
0.30 | 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517
0.40 | 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879
0.50 | 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224
0.60 | 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549
0.70 | 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852
0.80 | 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133
0.90 | 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389
1.00 | 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621
1.10 | 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830
1.20 | 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015
1.30 | 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177
1.40 | 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319
1.50 | 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441
1.60 | 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545
1.70 | 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633
1.80 | 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706
1.90 | 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767
2.00 | 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817
2.10 | 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857
2.20 | 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890
2.30 | 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916
2.40 | 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936
2.50 | 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952
2.60 | 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964
2.70 | 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974
2.80 | 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981
2.90 | 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986
3.00 | 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990
3.10 | 0.9990 0.9991 0.9991 0.9991 0.9992 0.9992 0.9992 0.9992 0.9993 0.9993
3.20 | 0.9993 0.9993 0.9994 0.9994 0.9994 0.9994 0.9994 0.9995 0.9995 0.9995
3.30 | 0.9995 0.9995 0.9995 0.9996 0.9996 0.9996 0.9996 0.9996 0.9996 0.9997
3.40 | 0.9997 0.9997 0.9997 0.9997 0.9997 (0:9997% 0.9997 0.9997 0.9997 0.9998
Index
A computational complexity, 40
conditional density function, 156
all-terminal reliability, 46 conditional expectation
AND gate, 49 continuous random variables, 157
autocorrelation function, 186 discrete random variables, 154
average intensity of Poisson process, conditional probability mass function,
179, 181 152
conditional probability, 11-12
continuity correction, 112
B continuous (random variable), 60, 62—
63
Bayes’ Theorem, 17 continuous parameter stochastic
Bernoulli process, 84, 189 process, 171
Bernoulli trials, 84 contract (an edge of a graph), 45
binomial distribution, 84, 182 convolution, 144
binomial process, 188 countably infinite, 26
cumulative distribution function, 65-66
C
D
central limit theorem, 112, 160-161
Chebyshev’s inequality, 115 De Moivre-Laplace theorem, 160
circuit De Morgan’s laws, 4
parallel, 35 degree-two vertex, 46
series, 35 delete (an edge of a graph), 45
series-parallel, 36 density function, 62, 131
combinations formula, 22 discrete (random variable), 60
combinations, 23 discrete parameter stochastic process,
communication path, 41 171
complement (of a set), 3 discrete uniform distribution, 91
230
Index 231
disjoint, 3
distribution function, 65
distribution, 67 inclusion-exclusion principle, 43
independent
continuous random variables, 136
Es discrete random variables, 129
events,-13, 21
ergodic in the mean, 194 independent trials formula, 23
ergodic, 193 independent trials process, 22, 171—
event, 8 Has 2
expectation (see expected value) indicator (random variable), 73
expected value, 69-71 intensity of Poisson process, 179, 181
exponential distribution, 104, 179, 201 intersection (of sets), 3
FE
J
M
H
marginal density function, 133
hazard function, 199 marginal mass function, 129
hazard rate, 199 Markov process, 208
histogram, 96 mean (see expected value)
hypergeometric distribution, 94 mean time to failure, 201-202
232 Index
networks, 37 S
normal density function, 108
normal distribution, 107 sample function, 184
sample space, 6
sampling with replacement, 94, 96
O sampling without replacement, 94, 96
sets
observation (of stochastic process), complement, 3
184, 194 difference, 4
OR gate, 49 intersection, 3
union, 3
universal set, 3
Pp
Simpson’s rule, 118
sink, 37
permutations, 23 source, 37
Poisson distribution, 92 source-to-sink communication, 46
Poisson process, 174-175, 181 standard deviation, 74
probability mass function, 60 standard normal distribution, 107
probability measure, 8 state (of a Markov process), 208
stationary stochastic process, 181, 186,
188-189
R stochastic process, 171
strictly stationary stochastic process,
random variable, 57 190
continuous, 62-63
discrete, 60
expected value, 69-71
mean, 69-71
Index 233
variance, 73-74
Venn diagram, 4
. , Py r T= v =... -
D7 hae Z . . woes an ae c
ee A 3 oe x
Su wpe. —
« 2 = . ' ~ o
eet
sy Rb Lypoielguts poner Ie
= £5 a) Agie tA. DL weiind igen
aea
e
Ml PROBABILITY FOR Hi
BaeG LIE ERI NG
WITH
APPLICATIONS
: TO RELIABILITY
ISBN 0-7167-8187-5