Emanuel Parzen Modern Probability Theory and Its Applications
Emanuel Parzen Modern Probability Theory and Its Applications
Emanuel Parzen Modern Probability Theory and Its Applications
EMANUEL PARZEN
Associate Professor of Statistlcs
Stal1ford Ul1iversity
17
affecting the time a light bulb will burn may be quite different, yet
similar mathematical ideas may be used to describe both quantities.
On the other hand, a course in probability theory should serve as a
background to many courses (such as statistics, statistical physics, in-
dustrial engineering, communication engineering, genetics, statistical
psychology, and econometrics) in which probabilistic ideas and tech-
niques are employed. Consequently, in the basic course in probabil-
ity theory one should attempt to provide the student with a confident
technique for solving probability problems. To solve these problems,
there is no need to employ intuitive witchcraft. In this book it is shown
how one may formulate probability problems in a mathematical manner
so that they may be systematically attacked by routine methods. The
basic step in this procedure is to express any event whose probability
of occurrence is being sought as a set of sample descriptions, defined on
the sample description space of the random phenomenon under con-
sideration. In a similar spirit, the notion of random variable, together
with the sometimes bewildering array of notions that must be introduced
simultaneously, is presented in easy stages by first discussing the notion
of numerical valued random phenomena.
This book is written as a textbook for a course in probability that can
be adapted to the needs of students with diverse interests and back-
grounds. In particular, it has been my aim to present the major ideas
of modern probability theory without assuming that the reader knows
the advanced mathematics necessary for a rigorous discussion.
The first six chapters constitute a one-quarter course in elementary
probability theory at the sophomore or junior level. For the study of
these chapters, the student need have had only one year of college
calculus. Students with more mathematical background would also
cover Chapters 7 and 8. The material in the first eight chapters (omit-
ting the last section in each) can be conveniently covered in thirty-nine
class hours by students with a good working knowledge of calculus.
Many of the sections of the book can be read independently of one an-
other without loss of continuity.
Chapters 9 and 10 are much less elementary in character than the
first eight chapters. They constitute an introduction to the limit
theorems of probability theory and to the role of characteristic functions
in probability theory. These chapters provide careful and rigorous
derivations of the law of large numbers and the central limit theorem
and contain many new proofs.
In studying probability theory, the reader is exploring a way of think-
ing that is undoubtedly novel to him. Consequently, it is important that
he have available a large number of interesting problems that at once
PREFACE ix
illustrate and test his grasp of the theory. More than 160 examples,
120 theoretical exercises, and 480 exercises are contained in the text.
The exercises are divided into two categories and are collected at the
end of each section rather than at the end of the book or at the end
of each chapter. The theoretical exercises extend the theory; they are
stated in the form of assertions that the student is asked to prove. The
nontheoretical exercises are numerical problems concerning concrete
random phenomena and illustrate the variety of situations to which
probability theory may be applied. The answers to odd-numbered
exercises are given at the end of the book; the answers to even-
numbered exercises are available in a separate booklet.
In choosing the notation I have adopted in this book, it has been my
aim to achieve a symbolism that is self-explanatory and that can be read
as if it were English. Thus the symbol Fx(x) is defined as "the dis-
tribution function of the random variable X evaluated at the real num-
ber x." The terminology adopted agrees, I believe, with that used by
most recent writers on probability theory.
The author of a textbook is indebted to almost everyone who has
touched the field. I especially desire to express my intellectual indebted-
ness to the authors whose works are cited in the brief literature survey
given in section 8 of Chapter 1.
To my colleagues at Stanford, and especially to Professors A. Bowker
and S. Karlin, I owe a great personal debt for the constant encourage-
ment they have given me and for the stimulating atmosphere they have
provided. All have contributed much to my understanding of proba-
bility theory and statistics.
I am very grateful for the interest and encouragement accorded me
by various friends and colleagues. I particularly desire to thank Marvin
Zelen for his valuable suggestions.
To my students at Stanford who have contributed to this book by
their comments, I offer my thanks. Particularly valuable assistance has
been rendered by E. Dalton and D. Ylvisaker and also by M. Boswell
and P. Williams.
To the cheerful, hard-working staff of the Applied Mathematics and
Statistics Laboratory at Stanford, I wish to express my gratitude for
their encouragement. Great thanks are due also to Mrs. Mary Alice
McComb and Mrs. Isolde Field for their excellent typing and to Mrs.
Betty Jo Prine for her excellent drawings.
EMANUEL PARZEN
Stanford, California
January 1960
Contents
CHAPTER PAGE
PROBABILITY THEORY AS THE STUDY OF MATHEMATICAL MODELS
OF RANDOM PHENOMENA
Tables . . 441
Answers to Odd-Numbered Exercises 447
Index 459
LIst of Important Tables
TABLE PAGE
2-6A THE PROBABILITIES OF VARIOUS EVENTS DEFINED ON THE GENERAL
OCCUPANCY AND SAMPLING PROBLEMS. . . . . . . . . . .. 84
xv
CHAPTER 1
Probability Theory
as the Study
of Mathematical Models
of Random Phenomena
One of the most striking features of the present day is the steadily
increasing use of the ideas of probability theory in a wide variety of
scientific fields, involving matters as remote and different as the prediction
by geneticists of the relative frequency with which various characteristics
occur in groups of individuals, the calculation by telephone engineers of
the density of telephone traffic, the maintenance by industrial engineers of
manufactured products at a certain standard of quality, the transmission
1
2 FOUNDATIONS OF PROBABILITY THEORY CH. 1
(by engineers concerned with the design of communications and automatic-
control systems) of signals in the presence of noise, and the study by
physicists of thermal noise in electric circuits and the Brownian motion of
particles immersed in a liquid or gas. What is it that is studied in proba-
bility theory that enables it to have such diverse applications? In order
to answer this question, we must first define the property that is possessed
in common by phenomena such as the number of individuals possessing
a certain genetical characteristic, the number of telephone calls made in a
given city between given hours of the day, the standard of quality of the
items manufactured by a certain process, the number of automobile
accidents each day on a given highway, and so on. Each of these phenom-
ena may often be considered a random phenomenon in the sense of the
following definition.
A random (or chance) phenomenon is an empirical phenomenon charac-
terized by the property that its observation under a given set of circum-
stances does not always lead to the same observed outcome (so that there
is no deterministic regularity) but rather to different outcomes in such a
way that there is statistical regularity. By this is meant that numbers exist
between 0 and 1 that represent the relative frequency with which the
different possible outcomes may be observed in a series of observations of
independent occurrences of the phenomenon.
Closely related to the notion of a random phenomenon are the notions
of a random event and of the probability of a random event. A random
event is one whose relative frequency of occurrence, in a very long sequence
of observations of randomly selected situations in which the event may
occur, approaches a stable limit value as the number of observations is
increased to infinity; the limit value of the relative frequency is called the
probability of the random event.
In order to bring out in more detail what is meant by a random phenom-
enon, let us consider a typical random event; namely, an automobile
accident. It is evident that just where, when, and how a particular
accident takes place depends on an enormous number of factors, a slight
change in anyone of which could greatly alter the character of the accident
or even avoid it altogether. For example, in a collision of two cars, if one
of the motorists had started out ten seconds earlier or ten seconds later,
ifhe had stopped to buy cigarettes, slowed down to avoid a cat that happened
to cross the road, or altered his course for anyone of an unlimited number
of similar reasons, this particular accident would never have happened;
whereas even a slightly different turn of the steering wheel might have
prevented the accident altogether or changed its character completely,
either for the better or for the worse. For any motorist starting out on a
given highway it cannot be predicted that he will or will not be involved in
SEC. 1 RANDOM PHENOMENA 3
an automobile accident. Nevertheless, if we observe all (or merely some
very large number of) the motorists starting out on this highway on a
given day, we may determine the proportion that will have automobile
accidents. If this proportion remains the same from day to day, then we
may adopt the belief that what happens to a motorist driving on this high-
way is a random phenomenon and that the event of his having an automo-
bile accident is a random event.
Another typical random phenomenon arises when we consider the
experiment of drawing a ball from an urn. In particular, let us examine an
urn (or a bowl) containing six balls, of which four are white, and two are
red. Except for color, the balls are identical in every detail. Let a ball
be drawn and its color noted. We might be tempted to ask "what will be
the color of a ball drawn from the urn?" However, it is clear that there is
no answer to this question. If one actually performs the experiment of
drawing a ball from an urn, such as the one described, the color of the baIl
one draws will sometimes be white and sometimes red. Thus the outcome
of the experiment of drawing a ball is unpredictable.
Yet there are things that are predictable about this experiment. In
Table IA the results of 600 independent trials are given (that is, we have
TABLE lA
The number of white balls drawn in 600 trials of the experiment of drawing
a ball from an urn containing four white balls and two red balls.
taken an urn containing four white balls and two red balls, mixed the balls
weIl, drawn a ball, and noted its color, after which the ball drawn was
returned to the urn; these operations were repeated 600 times). It is seen
that in each block of 100 trials (as weIl as in the entire set of 600 trials) the
proportion of experiments in which a white ball is drawn is approximately
4 FOUNDATIONS OF PROBABILITY THEORY CH. 1
equal to j. Consequently, one may be tempted to assert that the propor-
tion i has some real significance for this experiment and that in a reasonably
long series of trials of the experiment t of the balls drawn will be colored
white. If one succumbs to this temptation, then one has asserted that the
outcome of the experiment (of drawing a ball from an urn containing six
balls, of which four are white and two are red) is a random phenomenon.
More generally, if one believes that the experiment of drawing a ball
from an urn will, in a long series of trials, yield a white ball in some definite
proportion (which one may not know) ofthe trials of the experiment, then
one has asserted (i) that the drawing of a ball from such an urn is a random
phenomenon and (ii) that the drawing of a white ball is a random event.
Let us give an illustration of the way in which one may use the know-
ledge (or belief) that a phenomenon is random. Consider a group of
300 persons who are candidates for admission to a certain school at which
there are facilities for only 200 students. In the interest of fairness it is
decided to use a random mechanism to choose the students from among
the candidates. In one possible random method the 300 candidates are
assembled in a room. Each candidate draws a ball from an urn containing
six balls, of which four are white; those who draw white balls are admitted
as students. Given an individual student, it cannot be foretold whether or
not he will be admitted by this method of selection. Yet, if we believe that
the outcome of the experiment of drawing a ball possesses the property of
statistical regularity, then on the basis of the experiment represented by
Table lA, which indicates that the probability of drawing a white ball is
t, we believe that the number of candidates who will draw white balls, and
consequently be admitted as students, will be approximately equal to 200
(note that 200 represents the product of (i) the number of trials of the
experiment and (ii) the probability of the event that the experiment will
yield a white ball). By a more careful analysis, one can show that the
probability is quite high that the num ber of candidates who will draw white
balls is between 186 and 214.
One of the aims of this book is to show how by means of probability
theory the same mathematical procedure can be used to solve quite
different problems. To illustrate this point, we consider a variation of the
foregoing problem which is of great practical interest. Many colleges find
that only a certain proportion of the students they admit as students
actually enrolL Consequently a college must decide how many students
to admit in order to be sure that enough students will enroll. Suppose that
a college finds that only two-thirds of the students it admits enroll; one
may then say that the probability is i that a student will enroll. If the
college desires to ensure that about 200 students will enroll, it should admit
300 students.
SEC. 2 MATHEMATICAL MODELS OF RANDOM PHENOMENA 5
EXERCISES
1935 1053
1940 1054
1945 1055
1950 1054
1951 1052
1952 1051
1953 1053
1954 1051
1955 1051
Would you say the event that a newborn baby is a boy is a random event?
If so, what is the probability of this random event? Explain your reasoning.
1.3. A discussion question. Describe how you would explain to a layman the
meaning of the following statement: An insurance company is not gambling
with its clients because it knows with sufficient accuracy what will happen
to every thousand or ten thousand or a million people even when the
company cannot tell what will happen to any individual among them.
One view that one may take about the nature of probability theory is
that it is part of the study of nature in the same way that physics, chemistry,
and biology are. Physics, chemistry, and biology may each be defined as
the study of certain observable phenomena, which we_may call, respectively,
6 FOUNDATIONS OF PROBABILITY THEORY CH. 1
the physical, chemical, and biological phenomena. Similarly, one might be
tempted to define probability theory as the study of certain observable
phenomena, namely the random phenomena. However, a random
phenomenon is generally also a phenomenon of some other type; it is a
random physical phenomenon, or a random chemical phenomenon, and
so on. Consequently, it would seem overly ambitious for researchers in
probability theory to take as their province of research all random
phenomena. In this book we take the view that probability theory is
not directly concerned with the study of random phenomena but rather
with the study of the methods of thinking that can be used in the
study of random phenomena. More precisely, we make the following
definition.
The theory ofprobability is concerned with the study of those methods of
analySiS that are common to the study ofrandom phenomena in all thefields in
which they arise. Probability theory is thus the study of the study of
random phenomena, in the sense that it is concerned with those properties
of random phenomena that depend essentially on the notion of random-
ness and not on any other aspects of the phenomenon considered. More
fundamentally, the notions of randomness, of a random phenomenon, of
statistical regularity, and of "probability" cannot be said to be obvious
or intuitive. Consequently, one of the main aims of a study of the theory
of probability is to clarify the meaning of these notions and to provide us
with an understanding of them, in much the same way that the study of
arithmetic enables us to count concrete objects and the study of electro-
magnetic wave theory enables us to transmit messages by wireless.
We regard probability theory as a part of mathematics. As is the case
with all parts of mathematics, probability theory is constructed by means
of the axiomatic method. One begins with certain undefined concepts.
One then makes certain statements about the properties possessed by, and
the relations between, these concepts. These statements are called the
axioms of the theory. Then, by means of logical deduction, without any
appeal to experience, various propositions (called theorems) are obtained
from the axioms. Although the propositions do not refer directly to
the real world, but are merely logical consequences of the axioms,
they do represent conclusions about real phenomena, namely those real
phenomena one is willing to assume possess the properties postulated in
the axioms.
We are thus led to the notion of a mathematical model of a real phenom-
,enon. A mathematical theory constructed by the axiomatic method is
said to be a model of a real phenomenon, if one gives a rule for translating
propositions of the mathematical theory into propositions about the real
phenomenon. This definition i~ vague, for it does not state the character
SEC. 2 MATHEMATICAL MODELS OF RANDOM PHENOMENA 7
of the rules of translation one must employ. However, the foregoing
definition is not meant to be a precise one but only to give the reader an
intuitive understanding of the notion of a mathematical model. Generally
speaking, to use a mathematical theory as a model for a real phenomenon,
one needs only to give a rule for identifying the abstract objects about which
the axioms of the mathematical theory speak with aspects of the real
phenomenon. It is then expected that the theorems of the theory will
depict the phenomenon to the same extent that the axioms do, for the
theorems are merely logical consequences of the axioms.
As an example of the problem of building models for real phenomena,
let us consider the problem of constructing a mathematical theory (or
explanation) of the experience recorded in Table lA, which led us to
believe that a long series of trials (of the experiment of drawing a ball from
an urn containing six balls, of which four -are white and two red) would
yield a white ball in approximately i of the trials. In the remainder of this
chapter we shall construct a mathematical theory of this phenomenon,
which we believe to be a satisfactory model of certain features of it. It may
clarify the ideas involved, however, if we consider here an explanation of
this phenomenon, which we shall then criticize.
We imagine that we are permitted to label the six balls in the urn with
numbers I to 6, labeling the four white balls with numbers 1 to 4. When a
ball is drawn from the urn, there are six possible outcomes that can be
recorded; namely, that ball number I was drawn, that ball number 2 was
drawn, etc. Now four of these outcomes correspond to the outcome that a
white ball is drawn. Therefore the ratio of the number of outcomes of the
experiment favorable to a white ball being drawn to the number of all
possible outcomes is equal to i- Consequently, in order to "explain" why
the observed relative frequency of the drawing of a white ball from the
urn is equal to i, one need only adopt this assumption (stated rather
informally): the probability of an event (by which is meant the relative
frequency with which an event, such as the drawing of a white ball, is
observed to occur in a long series of trials of some experiment) is equal to
the ratio of the number of outcomes of the experiment in which the event
may be observed to the number of all possible outcomes ofthe experiment.
There are several grounds on which one may criticize the foregoing
explanation. First, one may state that it is not mathematical, since it does
not possess a structure of axioms and theorems. This defect may perhaps
be remedied by using the tools that we develop in the remainder of this
chapter; consequently, we shall not press this criticism. However, there
is a second defect in the explanation that cannot be repaired. The assump~
tion stated, that the probability of an event is equal to a certain ratio, does
not lead to an explanation of the observed phenomenon because by counting
8 FOUNDATIONS OF PROBABILITY THEORY CH.l
in different ways one can obtain different values for the ratio. We have
already obtained a value of i for the ratio; we next obtain a value of t.
If one argues that there are merely two outcomes (either a white ball or a
nonwhite ball is drawn), then exactly one of these outcomes is favorable
to a white ball being drawn. Therefore, the ratio of the number of
outcomes favorable to a white ball being drawn to the number of possible
outcomes is t.
We now proceed to develop the mathematical tools we require to construct
satisfactory models of random phenomena.
4. EVENTS
A B
8
D
F
c D
Fig.4A. A Venn diagram. The shaded area represents EO.
Fig.4B. A Venn diagram. The shaded area represents EF.
Fig. 4C. A Venn diagram. The shaded area represe.nts E U F.
Fig. 4D. A Venn diagram. The shaded area (or rather the lack of a shaded area)
represents the impossible event 0, which is the intersection of the two mutually exclusive
events E and F.
We then have the basic principle that E equals F if and only if Eisa
subevent of F and F is a subevent of E. In symbols,
(4.1) E = F if and only if E c F and FeE.
The interesting question arises whether the operations of event union
and event intersection may be applied to an arbitrary pair of events E and
F. In particular, consider two events, E and F, that contain no descriptions
SEC. 4 EVENTS 15
in common; for example, suppose S = {l, 2, 3, 4, 5, 6}, E = {l, 2}, F =
{3,4}. The union E U F = {I, 2, 3, 4} is defined. However, what
meaning is to be assigned to the intersection EF? To meet this need, we
introduce the notion of the impossible event, denoted by 0. The impossible
event 0 is defined as the event that contains no descriptions and therefore
cannot occur. In set theory the impossible event is called the empty set.
One important property of the impossible event is that it is the complement
of the certain event S; clearly SC = 0, for it is impossible for S not to
occur. A second important property of the impossible event is that it is
equal to the intersection of any event E and its complement Ee; clearly,
EEe = 0, for it is impossible for both an event and its complement to occur
simultaneously. .
Any two events, E and F, that cannot occur simultaneously, so that
their intersection EF is the impossible event, are said to be mutually
exclusive (or disjoint). Thus, two events, E and F, are mutually exclusive
if and only if EF = 0.
Two mutually exclusive events may be represented on a Venn diagram
by the interiors of two geometrical figures that do not overlap, as in
Fig. 4D. The impossible event may be represented by the shaded area on a
Venn diagram, in which there is no shading, as in Fig. 4D.
Events may be defined verbally, and it is important to be able to express
them in terms of the event operations. For example, let us consider two
events, E and F. The event that exactly one of the events, E and F, will
occur is equal to EP U EeF; the event that exactly none of the events,
E and F, will occur is equal to £cP. The event that at least one (that is, one
or more) of the events, E or F, will occur is equal to E U F. The event that
at most one (that is, one or less) of the events will occur is equal to (EFy =
Ee U FC.
The operations of event union and event intersection have many of the
algebraic properties of ordinary addition and multiplication of numbers
(although they are conceptually quite distinct from the latter operations).
Among the important algebraic properties of the operations E U F and
EF are the following relations, which hold for any events E, F, and G:
Commutative law EuF=cFuE EF=FE
Associative law E u (F u G) = (Eu F) u G E(FG) = (EF)G
Distributive law E(FuG) =EFuEG E u (FG) = (E u F)(E u G)
Idempotency law EuE=E EE=E
In order to verify these identities, one can establish in each case that the
left-hand side of the identity is a subevent of the right-hand side and that
the right-hand side is a subevent of the left-hand side.
EXERCISES
4.1. An experiment consists of drawing 3 radio tubes from a lot and testing them
for some characteristic of interest. If a tube is defective, assign the letter D
to it. If a tube is good, assign the letter G to it. A drawing is then described
by a 3-tuple, each of whose components is either D or G. For example,
(D, G, G) denotes the outcome that the first tube drawn was defective and
the remaining 2 were good. Let Al denote the event that the first tube drawn
was defective, A2 denote the event that the second tube drawn was defective,
and A3 denote the event that the third tube drawn was defective. Write
down the sample description space of the experiment and list all sample
descriptions in the events AI' A 2 , A a, Al U A 2 , Al U A 3 , A2 u Aa,
Al u A2 V A a, A 1 A 2, AlA 3' A2Aa, A 1A 2A 3 •
4.2. For each of the following 16 events draw a Venn diagram similar to Figure
4A or 4B and on it shade the area corresponding to the event. Only 7
diagrams will be required to illustrate the 16 events, since some of the events
described are equivalent. (i) ABC, (ii) ABC U ACB, (iii) (A u B)", (iv) ACBe,
(v) (A B)", (vi) A" u Be, (vii) the event that exactly 0 of the events, A and B,
SEC. 5 PROBABILITY AS A FUNCTION OF EVENTS 17
occurs, (viii) the event that exactly 1 of the events, A and E, occurs, (ix) the
event that exactly 2 of the events, A and B, occur, (x) the event that at least
o of the events A and B, occurs, (xi) the event that at least 1 of the events,
A and B, occurs, (xii) the event that at least 2 of the events, A and B, occur,
(xiii) the event that no more than 0 of the events, A and E, occurs, (xiv) the
event that no more than 1 of the events, A and B, occurs, (xv) the event that
no more than 2 of the events, A and B, occur, (xvi) the event that A occurs
and B does not occur. Remark: By "at least 1" we mean "lor more," by
"no more than 1" we mean "lor less," and so on.
4.3. Let S = {I, 2, 3,.4, 5, 6, 7, 8, 9, 10, 11, 12}, A = {I, 2, 3, 4, 5, 6}, and
B = {4, 5, 6, 7, 8, 9}. For each of the events described in exercise 4.2, write
out the numbers that are members of the event.
4.4. For each of the following 12 events draw a Venn diagram and on it shade
the area corresponding to the event: the event that of1he events A, B, e,
there occur (i) exactly 0, (ii) exactly 1, (iii) exactly 2, (iv) exactly 3, (v) at least
0, (vi) at least 1, (vii) at least 2, (viii) at least 3, (ix) no more than 0, (x) no
more than 1, (xi) no more than 2, (xii) no more than 3.
4.5. Let S, A, B be as in exercise 4.3, and let e = {7, 8, 9}. For each of the
events described in exercise 4.4, write out the numbers that are members of
the event.
4.6. Prove (4.4). Note that (4.4) states that the impossible event behaves under
°
the operations of intersection and union in a manner similar to the way in
which the number behaves under the operations of multiplication and
addition.
4.7. Prove (4.5). Show further that the events F and EF" are mutually exclusive.
The mathematical notions are now at hand with which one may state the
postulates of a mathematical model of a random phenomenon. Let us
recall that in our heuristic discussion of the notion of a random phenomenon
in section 1 we accepted the so-called "frequency" interpretation of
probability, according to which the probability of an event E is a number
(which we denote by pre]). This number can be known to us only by
experience as the result of a very long series of observations of independent
trials of the event E. (By a trial of E is meant an occurrence of the
phenomenon on which E is defined.) Having observed a long series of
trials, the probability of E represents the fraction of trials whose outcome
has a description that is a member of E. In view of the frequency inter-
pretation of pre], it follows that a mathematical definition of the probability
of an event cannot tell us the value of pre] for any particular event E.
Rather a mathematical theory of probability must be concerned with the
18 FOUNDATIONS OF PROBABILITY THEORY CR. 1
EXERCISES
The event ABc u BAc is the event that exactly 1 of the events, A and B,
will occur. Contrast (5.12) with (5.4), which could be called the formula
for the probability that at least 1 of 2 events will occur.
5.3. Show that for any 3 events, A, B, and C, defined on a probability space,
the probability of the event that at least 1 of the events will occur is given by
peA u B u C] = PEA] + PCB] + P[C} - P[AB] - P[AC]
- P[BC] + P[ABC].
To prove (6.1), one need note only that if E consists of the descriptions
D i" D i2 , ... , Dik then E can be written as the union of the mutually
exclusive single-member events {D i ,}, {D i2 }, ... , {DiJ Equation (6.1)
follows immediately from (5.8).
Let E be the event that the ball drawn on the first draw is white. The event
E may be represented as a set of descriptions by E = {(W, W), (W, R)}.
Then, by (6.1), prE] = P[{(W, W)}] + P[{(W, R)}l = i. ....
SEC. 7 EQUALLY LIKELY DESCRIPTIONS 25
TABLE 7A
LICENSE PLATES WITH FIRST DIGIT 1
All License Plates in the Following Number of Integers
Intervals Have First Digit 1 in this Interval
1
10-19 10
100-199 100
1000-1999 1000
10,000-19,999 10,000
100,000-199,999 100,000
1,000,000-1,999,999 1,000,000
-------- ---------------------------------
28 FOUNDATIONS OF PROBABILITY THEORY CH. 1
EXERCISES
7.1. Suppose that a die (with faces marked I to 6) is loaded in such a manner
that, for k = 1, ... , 6, the probability of the face marked k turning up
when the die is tossed is proportional to k. Find the probability of the event
that the outcome of a toss of the die will be an even number.
7.2. What is the probability that the thirteenth of the month will be (i) a Friday
or a Saturday, (ii) a Saturday, Sunday, or Monday?
7.3. Let a number be chosen from the integers 1 to 100 in such a way that each of
these numbers is equally likely to be chosen. What is the probability that
the number chosen will be (i) a multiple of 7, (ii) a multiple of 14?
7.4. Consider a state in which the license plates of automobiles are numbered
serially, beginning with 1. What is the probability that the first digit on the
license plate of an automobile selected at random will be the digit 1,
assuming that the number of automobiles registered in the state is equal to
(i) 999,999, (ii) 1,000,000, (iii) 1,500,000, (iv) 2,000,000, (v) 6,000,000?
7.5. What is the probability that a ball, drawn from an urn containing 3 red
balls, 4 white balls, and 5 blue balls, will be white? State carefully any
assumptions that you make.
7.6. A research problem. Using the same assumptions as those with which the
table in (7.3) was derived, find the probability that Christmas (December 25)
is a Monday. Indeed, show that the probability that Christmas will fall on a
given day of the week is supplied by the following table:
x Sunday Monday Tuesday Wednesday Thursday Friday Saturday
58 56 58 57 57 58 56
P[{x}]
400 400 400 400 400 400 400
Basic
Probability Theory
(1.4) 11 = 0, 1, 2, ... , M.
(1.5) O! = 1.
(1.6)
36 BASIC PROBABILITY THEORY CH.2
~ Example IC. (4)0 = 1, (4h = 4, (4)2 = 12, (4)3 = 24, (4)4 = 4! = 24.
Note that (4)5 is undefined at present. It is later defined as having
value O. ~
(1.7)
(1.9)
°
if either k < or k > N.
We next note the extremely useful relation, holding for N = 1,2, ... ,
and k = 0, ±l, ±2, ... ,
(1.11) (kN
-l
) + (N)
k -
_ (N k+ 1) .
This relation may be verified directly from the definition of binomial
coefficients. An intuitive justification of (1.11) can be obtained. Given a
set S, with N + 1 members, choose an element t in S. The number of
subsets of S of size k in which t (Z), whereas
is not present is equal to
the sum of these two quantities is equal to (N t 1), the total number of
subsets of S of size k.
38 BASIC PROBABILITY THEORY cH.2
Equation (1.11) is the algebraic expression of a fact represented in
tabular form by Pascal's triangle:
(~) = 1 G) = 1
(~) =1 (n =2 (;) = 1
(~) =1 G) = 3 G) =3 G) = 1
(~) = 1 (~) = 4 (~) =6 (:) = 4 (:) =1
(~) .= 1 (i) =5 (;) = 10 G) = 10 (~) =5 (;) = 1
and so on. Equation (1.11) expresses the fact that each' term in Pascal's
triangle is the sum of the two terms above it.
One also notices in Pascal's triangle that the entries on each line
are symmetric about the middle entry (or entries). More precisely, the
binomial coefficients have the property that for any positive integer Nand
k = 0,1,2, ... , N
(l.12)
To prove (l.12) one need note only that each side of the equation is equal
to N!/k!(N - k)!.
It should be noted that with (1.11) and the aid of the principle of
mathematical induction one may prove the binomial theorem.
The mathematical facts are now at hand to determine how many
subsets of a set of size N one may form. From the binomial theorem (I.9),
with a = b = 1, it follows that
(1.13)
From (1.13) it follows that the number of events (including the impossible
event) that can be formed on a sample description space of size N is 2N.
For there is one impossible event, (~) events of size 1, (~) events of
N!
(1.15)
~
Next, one obtains !
I
\
It
Continuing in this manner, one finds that (1.14) is equal to I\.
N! I.
(1.16) I!
I
Quantities of the form of (1.16) arise frequently, and a special notation is
I
introduced to denote them. For any integer N, and r nonnegative integers
kl' k2' ... , k, whose sum is N, we define the multinomial coefficient:
I,
!
!
(1.17) I
The multinomial coefficients derive their name from the fact that they are
I1
the coefficients in the expansion of the Nth power of the multinomial form
a 1 + a2 + ... + a, in terms of powers of al> a 2, ... , a,:
(1.18)
since a bridge hand constitutes a set of thirteen cards selected from a set of
SEC. I SAMPLES AND n-TUPLES 41
52. The number of ways in which a bridge deck may be dealt into four
hands (labeled, as is usual, North, West, South, and East) is
EXERCISES
1.1. A restaurant menu lists 3 soups, 10 meat dishes, 5 desserts, and 3 beverages.
In how many ways can a meal (consisting of soup, meat dish, dessert, and
beverage) be ordered?
1.2. Find the value of (i) (5)3' (ii) (5)3, (iii) 5! (iv) (~).
1.3. How many subsets of size 3 does a set of size 5 possess? How many
subsets does a set of size 5 possess?
1.4. In how many ways can a bridge deck be partitioned into 4 hands, each of
size 13?
1.5. Five politicians meet at a party. How many handshakes are exchanged if
each politician: shakes hands with every other politician once and only once?
1.6. Consider a college professor who every year tells exactly 3 jokes in his
course. If it is his policy never to tell the same 3 jokes in any year that he
has told in any other year, what is the min imum number of jokes he will
tell in 35 years? If it is his policy never to tell the same joke twice, what is
the minimum number of jokes he will tell in 35 years?
1.1. In how many ways can a student answer an 8-question, true-false examina-
tion if (i) he marks half the questions true and half the questions false,
(ij) he marks no two consecutive answers the same?
1.8. State, by inspection, the value of
34
4·3- . 32 + 4·3·2
+ 4 . 33 + . - - . 3 + l.
1·2 1·2·3
1.10. Find the value of (i) (2 i 2)' (ii) (2 ~ 1)' (iii) (5 ~ 0)' (iv) (3 ~ 0)·
Explain why (3 ~ 0) = G)·
42 BASIC PROBABILITY THEORY CH.2
1.12. Given an alphabet of n symbols, in how many ways can one form words
consisting of exactly k symbols? Consequently, find the number of possible
3 letter words that can be formed in the English language.
1.13. Find the number of 3-letter words that can be formed in the English
language whose first and third letters are consonants and whose middle
letter is a vowel.
1.14. Use (1.11) and the principle of mathematical induction to prove the
binomial theorem, which is stated by (1.9).
In words, one may read (2.1) as follows: S is the set of a112-tuples (zl' Z2)
whose components are any numbers, 1 to 6, subject to the restriction that
SEC. 2 POSING PROBABILITY PROBLEMS MATHEMATICALLY 43
no two components of a 2-tuple are equal. The jth component Zj of a
description represents the number of the ball drawn on the jth draw. Now
let A be the event that both balls drawn are white, let B be the event that
both balls drawn are red, and let C be the event that at least one of the
balls drawn is white. The problem at hand can then be stated as one of
finding (i) PIA], (ii) PIA U B], (iii) P[ C]. It should be noted that C = Be,
so that P[C] = 1 - PCB]. Further, A and B are mutually exclusive, so
that peA U B] = peA] + PCB]. Now
(2.2) A = {(l, 2), (1, 3), (1, 4), (2,1), (2, 3), (2, 4),
(3,1), (3, 2), (3,4), (4,1), (4, 2), (4, 3)}
whereas B = {(5, 6), (6, 5)}. Let us assume that all descriptions in S are
equally likely. Then
The answers to the questions posed in example 2A are given, in the case
of sampling without replacement, by (i) peA] = 0.4, (ii) P[ A U B] = 0.466,
(iii) P[C] = 0.933. These probabilities have been obtained under the
assumption that the balls in the urn may be regarded as numbered
(distinglJishable) and that all descriptions in the sample description space
S given in (2.1) are equally likely. In the case of sampling with replacement,
a similar analysis may be carried out; one obtains the answers
4·4 2·2
(2.4) peA] = 6 . 6 = 0.444, PCB] = 6 . 6 = 0.11,
Under the assumption that all descriptions in S are equally likely, one
would conclude that P[A] = t, P[A U B] = ~, P[C] = §. ....
The next example illustrates the treatment of problems concerning urns
of arbitrary composition. It also leads to a conclusion that the reader
may find startling ifhe considers the following formulation of it. Suppose
that at a certain time the milk section of a self-service market is known
to coi1tain 150 quart bottles, of which 100 are fresh. If one assumes that
each bottle is equally likely to be drawn, then the probability is i that a
bottle drawn from the section will be fresh. However, suppose that one
selects one bottle after each of fifty other persons have selected a bottle.
Is one's probability of drawing a fresh bottle changed from what it would
have been had one been the first to draw? By the reasoning employed in
example 2B it can be shown that the probability that the fifty-first bottle
drawn will be fresh is the same as the probability that the first bottle
drawn will be fresh.
~ Example 2B. An urn of arbitrary composition. An urn contains M
balls, of which Mw are white and M B are red. A sample of size 2 is
drawn with replacement (without replacement). What is the probability
that (i) the first ball drawn will be white, (ii) the second ball drawn will
be white, (iii) both balls drawn will be white?
Solution: Let A denote the event that the first ball drawn is white,
B denote the event that the second ball drawn is white, and C denote the
event that both balls drawn are white. It should be noted that C = AB.
Let the balls in the urn be numbered 1 to M, the white balls bearing
numbers 1 to M w , and the red balls bearing numbers M 1V + 1 to M.
We consider first the case of sampling with replacement. The sample
description space S of the experiment consists of ordered 2-tuples (Zl' Z2)'
in which Zl is the number of the ball drawn on the first draw and Z2 is
the number of the ball drawn on the second draw. Clearly, N[S] = M2.
To compute N[A], we use the fact that a description is in A if and only if
its first component is a number 1 to Mw (meaning a white ball was
drawn on the first draw) and its second component is a number 1 to M
(due to the sampling with replacement th e color of the ball drawn on the
second draw is n·ot affected by the fact that the first ball drawn was white).
Thus there are Mw possibilities for the first component, and for each of
these M possibilities for the second component of a description in A.
Consequently, by (1.1), the size of A is MwM. Similarly, N[B] = MM w,
since there are M possibilities for the first component and Mw possi-
bilities for the second component of a description in B. The reader may
verify by a similar argument that the event AB, (a white ball is drawn on
both draws), has size N[AB] = MwMw. Thus in the case of sampling
SEC. 2 POSING PROBABILITY PROBLEMS MATHEMATICALLY 45
with replacement one obtains the result, if all descriptions are equally
likely, that
(2.6) peA]
Mw
= PCB] = M ' [ ] _ Mw(Mw - 1)
P AB - M (M - l ) '
Another way of computing PCB], which the reader may find more
convincing 011 first acquaintance with the theory of probability, is as
follows. Let BI denote the event that the first ball drawn is white and
the second ball drawn is white. Let B2 denote the event that the first ball
drawn is red and the second ball drawn is white. Clearly, N[B1] =
Mw(Mw - 1), N[BJ = (M - Mw)Mw' SinceP[B] = P(BI] + P[B2], we
have
MwCMw - 1) (M - Mw)Mw Mw
PCB] = M(M _ 1) + M(M - 1) = M .
To illustrate the use of (2.5) and (2.6), let us consider an urn containing
M = 6 balls, of which Mw = 4 are white. Then peA] = PCB] = i and
P[AB] = ~ in sampling with replacement, whereas peA] = PCB] = i and
t
P[AB] = in sampling without replacement.
46 BASIC PROBABILITY THEORY CH.2
The reader may find (2.6) startling. It is natural, in the case of sampling
with replacement, in which P[A] = P[B], that the probability of drawing
a white ball is the same on the second draw as it is on the first draw,
since the composition of the urn is the same in both draws. However,
it seems very unnatural, if not unbelievable, that in sampling without
replacement P[A] = P[B]. The following remarks may clarify the meaning
·of (2.6).
Suppose that one desired to regard the event that a white ball is drawn
Dn the second draw as an event defined on the sample description space,
denoted by S', which consists of all possible outcomes of the second draw.
To begin with, one might write S' = {I, 2, ... ,M}. However, how is a
probability function to be defined on the subsets of S' in the case in which
the sample is drawn without replacement. If one knows nothing about
the outcome of the first draw, perhaps one might regard all descriptions
in S' as being equally likely; then, P[B] = Mw!M. However, suppose
one knows that a white ball was drawn on the first draw. Then the
descriptions in Sf are no longer equally likely; rather, it seems plausible
to assign probability 0 to the description corresponding to the (white)
ball, which is not available on the second draw, and assume the remaining
descriptions to be equally likely. One then computes that the probability
of the event B (that a white ball will be drawn on the second draw), given
that the event A (that a white ball was drawn on the first draw) has
occurred, is equal to (Mw - I)!(M - 1). Thus (Mw - l)!(M - 1)
represents a conditional probability of the event B (and, in particular, the
conditional probability of B, given that the event A has occurred), whereas
M w! M represents the unconditional probability of the event B. The
distinction between unconditional and conditional probability is made
precise in section 4. ....
The next example we shall consider is a generalization of the celebrated
problem of repeated birthdays. Suppose that one is present in a room in
which there are 11 people. What is the probability that no two persons in
the room have the same birthday? Let it be assumed that each person
in the room can have as his birthday anyone of the 365 days in the year
(ignoring the existence of leap years) and that each day of the year is
equally likely to be the person's birthday. Then selecting a birthday for
each person is the same as selecting a number randomly from an urn
containing M = 365 balls, numbered 1 to 365. It is shown in example 2C
that the probability that no two persons in a room containing 11 persons
will have the same birthday is given by
(2.7) (1- _1
365
) (1 _2)
365
... (1 _ ~)
365 •
SEC. 2 POSING PROBABILITY PROBLEMS MATHEMATICALLY 47
The value of (2.7) for various values of n appears in Table 2A.
TABLE 2A
In a room containing n persons let P n be the probability
that there are not two or more persons in the room with the
same birthday and let Qn be the probability that there are
two or more persons with the same birthday.
n Pn Qn
4 0.984 0.016
8 0.926 0.074
12 0.833 0.161'
16 0.716 0.284
20 0.589 0.411
22 0.524 0.476
23 0.493 0.507
24 0.462 0.538
28 0.346 0.654
32 0.247 0.753
40 0.109 0.891
48 0.039 0.961
56 0.012 0.988
64 0.003 0.997
From Table 2A one determines a fact that many students find startling
and completely contrary to intuition. How many people must there be
in a room in order for the probability to be greater than 0.5 that at least
two of them will have the same birthday? Students who have been asked
this question have given answers as high as 100, 150, 365, and 730. In
fact, the answer is 23!
~ Example 2e. The probability of a repetition in a sample drawn with
replacement. Let a sample of size n be drawn with replacement from an
urn containing M balls, numbered I to M. Let P denote the probability
that there are no repetitions in the sample (that. is, that all the numbers
in the sample occur just once). Let us show that
were, in fact, red. To compute the size of A, we notice that the three
numbers in a description in A, corresponding to a correct guess, may be
(:) (~) 8
P[A] = (!) = 35'
EXERCISES
(3.1) peA ] =
k
(n)k (MwMM(M)..
- MW)n-k
'
k = 0, 1, ... ,no
whereas in the case of sampling with replacement
(3.3)
the proportion of white balls in the urn. The formula for P[A k ] can then
be compactly written, in the case of sampling with replacement,
(3.4)
(3.5)
54 BASIC PROBABILITY THEORY CH.2
(3.6)
(1000q)100
(3.7) ()
Pp= +1 00 1000p(l000q)99"
(1000)100 (1000)100
0.95
0.9
P(p)
0.8
0.7
0.6
0.5
0.4
0.3
0.2 -I
I
I
1
0.1 I
1
1
I
o 0.005 0.1 P
o 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
1
(3.9) Po = - = (2.718)-1 = 0.368, PI = 1 - e- I = 0.632.
e
(3.10)
so that prE] = 60/3 6 = 0.082. To prove (3.10), we note that the number
of samples of size 6, which contain three calls for A, two calls for B, and
one call for C, is the number of ways one can partition the set {I, 2, 3,4,5, 6}
into three ordered subsets of sizes 3, 2, and I, respectively. ....
THEORETICAL EXERCISES
3.3. Consider an urn containing n balls, each of a different color. Let r be any
integer. Show the probability that a sample of size r drawn with replace-
ment will contain r 1 balls of color 1, r 2 balls of color 2, ... , r n balls of
color n, where r 1 + r2 + ... + rn = r is given by
~ Cr
1 2 .r. . rn) .
3.4. An urn contains M balls, numbered 1 to M. Let N numbers be designated
"lucky," where N:O;; M. Let a sample of size n be drawn either without
replacement (in which case n :0;; M), or with replacement. Show that the
probability that the sampl«. will contain exactly k balls with "lucky"
numbers is given by (3.1) and (3.2), respectively, with Mw replaced by N.
EXERCISES
3.5. Consider 3 urns; urn I contains 2 white and 4 red balls, urn II contains
8 white and 4 red balls, urn III contains 1 white and 3 red balls. One ball
is selected from each urn. Find the probability that the sample drawn will
contain exactly 2 white balls.
3.6. A box contains 24 bulbs, 4 of which are known to be defective and the
remainder of which is known to be nondefective. What is the probability
that 4 bulbs selected at random from the box will be nondefective?
3.7. A box contains 50 razor blades, 5 of which are known to be used, the
remainder' unused. What is the probability that 5 razor blades selected
from the box will be unused?
3.8. A fisherman caught 10 fish, 3 of which were smaller than the law permits
to be caught. A game warden inspects the catch by examining 2, which he
selects at random among the fish. What is the probability that he will not
select any undersized fish?
3.9. A professional magician named Sebastian claimed to be able to "read
minds." In order to test his claims, an experiment is conducted with 5
cards, numbered 1 to 5. A person concentrates on the numbers of 2 of
the cards, and Sebastian attempts to "read his mind" and to name the 2
cards. What is the probability that Sebastian will correctly name the 2
cards, under the assumption that he is merely guessing?
3.10. Find approximately the probability that a sample of 100 items drawn
from a lot of 1000 items contains 1 or fewer defective items if the pro-
portion of the lot that is defective is (i) om, (ii) 0.02, (iii) 0.05.
3.11. The contract between a manufacturer of electrical equipment (such as
resistors or condensors) and a purchaser provides that out of each lot of
100 items 2 will be selected at random and subjected to a test. In negotia-
tions for the contract the following two acceptance sampling plans are
considered. Plan (a): reject the lot if both items tested are defective;
otherwise accept the lot. Plan (b): accept the lot if both items tested are
good; otherwise reject the lot. Obtain the operating characteristic curves
of each of these plans. Which plan is more satisfactory to (i) the purchaser,
(ii) the manufacturer? If you were the purchaser, would you consider
either of the plans acceptable?
3.12. Consider a lottery that sells 25 tickets, and offers (i) 3 prizes, (ii) 5 prizes.
If one buys 5 tickets, what is the probability of winning a prize?
3.13. Consider an electric fixture (such as Christmas tree lights) containing 5
electric light bulbs which are connected so that none will operate if any
one of them is defective. If the light bulbs in the fixture are selected
randomly from a batch of 1000 bulbs, 100 of which are known to be
defective, find the probability that all the bulbs in the electric fixture will
operate.
60 BASIC PROBABILITY THEORY CH.2
3.14. An urn contains 52 balls, numbered I to 52. Find the probability that a
sample of 13 balls drawn without replacement will contain (i) each of the
numbers 1 to 13, (ii) each of the numbers I to 7.
3.15. An urn contains balls of 4 different colors, each color being represented
by the same number of balls. Four balls are drawn, with replacement.
What is the pro bability that at least 3 different colors are represented in the
sample?
3.16. From a committee of 3 Romans, 4 Babylonians, and 5 Philistines a sub-
committee of 4 is selected by lot. Find the probability that the committee
will consist of (i) 2 Romans and 2 Babylonians, (ii) 1 Roman, I Babylonian,
and 2 Philistines; (iii) 4 Philistines.
3.17. Consider a town in which there are 3 plumbers; on a certain day 4
residents telephone for a plumber. If each resident selects a plumber at
random from the telephone directory, what is the probability that (i) all
plumbers will be telephoned, (ii) exactly 1 plumber will be telephoned?
3.18. Six persons, among whom are A and B, are arranged at random (i) in a
row, (ii) in a ring. What is the probability that (a) A and B will stand
next to each other, (b) A and B will be separated by one and only one
person?
4. CONDITIONAL PROBABILITY
If the balls numbered 1 to 4 are colored white, and the balls numbered 5
and 6 are colored red, then the outcome of the thirty trials can be recorded
as follows:
(W, R), (W, R), (W, W), (R, W), (W, W), (W, W)
(W, W), (R, W), (W, W), (W, W), (W,R), (R, R)
(R, W), (W, W), CR, W), CR, R), (W, R), (R, W)
CW, W), (R, W), (W, W), (W,R), (W,R), (R, W)
CW, W), CR, W), (W,R), CR, W), (R, W), (W, W)
NA NB
(4.1) P[A] = N' P[B] = N' P[AB] = NAB
N
In: analogy with (4.3) we now give the following formal definition of
P[B I A]:
FORMAL DEFINITION OF CONDITIONAL PROBABILITY. Let A and B be two
events on a sample description space S, on the subsets of which is defined
a probability function P[·]. The conditional probability of the event B,
given the event A, denoted by P[B I A], is defined by
~ Example 4D. Consider a family with two children. Assume that each
chiid is as likely to be a boy as it is to be a girl. What is the conditional
probability that both children are boys, given that (i) the older child is a
boy, (ii) at least one of the children is a boy?
Solution: Let A be the event that the older child is a boy, and let B be
the event that the younger child is a boy. Then A U B is the event that
at least one of the children is a boy, and AB is the event that both children
are boys. The probability that both children are boys, given that the
older is a boy, is equal to
P[AB] 1/4 1
(4.7) P(AB I A] = P[A] = 1/2 = 2 .
The probability that both children are boys, given that at least one of
them is a boy, is equal to since (AB)(A U B) = AB
P[AB] 1/4 1
(4.8) P[AB I A U E] = P[A = -3/4 = -3 .
U B]
~ Example 4E. The outcome of a draw, given the outcome of a sample. Let
a sample of size 4 be drawn with replacement (without replacement),
tram an urn containing twelve balis, of which eight are white. Find the
conditional probability that the ball drawn on the third draw was white,
given that the sample contains three white balls.
Solution: Let A be the event that the sample contains exactly three
white balls, and let B be the event that the ball drawn on the third draw
64 BASIC PROBABILITY THEORY CH.2
was white. The problem at hand is to find P[B A]. In the case of sampling
1
with replacement
(~)834 (;) 3
(4.9) P[A] = (12)4 ' P[B I A] =- =-.
(~) 4
In the case of sampling without replacement
THEORETICAL EXERCISES
4.1. Prove the following statements, for any events A, E, and C, such that
P[C] > O. These relations illustrate the fact that all general theorems
on probabilities are also valid for conditional probabilities with respect
to any particular event C.
(i) PES C] = 1
1 where S is the certain event.
(ii) PEA C] = 1
1 if C is a subevent of A.
(iii) PEA C] = 0
1 if PEA] = O.
(iv) PEA UBI C] = P[A I C] + P[B 1C] - P[AB I' C].
(v) PEA" I C] = 1 - PEA 1 C].
SEC. 4 CONDITIONAL PROBABIUTY 65
4.2. Let B be an event of positive probability. Show that for any event A,
(i) A c B implies P[A I B] = P[A]/P[B],
(ii) B c A implies P[A I B] = 1.
4.3. Let A and B be two events, each with positive probability. Show that
statement (i) is true, whereas statements (ii) and (iii) are, in general, false:
(i) P[A I B] + P[AC I B] = 1.
(ii) P[A I B] + P[A I BC] = 1.
(iii) P[A I B] + P[AC I BC] = 1.
4.4. An urn contains M balls, of which Mw are white (where Mw :::; M).
Let a sample of size n be drawn from the urn either with replacement or
without replacement. For j = 1,2, ... ,n let B j be the event that the
ball drawn on the jth draw is white. For k = 1, 2, ... , n let Ak be the
event that the sample (of size n) contains exactly k white balls. Show that
P[B j I A,J = kin. Express this fact in words.
4.5. An urn contains M balls, of which Mw are white. n balls are drawn and
laid aside (not replaced in the urn), their color unnoted. Another ball is
drawn (it is assumed that n is less than M). What is the probability that
it will be white? Hint: Compare example 2B.
EXERCISES
4.1. A man tosses 2 fair coins. What is the conditional probability that he has
tossed 2 heads, given that he has tossed at least 1 head?
4.2. An urn contains 12 balls, of which 4 are white. Five balls are drawn and
laid aside (not replaced in the urn), their color unnoted.
(i) Another ball is drawn. What is the probability that it will be white?
(ii) A sample of size 2 is drawn. What is the probability that it will contain
exactly one white ball?
(iii) What is the conditional probability that it will contain exactly 2 white
balls, given that it contains at least 1 white ball.
4.3. In the milk section of a self-service market there are 150 quarts, 100 of
which are fresh, and 50 of which are a day old.
(i) If 2 quarts are selected, what is the probability that both will be fresh?
(ii) Suppose that the 2 quarts are selected after 50 quarts have been removed
from the section. What is the probability that both will be fresh?
(iii) What is the conditional probability that both will be fresh, given that
at least 1 of them is fresh?
4.4. The student body of a certain college is composed of 60% men and 40%
women. The following proportions of the students smoke cigarettes:
40% of the men and 60% of the women. What is the probability that a
student who is a cigarette smoker is a man? A woman?
66 BASIC PROBABILITY THEORY CH.2
4.5. Consider two events A and B such that P[A] = t, P[B I A] = t, P[A I B] =
t. For each of the following 4 statements, state whether it is true or false:
(i) The events A and B are mutually exclusive, (ii) A is a subevent of B,
(iii) p[Ae I Be] = !; (iv) P[A I B] + P[A I Be] = l.
4.6. Consider. an urn containing 12 balls, of which 8 are white. Let a sample
of size 4 be drawn with replacement (without replacement). What is the
conditional probability that the first ball drawn will be white, given that
the sample contained exactly (i) 2 white balls, (ii) 3 white balls?
4.7. Consider an urn containing 6 balls, of which 4 are white. Let a sample
of size 3 be drawn with replacement (without replacement). Let A denote
the event that the sample contains exactly 2 white balls, and let B denote
the event that the ball drawn on the third draw is white. Verify numeri-
cally that (4.5) holds in this case.
4.8. Consider an urn containing 12 balls, of which 8 are white. Let a sample
of size 4 be drawn with replacement (without replacement). What is the
conditional probability that the second and third balls drawn will be
white, given that the sample contains exactly three white balls?
4.9. Consider 3 urns; urn I contains 2 white and 4 red balls, urn II contains
8 white and 4 red balls, urn III contains 1 white and 3 red balls. One
ball is selected from each urn. What is the probability that the ball
selected from urn II will be white, given that the sample drawn contains
exactly 2 white balls?
4.10. Consider an urn in which 4 balls have been placed by the following scheme.
A fair coin is tossed; if the coin falls heads, a white ball is placed in the
urn, and if the coin falls tails, a red ball is placed in the urn.
(i) What is the probability that the urn will contain exactly 3 white balls?
(ii) What is the probability that tl).e urn will contain exactly 3 white balls,
given that the first ball placed in the urn was white?
4.11. A man tosses 2 fair dice. What is the (conditional) probability that the
sum of the 2 dice will be 7,given that (i) the sum is odd, (ii) the sum is
greater than 6, (iii) the outcome of the first die was odd, (iv) the outcome
of the second die was even, (v) the outcome of at least 1 of the dice was
odd, (vi) the 2 dice had the same outcomes, (vii) the 2 dice had different
outcomes, (viii) the sum of the 2 dice was 13?
4.12. A man draws a sample of 3 cards one at a time (without replacement)
from a pile of 8 cards, consisting of the 4 aces and the 4 kings in a bridge
deck. What is the (conditional) probability that the sample will contain
at least 2 aces, given that it contains (i) the ace of spades, (ii) at least one
ace? Explain why the answers to (i) and (ii) need not be equal.
4.13. Consider 4 cards, on each of which is marked off a side I and side 2.
On card I, both side I and side 2 are colored red. On card 2, both side I
and side 2 are colored black. On card 3, side I is colored red and side 2
is colored black. On card 4, side 1 is colored black and side 2 is colored
red. A card is chosen at random. What is the (conditional) probability
that if one side of the card selected is red the other side of the card will
be black? What is the (conditional) probability that if side 1 of the card
selected is examined and found to be red side 2 of the card will be black?
Hint: Compare example 4D.
SEC. 5 UNORDERED AND PARTITIONED SAMPLES 67
4.14. A die is loaded in such a way that the probability of a given number
turning up is proportional to that number (for instance, a 4 is twice as
probable as a 2).
(i) What is the probability of rolling a 5, given that an odd number turns
up.
(ii) What is the probability of rolling an even number, given that a number
less than 5 turns up.
(5.1)
SEC. 5 UNORDERED AND PARTITIONED SAMPLES 69
It is readily verified that the value of P[A k ], given by the model of unordered
samples, agrees with the value of P[A k ], given by the model of ordered
samples, in the case of sampling without replacement. However, in the
case of sampling with replacement the probability that an unordered
sample of size n, drawn from an urn containing M balls, of which Mw
are white, will contain exactly k white balls is equal to
(Mw +k k - 1) (M - MJ~ ~ ~ - k - 1)
(5.2)
P[A k ] = (M +nn - 1) ,
which does not agree with the value of P[A k ], given by the model of
ordered samples.
~ Example 5B. Distributing balls among urns (the occupancy problem).
Suppose that we are given M urns, numbered 1 to M, among which we
are to distribute n balls, where n < M. What is the probability that each
of the urns numbered 1 to n will contain exactly 1 ball?
Solution: Let A be the event that each of the urns numbered 1 to n will
contain exactly 1 ball. In order to determine the probability space on
which the event A is defined, we must first make assumptions regarding
(i) the distinguishability of the balls and (ii) the manner in which the
distribution of balls is to be carried out.
If the balls are regarded as being distinguishable (by being labeled with
the numbers 1 to n), then to describe the results of distributing n balls
among the N urns one may write an n-tuple (zl' Zz, ... ,zn), whose jth
component. Zj designates the number of the urn in which ball j was
deposited. If the balls are regarded as being all alike, and therefore
indistinguishable, then to describe the results of distributing n balls
among the N urns one may write a set {zl' ZZ, ••• , zn} of size n, in which
each member Zj represents the number of an urn into which a ball has
been deposited. Thus ordered and unordered samples correspond in the
occupancy problem to distributing distinguishable and indistinguishable balls,
respectively.
Next, in distributing the balls, one mayor may not impose an exclusion
rule to the effect that in distributing the balls one ball at most may be
put into any urn. It is clear that imposing an exclusion rule is equivalent
to choosing the urn numbers (sampling) without replacement, since an
urn may be chosen once at most. If an exclusion rule is not imposed, so
that in any urn one may deposit as many balls as one pleases, then one is
choosing the urn numbers (sampling) with replacement.
Let us now return to the problem of computing P[A]. The size of the
70 BASIC PROBABILITY THEORY CH.2
TABLE SA
(~)
I
I
With Without
exclusion
(M)n
Fermi-Dirac replacement
statistics
Unordered Samples
Ordered samples
samples drawn
(5.5)
P[Aj
n!
= (M)n = en .
1
SEC. 5 UNORDERED AND PARTITIONED SAMPLES 71
Each of the different probability modelsfor occupancy problems, described
in the foregoing, find application in statistical physics. Suppose one seeks
to determine the equilibrium state of a physical system composed of a
very large number n of "particles" ofthe same nature: electrons, protons,
photons, mesons, neutrons, etc. For simplicity, assume that there are M
microscopic states in which each of the particles can be (for example,
there are M energy levels that a particle can occupy). To describe the
macroscopic state of the system, suppose that it suffices to state the M-
tuple (n1' 112' ••• , n M) whose jth component nj is the number of "particles"
in the jth microscopic state. The equilibrium state of the system of particles
is defined as that macroscopic state (n 1, n 2, ... ,nM) with the highest
probability of occurring. To compute the probability of any given
macroscopic state, an assumption must be made as to whether or not
the particles obey the Pauli exclusion principle (which states that there
cannot be more than one particle in any of the microscopic states). If
the indistinguishable particles are assumed to obey the exclusion principle,
then they are said to possess Fermi-Dirac statistics. If the indistinguishable
particles are not required to obey the exclusion principle, then they are
said to possess Bose-Einstein statistics. If the particles are assumed to be
distinguishable and do not obey the exclusion principle, then they are
said to possess Maxwell-Boltzmann statistics. Although physical particles
cannot be considered distinguishable, Maxwell-Boltzmann statistics are
correct as approximations in certain circumstances to Bose-Einstein and
Fermi-Dirac statistics. ....
The probability of various events defined on the general occupancy and
sampling problems are summarized in Table 6A on p. 84.
Partitioned Samples. If we examine certain card games, we may notice
still another type of sampling. We may extract n distinguishable balls (or
cards) from an urn (or deck of cards), which can then be divided into a
number of subsets (in a bridge game, into four hands). More precisely,
we may specify a positive integer r and nonnegative integers k 1, k 2 , ••• , kT,
whose first component is the first subset, second component is the second
subset, .. _ , rth component is the rth subset. We call a sample of the
72 BASIC PROBABILITY THEORY CH.2
7c M - n)'
( k 1 k 2 ... r
The interested reader may desire to consider for himself the theory of
partitions that are unordered, rather than ordered, arrays of subsets.
THEORETICAL EXERCISES
5.2. The number of unordered samples with replacement. Let U(M, n) denote the
number of unordered samples of size n that one may draw, by sampling
with replacement, from an urn containing M distinguishable balls. Show
that U(M, n) = (M +: - 1).
Hint. To prove the assertion, make use of the principle of mathematical
induction. Let pen) be the proposition that, whatever M, U(M, n) =
(M +: - 1). PO) is clearly true, since there are Munordered samples
of size 1. To complete the proof, we must show that Pen) implies Pen + 1).
The following formula is immediately obtained: for any M = 1,2, ... ,
and n = 1, 2, ... :
U(M, n + 1) = U(M, n) + U(M - 1, n) + ... + U(l, n)
To obtain this formula, let the balls be numbered 1 to M. Let each
unordered sample be arranged so that the numbers of the balls in the
sample are in nondecreasing order (as in the example in the text involving
unordered samples of size 3 from an urn containing 4 balls). Then there
are U(M, n) samples of size (n + 1) whose first entry is 1, U(M - 1, n)
samples of size (n + 1) whose first entry is 2, and so on, until there are
U(I, n) whose first entry is M. Now, by the induction hypothesis, U(k, n) =
(k +~ - 1). Consequently, U(k, n) = (~ ~ ~) _ (k ~ ~ 11). We
EXERCISES
5.2. A certain young woman has 3 men friends. She is told by a fortune teller
that she will be married twice and that both her husbands will come from
this group of 3 men. How many possible marital histories can this
woman have? Consider 4 cases. (May she marry the same man twice?
Does the order in which she marries matter?)
5.3. The legitimate theater in New York gives both afternoon and evening
performances on Saturdays. A man comes to New York one Saturday
to attend 2 performances (1 in the afternoon and 1 in the evening) of the
living theater. There are 6 shows that he might consider attending. In
how many ways can he choose 2 shows? Consider 4 cases.
5.4. An urn contains 52 balls, numbered 1 to 52. Let the balls be drawn 1 at
a time and divided among 4 people. Suppose that the balls numbered
1, 11, 31, and 41 are considered "lucky." What is the probability that
(i) each person will have a "lucky" ball, (ii) I person will have all 4 "lucky"
balls?
5.5. A bridge player announces that his hand (of 13 cards) contains (i) an ace
(that is, at least 1 ace), (ii) the ace of hearts. What is the probability that
it will contain another 0ne?
5.6. What is the probability that in a division of a deck of cards into 4 bridge
hands, 1 of the hands will contain (i) 13 cards of the same suit, (ii) 4 aces
and 4 kings, (iii) 3 aces and 3 kings?
5.7. Prove that the probability of South's receiving exactly k aces when a
bridge deck is divided into 4 hands is the same as the probability that a
hand of 13 cards drawn from a bridge deck will contain exactly k aces.
5.8. An urn contains 8 balls numbered 1 to 8. Four balls are drawn without
replacement; suppose x is the second smallest of the 4 numbers drawn.
What is the probability that x = 3?
5.9. A red card is removed from a bridge deck of 52 cards; 13 cards are then
drawn and found to be the same color. Show that the (conditional)
probability that all will be black is equal to i.
5.10. A room contains 10 people who are wearing badges numbered 1 to 10.
What is the probability that if 3 persons are selected at random (i) the
largest (ii) the smallest badge number chosen will be 5?
76 BASIC PROBABILITY THEORY CH.2
5.11. From a pack of 52 cards an even number of cards is drawn. Show that
the probability that half of these cards will be red and half will be black
is
52!
( (26 ) . ( 51 )
!)2 - 1 -;- 2 - 1 .
Hint. Show, and then use (with n = 52), the facts that for any integer n
= (D
k~O
i (~) - m i (-lY'(~)
k~O
= 2n - l.
(6.1)
SEC. 6 OCCURRENCE OF A GIVEN NUMBER OF EVENTS 77
The definition of Sr is usually written
(6.1') Sr = L
{k1 •.•. , k r }
peAk Ak .. , A k
1 2 r
],
Before giving the proof of this theorem, we shall discuss various appHca-
tions of it.
~ Example 6A. The matching problem (case of sampling without replace-
ment). Suppose that we have M urns, numbered 1 to M, and M balls,
numbered 1 to M. Let the balls be inserted randomly in the urns, with one
ball in each urn. If a ball is put into the urn bearing the same number as
the ball, a match is said to have occurred. Show that the probability that
(i) at least one match will occur is
1
(6.5) 1- - + -1 - ... ± - 1 . .:. 1 - e- I = 0.63212
2! 3! M! '
(6.6)
1
"':"-e- 1 for M - m large.
m!
The matching problem may be formulated in a variety of ways. First
78 BASIC· PROBABILITY THEORY CH.2
variation: if M married gentlemen and their wives (in a monogamous
society) draw lots for a dance in such a way that each gentleman is equally
likely to dance with any of the M wives, what is the probability that
exactly m gentlemen will dance with their own wives? Second variation:
if M soldiers who sleep in the same barracks arrive home one evening so
drunk that each soldier chooses at random a bed in which to sleep, what
is the probability that exactly m soldiers will sleep in their own beds?
Thir8 variation: if M letters and M corresponding envelopes are typed by a
tipsy typist and the letters are put into the envelopes in such a way that
each envelope contains just one letter that is equally likely to be anyone
of the M letters, what is the probability that exactly m letters will be
inserted into their corresponding envelopes? Fourth variation: If two
similar decks of M cards (numbered 1 to M) are shuffled and dealt
simultaneously, one card from each deck at a time, what is the probability
that on just m occasions the two cards dealt will bear the same number?
There is a considerable literature on the matching problem that has
particularly interested psychologists. The reader may consult papers by
D. E. Barton, Journal of the Royal Statistical SOciety, Vol. 20 (1958),
pp.73-92, and P. E. Vernon, Psychological Bulletin, Vol. 33 (1936), pp.
149-77, which give many references. Other references may be found in
an editorial note in the American Mathematical Monthly, Vol. 53 (1946),
p. 107. The matching problem was stated and solved by ~he earliest
writers on probability theory. It may be of value to reproduce here the
statement of the matching problem given by De Moivre (Doctrine of
Chances, 1714, Problem 35): "Any number ofletters a, b, c, d, e, f, etc.,
all of them different, being taken promiscuously as it happens; to find the
Probability that some of them shall be found in their places according to
the rank they obtain in the alphabet and that others of them shall at the
same time be displaced."
Solution: To describe the distribution of the balls among the urns, write
an n-tuple (Zl' Z2' ••• , zJ whose jth component zi represents the number of
the ball inserted in the jth urn. For k = 1,2, ... , M the event Ak that a
match will occur in the kth urn may be written Ak = {(Zt, Z2' ••• ,zn):
Zk = k}. It is clear that for any integer r = 1,2, ... , M and any r unequal
(6.9)
(6.10 Ak = {(Zl' Z2' ••• ,zn): for j = 1,2, ... ,n, Zj i= k}.
80 BASIC PROBABILITY THEORY CH.2
It is easy to obtain the probability of the intersection of any number of
the events At> ... , AM' We have
P[A k ] =
M- l)n = (1 -
(~ M
1 )n' k= 1,···,M,
P[A k A k
1 2
] = (I _2)n,
M
kl = I; .. , n,
(6.12) k2 = kl + 1, ... , n,
P[A A
kl k2
... A ]
k,
= (1 _~)n
M ' kl = 1, ..• , n,
(6.13) r = 0, 1,"', M.
Let Em be the event that exactly m of the integers 1 to M will not be found
in the sample. Clearly Bm is the event that exactly m of the events AI'
A 2 , ••• , AM will occur. By (6.2) and (6.13),
(6.14) P[Bm
l= I (_l)r-m(r) (M) (1 _~)n
r~m m r M
_ (M)M-m
- ~(-1)
lc(M-m)(1 -m+k)n
--,
,m k~O k M
which coincides with (6.9).
Other applications of the theorem stated at the beginning of this section
may be found in a paper by J. O. Irwin, "A Unified Derivation of Some
Well-Known Frequency Distributions of Interest in Biometry and Statis-
tics," Journal of the Royal Statistical Society, Series A, Vol. 118 (1955),
pp. 389-404 (including discussion).
The remainder of this section* is concerned with the proof of the
theorem stated at the beginning of the section, Our proof is based on the
method of indicator functions and is the work of M. Loeve. Our proof has
the advantage of being constructive in the sense that it is not merely a
verification that (6.2) is correct but rather obtains (6.2) from first principles.
The method of indicator functions proceeds by interpreting operations
* The remainder of this section may be omitted in a first reading of the book.
SEC. 6 OCCURRENCE OF A GIVEN NUMBER OF EVENTS 81
on events in terms of arithmetic operations. Given an event A, on a sample
description space S, we define its indicator function, denoted by leA), as a
function defined on S, with value at any description s, denoted by l(A; s),
equal to 1 or 0, depending on whether the description s does or does not
belong to A.
The two basic properties of indicator functions, which enable us to
operate with them, are the following.
First, a product of indicator functions can always be replaced by a single
indicator function; more precisely, for any events AI' A 2 , ••• , Am
(6.15)
so that the product of the indicator functions of the sets AI' A 2 , ••• , An is
equaltotheindicatorfunctionoftheintersectionAl> A 2 , . · . , An. Equation
(6.15) is an equation involving functions; strictly speaking, it is a brief
method of expressing the following family of equations: for every
description s
I(AI; s)I(A2; s) ... leAn; s) = I(AIA2 ... An; s).
To prove (6.15), one need note only that I(AIA2 ... An; s) = 0 if and only
if s does not belong to AIA2 ... An. This is so if and only if, for some
j = 1, ... , n, s does not belong to A;, which is equivalent to, for some
j = 1, ... , n, leA;; s) = 0, which is equivalent to the product I(AI; s)· ..
I(A.,.; s) = O.
Second, a sum of indicator functions can in certain circumstances be
replaced by a single indicator function; more precisely, if the events AI'
A 2 , •.• , An are mutually exclusive, then
The proof of (6.16) is left to the reader. One case, in which n = 2 and
A2 = Ale, is of especial importance. Then Al U A2 = S, and leAl U A 2 )
is identically equal to 1. Consequently, we have, for any event A,
In words, E[f(·)] is equal to the sum, over all possible values k of the
functionf('), of the product of the value k and the probability thatf(·)
will assume the value k. In particuJar, if fO is an indicator function, so
that f(·) = leA) for some set A, then EffO] = peA]. Consider now
another function gO, which may be written
N(g)
(6.21) g(') = ~ j/[D;(g)]
j= -N(g)
N(j) N(g)
:L
k= -N(!) j= -N(g)
k :L P[D,,(f) Dlg)]
N(g) N(!)
+ :L j :L P[D,lf)D;(g)]
j=-N(u) k=-N(f)
N(!) N(g)
:L kP[Dif)] + :L jP[D;(g)]
k=-N(j) j=-N(g)
= E[f(·)] + E[gO],
and the proof of (6.22) is complete. By mathematical induction we deter-
mine from (6.22) that, for any n functions, j~('), fl'), ... JnO,
(6.23) EfhO + ... + fnO] = £[/10] + ... + E[l,k)].
Finally, from (6.23), and the fact that E[I(A)] = peA], we obtain the
principle we set out to prove; namely, that, for any events A, AI' ... , An'
if
(6.24) leA) = c1 1(A I) + c 1(A + ... + cnl(An),
2 2)
(M+ If - k- 2) (M-I)
(M_l)n~k
A specified urn will con~ A specified ball will n-k n-k
I tain k balls. where
k<;n
appear k times in the
sample. where k <; n
C) if k = 0, I
Mn (M+:-I) (~)
First urn contains kl
In the sample. the first
balls: second urn con- ball appears kl times;
1 I
[::
k.~' . k ,,)
the second ball appears
if k; <; 1
~
II tains k. balls; ... ; the k, times; ... ; tbe Mtb (k 1
Mth urn contains k M
balls. where k1 + k. + ball appears k M times.
where kl + k. + ... Mn
(M+:-I) (~) for j = I •...• If
... + kM = n
I§
+ kAf = n
N
L <_l)k(N) <M - k)n
Each of N specified urns
III will be occupied. where
N<;M
Each of N specified
bans is contained in the
sample. where N <; M
~ (-I)k(N)(1
k=O k
- ~
M
r (M-N+n-I)
n-N
(M+:-I)
k=O k <M)n
= (~~:) -7 (~)
5:J
tTl
M
L (-llk-m(N)C) (N)(M- m +- n - (N - m) - I) o
k=m k m m n - (N- m) ~
M
IV
Exactly m of N specified
urns will be empty where
Exactly m of N specified
balls are not contained
X(I-~r (M+:-I) :L <_l)k(N)CfM - k)n
k=m k m (M)n
N<;M.m= in the sample where
0.1 ..... N. N<;M.m= N M-m C)(M-: N+ n -I)
O.I •...• N. = (m) k~O (-I)k(N: m) = m M-m-I . (N)(
= In
M - N)
n-N+m -7
(M)
n
(M\n-I)
X(I-m~kr
~
ordered samples
(6.29) P[B.".] = .2
M-m
(_l)k
(m + k) Sk+m.
k=O m
which is the same as (6.2). Equatio'u (6.4) follows immediately from (6.2)
by induction.
THEORETICAL EXERCISES
6.1. Matching problem (case of sampling without replacement). Show that for
j = 1, ... ,M the conditional probability of a match in thejth urn, given
that there are m matches, is m/M. Show, for any 2 unequal integers j and
k, 1 to M, that the conditional probability that the ball number j was
placed in urn number k, given that there are m matches, is equal to
(M - m)(M - m - l)/M(M - 1).
6.2. Matching (case of sampling with replacement). Consider the matching
problem under the assumption that, for j = 1,2, ... , M, the ball inserted
in the jth urn was chosen randomly from all the M balls available (and then
made available as a candidate for insertion in the (j + I)st urn). Show that
the probability of at least 1 match is 1 - [1 - (l/M)]M == 1 - e-1 =
0.63212. Find the probability of exactly m matches.
6.3. A man addresses n envelopes and writes n checks in payment of n bills.
(i) If the n bills are placed at random in the n envelopes, show that the
probability that each bill will be placed in the wrong envelope is
n
L (-l)k(1/k!).
k=2
(ii) If the n bills and n checks are placed at random in the n envelopes, 1 in
each envelope, show that the probability that in no instance will the
n
enclosures be completely correct is .2 (-I)k(n - k) !/(n!k!).
k=O
(iii) In part (ii) the probability that each bill and each check will be in a
wrong envelope is equal to the square of the answer to part (i).
6.4. A sampling (or coupon collecting) problem. Consider an urn that contains
rM balls, for given integers rand M. Suppose that for each integer j,
1 to M, exactly r balls bear the integer j. Find the probability that in a
86 BASIC PROBABILITY THEORY CH.2
sample of size n (in which n ~ M), drawn without replacement from the
urn, exactly m of the integers 1 to M will be missing.
Hint: Sj = (7)[r(M -j)]n!(rM)n'
EXERCISES
Independence
and Dependence
THEORETICAL EXERCISES
1.6. Let A, B, and C be independent events. In terms of P[A], P[B], and P[C],
express, for k = 0, 1,2, 3, (i) P[exactly k of the events A, B, C will occur],
(ii) P[at least k of the events A, B, C will occur], (iii) P[at most k of the
events A, B, C will occur].
EXERCISES
1.4. Consider example lE. Find the probability that (i) both A' and B' will
state that car I stopped suddenly, (ii) neither A' nor C' will state that car I
stopped suddenly, (iii) at least 1 of A', B', and C' will state that car I
stopped suddenly.
1.6. Compute the probabilities asked for in exercise 1.5 under the assumption
that prAll = 0.1, P[Azl = 0.2, P[A 3] = 0.3.
1.7. A manufacturer of sports cars enters n drivers in a race. For i = 1, ... , n
let Ai be the event that the ith driver shows (see exercise 1.5). Assume
that the events AI' ... , An are independent and have equal probability
P[AiJ = p. Show that the probability that exactly k of the drivers will
s h ow IS(n)
. k Pkqn-k f or k -
- "
0 1 ... , n.
94 INDEPENDENCE AND DEPENDENCE CH. 3
1.S. Suppose you have to choose a team of 3 persons to enter a race. The rules
of the race are that a team must consist of 3 people whose respective pro-
babilities PI> P2, Fa of showing must add up to ~; that is, PI + P2 + P3 = t.
What probabilities of showing would you desire the members of your team
to have in order to maximize the probability that at least 1 member of
your team will show? (Assume independence.)
1.9. Let A and B be 2 independent events such that the probability is that t
t
they will occur simultaneously and that neither of them will occur. Find
PEA] and P[B]; are PEA] and P[B] uniquely determined?
1.10. Let A and B be 2 independent events such that the probability is t that they
will occur simultaneously and t that A will occur and B will not occur.
Find PEA] and P[B]; are PEA] and P[B] uniquely determined?
2. INDEPENDENT TRIALS
Let Zl' Z2' ... , Zn be n sample description spaces (which may be alike)
on whose subsets, respectively, are defined probability functions PI'
P 2 , ••• ,Pn • For example, suppose we are drawing, with replacement, a
sample of size n from an urn containing N balls, numbered 1 to N. We
define (for k = 1,2, ... , n) Zk as the sample description space of the
outcome of the kth draw; consequently, Zk = {I, 2, ... 0' N}. If the
descriptions in Zk are assumed to be equally likely, then the probability
function Pk is defined on the events Ck of Zk by Pk[Ck] = N[CJ/N[Zk]'
Now suppose we perform in succession the n random experiments whose
sample description spaces are Zl' Z2, ... , Zn, respectively. The sample
description space S of this series of n random experiments consists of n-
tuples (21 , 2 2, ••• , zn), which may be formed by taking for the first com-
ponent Zl any member of Zl' by taking for the second component Z2 any
member of Z2' and so on, until for the nth component Zn we take any
member of Zn. We introduce a notation to express these facts; we write
S = Z1 <2> Z2 <2> ••• <2> Zm which we read "S is the combinatorial product
of the spaces Zl' Z2' ... ,Zn'" More generally, we define the notion of a
combinatorial product event on S. For any events C1 on Zl' C2 on Z2, and
CnonZn wedefinethecombinatorialproducteventC = C1 <2> C2 <2> ••• <2> C n
as the set of all n-tuples (Z1' Z2' . . . , zn), which can be formed by taking for
the first component 21 any member of C1, for the second component 22 any
member of C2 , and so on, until for the nth component Zn we take any
member of Cn-
We now define a probability fllnction P[-] on the subsets of S. For every
event C on S that is a combinatorial product event, so that C = C1 <2>
C2 <2> ••• <2> C n for some events C1 , C2, ... , Cn' which belong, respectively,
to Zl' Z2' ... , Zm we define
(2.1)
Not every event in S is a combinatorial product event. However, it can
be shown that it is possible to define a unique probability function P[·) on
the events of S in such a way that (2.1) holds for combinatorial product
events.
It may help to clarify the meaning of the foregoing ideas if we consider
the special (but, nevertheless, important) case, in which each sample
description space Zl' Z2' ... ,Zn is finite, of sizes N 1 , N 2 , ••• , N m
respectively. As in section 6 of Chapter 1, we list the descriptions in
Z1' Z2' ... , Zn: for j = 1, ... , n.
Z
J
= {DCil D(j) ... D(i)}
l' 2' '1.V j •
(2.3)
Equation (2.4) follows from the fact that an event Ale depends on the kth
trial if and only if the decision as to whether or not a description
(zv Z2' •.• , zn) belongs to Ak depends only on the kth component Zk of the
description. Next, let AI' A 2 , . . • , An be events depending, respectively,
on the first, second, ... , nth trial. For each Ak we have a representation of
the form of (2.4). We next assert that the intersection may be written as a
combinatorial product event:
(2.5)
98 INDEPENDENCE AND DEPENDENCE CH. 3
We leave the verification of (2.5), which requires only a little thought, to
the reader. Now, from (2.l) and (2.5)
(2.6)
= Pk[Ck]
From (2.6) and (2.7) it is seen that (l.11) is satisfied, so that S consists of n
independent trials.
The foregoing considerations are not only sufficient to define a proba-
bility space that consists of independent trials but are also necessary in
the sense of the following theorem, which we state without proof. Let the
sample description space S be a combinatorial product ofn sample description
spaces ZI' Z2' ... ,Zn' Let P[-J be a probability fimction defined on the
subsets of S. The probability space S consists of 11 independent trials if and
only if there exist probability functions Pl[-], P2 [-], ••• ,Pn (-], defined,
respectively, on the subsets of the sample description spaces Zl' Z2' ... , Z'"
with respect to which P[·] satisfies (2.6)for every set o[n events AI' A 2 , ••• ,
An on S such that, for k = 1, ... , n, Ale depends only on the kth trial (and
then Ck is defined by (2.4)).
To illustrate the foregoing considerations, we consider the following
example.
~ Example 2D. A man tosses two fair coins independently. Let C1 be
the event that the first coin tossed is a head, let C2 be the event that the
second coin tossed is a head, and let C be the event that both coins tossed
are heads. Consider sample description spaces: S = {(H, H), (H, T),
(T, H), (T, T)}, ZI = Z2 = {H, T}. Clearly S is the sample description
space of the outcome of the two tosses, whereas ZI and Z2 are the sample
description spaces of the outcome ofthe first and second tosses, respectively.
We assume that each of these sample description spaces has equally
likely descriptions.
The event C1 may be defined on either S or ZI' If defined on ZI' C1 =
{H}. If defined on S, C1 = {(H, H), (H, T)}. The event C2 may in a
similar manner be defined on either Z2 or S. However, the event C can be
defined only on S; C = {(H, H)}.
The spaces on which C1 and C2 are defined determines the relation that
exists between C1 , C2 , and C. If both C1 and C2 are defined on S, then
C = C1 C2 • If C1 and C2 are defined on ZI and Z2' respectively, then C =
C1 0 C 2 •
SEC. 2 INDEPENDENT TRIALS 99
In order to speak of the independence of C1 and C2 , we must regard them
as being defined on the same sample description space. That C1 and C2 are
independent events is intuitively clear, since S consists of two independent
trials and C1 depends on the first trial, whereas C2 depends on the second
trial. Events can be independent without depending on independent trials.
For example, consider the event D = {(H, H), (T, T)} that the two tosses
have the same outcome. One may verify that D and CI are independent
and also that D and C2 are independent. On the other hand, the events D,
CI , and C2 are not independent. .....
EXERCISES
2.1. Consider a man who has made 2 tosses of a die. State whether each of the
following six statements is true or false.
Let Al be the event that the outcome of the first throw is a 1 or a 2.
Statement I: Al depends on the first throw.
Let A2 be the event that the outcome of the second throw is a I or a 2.
Statement 2: Al and A2 are mutually exclusive events.
Let BI be the event that the sum of the outcomes is 7.
Statement 3: BI depends on the first throw.
Let B2 be the event that the sum of the outcomes is 3.
Statement 4: BI and B2 are mutually exclusive events.
Let C be the event that one of the outcomes is a 1 and the other is a 2.
Statement 5: Al u A2 is a subevent of C.
Statement 6: C is a subevent of B 2 •
2.2. Consider a man who has made 2 tosses of a coin. He assumes that the
possible outcomes of the experiment, together with their probability, are
given by the following table:
Sample Descriptions D (H,H) (H, T) (T,H) (T, T)
P[{D}] 1.
6
Show that this probability space does not consist of 2 independent trials.
Is there a unique probability function that must be assigned on the subsets
of the foregoing sample description space in order that it consist of 2
independent trials?
2.3. Consider 3 urns; urn I contains 1 white and 2 black balls, urn II contains
3 white and 2 black balls, and urn III contains 2 white and 3 black balls.
One ball is drawn from each urn. What is the probability that among the
balls drawn there will be (i) 1 white and 2 black balls, (ii) at least 2 black
balls, (iii) more black than white balls?
2.4. If you had to construct a mathematical model for events A and B, as
described below, would it be appropriate to assume that A and Bare
independent? Explain the reasons for your opinion.
100 INDEPENDENCE AND DEPENDENCE CH. 3
(i) A is the event that a subscriber to a certain magazine owns a car, and B
is the event that the same subscriber is listed in the telephone directory.
(ii) A is the event that a married man has blue eyes, and B is the event that
his wife has blue eyes.
(iii) A is the event that a man aged 21 is mor:e than 6 feet tall, and B is the
event that the same man weighs less than 150 pounds.
(iv) A is the event that a man lives in the Northern Hemisphere, and B is
the event that he lives in the Western Hemisphere.
(v) A is the event that it will rain tomorrow, and B is the event that it will
rain within the next week.
2.5. Explain the meaning of the following statements:
(i) A random phenomenon consists of n trials.
(ii) In drawing a sample of size n, one is performing n trials.
(iii) An event A depends on the third trial.
(iv) The event that the third ball drawn is white depends on the third trial.
(v) In drawing with replacement a sample of size 6, one is performing 6
independent trials of an experiment.
(vi) If S is the sample description space of the experiment of drawing with
replacement a sample of size 6 from an urn containing balls, numbered 1
to 10, then S = Zl @ Z2 @ ... @ Z6' in which Z; = {l, 2, ... ,1O} for
j = 1, .... 6.
(vii) If, in (vi), balls numbered 1 to 7 are white and if A is the event that all
balls drawn are white, then A = C1 @ C 2 @ .•• @ C 6 , in which C j =
{I, 2, ... , 7} for j = 1, ... , 6.
.. Example 3A. Suppose that a man tosses ten times a possibly unfair
coin, whose probability of falling heads is p, which may be any number
°
between and 1, inclusive, depending on the construction of the coin. On
each trial a success s is said to have occurred if the coin falls heads. Let
us find the probability of the event A that the coin will fall heads on the
first four tosses and tails on the last six tosses, assuming that the tosses are
independent. It is equal to p4q6, since the event A is the same as the single-
member event {(s, s, s, s,J,J,J,J,J,f)}. ....
* A reader who has omitted the preceding section may take this rule as the definition
of Il independent repeated Bernoulli trials.
102 INDEPENDENCE AND DEPENDENCE CH. 3
happen in as many ways as k letters s may be distributed among n places;
this is the same as the number of subsets of size k that may be formed from
a set containing n members. Consequently, there arc (Z) descriptions
containing exact1y k successes and n - k failures. Each such description
has probability pl.'qn-k. Thus we have obtained a basic formula.
The Binomial Law. The probability, denoted by b(k;n,p), that n
independent repeated Bernoulli trials, with probabilities p for success, and
q = 1 - P for failure, will result in k successes and n - k failures (in
which k = 0, 1, ... , n) is given by
(3.3)
since p + q = 1.
The reader should note that (3.2) is very similar to (3.4) of Chapter 2.
However, (3.2) represents the solution to a probability problem that does
not involve equally likely descriptions. The importance of this fact is
illustrated by the following example. Suppose one is throwing darts at a
target. It is difficult to see how one could compute the probability of the
event E that one will hit the target by setting up some appropriate sample
description space with equally likely descriptions. Rather, p rriay have to
be estimated approximately by means of the frequency definition of
probability. Nevertheless, even though p cannot be computed, once one
has assumed a value for p one can compute by the methods of this section
the probability of any event A that can be expressed in terms of independent
trials of the event E.
The reader should also note that (3.2) is very similar to (1.13). By means
of the considerations of section 2, it can be seen that (3.2) and (1.13) are
equivalent formulations of the same law.
The binomial law, and consequently the quantity b(k; n,p), occurs
frequently in applications of probability theory. The quantities b(k; n, p),
k = 0, 1, ... ,11, are tabulated for p = 0.01 (0.01) 0.50 and 11 = 2(1) 49
(that is, for all values of p and n in the ranges p = 0.01, 0.02, 0.03, ... ,
0.50 and n = 2, 3, 4, ... , 49) in "Tables of the Binomial Probability
Distribution," National Bureau of Standards, Applied Mathematics Series
6, Washington, 1950. A short table of b(k; n,p) for various values of p
between 0.01 and 0.5 and for 11 = 2, 3, ... , 10 is given in Table II on
SEC. 3 INDEPENDENT BERNOULLI TRIALS 103
p.442. It should be noted that values of b(k; n, p) for p > 0.5 can be
obtained from Table 11 by means of the formula
(3.4) b(k; n,p) = ben - k; n, 1 - p).
~ Example 3B. By a series of tests of a certain type of electrical relay,
it has been determined that in approximately 5 % of the trials the relay will
fail to operate under certain specified conditions. What is the probability
that in ten trials made under these conditions the relay will fail to operate
one or more times?
Solution: To describe the results of the ten trials, we write a lO-tuple
(Zl' Z2' ••• , ZlO) whose kth component Zk = S or f, depending on whether
the relay did or did not operate on the kth trial. We next assume that the
ten trials constitute ten independent repeated Bernoulli trials, with
probability of success p = 0.95 at each trial. The probability of no failures
inthetentrialsisb(lO; 10,0.95) = (0.95)1° = b(O; 10,0.05). Consequently,
the probability of one or more failures in the ten trials is equal to
1 - (0.95)10 = 1 - b(O; 10,0.05) = 1 - 0.5987 = 0.4013. ....
~ Example 3C. How to tell skill from luck. A rather famous personage
in statistical circles is the tea-tasting lady whose claims have been discussed
by such outstanding scholars as R. A. Fisher and J. Neyman; see J.
Neyman, First Course in Probability and Statistics, Henry Holt, New York,
1950, pp. 272-289. "A Lady declares that by tasting a cup of tea made
with milk she can discriminate whether the milk or the tea infusion was
first added to the cup." Specifically, the lady's claim is "not that she could
draw the distinction with invariable certainty, but that, though sometimes
mistaken, she would be right more often than not." To test the lady's
claim, she will be subjected to an experiment. She will be required to
taste and classify n pairs of cups of tea, each pair containing one cup of
tea made by each of the two methods under consideration. Let p be the
probability that the lady will correctly classify a pair of cups. Assuming
that the n pairs of cups are classified under independent and identical
conditions, the probability that the lady will correctly classify k of the n
pairs is (~) l'qn-k. Suppose that it is decided to grant the lady's claims if
she correctly classifies at least eight of ten pairs of cups. Let pep) be the
probability of granting the lady's claims, given that her true probability of
classifying a pair of cups is p. Then pep) = C80)psq2 + e90 ) p9q + plO,
since Pcp) is equal to the probability that the lady will correctly classify at
least eight of ten pairs. In particular, the probability that the lady will
establish her claim, given that she is skillful (say, p = 0.85) is given by
104 INDEPENDENCE AND DEPENDENCE CH. 3
P(0.85) = 0.820, whereas the probability that the lady will establish her
claim, given that she is merely lucky (that is,p = 0.50) is given by P(0.50) =
0.055. ....
~ Example 3D. The game of "odd man out". Let N distinguishable coins
be tossed simultaneously and independently, where N > 3. Suppose that
each coin has probability p of faIling heads. What is the probability that
either exactly one of the coins will fall heads or that exactly one of the coins
will fall tails?
Application: In a game, which we shall call "odd man out," N persons
toss coins to determine one person who will buy refreshments for the
group. If there is a person in the group whose outcome (be it heads or
tails) is not the same as that of any other member of the group, then that
person is called an odd man and must buy refreshment for each member of
the group. The probability asked for in this example is the probability
that in any play of the game there will be an odd man. The next example is
concerned with how many plays of the game will be required to determine
an odd man.
Solution: To describe the results of the N tosses, we write an N-tuple
(zl' z2' ... , z,,,) whose kth component is s or/, depending on whether the
kth coin tossed fell heads or tails. We are then considering N independent
repeated Bernoulli trials, with probability p of success at each trial. The
probability of exactly one success is (~) pqN -1, whereas the probability of
exactly one failure is (N ':...1) VV -l q. Consequently, the probability that
either exactly one of the coins will fall heads or exactly one of the coins
will fall tails is equal to N(pN - l q + pN -1). If the coins are fair, so that
p = !, then the probability is Nj2 N - 1 • Thus, if five persons play the game
of "odd man out" with fair coins, the probability that in any play of the
game there will be a loser is fe. ....
~ Example 3E. The duration of the game of "odd man out". Let N persons
play the game of "odd man out" with fair coins. What is the probability
for n = 1,2, ... that n plays will be required to conclude the game (that
is, the nth play is the first play in which one of the players will have an
outcome on his coin toss different from those of all the other players)?
Solution: Let us rephrase the problem. (See theoretical exercise 3.3.)
Suppose that n independent plays are made of the game of "odd man out."
What is the probability that on the nth play, but not on any preceding play,
there will be an odd man? Let P be the probability that on any play there
will be an odd man. In example 3D it was shown that P = Nj2 N - 1 if N
persons are tossing fair coins. Let Q = I - P. To describe the results of
SEC. 3 INDEPENDENT BERNOULLI TRIALS 105
n plays, we write an n-tuple (zl' zz, ... , zn) whose kth component is s or f,
depending on whether the kth play does or does not result in an odd man.
Assuming that the plays are independent, the n plays thus constitute
repeated independent Bernoulli trials with probability P = N/2"" -1 of
success at each trial. Consequently, the event {(f,f, ... ,f, s)} of failure
at all trials out the nth has probability Q'HP. Thus, if five persons toss
fair coins, the probability that four tosses will be required to produce an
odd man is (11/16)3(5/16). .....
Various approximations that exist for computing the binomial proba-
bilities are discussed in section 2 of Chapter 6. We now briefly indicate
the nature of one of these approximations, namely, that of the binomial
probability law by the Poisson pro.bability law.
The Poisson Law. A random phenomenon whose sample description
°
space S consists of all the integers from onward, so that S = {O, 1,2, ... },
and on whose subsets a probability function P[·] is defined in terms of a
parameter A > by ° Ale
(3.5) P[{k}] = e-J. k! ' k = 0, 1,2, ...
is said to obey the Poisson probability law with parameter A.. Examples of
random phenomena that obey the Poisson probability law are given in
section 3 of Chapter 6. For the present, let us show that under certain
circumstances the number of successes in n independent repeated Bernoulli
trials, with probability of success p at each trial, approximately obeys the
Poisson probability law with parameter A = np.
More precisely, we show that for any fixed k = 0, 1, 2, ... , and A > °
(3.6) lim
n~oo
(kn). (A)-n k (
1 - -A) n-le
n
Ale .
= e-J. -.
k.
To prove (3.6), we need only rewrite its left-hand side:
1 7'(
-A 1 - -
A.) n-k n(n - 1) ... (n - k + 1) .
k! n nk
Since lim [1 - (J,/nW = e--', we obtain (3.6).
n__ 00
Since (3.6) holds in the limit, we may write that it is approximately true
for large values of n that
We shall Dot consider here the remainder terms for the determination of the
106 INDEPENDENCE AND DEPENDENCE CH. 3
To prove (3.12), one must note only that the number of descriptions in
S, which contain k1s1'S, k 2s2' S, ••• , krs/ s, is equal to the number of ways a
set of size n can be partitioned into r ordered subsets of sizes kl' k2' ... , kr'
respectively, which is equal to (klk2~' . kJ. Each of these descriptions
THEORETICAL EXERCISES
(3.13)
(:i (m:~)n)(~r + (J!.)'"
r~O m +r q
3.3. Suppose one performed a sequence of independent Bernoulli trials (in
which the probability of success at each trial is p) until the first success
occurs. Show for any integer n = 1,2, ... that the probability that n
will be the number of trials required to achieve the first success is pqn-l.
Note: Strictly speaking, this problem should be rephrased as follows.
SEC. 3 INDEPENDENT BERNOULLI TRIALS 109
Consider n independent Bernoulli trials, with probability p for success
on any trial. What is the probability that the nth trial will be the first
trial on which a success occurs? To show that the problem originally
stated is equivalent to the reformulated problem requires the consideration
of the theory of a countably infinite number of independent repeated
Bernoulli trials; this is beyond the scope of this book.
3.4. The behavior of the binomial probabilities. Show that, as k goes from 0
to n, the terms b(k; n, p) increase monotonically, then decrease monotoni-
cally, reaching their largest value (i) in the case that (n + l)p is not an
integer, when k is equal to the integer 111 satisfying the inequalities
(3.14) (n + l)p - 1 < m < (n + I)p
and (ii) in the case (n + 1)p
is an integer, when k is equal to either
(n+ I)p - 1 or (n + l)p. Hint: Use the fact that
(3.15) b(k;n,p) = (n - k + l)p = 1 + (n + I)p -k
b(k - 1 ; n, p) kq kq'
3.5. Consider a series of n independent repeated Bernoulli trials at which the
probability of success at each trial is p. Show that in order to have two
successive integers, kl and k 2• between 0 and n, such that the probability of
kl successes in the n trials will be equal to the prObability of k2 successes in
the n trials, it is necessary and sufficient that (n + I)p be an integer.
3.6. Show that the probability [denoted by per + I), say] of at least (r + 1)
successes in (n + I) independent repeated Bernoulli trials, with proba-
bility p of success at each trial, is equal to
EXERCISES
3.1. Assuming that each child has probability 0.51 of being a boy, find the
probability that a family of 4 children will have (i) exactly I boy, (ii)
exactly 1 girl, (iii) at least one boy, (iv) at least 1 girl.
3.2. Find the number of children a couple should have in order that the
probability of their having at least 2 boys will be greater than 0.75.
3.3. Assuming that each dart has probability 0.20 of hitting its target, find the
probability that if one throws 5 darts at a target one will score (i) no hits,
(ii) exactly 1 hit, (iii) at least 2 hits.
3.4. Assuming that each dart has probability 0.20 of hitting its target, find
the number of darts one should throw at a target in order that the proba-
bility of at least 2 hits will be greater than 0.60.
3.5. Consider a family with 4 children, and assume that each child has proba-
bility 0.51 of being a boy. Find the conditional probability that all the
children will be boys, given that (i) the eldest child is a boy, (ii) at least
1 of the children is a boy.
3.6. Assuming that each dart has probability 0.20 of hitting its target, find
the conditional probability of obtaining 2 hits in 5 throws, given that one
has scored an even number of hits in the 5 throws.
3.7. A certain manufacturing process yields electrical fuses, of which, in the
long run, 15% are defective. Find the probability that in a sample of 10
fuses selected at random there will be (i) no defectives, (ii) at least I
defective, (iii) no more than 1 defective.
3.8. A machine normally makes items of which 5 % are defective. The practice
of the producer is to check the machine every hour by drawing a sample of
size 10, which he inspects. If the sample contains no defectives, he allows
the machine to run for another hour. What is the probability that this
practice willl,ead him to leave the machine alone when in fact it has shifted
to producing items of which 10% are defective?
3.9. (ContinUation of 3.8). How large a sample should be inspected to insure
that if p = 0.10 the probability that the machine will not be stopped is
less than or equal to 0.01 ?
3.10. Consider 3 friends who contract a disease; medical experience has shown
that 10% of people contracting this disease do not recover. What is the
probability that (i) none of the 3 friends will recover, Oi) all of them will
recover?
3.11. Let the probability that a person aged x years will survive I year be
denoted by Px, whereas qx = 1 - Px is the probability that he will die
within a year. Consider a board of directors, consisting of a chairman
and 5 members; all of the members are 60, the chairman is 65. Find the
probability, in terms of q60 and q65, that within a year (i) no members will
SEC. 3 INDEPENDENT BERNOULLI TRIALS 111
die, (ii) not more than 1 member will die, (iii) neither a member nor the
chairman will die, (iv) only the chairman will die. Evaluate these proba-
bilities under the assumption that 960 = 0.025 and q65 = 0.040.
3.12. Consider a young man who is waiting for a young lady, who is late. To
amuse himself while waiting, he decides to take a walk under the following
set of rules. He tosses a coin (which we may assume is fair). If the coin
falls heads, he walks 10 yards north; if the coin falls tails, he walks 10
yards south. He repeats this process every 10 yards and thus executes
what is called a "random walk." What is the probability that after
walking 100 yards he will be (i) back at his starting point, (ii) within 10
yards of his starting point, (iii) exactly 20 yards away from his starting
point.
3.13. Do the preceding exercise under the assumption that the coin tossed by
the young man is unfair and has probability 0.51 of falling heads (proba-
bility 0.49 of falling heads).
3.14. Let 4 persons play the game of "odd man Qut" with fair coins. What is the
probability, for n = 1,2, ... , that n plays will be required to conclude the
game (that is, the nth play is the first play on which I of the players will
have an outcome on his coin toss that is different from those of all the
other players)?
3.15. Consider an experiment that consists of tossing 2 fair dice independently.
Consider a sequence of n repeated independent trials of the experiment.
What is the probability that the nth throw will be the first time that the
sum of the 2 dice is a 7?
3.16. A man wants to open his door; he has 5 keys, only I of which fits the door.
He tries the keys successively, choosing them (i) without replacement,
(ii) with replacement, until he opens the door. For each integer k =
1, 2, ... , find the probability that the kth key tried will be the first to fit
the door.
3.17. A man makes 5 independent throws of a dart at a target. Let p denote
his probability of hitting the target at each throw. Given that he has
made exactly 3 hits in the 5 throws, what is the probability that the first
throw hit the target? Express your answer in terms as simple as you can.
3.18. Consider a loaded die; in 10 independent throws the probability that an
even number will appear 5 times is twice the probability that an even
number will appear 4 times. W:hat is the probability that an even number
will not appear at all in 10 independent throws of the die?
3.19. An accident insurance company finds that 0.001 of the population incurs
a certain kind of accident each year. Assuming that the company has
insured 10,000 persons selected randomly from the population, what is
the probability that not more than 3 of the company's policyholders will
incur· this accident. in a given year?
3.20. A certain airline finds that 4 per cent of the persons making reservations
on a certain flight will not show up for the flight. Consequently, their
pOlicy is to sell to 75 persons reserved seats on a plane that has exactly 73
112 INDEPENDENCE AND DEPENDENCE CH. 3
seats. What is the probability that for every person who shows up for
the flight there will be a seat available?
3.21. Consider a flask containing 1000 cubic centimeters of vaccine drawn
from a vat that contains on the average 5 live viruses in every 1000 cubic
centimeters of vaccine. What is the probability that the flask contains (i)
exactly 5 live viruses, (ii) 5 or more live viruses?
3.22. The items produced by a certain machine may be classified in 4 grades,
A, E, C, and D. It is known that these items are produced in the following
proportions:
Grade A Grade B Grade C Grade D
0.3 0.4 0.2 0.1
What is the probability that there will be exactly 1 item of each grade in a
sample of 4 items, selected at random from the output of the machine?
3.23. A certain door-to-door salesman sells 3 sizes of brushes, which he calls
large, extra large, and giant. He estimates that among the persons he calls
upon the probabilities are 0.4 that he will make no sale, 0.3 that he will
sell a large brush, 0.1 that he will sell an extra large brush, and 0.2 that he
will sell a giant brush. Find the probability that in 4 calls he will sell (i) no
brushes, (ii) 4 large brushes, (iii) at least 1 brush of each kind.
3.24. Consider a man who claims to be able to locate hidden sources of water
by use of a divining rod. To test his claim, he is presented with 10 covered
cans, 1 at a time; he must decide, by means of his divining rod, whether
each can contains water. What is the probability that the diviner will
make at least 7 correct decisions just by chance? Do you think that the
test described in this exercise is fairer than the test described in exercise
2.14 of Chapter 2? Will it make a difference if the diviner knows how
many of the' cans actually contain water?
3.25. In their paper "Testing the claims of a graphologist," Journal of Person-
ality, Vol. 16 (1947), pp. 192-197, G. R. Pascal and B. Suttell describe an
experiment designed to evaluate the ability of a professional graphologist.
The graphologist claimed that she could distinguish the handwriting of
abnormal from that of normal persons. The experimenters selected 10
persons who had been diagnosed as psychotics by at lea.st 2 psychiatrists.
For each of these persons a normal-control person was matched for age,
sex, and. education. Handwriting samples from each pair of persons
were placed in a separate folder and presented to the graphologist, who
was able to identify correctly the sample of the psychotic in 6 of the 10
pairs.
(i) What is the probability that she would have been correct on at least
6 pairs just by chance?
(ii) How many correct judgements would the graphologist need to make
so that the probability of her getting at least that many correct by chance
is 5 % or less?
3.26. Two athletic teams playa series of games; the first team winning 4 games
is the winner. The World Series is an example. Suppose that I of the
SEC. 4 DEPENDENT TRIALS 113
teams is stronger than the other and has probability p of winning each
game, independent of the outcomes of any othPf games. Assume that a
game cannot end in a tie. Show that the probabilities that the series will
end in 4, 5, 6, or 7 games are (i) if P = ~, 0.21, 0.296, 0.274, and 0.22,
respectively, and (ii) if P = t, 0.125, 0.25,0.3125, and 0.3125, respectively.
3.27. Suppose that 9 people, chosen at random, are asked if they favor a certain
proposal. Find the probability that a majority of the persons polled will
favor the proposal, given that 45 % of the population favor the proposal.
3.28. Suppose that (i) 2, (ii) 3 restaurants compete for the same 10 patrons.
Find the number of seats each restaurant should have in order to have a
probability greater than 95 % that it can serve all patrons who come to it
(assuming that all patrons arrive at the same time and choose, indepen-
dently of one another, each restaurant with equal probability).
3.29. A fair die is ~o be thrown 9 times. What is the most probable number of
throws on which the outcome is (i) a 6, (ii) an even number?
4. DEPENDENT TRIALS
P[A I ]
P[A 2 I AI]
P[A a I AI' A 2]
(4.2)
for any events Al in db A2 in .912 , ••• ,An in .91no one has thereby
specified the value of P[A] for any event A on S.
(4.3)
since just before the ith draw there are M - (i - 1) balls in the urn, of
which Mw - (i - 1) are white. Let us assume that (4.3) is valid; more
generally, we assume a knowledge of all the probabilities in (4.2) by means
of the assumption that, whatever the first (i - 1) choices, at the ith draw
each of the remaining M - i + 1 elements will have probability
1/(M - i + 1) of being chosen. Then, from (4.1) it follows that
In making these assumptions (4.6) we have used the fact that the woman has
no family history of hemophilia. A boy usually carries an X chromosome
and a Y chromosome; he has hemophilia if and only if, instead of an X
chromsome, he has an XI chromosome which bears a gene causing
hemophilia. Let m be the probability of mutation of an X chromosome
into an XI chromosome. Now the mother carries two X chromosomes.
Event Al can occur only if at least one of these X chromosomes is a
mutant; this will happen with probability 1 - (l - m)2....:.... 2m, since m2
is much smaller than 2m. Assuming that the woman is a hemophilia
carrier and exactly one of her chromosomes is XI, it follows that her son
will have probability -! of inheriting the XI chromosome.
We are seeking P[Aa I A z]. Now
Thus the conditional probability that the second son of a woman with no
family history of hemophilia will have hemophilia, given that her first son
has hemophilia, is approximately t! .....
A very important use of the notion of conditional probability derives
from the following extension of (4.5). Let C1 , C2 , ••• , en be n events,
each of positive probability, which are mutually exclusive and are also
exhaustive (that is, the union of all the events C1 , C2 , ••• , Cn is equal to
the certain event). Then, for any event B one may express the unconditional
probability P[B] of B in terms ofthe conditional probabilities P[B I Cl]' ... ,
P[B I C,,] and the unconditional probabilities P[CI ] , . . . , P[C n ]:
(4.11)
if
C1 U C2 U ... U C n = S, CtC; = (/) for i =1= j,
P[C;] > o.
Equation (4.11) follows immediately from the relation
(4.12)
and the fact that P[BCt] = P[B I C;]P[C;] for any event Ci.
(4.14) .± 1 (5.)
}=15 ]
(0.2)i(0.S)5-j = (0.2) .± (.~ 1)
}~l J
(0.2)H(0.S)4-{j-l) = 0.2,
(4.15)
~C)= C=~)
and the fact that the last sum in (4.14) is equal to 1 by the binomial
theorem. Combining (4.13) and (4.14), we have PLB] = 0.2. In words, we
have proved that selecting an item randomly from a sample which has been
selected randomly from a larger population is statistically equivalent to
selecting the item from the larger population. Note the fact that P[B] = 0.2
does not imply that the box containing five tubes will always contain one
defective tube.
Let us next consider part (ii) of example 4D. To describe the results of
the experiment that consists in selecting five tubes from the output of the
machine and then selecting two tubes from among the five previously
selected, we write a 7-tuple (Zl' Z2' ••. , Z7)' in which Zs and Z7 denote the
tubes drawn from the box containing the first five tubes selected. Let
Co, ... , C5 and B be defined as before. Let A be the event that the seventh
tube is defective. We seek P[A I B]. Now, if two tubes, each of which
has probability.0.2 of being defective, are drawn independently, the
conditional probability that the second tube will be defective, given that
the first tube is defective, is equal to the unconditional probability that the
second tube will be defective, which is equal to 0.2. We now proceed to
prove that P[A I B] = 0.2. In so doing, we are proving a special case of
the principle that a sample of size 2, drawn without replacement from a
sample of any size whose members are selected independently from a given
population, has statistically the same properties as a sample of size 2 whose
members are selected independently from the population! More general
statements of this principle are given in the theoretical exercises of section
SEC. 4 DEPENDENT TRIALS 119
4, Chapter 4. We prove that peA I B] = 0.2 under the assumption that
P[AB I C1l = (j)2/(5)2 for j = 0, ... ,5. Then, by (4.11),
P[AB] = ±
j=O
(j)2
(5)2 )
(5.) (0.2)1(0.8)5-i
= (0.2)2 ±(.
j=2
3 2) (0.2);-2(0.8)3-(1-2)
J-
= (0.2)2.
has cancer. Let us compute P[C I A], the probability that a person who
according to the test has cancer actually has it. We have
Let us assume that the probability that a person taking the test actually
has cancer is given by P[C] = 0.005. Then
(0.95)(0.005)
(4.1S) P[ C I A] = -:-::(0.. . ".9-,5).. ,-,(00-:.0:-::-
0-=-5)-'--+'----:-:(0--=.0--c
5)--:-(0=----.9=-=9--=:-5)
0.00475
0.00475 + 0.04975 = 0.OS7.
One should carefully consider the meaning of this result. On the one hand,
the cancer diagnostic test is highly reliable, since it will detect cancer in
95 % of the cases in which cancer is present. On the other hand, in only
8.7% of the cases in which the test gives a positive result and asserts cancer
to be present is it actually true that cancer is present! (This example is
continued in exercise 4.8.) ....
~ Example 4F. Prior and posterior probability. Consider an urn that
contains a large number of coins: Not all of the coins are necessarily fair.
Let a coin be chosen randomly from the urn and tossed independently
100 times. Suppose that in the 100 tosses heads appear 55 times. What
is the probability that the coin selected is a fair coin (that is, the proba-
bility that the coin will fall heads at each toss is equal to t)?
Solution: To describe the results of the experiment we write a 10I-tuple
(Zl' Z2' •.• ,Z101)' The components Z2' . . . , Z101 are H or T, depending on
whether the outcome of the respective toss is heads or tails. What are the
possible values that may be assumed by the first component zl? We
assume that there is a set of N numbers, PI> P2' ... ,PN' each between 0
and I, such that any coin in the urn has as its probability of falling heads
some one of the numbers PI' P2' ... ,PN' Having selected a coin from the
urn, we let Zl denote the probability that the coin will fall heads; con-
sequently, Zl is one of the numbers PI' ... ,p.\". Now, for) = 1,2, ... ,N
let Cj be the event that the coin selected has probability Pi of falling heads,
and let B be the event that the coin selected yielded 55 heads in 100 tosses.
Let)o be the number, 1 to N, such that Pjo =i. We are now seeking
P[Cjo I B], the conditional probability that the coin selected is a fair coin,
given that it yielded 55 heads in 100 tosses. In order to use (4.16) to
SEC. 4 DEPENDENT TRIALS 121
evaluate P[CiD I B], we require a knowledge of P[C;] and P[B I C;] for
j = I, ... , N. By the binomial law,
(4.20) P[Cj I B] = v .
o (l/N)j~l C5050) (p;)55(1 - Pi)45
Let us next assume that N = 9, and p; = j/IO for j = 1,2, ... ,9. Then
jo = 5, and
(4.21) P[ C 5 I B] =
j~l 1~50
<) ( )
(j/lO)55[(l0 - j)/1O]45
0.048475
= = 0.496.
0.097664
The probability P[C5 ] = ~ is called the prior (or a priori) probability of
the event C 5 ; the conditional probability P[C 5 I B] = 0.496 is called the
posterior (or a posteriori) probability of the event C 5' The prior probability
is an unconditional probability that is known to us before any observations
are taken. The posterior probability is a conditional probability that is of
interest to us only if it is known that the conditioning event has occurred .
....
Our next example illustrates a controversial use of Bayes's theorem.
~ Example 4G. Laplace's rule of succession. Consider a coin that in n
independent tosses yields k heads. What is the probability that n' sub-
sequent independent tosses wi [l yield k' heads? The problem may also be
phrased in terms of drawing balls from an urn. Consider an urn that
contains white and red balls in unknown proportions. In a sample of size
n, drawn with replacement from the urn, k white balls appear. What is the
122 INDEPENDENCE AND DEPENDENCE CR. 3
probability that a sample of size n' drawn with replacement will contain k'
white balls? A particular case of this problem, in which k = nand k' =
nt, can be interpreted as a simple form of the fundamental problem of
inductive inference if one formulates the problem as follows: if n indepen-
dent trials of an experiment have resulted in success, what is the probability
that n' additional independent trials will result in success? Another
reformulation is this: if the results ofn independent experiments, performed
to test a theory, agree with the theory, what is the probability that n'
additional independent experiments will agree with the theory.
Solution: To describe the results of our observations, we write an
(n + n' + I)-tuple (Zl' z2' ... , Zn+n' +1) in which the components Z2' ... ,
zn+1 describe the outcomes of the coin tosses which have been made and
the components zn+2, ... , zn+n' +1 describe the outcomes of the subsequent
coin tosses. The first component Zl describes the probability that the coin
tossed has of falling heads; we assume that there are N known numbers,
PI' P2' ... ,PN, which 21 can take as its value. We have italicized this
assumption to indicate that it is considered controversial. For1 = I, 2, ... ,
Nlet Cj be the event that the coin tossed has probability Pi of falling heads.
Let B be the event that the coin yields n heads in its first n tosses, and let A
be the event that it yields n' heads in its subsequent n' tosses. We are
seeking P[A I B]. Now
N
(4.22) P[AB] =I P[AB I Ci]P[Cj ]
j~=l
N
=I (pi)n+n'p[c;],
j=l
whereas
N
(4.23) P[B] = I (pj)np[cj].
j=l
Let us now assume that pj is equal to liN and that P[Ci ] = liN. Then
N
(1 IN) I (jl N)n+n'
(4.24) P[A I B] = j=lN
(lIN) I (jIN)n
j=l
THEORETICAL EXERCISES
4.1. An urn contains M balls, of which Mrv are white (where Mw :::: M). Let
a sample of size m (where m :::: MJV) be drawn from the urn with replace-
ment [without replacement] and deposited in an empty urn. Let a sample
of size n (where n :::: m) be drawn from the second urn without replace-
ment. Show that for k = 0, I, ... , n the probability that the second sample
will contain exactly k white balls continues to be given by (3.2) [(3. I)] of
Chapter 2. Tr,e result shows that, as one might expect, drawing a sample of
size n from a sample of larger size is statistically equivalent to drawing a
sample of size n from the urn. An alternate statement of this theorem, and
an outline' of the proof, is given in theoretical exercise 4.1 of Chapter 4.
4.2. Consider a box containing N radio tubes selected at random from the
output of a machin.e; the probability p that an item produced by the
machine is defective is known.
(i) Let k :::: n :::: N be integers. Show that the probability that n tubes
selected at random from the box will have k defectives is given by
(Z)i'qn-k;
(ii) Suppose that m tubes are selected at random from the box and found
to be defective. Show that the probability that n tubes selected at random
from the remaining N - m tubes in the box will contain k defectives is
equal to (Z) pkqn-k.
(iii) Suppose that m + n tubes are selected at random from the box and
tested. You are informed that at least m of the tubes are defective; show
that the probability that exactly m + k tubes are defective, where k is an
integer from 0 to n, is given by (3.13). Express in words the conclusions
implied by this exercise.
pn = apn-l + b, n = 2, 3, .. "
EXERCISES
4.1. Urn I contains 5 white and 7 black balls. Urn II contains 4 white and 2
black balls. Find the probability of drawing a white ball if (i) 1 urn is
selected at random, and a ball is drawn from it, (ii) the 2 urns are emptied
into a third urn from which 1 ball is drawn.
4.2. Urn I contains 5 white and 7 black balls. Urn II contains 4 white and 2
black balls. An urn is selected at random, and a ball is drawn from it.
Given that the ball drawn is white, what is the probability that urn I
was chosen?
4.3. A man draws a ball from an urn containing 4 white and 2 red balls. If the
ball is white, he does not return it to the urn; if the ball is red, he does
return it. He draws another ball. Let A be the event that the first ball
drawn is white, and let B be the event that the second ball drawn is white.
Answer each of the following statements, true or false. (i) PEA] = 1,
(ii) P[B] = t. (iii) P[B I A] = !, (iv) PEA I B] = l~' (v) The events A and
B are mutually exclusive. (vi) The events A and B are independent.
4.4. From an urn containing 6 white and 4 black balls, 5 balls are transferred
into an empty second urn. From it 3 balls are transferred into an empty
box. One ball is drawn from the box; it turns out to be white. What is
the probability that exactly 4 of the balls transferred from the first to the
second urn will be white?
126 INDEPENDENCE AND DEPENDENCE CH. 3
4.5. Consider an urn containing 12 balls, of which 8 are white. Let a sample
of size 4 be drawn with replacement (without replacement). Next, let a
ball be selected randomly from the sample of size 4. Find the probability
that it will be white.
4.6. Urn I contains 6 white and 4 black balls. Urn II contains 2 white and 2
black balls. From urn I 2 balls are transferred tel" urn II. A sample of
size 2 is then drawn without replacement from urn II. What is the
probability that the sample will contain exactly 1 white ball ?
4.7. Consider a box containing 5 radio tubes selected at random from the
output of a machine, which is known to be 20% defective on the average
(that is, the probability that an item produced by the machine will be
defective is 0.2). Suppose that 2 tubes are selected at random from the
box and tested. You are informed that at least 1 of the tubes selected is
defective; what is the probability that both tubes will be defective?
4.8. Let the events A and C be defined as in example 4E. Let PEA I C] =
P[AC Ice] = Rand PEe] = O.OOS. What value must R have in order that
P[C I A] = 0.9S? Interpret your answer.
4.13. A male rat is either doubly dominant (AA) or heterozygous (Aa), owing to
Mendelian properties, the probabilities of either being true is t. The
male rat is bred to a doubly recessive (aa) female. If the male rat is
doubly dominant, the offspring will exhibit the dominant characteristic;
if heterozygous, the offspring will exhibit the dominant characteristic t
of the time and the recessive characteristic t of the time. Suppose all of
3 offspring exhibit the dominant characteristic. What is the probability
that the male is doubly dominant?
4.14. Consider an urn that contains 5 white and 7 black balls. A ball is drawn
and its color is noted. It is then replaced; in addition, 3 balls of the color
drawn are added to the urn. A ball is then drawn from the urn. Find the
probability that (i) the second ball drawn will be black, (ii) both balls
drawn will be black.
4.15. Consider a sample of size 3 drawn in the following manner. One starts
with an urn containing 5 white and 7 red balls. At each trial a ball is
drawn and its color is noted. The ball drawn is then returned to the urn,
together with an additional ball of the same color. Find the probability
that the sample will contain exactly (i) 0 white balls, (ii) 1 white ball,
(iii) 3 white balls.
4.16. A certain kind of nuclear particle splits into 0, I, or 2 new particles (which
to
we call offsprings) with probabilities t, t, and respectively, and then dies.
The individual particles act independently of each other. Given a particle,
let Xl denote the number of its offsprings, let X 2 denote the number of
offsprings of its offsprings, and let Xa denote the number of offsprings of
the offsprings of its offsprings.
(i) Find the probability that X 2 > O.
(ii) Find the conditional probability that Xl = 1, given that X 2 = 1,
(iii) Find the probability that X 3 = O.
4.17. A number, denoted by Xl> is chosen at random from the set of integers
{I, 2, 3, 4}. A second number, denoted by X 2 , is chosen at random from
the set {l, 2, ... , Xl}'
(i) For each int~ger k, 1 to 4, find the conditional probability that
X 2 = 1, given that Xl = k.
(ii) Find the probability that X 2 = 1.
(iii) Find the conditional probability that Xl = 2, given that X 2 = 1.
128 INDEPENDENCE AND DEPENDENCE CH. 3
are independent of k. We then say that the trials are Markov dependent
repeated Bernoulli trials.
By using theoretical exercise 4.5, it follows from (5.6) and (5.9) that for
k = 1,2, ... ,n
1 - P(f,f) ]
(5.10) pis) = [ PI(s) - 2 _ pes, s) _ P(f,f) [pes, s) + P(J,f) - l]k-l
[ 1 - P(f,f) ]
+ 2 - pes, s) - P(f,f) .
1 - pes, s) ]
(5.11) pij) = [ Pl(f) - 2 _ pes, s) _ P(f,f) [pes, s) + P(f,f) - I]k-l
[ 1 - pes, s) ]
+ 2 - pes, s) - P(f,f) .
It is readily verifiable that the expressions in (5.10) and (5.11) sum to one,
as they ought.
In many problems involving Markov dependent repeated Bernoulli
SEC. 5 MARKOV DEPENDENT BERNOULLI TRIALS 131
trials we do not know the probability PI(s) of success at the first trial. We
can only compute the quantities
Pis, s) = conditional probability of success at the
(k + l)st trial, given success at the first trial,
Pk(s,f) = conditional probability of failure at the
(k + l)st trial, given success at the first trial,
(5.12)
Pk(f,f) = conditional probability of failure at the
(k + l)st trial, given failure at the first trial,
Pk(f, s) = conditional probability of success at the
(k + l)st trial, given failure at the first trial.
Since
(5.13)
it suffices to obtain formulas for Pis, s) and PkCf,f).
In the same way that we obtained (5.6) we obtain
k-l [ 1 - P(f,f) ]
x [pes, s) + P ( f,f) - 1]. + 2 _ pes, s) - P(f,f)
+ l- 1 - PCf,f) ]
2 - pes, s) - P(f,f) .
By interchanging the role of sand f, we obtain, similarly,
I - P(f,f) l:
(5.17) Pif,f) = 2 -pes, s) -P(fJ) [pes, s) + PCf,f) - I]
1 - pes, s)
+----~~--~~
2 - pes, s) - P(f,f)
132 INDEPENDENCE AND DEPENDENCE CH. 3
By using (5.13), we obtain
1 - pes, s)
(5.18) Pis,f) = - 2 _ pes, s) _ P(J;j}P(s, s) + P(f,f) - l]k
1 - pes, s)
+2- P (s, s) - P(f,f) ,
( () _ 1 - PCf,f) k
5.19) P k f, s - - 2 _ pes, s) _ P(J,f}P(s, s) + P(f,f) - 1]
+ 1 - P(f,f)
2 - pes, s) - P(f,f)
Equations (5.16) to (5.19) represent the basic conclusions in the theory of
Markov dependent Bernoulli trials (in the case that (5.9) holds).
~ Example SB. Consider a communications system which transmits the
°
digits and 1. Each digit transmitted must pass through several stages,
at each of which there is a probability p that the digit that enters will be
unchanged when it leaves. Suppose that the system consists of three
stages. What is the probability that a digit entering the system as will be °
0) transmitted by the third stage as 0, Oi) transmitted by each stage as °
(that is, never changed from o)? Evaluate these probabilities for p = t.
Solution: In observing the passage of the digit through the communica-
tions system, we are observing a 4-tuple (Zl' Z2' Za, Z4)' whose first com-
ponent ~ is 1 or 0, depending on whether the digit entering the system is
1 or 0. For i = 2, 3, 4 the component Zi is equa~ to 1 or 0, depending on
whether the digit leaving the ith stage is 1 or 0. We now use the foregoing
formulas, identifying s with 1, say, and 0 withf Ol.a basic assumption is
that
(5.20) P(O, 0) = P(l, 1) = p.
The probability that a digit entering the system as
°
by the third stage as is given by
° will be transmitted
1 -P(O,O) 3
(5.21) P 3(0, 0) = 2 _ P(O, 0) _ P(1, dP(O, o) + P(l, 1) - 1]
1 - P(l, I)
+2- P(O, 0) - P(l, 1)
= 1 - P (2p _ l)a + 1- P
2-2p 2-2p
= i[l + (2p - 1)3].
SEC. S MARKOV DEPENDENT BERNOULLI TRIALS 133
If P = t, then P 3(0, 0) = i[l - m
3] = g. The plDbability that a digit
1 - P(1, 1)
+ 2 - P(O,O) - P(l,l)
= (t - t)( -t)3 +t = tf
From (5.21) and (5.23) it follows that the conditional probability that a
digit leaving the system as 0 entered the system as 0 is given by
1. . ll/~1. _ II
3 27 81 - 41'
P(O,O) = P(l, 1) = i.
The persons A, B, C, and D can be regarded as forming a communications
system. We are seeking the conditional probability that the digit entering
the system was 0 (which is equivalent to D being truthful), given that the
digit transmitted by the third stage was 0 (if A affirms that B denies that C
declares that D is a liar, then A is asserting that D is truthful). In view
of example 5C, the required probability is H. ....
Statistical Equilibrium. For large values of k the values of h(s) and
h(f) are approximately given by
I - P(jJ)
h(s) = 2 -pes, s) - P(J,f) ,
(5.24)
. 1 - pes, s)
h(J) = 2 - pes, s) - P(f,f)
To justify (5.24), use (5.10) and (5.11) and the fact that (5.9) implies that
lim [pes, s) + PCf, f) - l]k-1 = o.
k~co
EXERCISES
5.10. Suppose that people in a certain group may be classified into 2 categories
(say, city dwellers and country dwellers; Republicans and Democrats;
Easterners and Westerners; skilled and unskilled workers, and so on).
Let us consider a group of engineers, some of whom are Easterners and
some of whom are Westerners. Suppose that each person has a certain
probability of changing his status: The probability that an Easterner will
become a Westerner is 0.04, whereas the probability that a Westerner will
become an Easterner is 0.01. In the long run, what proportion of the
group will be (i) Easterners, (ii) Westerners, (iii) will move from East to
West in a given year, (iv) will move from West to East in a given year?
Comment on your answers.
6. MARKOV CHAINS
(6.1)
recall that A~!.) is the event that at time Tn the system is in state j.
One may similarly prove
,.
(6.5) P m(i,j) = 2: P(i, k)Pm-1(k,j).
k~l
all
a 2I
a 12 · · · aIr
a 22 · · · a 2r
J
A= [
amI a m2 · · · a mT
r
Pm(l,I) Pm(l,2) ... Pm(1,r)l
Pm(2, 1) P m(2, 2) ... P m(2, r)
l~~(;'-I·)·· ~~(;'-;)·.·.·.·;~(r~~) J
(6.11)
Pm =
P= [
0
i °t iJ
t ,
.} i °
then the chain consists of three states, since P is a 3 x 3 matrix, and the
2-step and 3-step transition probability matrices are given by
p,~pp~ l: 4
9
4
9
1.
9
!l p,~p.p~ Ut !l
;j' ~J
2
"9
~
9 "3
In words, a Markov chain is ergodic if, as 111 tends to 00, the m-step
transition probabilities Pm(i,j) tend to a limit that depends only on the
final state j and not on the initial state i. If a Markov chain is ergodic,
then after a large number of trials it achieves statistical equilibrium in the
sense that the unconditional probabilities Pn(j) tend to limits
(6.14)
which are the same, no matter what the values of the initial unconditional
probabilities PI(j). To see that (6.13) implies (6.14), take the limit of both
r
sides of (6.7) and use the fact that I PI(k) = 1.
k~l
(6.18) forj=I,2,···,r;
(6.19)
is a solution of (6.19) satisfying (6.18). In the long run, the states I, 2, and
3 are equally likely to be the state of the Markov chain. ....
A matrix
an a12 ••• aIr 1
A= ra21 a22 ··· a2r
I: p 0 0 0
~1
l
0 p 0 0
p~ ~ ~ I·
q 0 p 0
(6.23)
0 q 0 p
~J
0 0 q 0
0 0 0 q
0< P(O, l)P(I, 2)P(2, 3)P(3, 4)P(4, 5)P(5, 4)P(4, 3)P(3, 2)P(2, l)P(l, 0).
142 INDEPENDENCE AND DEPENDENCE CH. 3
The chain is ergodic, since P(O, 0) > 0. To find the stationary probabilities
7To, 7TV . • • , 7T S ' we solve the system of equations:
7To = + q7TI
q 7T o
7TI + q172
= p 7T o
172 = p7T I + q l7a
(6.24)
7Ta = P7T2 + q174
7T4 = P7T3 + q 7T 5
7T5 = p7T4 + P175·
We solve these equations by successive substitution.
From the first equation we obtain
or 171 =
P
- 170.
q
By subtracting this result from the second equation in (6.24), we obtain
or
Similarly, we obtain
or
or
or
u o(3) - u o(2)
q[u o(2)
=P - u o(1)] = (q)2
pc
uo(4) - uo(3)
q[u o(3)
=P - u o(2)] = (q)3
Pc
u o(5) - u o(4) = (~)[Uo(4) - uo(3)] = (~rc.
SEC. 6 MARKOV CHAINS 145
Therefore, there is a constant c such that (since uo(O) = 1),
uo(l) = 1+ c
uo(3) = (~r c + G) c + 1 + c
1 + 5c = ° if P =q= t
1+ c(1 - (qjp) 5)' = 0 if P=F q
1 - (q/p)
so that
c =-} ifp = q= t
1 - (qjp)
= - ----::-'-:-'-:-:; if P =F q.
1 - (qjp)5
i
(6.30) uo(i) = 1- -
5
jf p =q= t
= 1 _ 1 - (q!PY
if P =F q.
1 _ (q/p)5
ifp = q = t
= (qjp)3
-'-'-__- (qjp)5
--'--'-c_
if P =F q.
1 - (qjp)5
146 INDEPENDENCE AND DEPENDENCE CH.3
EXERCISES
6.1. Compute the 2-step and 3-step transition probability matrices for the
Markov chains whose transition probability matrices are given by
J_
2
p= I_-t} tt 0]
1
[~
"2
(iii)
o 0
0,
1 "
(iv)
p -
i-
t i]
6.2. For each Markov chain in exercise 6.1, determine whether or not (i) it is
ergodic, (ii) it has absorbing states.
6.3. Find the stationary probabilities for each of the following ergodic Markov
chains:
6.4. Find the stationary probabilities for each of the following ergodic Markov
chains:
t
t] t
[' -']
1
(i) t 4 , (ii)
[: 2
3
1J
3 , (iii) l
<[
.1
4
_L
12
12
Ut t i 1
<[ 0 1
"3 t l~'i
6.5. Consider a series of independent repeated tosses of a coin that has proba-
bility p > 0 of falling heads. Let us say -that at time n we are in state
81' 82, S3, or s. depending on whether outcomes of tosses n - 1 and n were
(H, H), (H, T), (T, H), or (T, T). Find the transition probability matrix
P of this Markov chain. Also find p2, p3, P4.
6.6. Random walk with retaining barriers. Consider a straight line on which
positions 0, 1, 2, ... , 7 are marked off. Consider a man who performs a
random walk among the positions according to the following transition
probability matrix:
q p 0 0 0 0 0 0
q 0 p 0 0 0 0 0
0 q 0 p 0 0 0 0
P= 0 0 q 0 p 0 0 0
0 0 0 q 0 p 0 0
0 0 0 0 q 0 p 0
0 0 0 0 0 q 0 p
0 0 0 0 0 0 q p
Prove that the Markov chain is ergodic. Find the stationary probabilities.
SEC. 6 MARKOV CHAINS 147
6.7. Gambler's ruin. Let two players A and B have 7 cents between them.
Let A toss a coin, which has probability p of falling heads. On each toss
he wins a penny if the coin falls heads and he loses a penny if the coin falls
tails. If A's initial fortune is 3 cents, what is the probability that A's fortune
will be 0 cents before it is 7 cents, and that A will be ruined.
6.8. Consider 2 urns, I and II, each of which contains 1 white and 1 red ball.
One ball is drawn simultaneously from each urn and placed in the other
urn. Let the probabilities that after n repetitions of this procedure urn I
will contain 2 white balls, 1 white and 1 red, or 2 red balls be denoted
by pm qm and rn, respectively. Deduce formulas expressingpn+1' qn+l' and
rn+1 in terms of pn, qm and rn' Show that pm qm and rn tend to limiting
values as n tends to infinity. Interpret these values.
6.9. In exercise 6.8 find the most probable number of red balls in urn I after
(i) 2, (ii) 6 exchanges.
CHAPTER 4
Numerical-Valued
Random Phenomena
(iii) To;? belongs the union U An of any sequence of sets AI' A 2 , ••• ,
A", ... belonging to ;? n~1
If we desire to give a precisc dellnition of the notion of an event at this
stage in our discussion, we may do so as follows. There exists a smallest
family of sets on the real line with the properties (i), (ii), and (iii). This
family is denoted by &.1, and any member of &.J is called a Borel set, after
the great French mathematician and probabilist Emile Borel. Since /?/J is
the smallest family to possess properties (i), (ii), and (iii), it follows that ,1iJ
is contained in ?, the family of probabilizable sets. Thus every Borcl set
is probabilizable. Since the needs of mathematical rigor are fully met by
restricting our discussion to Borel sets, in this book, by an "event"
concerning a numerical-valued random phenomena, we mean a Borel set of
real numbers.
We sum up the discussion of this section in a formal definition.
A numerical-valued random phenomenon is a random phenomenon whose
sample description space is the set R (of all real numbers from - Cf) to Cf))
on whose subsets is defined a function P[·], which to every Borel set of real
numbers (also called an event) E assigns a nonnegative real number,
denoted by prE], according to the following axioms:
AXIOM 1. prE] > 0 for every event E.
AXIOM 2. P[R] = l.
AXIOM 3. For any sequence of events E1> E2 , ••. , En> . .. which is
mutually exclusive,
P [n~IEnJ =,~t[E?1].
~ Example lA. Consider the random phenomenon that consists in
observing the time one has to wait for a bus at a certain downtown bus
stop. Let A be the event that one has to wait between 0 and 2 minutes,
inclusive, and let B be the event that one has to wait between 1 and 3
minutes, inclusive. Assume that P[A] = -~, P[B] = t, P[AB] = t. We can
now answer all the usual questions about the events A and B. The con-
ditional probability P[B I A] that B has occurred given that A has occurred
is l The probability that neither the event A nor the event B has occurred
is given by P[ACBC] = I - P[A U B) = 1 - P[A] - P[B] + P[AB] = t .....
EXERCISE
1.1. Consider the events A and B defined in example lA. Assuming that
P[A] = P[B] = t, P[AB] =}, find the probability for k = 0, 1,2, that
(i) exactly k, (ii) at least k, (iii) no more than k of the events A and B will
occur.
SEC. 2 SPECIFYING THE PROBABILITY FUNCTION 151
((x)
(x)
(x)
Exercise 2.1(i)
x
~
(x)
.-re 22(;)
x
~'dre
x
Exercise 2.1(ii) x
2.2(;;)
{(x)
{(x)
x
x
Exercise 2.1(iii)
{(x) * - , r e 2.2(111)
Exercise 2.3(iii)
.\f! Exercise 2.3(i)
x
=4:::,
{(x) {(x)
x
Exercise 2.3(ii)
{(x) ((x)
C?= x
t:4(1)
Fig. 2A. Graphs of the probability density functions given in the exercises indicated.
x
SEC. 2 SPECIFYING THE PROBABILITY FUNCTION 153
It is necessary that fO satisfy (2.2); in words, the integral offO from - 00
to 00 must be equal to 1.
Afunction fe-) is said to be a probability density function if it satisfies (2.2)
and, in addition, * satisfies the condition
(2.3) f(x) > 0 for all x in R,
since a function f(·) satisfying (2.2) and (2.3) is the probability density
function of a unique probability function P(·], namely the probability
function with value prE] at any event E given by (2.1). Some typical
probability density functions are illustrated in Fig. 2A.
~ Example 2A. Verifying that a function is a probability density function.
Suppose one is told that the time one has to wait for a bus on a certain
street corner is a numerical-valued random phenomenon, with a probability
function, specified by the probability density functionf(·), given by
(2.4) f(x) = 4x - 2x2 - 1 0 < x < 2
= 0 otherwise.
The function fO is negative for various values of x; in particular, it is
negative for 0 < x < ! (prove this statement). Consequently, it is not
possible for f(·) to be a probability density function. Next, suppose that
the probability density function fO is given by
(2.5) f(x) = 4x - 2x2 0 < X < 2
= 0 otherwise.
The function f(·), given by (2.5), is nonnegative (prove this statement).
However, its integral from -00 to 00,
co
f
-00
8
f(x) dx =-,
3
is not equal to 1. Consequently the function f(·), given by (2.5) is not a
probability density function. However, the functionf(-), given by
f(x) = i(4x - 2x2) 0 <x< 2
=0 otherwise,
is a probability density function.
~ Example 2B. Computing probabilities from a probability deBsity
function. Let us consider again the numerical-valued random phenomenon,
discussed in example lA, that consists in observing the time one has to
* For the purposes of this book we also require that a probability density function
f (-) be defined and continuous at all but a finite number of points.
154 NUMERICAL-VALUED RANDOM PHENOMENA CH.4
wait for a bus at a certain bus stop. Let us assume that the probability
function P[·] of this phenomenon may be expressed by (2.1) in terms of the
function f('), whose graph is sketched in Fig. 2B. An algebraic formula for
f(·) can be written as follows:
y = f(x)
From (2.1) it follows that if A = {x: 0 <x< 2} and B = {x: 1 < x < 3}
then
P[A] ,
= Jo f(x) dx = t, P[B] = 11
3
f(x) dx = 1,
P[AB] = f f(x) dx = 1-,
which agree with the values assumed in example lA.
SEC. 2 SPECIFYING THE PROBABILITY FUNCTION 155
~ Example 2C. The lifetime of a vacuum tube. Consider the numerical-
valued random phenomenon that consists in observing the total time a
vacuum tube will burn from the moment it is first put into service. Suppose
that the probability function P[·] of this phenomenon is expressed by (2.1)
in terms of the function fO given by
f(x) = 0 for x < O.
= _1_ e( -x/lOOO) for x >0.
1000
Let E be the event that the tube burns between 100 and 1000 hours,
inclusive, and let F be the event that the tube burns more than 1000 hours.
The events E and F may be represented as subsets of the real line: E =
{x: 100 < x < 1000} and F = {x: 1000 < x}. The probabilities of E and
F are given by
prE] = 11000
100
f(x) dx = --
1
1000
JAlOOO
100
e-(x/lOOO) dx = _e-(x/lOOO)
1000
I
100
= e- O•1 _ e- l = 0.537.
P[E] = i'"
1000
f(x) dx = -I- foo e-(x/lOOO) dx =
1000 1000
_e-(X/lOOO) Ico
1000
= e- l = 0.368.
For many probability functions there exists a function pO, defined for
all real numbers x, but with value p(x) equal to 0 for all x except for a
finite or countably infinite set of values of x at which p(x) is positive, such
that from pO the value of prE] can be obtained for any event E by
summation:
(2.7) prE] = 2: p(x)
over all
points x in E
such that p(x) > 0
In order that the sum in (2.7) may be meaningful, it suffices to impose the
condition [letting E = R in (2.7)] that
(2.8) 1= 2: p(x)
over all
pOints x in R
such thatp(x) >0
p(x)
1/9
1/24
1/30
_ _~~LL~~LL~~LL~~LL~_ _~X
0.3 0.9 1.5 2.1 2.7 3.3 3.9 4.5 5.1 5.7
EXERCISES
1
2.2. (i) f(x) = 2 v;; for 0 < x < 1
=0 elsewhere.
(ii) f(x) = 2x for 0 < x < 1
=0 elsewhere.
(iii) f(x) = Ixl for Ixl :s: I
=0 elsewhere.
* The reader should note the convention used in the exercises of this book. When a
function f (-) is defined by a single analytic expression for all x in - w < x < w, the
fact that x varies between - wand w is not explicitly indicated.
158 NUMERICAL-VALUED RANDOM PHENOMENA CH.4
1
2.3. 0) I(x) = -,=-== for Ixl < 1
7TV 1 - x 2
=0 elsewhere.
2 1
(ii) (x) = -
• 7T
----===
vI - x 2 for 0 < x < 1
=0 elsewhere.
1 1
(iii) I(x) =; 1 + x2
(iv) ( e:l.:) = -_
1 (
1
X2)-1
+-
, 7Tv3 3
2.4. (i) lex) = e-"', x:o:o
=0, x < 0
(ij) I(x) = me- Ixl
eX
(iii) I(x) = e1 +e X )2
2 eX
(iv) I(x) = ; 1 + e2.<
1 U 2
2.5. (i) I(x) = --=e-)-2 X
V27T
1 -li (X-2)2
(ii) ( x) = --= e 2
, 2 V27T
1
(iii) I(x) = - = e- x/2 for x > 0
V27TX
=0 elsewhere.
(iv) I(x) = txe - x/2 for x > 0
=0 elsewhere.
Show that each of the functions pO given in exercises 2.6 and 2.7 is a proba-
bility mass function [by showing that it satisfies (2.8)], and sketch its graph.
Hint: use freely the facts developed in the appendix to this section.
2.6. (i) p(x) =t for x = 0
=i for x = 1
= 0 otherwise.
(ii) _
pex ) - (6)(~)X(!)6-X for x = 0, 1, ... , 6
x 3 3
cr-
=0 otherwise.
1
(iii) p(x) = 3'2 3' for x = 1, 2, ... ,
=0 otherwise.
2'"
(iv) p(x) = e- 2 - for x =0, 1,2,···
x!
=0 otherwise.
SEC. 2 SPECIFYING THE PROBABILITY FUNCTION 159
otherwise.
(ii) for x =0,1,2,'"
otherwise.
(iii) for x = 0, 1, 2, 3, 4, 5, 6
otherwise.
2.S. The amount of bread (in hundreds of pounds) that a certain bakery is
able to sell in a day is found to be a numerical-valued random pheno-
menon, with a probability function specified by the probability density
function /0, given by
lex) = Ax for 0 :::; x<5
= A(lO - x) for 5 :::; x < 10
=0 otherwise.
(i) Find the value of A which makes /0 a probability density function.
(ii) Graph the probability density function.
(iii) What is the probability that the number of pounds of bread that will
be sold tomorrow is (a) more than 500 pounds, (b) less than 500 pounds,
(c) between 250 and 750 pounds?
(iv) Denote, respectively, by A, B, and C, the events that the number of
pounds of bread sold in a day is (a) greater than 500 pounds, (b) less than
500 pounds, (c) between 250 and 750 pounds. Find P[A I B], P[A I C].
Are A and B independent events? Are A and C independent events?
2.9. The length of time (in minutes) that a certain young lady speaks on the
telephone is found to be a random phenomenon, with a probability
function specified by the probability density function/O, given by
lex) = Acx / 5 for x > 0
= 0 otherwise.
(i) Find the value of A that makes /0 a probability density function.
(ii) Graph the probability density function.
(iii) What is the probability that the number of minutes that the young
lady will talk on the telephone is (a) more than 10 minutes, (b) less than
5 minutes, (c) between 5 and 10 minutes?
(iv) For any real number h, let A(b) denote the event that the young lady
talks longer than b minutes. Find P[A(b)]. Show that, for a > 0 and
b > 0, P[A(a + b) I A(a)] = P[A(b)]. In words, the ,conditional proba~
bility that a telephone conversation will last more than a + b minutes,
given that it has lasted at least a minutes, is equal to the unconditional
probability that it will last more than b minutes.
160 NUMERICAL-VALUED RANDOM PHENOMENA CR. 4
2.10. The number of newspapers that a certain newsboy is able to sell in a day
is found to be a numerical-valued random phenomenon, with a probability
function specified by the probability mass function pO, given by
p(x) = Ax for x = 1,2, ... , 50
= A(100 - x) for x = 51, 52, .. ',100
= 0 otherwise.
(i) Find the value of A that makes pO a probability mass function.
(ii) Sketch the probability mass function.
(iii) What is the probability that the number of newspapers that will be
sold tomorrow is (a) more than 50, (b) less than 50, (c) equal to 50,
(d) between 25 and 75, inclusive, (e) an odd number?
(iv) Denote, respectively, by A, B, C, and D, the events that the number
of newspapers sold in a day is (a) greater than 50, (b) less than 50, (c) equal
to 50, (d) between 25 and 75, inclusive. Find P[A I B], P[A I C], P[A I D],
P[ C I D]. Are A and B independent events? Are A and D independent
events? Are C and D independent events?
2.11. The number oftimes that a certain piece of equipment (say, a light switch)
operates before having to be discarded is found to be a random pheno-
menon, with a probability function specified by the probability mass
function pO, given by
p(x) = A(})'" for x = 0, 1,2, ...
=0 otherwise.
(i) Find the value of A which makes pO a probability mass function.
(ii) Sketch the probability mass function.
(iii) What is the probability that the number of times the equipment will
operate before having tobe discarded is (a) greater than 5, (b) an even
°
number (regard as even), (c) an odd number?
(iv) For any real number b, let A(b) denote the event that the number of
times the equipment operates is strictly greater than or equal to b. Find
P[A(b)]. Show that, for any integers a > 0 and b > 0, P[A(a + b) I A(a)] =
P[A(b)]. Express in words the meaning of this formula.
If (2.1) and (2.7) are to be useful expressions for evaluating the proba-
bility of an event, then techniques must be available for evaluating sums
and integrals. The purpose of this appendix is to state some of the notions
and formulas with which the student should become familiar and to collect
some important formulas that the reader should learn to use, even if he
lacks the mathematical background to justify them.
To begin with, let us note the following principle. If a function is defined
by different analytic expressions over various regions, then to evaluate an
SEC. 2 SPECIFYING THE PROBABILITY FUNCTION 161
integral whose integrand is this function one must express the integral as a
sum of integrals corresponding to the different regions of definition of the
function. For example, consider the probability density function fe-)
defined by
f(x) = x for 0 < x < 1
(2.10) =2-x for 1 < x < 2
= 0 elsewhere.
To prove that fe-) is a probability density function, we need to verify that
(2.2) and (2.3) are satisfied. Clearly, (2.3) holds. Next,
= 10 1
f(x) dx + J1
(2
f(x) dx +0
= ~2 r: + (2x _ ~2) I: = ~ + (2 - ~) = 1,
and (2.2) has been shown to hold. It might be noted that the function
f(') in (2.10) can be written somewhat more concisely in terms of the
absolute value notation:
(2.11) f(x) = 1 - 11 - xl for 0 <x< 2
=0 otherwise.
Next, in order to check his command of the basic techniques of integra-
tion, the reader should verify that the following formulas hold:
J (l
eX
+e X )2 dx
-1
= 1 + eX '
J+ 1
eX
e2x dx = tan-1 eX = arc tan eX,
(2.12)
J e-x-e-· dx = J e-e-"'e- x dx = e-e-".
(2.20)
+ 1) -_
r ( n---- 1 . 3 . 5 ... (n - 1) . /
'VTT
2 2n / 2 '
since
(2.21) rm = v:;;:,
We prove (2.21) by showing that r(t) is equal to another integral of
whose value we have need. In (2,15), make the change of variable x = ty2,
and let t = (n + 1)/2. Then, for any integer, n = 0, 1, ... , we have the
formula
(2.22) n
r ( --2--
+ 1) _ 1 Jorco y ne
- 2(n-l)/2
-)1'y'
dy.
1 j~CO _)1'uy2 d _ 1
(2.24) ./_ e y- . / '
V 2TT -00 V U
SEC. 2 SPECIFYING THE PROBABILITY FUNCTION 163
Equation (2.24) may be derived as follows. Let J be the value of the integral
in (2.24). Then J2 is a product of two single integrals. By the theorem for
the evaluation of double integrals, it then follows that
1 J~W roo
(2.25) J2 = 27T -co ~ -co exp [ - ~U(X2 + y2)] dx dy.
We now evaluate the double integral in (2.25) by means of a change of
variables to polar coordinates. Then
k~O k!
in which g(/C)(O) denotes the value at x =
g(x). Letting g(x) = eX, we obtain
° of the kth derivative g(k)(X) of
ctJ Xk x2 xl!
(2.28) eX = 2: I
k~O k.
= 1 + x + I2. + ... + ,n. + ... , -00 < x < 00.
(2.30) (1 - x)" = i
/,;=0
<_l)k(n)X\
k
-00 < x < 00,
164 NUMERICAL-VALUED RANDOM PHENOMENA CH.4
which is a special case of the binomial theorem. One may deduce the
binomial theorem from (2.30) by setting x = (-b)/a.
We obtain an important generalization of the binomial theorem by
taking g(x) = (1 - x)/, in which t is any real number. For any real number
t and any integer k = 1,2, ... define the binomial coefficient
= 1 for k = O.
Note that for any positive number n
By Taylor's theorem, we obtain the important formula for all real numbers
t and -1 < x < 1,
(2.33)
(2.34) (1 - x)-n = I (n + kk -
k=O
1) x\ Ixl < 1.
THEORETICAL EXERCISES
2.1. Show that for any positive real numbers a., fl, and t
(2.38)
+-a. t) -fl
.
2.2. Show for any a > 0 and n = 1, 2, ...
(2.39) L
2 ~royne -'A.{y/G)2 dy = (2a2)<,,+1)/2r ( T+ 1) .
2.3. The integral
(2.40) B(m, n) =
~O
CX"'-lO - x)n-l dx,
which converges if m and n are positive, defines a function of m and n,
called the beta function. Show that the beta function is symmetrical in its
arguments, B(m, n) = B(n, m), and may be expressed [letting x = sin2 e
and x = I/O + y), respectively] by
(,,/2
(2.41) B(m, n) = 2 Jo sin2m- 1 eCOS2n- 1 ede
(00 yn-l
= J0(1--+-=---y-)m~+-n dy.
Show finally that the beta and gamma functions are connected by the
relation
r(m)rCn)
(2.42) B(m, n) = r( ).
m +n
Hint: By changing to polar coordinates, we have
r(m)r(n) = 4 ii 00 00 x2m-le-X'y2n-le-y2 dx dy
=4 1de
0
77/2
COS2m- 1 esin2n- 1 eJo
(00
dre-r2r2m+2n-l.
2.6. Prove that the integral defining the beta function converges for any real
numbers I1l and n, such that I1l > 0 and n > O.
2.7 Taylor's theorem with remainder. Show that if the function gO has a
continuous nth derivative in some interval containing the origin then for
x in this interval
x2 x n- 1
(2.43) g(x) =g(O) +xg'(O) + 2!g"(0) + ... + (n _1)!g(n-1l(0)
+ xl!
(n - 1)!
r1
Jo dt
(l _ t)n-1g (n)(xt).
- Xk ilg(kl(xt)(1 - x k-1 il
t)k-l dt +r(k-1)(xt)(1 - t)I.~2 dt
(k - 1)! 0 (k - 2)! 0"
X k- 1 (k-1)
(k _ I)! g (0).
(2.44) L
o
l 1
g(nl(xt)(l - t)n-1 dt = - g(n)«()x)
n
for some number () in the interval 0 < e< 1.
3. DISTRIBUTION FUNCTIONS
Equation (3.2) follows immediately from (3.1) and (2.7). If the probability
function is specified by a probability density functionf('), then the corre-
sponding distribution function F(·) for any real number x is given by
0.2
-2 -1 0 2
~I
3 4
,
5 6 7 8 x
I
-2 -1
IYI
0 2
I I~I
3 4 5
I
6
I
7
I
8
",
x
~
~
()
>-
t-<
I
,-, I ."
@
0.4+
, 0.4
z
0
I
I
I
s::ttl
I Z
I >-
0.2+ I
I
I
-2 -1 0 2 3 4 5 6 7 8 x x
Fig. 3A. Graph of a discrete distribution function Fe-) and of the Fig. 3D. Graph of a continuous distribution function FO and of ~
probability mass function pO in terms of which F(·) is given by the probability density function [0 in terms of which FO is given .j>,.
(3.2). by (3.3).
SEC. 3 DISTRIBUTION FUNCTIONS 169
F(%,
-----1.0 -------------~
0.9
/
I
I
I
/
0.8
0.7
I
J
0.6 I
I
I
I
I
0.5
0.4
)
0.3
0.2
0.1
I I I ;0 %
o 2 3 4 5
Fig. 3C. Graph of a mixed distribution function.
its distribution function F(·) is given by (3.3). The graph y = F(x) then
appears (Fig. 3B) as an unbroken curve. The function F(·) is continuous.
However, even more is true; the derivative F'(x) exists at all points
(except perhaps for a finite number of points) and is given by
that are neither discrete nor continuous. Such distribution functions are
called mixed. A distribution function F(·) is called mixed if it can be
written as a linear combination of two distribution functions, denoted by
Fd(.) and P(·), which are discrete and continuous, respectively, in the
following way: for any real number x
(3.5)
in which C1 and C2 are constants between 0 and 1, whose sum is one. The
distribution function FO, graphed in Fig. 3C, is mixed, since F(x) =
*Fd(x) + ~P(x), in which F d(.) and PO are the distribution functions
graphed in Fig. 3A and 3B, respectively.
Any numerical valued random phenomenon possesses a probability
mass function pO defined as follows: for any real number x
Thus p(x) represents the probability that the random phenomenon will
have an observed value equal to x. In terms of the representation of the
probability function as a distribution of a unit mass over the real line,
p(x) represents the mass (if any) concentrated at the point x. It may be
shown that p(x) represents the size of the jump at x in the graph of the
distribution function FO of the numerical valued random phenomenon.
Consequently, p(x) = 0 for all x if and only if FO is continuous.
We now introduce the following notation. Given a numerical valued
random phenomenon, we write X to denote the observed value of the
random phenomenon. For any real numbersaandbwewriteP[a:S; X <b]
to mean the probability that an observed value X of the numerical valued
random phenomenon lies in the interval a to b. It is important to keep in
mind that P[a < X < b] represents an informal notation for P[{x: a < x <
b}].
Some writers on probability theory call a number X determined by
the outcome of a random experiment (as is the observed value X of a
numericaL valued random phenomenon) a random variable. In Chapter 7
we give a rigorous definition of the notion of random variable in terms of
the notion of function, and show that the observed value X of a numerical
valued random phenomenon can be regarded as a random variable. For
the present we have the following definition:
A quantity X is said to be a random variable (or, eqUivalently, X is said
to be an observed value of a numerical valued random phenomenon) iffor
every real number x there exists a probability (which we denote by P[X < xl)
that X is less than or equal to x.
Given an observed value X of a numerical valued random phenomenon
SEC. 3 DISTRIBUTION FUNCTIONS 171
with distribution function F(·) and probability mass function p(.), we have
the following formulas for any real numbers a and b (in which a < b):
P[a < X< b] = P[{x: a < x <b}] = F(b) - F(a)
P[a < X< b] = P[{x: a <x< b}] = F(b) - F(a) +
pea)
(3.7) pea < X < b] = P[{x: a <x < b}] = F(b) - Pea) + pea) - pCb)
Pea < X < b] = P[{x: a < x < b}] = F(b) - F(a) - pCb).
To prove (3.7), define the events A, B, C, and D:
A={X<a}, B={X<b}, C={X=a}, D = {X= b}.
Then (3.7) merely expresses the facts that (since A c B, C c A, DeB)
P[BAC] = PCB] - peA]
(3.8)
P[BAG U C] = P(B] - peA]+ P[C]
P[BACDC U C] = PCB] - peA] + P[C] - P[D]
P[BACDC] = PCB] - peA] - P[D].
The use of (3.7) in solving probability problems posed in terms of
distribution functions is illustrated in example 3A.
~ Example 3A. Suppose that the duration in minutes of long distance
telephone calls made from a certain city is found to be a random pheno-
menon, with a probability function specified by the distribution function
F(·), given by
(3.9) F(x) = 0 for x <0
for x >0,
in which the expression [y] is defined for any real number y > 0 as the
largest integer less than or equal to y. What is the probability that the
duration in minutes of a long distance telephone call is (i) more than six
minutes, (ii) less than four minutes, (iii) equal to three minutes? What is
the conditional probability that the duration in minutes of a long distance
telephone call is (iv) less than nine minutes, given that it is more than five
minutes, (v) more than five minutes, given that it is less than nine minutes?
Solution: The distribution function given by (3.9) is neither continuous
nor discrete but mixed. Its graph is given in Fig. 3D. For the sake of
brevity, we write X for the duration in minutes of a telephone call and
P[X> 6] as an abbreviation in mathematical symbols of the verbal
statement "the probability that a telephone call has a duration strictly
greater than six minutes." The intuitive' statement P[ X > 6] is identified
in our model with P[{X/: x' > 6}], the value at the set {x': x' > 6} of the
probability function P[·] corresponding to the distribution function F(·)
given by (3.9). Consequently,
P[X> 6] = I - F(6) = te- 2 + te-[2] = e- 2 = 0.135.
172 NUMERICAL-VALUED RANDOM PHENOMENA CH. 4
Next, the probability that the duration of a call will be less than four minutes
(or, more concisely written, P[X < 4]) is equal to F(4) - p(4), in which
p(4) is the jump in the distribution function Fe-) at x = 4. A glance at the
graph of F('), drawn in Fig. 3D, reveals that the graph is unbroken at
x = 4. Consequently,p(4) = 0, and
P[X < 4] = 1 - te-(~) - te-[~l = 1 - lr(%) - ie-1 = 0.684.
F(x)
1.0
0.8
0.6
0.4
-----
Fig. 3D. Graph of the distribution function given by (3.9).
(iii) at any point x the limit from the right lim F(b), which is defined as the
b~",+
where we define p(x) as the probability that the observed value of the
random phenomenon is equal to x. Note that p(x) represents the size of
the jump in the graph of F(x) at x.
From these facts it follows that the graph y = F(x) of a typical distribu-
tion function FC·) has as its asymptotes the lines y = 0 and y = 1. The
graph is nondecreasing. However, it need not increase at every point but
rather may be level (horizontal) over certain intervals. The graph need
not be unbroken [that is, FO need not be continuous] at all points, but
there is at most a countable infinity of points at which the graph has a
break; at these points it jumps upward and possesses limits from the
right and the left, satisfying (3.12) and (3.13).
The foregoing mathematical properties of the distribution function of a
numerical valued random phenomenon serve to characterize completely
such functions. It may be shown that for any function possessing the first
three properties listed there is a unique set fUllction P[·], defined on the
Borel sets of the real line, satisfying axioms 1-3 of section 1 and the con-
dition that for any finite real numbers a and b, at which a < b,
(3.14) P[{real numbers x: a < x <b}] = F(b) - F(a).
174 NUMERICAL-VALUED RANDOM PHENOMENA CH. 4
From this fact it follows that to specify the probability function it suffices
to specify the distribution function.
The fact that a distribution function is continuous does not imply that
it may be represented in terms of a probability density function by a
formula such as (3.3). If this is the case, it is said to be absolutely continuous.
There also exists another kind of continuous distribution function, called
singular continuous, whose derivative vanishes at almost all points. This
is a somewhat difficult notion to picture, and examples have been con-
structed only by means of fairly involved analytic operations. From a
practical point of view, one may act as if singular distribution functions
do not exist, since examples of these functions are rarely, if ever, encountered
in practice. It may be shown that any distribution function may be
represented in the form
(3.15)
THEORETICAL EXERCISES
3.1. Show that the probability mass function p(.) of a numerical valued random
phenomenon can be positive at no more than a countable infinity of
points. Hint: For n = 2, 3, ... , define En as the set of points x at which
p(x) > (lIn). The size of En is less than n, for if it were greater than n it
would follow that P[Enl > 1. Thus each of the sets En is of finite size.
Now the set E of points x at which p(x) > 0 is equal to the union E2 U
E3 U ... U En U ... , since p(x) > 0 if and only if, for some integer n,
p(x) > (lIn). The set E, being a union of a countable number of sets of
finite size, is therefore proved to have at most a countable infinity of
members.
EXERCISES
lasted more than 5 minutes, (b) less than 9 minutes, given that it has
lasted more than 15 minutes?
3.11. Suppose that the time in minutes that a man has to wait at a certain
subway station for a train is found to be a random phenomenon, with a
probability function specified by the distribution function F('), given by
F(x) =0 for x ::; 0
= tx for 0 ::; x ::; 1
=t for 1 ::; x ::; 2
= !x for 2 ::; x ::; 4
=1 for x ;;::: 4.
(i) Sketch the distribution function.
(ii) Is the distribution function continuous? If so, give a formula for its
probability density function.
(iii) What is the probability that the time the man will have to wait for a
train will be (a) more than 3 minutes, (b) less than 3 minutes, (c) between
1 and 3 minutes?
(iv) What is the conditional probability that the time the man will have to
wait for a train will be (a) more than 3 minutes, given that it is more than
1 minute, (b) less than 3 minutes, given that it is more than 1 minute?
3.12. Consider a numerical valued random phenomenon with distribution
function
F(x) = 0 for x ::; 0
=(!)x forO < x < 1
=1 for 1 ::; x ::; 2
= (1)x for 2 < x::; 3
=i for 3 < x::; 4
= (l)x for 4 < x::; 8
=1 for 8 < x.
What is the conditional probability that the observed value of the random
phenomenon will be between 2 and 5, given that it is between 1 and 6,
inclusive.
4. PROBABILITY LAWS
It should be recalled that [x] denotes the largest integer less than or equal
to x.
Equivalently, since the distribution function is discrete, one may
describe the phenomenon by stating its probability mass function p('),
given by
(4.3) for x = 0, 1, ... , 5
= ° otherwise.
Equations (4.1), (4.2), and (4.3) constitute equivalent representations, or
statements, of the same concept, which we call the probability law of the
random phenomenon. This particular probability law is discrete.
178 NUMERICAL-VALUED RANDOM PHENOMENA CH. 4
We next note that probability laws may be classified into families on the
basis of similar functional form. For example, consider the function
b('; n,p) defined for any n = 1,2, ... and 0 <p< 1 by
= 0 otherwise.
For fixed values of nand p the function b('; n,p) is a probability mass
function and thus defines a probability law. The probability laws deter-
mined by b(-; nI , PI) and b(·; n2 , P2) for two different sets of values nI , P1
and n2, P2 are different. Nevertheless, the common functional form of the
two functions b('; nI , PI) and b('; n2, P2) enables us to treat simultaneously
the two probability laws that they determine. We call nand p parameters,
and b(·; n, p) the probability mass function of the binomial probability
law with parameters nand p.
We next list some frequently occurring discrete probability laws, to be
followed by a list of some frequently occurring continuous probability laws.
The Bernoulli probability law with parameterp, where 0 <p< 1, is
specified by the probability mass function
(4.4) p(x) =P if x = 1
=1-p=q ifx=O
= 0 otherwise.
An example of a numerical valued random phenomena obeying the
Bernoulli probability law with parameter p is the outcome of a Bernoulli
trial in which the probability of success is p, if instead of denoting success
and failure by sand f, we denote them by 1 and 0, respectively.
The binomial probability law with parameters nand p, where n = 1,
2, ... , and 0 <p < 1, is specified by the probability mass function
= 0 otherwise.
An important example of a numerical valued random phenomenon obeying
the binomial probability law with parameters nand p is the number of
successes in n independent repeated Bernoulli trials in which the probability
of success at each trial is p.
The Poisson probability law with parameter A, where A > 0, is specified
by the probability mass function
AX
(4.6) p(x) = e-). - for x = 0, 1, 2, ...
xl
=0 otherwise.
SEC. 4 PROBABILlTY LAWS 179
In section 3 of Chapter 3 it was seen that the Poisson probability law
provides under certain conditions an approximation to the binomial
probability law. In section 3 of Chapter 6 we discuss random phenomena
that obey the Poisson probability Jaw.
The geometric probability law with parameter p, where <p < 1, IS
specified by the probability mass function
°
(4.7) p(x) = pel - p)"H for x = 1,2, ...
= ° otherwise.
An important example of a numerical valued random phenomenon obeying
the geometric probability law with parameter p is the number of trials
required to obtain the first success in a sequence of independent repeated
Bernoulli trials in which the probability of success at each trial is p.
The hypergeometric probability law with parameters N, n, and p (where
N may be any integer 1,2, .. " ,n is an integer in the set 1,2, ... , Nand
p = 0, lIN, 21N, ... , 1) is specified by the probability mass function,
letting q = 1 - p,
= ° otherwise.
The hypergeometric probability law may also be defined by using (2.31),
°
for any value of p in the interval < P < 1. An example of a random
phenomenon obeying the hypergeometric probability law is given by the
number of white balls contained in a sample of size n drawn without
replacement from an urn containing N balls, of which Np are white.
The negative binomial probability law with parameters rand p, where
°
r = 1, 2, ... and < P < 1, is specified by the probability mass function,
letting q = 1 - p,
=0 otherwise.
An example of a random phenomenon obeying the negative binomial
probability law with parameters r andp is the number offailures encountered
in a sequence of independent repeated Bernoulli trials (with probability p
of success at each trial) before the rth success. Note that the number of
trials required to achieve the rth success is equal to r plus the number of
failures encountered before the rth success is met.
180 NUMERICAL-VALUED RANDOM PHENOMENA CH.4
Some important continuous probability laws are the following.
The uniform probability law over the interval a to b, where a and bare
any finite real numbers such that a < b, is specified by the probability
density function
1
(4.10) J(x) = -b- for a <x< b
-a
= ° otherwise.
Examples of random phenomena obeying a uniform probability_,law are
discussed in section 5,
The normal probability law with parameters m and (J, where -w <
m < wand (J > 0, is specified by the probability density function
1 _!-i(x-m)2
(4.11) f(x) = --== e a, -w < x < W.
(J\I27T
(4.14)
f(x) = ( (x _ (1..) 2} , -w<x<w.
7TfJtl + -fJ-
Student's distribution with parameter n = 1,2, ... (also called Student's
t-distribution with n degrees of freedom) is specified by the probability
density function
_ _1_ r[(n + 1)/2] ( x 2 )-(n+I)/2
(4.15) f(x) - Vn7T r(n/2) 1+ n
SEC. 4 PROBABILITY LAWS 181
It should be noted that Student's distribution with parameter n = 1
coincides with the Cauchy probability law with parameters ex. = 0 and
{3=1.
The X2 distribution with parameters n = 1, 2, ... and (J' > 0 is specified
by the probability density function
=0 for x < 0
182 NUMERICAL-VALUED RANDOM PHENOMENA CH. 4
The Maxwell distribution with parameter ex coincides with the X distribu-
tion with parameter n = 3 and (j = exv'3/2.
The F distribution with parameters m = I, 2, ... and n = I, 2, ... is
specified by the probability density function
(4.20)
x _ r[(m +
n)/2] m/2 x< m/2)-1
for x > 0
I( ) - r(m/2)r(n/2) (m/n) [1 + (m/n)x]<m+n)/2
=0 for x < O.
The beta probability law with parameters a and b, in which a and bare
positive real numbers, is specified by the probability density function
I
(4.21) I(x) = --xa-1(I - X)b-l O<x<1
B(a, b)
=0 elsewhere.
THEORETICAL EXERCISES
4.1. The probability law of the number of white balls in a sample drawn without
replacement from an urn of random composition. Consider an urn containing
N balls. Suppose that the number of white balls in the urn is a numerical
valued random phenomenon obeying (i) a binomial probability law with
parameters Nand p, (ii) a hypergeometric probability law with parameters
M, N, and p. [For example, suppose that the balls in the urn constitute a
sample of size N drawn with replacement (without replacement) from a
box containing M balls, of which a proportion p is white.] Let a sample of
size n be drawn without replacement from the urn. Show that the number
of white balls in the sample obeys either a binomial probability law with
parameters nand p, or a hypergeometric probability law with parameters
M, n, and p, depending on whether the number of white balls in the urn
obeys a binomial or a hypergeometric probability law.
Hint: Establish the conditions under which the following statements are
valid:
N) = (N - k) (Nh ;
(m m-k (mh
(~)(~ == ~) .
(~) ,
N
(m)(N - m)
k n - k N-,,+k
(n)(N - n)
k m - k
m~o (~) p(m) = m~k (~) p(rn)
SEC. 4 PROBABILITY LAWS 183
where
EXERCISES
4.1. Give formulas for, and identify. the probability law of each of the following
numerical valued random phenomena:
(i) The number of defectives in a sample of size 20, chosen without replace-
ment from a batch of 200 articles, of which 5 % are defective.
(ii) The number of baby boys in a series of 30 independent births, assuming
the probability at each birth that a boy will be born is 0.51.
(iii) The minimum number of babies a woman must have in order to give
birth to a boy (ignore multiple births, assume independence, and assume
the probability at each birth that a boy will be born is 0.51).
(iv) The number of patients in a group of 35 having a certain disease who
will recover if the long-run frequency of recovery from this disease is 75%
(assume that each patient has an independent chance to recover).
In exercises 4.2--4.9 consider an urn containing 12 balls, numbered 1 to
12. Further. the balls numbered 1 to 8 are white, and the remaining balls
are red. Give a formula for the probability law of the numerical valued
random phenomenon described.
4.2. The number of white balls in a sample of size 6 drawn from the urn without
replacement.
4.3. The number of white balls in a sample. of size 6 drawn from the urn with
replacement.
4.4. The smallest number occurring on the balls in a sample of size 6, drawn
from the urn without replacement (see theoretical exercise 5.1 of Chapter 2.)
4.5. The second sl)1allest number oCGurring in a sample of size 6, drawn from.
the urn without replacement. .
4.6. The minimum number of balls that must be drawn, when sampling without .
replacement, to obtain a white ball.
4.7. The minimum number of balls that must be drawn, when sampling with
replacement, to obtain a white ball.
184 NUMERICAL-VALUED RANDOM PHENOMENA CH.4
4.8. The minimum number of balls that must be drawn, when sampling without
replacement, to obtain 2 white balls.
4.9. The minimum number of balls that must be drawn, when sampling with
replacement, to obtain 2 white balls.
= ° otherwise.
From (5.4) it follows that the definition of a uniform probability law given
by (5.1) coincides with the definition given by (4.10). (See Fig. SA.)
f'
(a) {(x) (b)
2
~
I I
I
I
I
I
I
I
I
I
I
I
0.5 0.5 I
I
I
I
~--~~~~----~~--~----~I~--~x
0 2 3
F(x)
0.5
..... Example SA. Waiting time for a train. Between 7 A.M. and 8 A.M.
trains leave a certain station at 3, 5, 8, 10, 13, 15, 18, 20, ... minutes past
the hour. What is the probability that a person arriving at the station will
186 NUMERICAL-VALUED RANDOM PHENOMENA CH.4
have to wait less than a minute for a train, assuming that the person's time
of arrival at the station obeys a uniform probability law over the interval
of time (i) 7 A.M. to 8 A.M., (ii) 7:15 A.M. to 7:30 A.M., (iii) 7:02 A.M. to
7:15 A.M., (iv) 7: 03 A.M. to 7:15 A.M., (v) 7:04 A.M. to 7:15 A.M.?
Solution: We must first find the set B of 'real numbers in which the
person's arrival must lie in order for his waiting time to be less than 1
minute. One sees that B is the set of real numbers consisting of the intervals
2 to 3, 4 to 5, 7 to 8, 9 to 10, and so on. (See Fig. 5B.) The probability
that the person will wait less than a minute for a train is given by P[B],
which is equal to, in the various cases, (i) it = ~-, (ii) 165 =~, (iii) 163,
~~,M~· ~
,'"I 1111111'11 "'" 11111111 III: ' 1'1 1111111111 111111:11 1111111111 1IIIIIIi 1111111111 1111111111 1111111111
o 10 20 30 40 50 60
~ Example 5B. The probability law of the second digit in the decimal
expansion of the square root of a randomly chosen number. A number is
chosen from the interval 0 to 1 by a random mechanism that obeys a
uniform probability law over the interval. What is the probability that
the second decimal place of the square root of the number will be the
digit 3? Is the digit k for k = 0, 1, ... , 9?
Solution: For k = 0, 1, ... ,9 let Bk be the set of numbers on the unit
interval whose square roots have a second decimal equal to the digit k,
A number x belongs to Bk if and only if V;; satisfies for some m = 0,
1, ... ,9
k . - k+l
m + -10 -< lOYx < m + -10-
or
1 (
100 111 + J.()
k+ 1) - 2 1 (
100 m + 10 =
k) 2 1
10,000 (20m + 2k + 1).
SEC. 5 THE UNIFORM PROBABILITY LAW 187
Hence the probability of the set Bk is given by
1 9
P[BkJ = - - .2 (20m + 2k + 1) = 0.091 + 0.002k.
10,000 ",=0
In particular, P[BaJ = 0.097.
EXERCISES
5.1. The time, measured in minutes, required by a certain man to travel from
his horne to a train station is a random phenomenon obeying a uniform
probability law over the interval 20 to 25. If he leaves his home promptly
at 7 :05 A.M., what is the probability that he will catch a train that leaves the
station promptly at 7 :28 A.M. ?
5.2. A radio station broadcasts the correct time every hour on the hour between
the hours of 6 A.M. and 12 midnight. What is the probability that a listener
will have to wait less than 10 minutes to hear the correct time if the time at
which he tunes in is distributed uniformly over (chosen randomly from) the
interval (i) 6 A.M. to 12 midnight, (ii) 8 A.M. to 6 P.M., (iii) 7 :30 A.M. to
5 :30 P.M., (iv) 7 :30 A.M. to 5 P.M?
5.3. The circumference of a wheel is divided into 37 arcs of equal length, which
are numbered 0 to 36 (this is the principle of construction of a roulette
wheel). The wheel is twirled. After the wheel comes to rest', the point on
the wheel located opposite a certain fixed marker is noted. Assume that
the point thus chosen obeys a uniform probability law over the circum-
ference of the wheel. What is the probability that the point thus chosen will
lie in an arc (i) with a number 1 to 10, inclusive, (ii) with an odd number,
(iii) numbered O?
5.4. A parachutist lands on the line connecting 2 towns, A and B. Suppose
that the point at which he lands obeys a uniform probability law over the
line. What is the probability that the rat.io of his distance from A to his
distance from B will be (i) greater than 3, (ii) equal to 3, (iii) greater than R,
where R is a given real number?
5.5. An angle 8 is chosen from the interval -1T/2 to 7T/2 by a random mechanism
that obeys a uniform probability law over the interval. A line is then drawn
on an (x, y)-plane through the point (0, 1) at the angle 8 with the y-axis.
What is the probability, for any positive number z, that the x-coordinate of
the point at which the line intersects the x-axis will be less than z?
5.6. A number is chosen from the interval 0 to 1 by a random mechanism that
obeys a uniform probability law over the interval. What is the probability
that (i) its first decimal will be a 3, (ii) its second decimal will be a 3, (iii) its
first 2 decimals will be 3's, (iv) any specified decimal will be a 3, (v) any
2 specified decimals will be 3's?
5.7. A number is chosen from the interval 0 to I by a random mechanism that
obeys a uniform probability law over the interval. What is the probability
that (i) the first decimal of its square root will be a 3, (ii) the negative of its
logarithm (to the base e) will be less than 3?
188 NUMERICAL-VALUED RANDOM PHENOMENA CH. 4
(6.2) l1>(x) = Jx
4>(y) dy = A
1 / _
IX e-~1I2 dy.
-00 V 27T -00
cb(x)
0.399
I
I
1
0.242 - - - t -
I I
1 I
1 1
1 I
1 I
1 i
1 I
0.058 --L-t----
: I :
-0.67 0.014 067-t---- 1.96--2.18
-4 -3 I I I
-21 -1 I 0 I 1 ,2 I
I
I
I
L50% of area J I
I
I I
1
I L--68.3% of area-- J I I
L-------95% of area-------..-l I
I 0 I
L---------~%cla~----------..-l
Because of their close relation to normal probability laws cPO is called the
normal density function and 11>('), the normal distribution function. These
functions are graphed in Figs. 6A and 6B, respectively. The graph of cPO
is a symmetric bell-shaped curve. The graph of 11>0 is an S-shaped curve.
It suffices to know these functions for positive x, in order to know them
for all x, in view of the relations (see theoretical exercise 6.3)
(6.3) cP( -x) = cP(x)
(6.4) <P( -x) = 1 - l1>(x).
A table of l1>(x) for positive values of x is given in Table I (see p. 441).
SEC. 6 THE NORMAL DISTRIBUTION AND DENSITY FUNCTIONS 189
The function rfo(x) is positive for all x. Further, from (2.24)
(6.5) L"'",rfo(x) dx = 1,
t(%)
- - - - - - - - - - - ----- - - ---1.0 - - - - -- - ----=-::;;-;;,.;-;:;.::-......- -
0.9
0.8
0.75
0.7
-4 -3
1 fCZ-m)/U
= --= (x - m)
e-!4V' dy = c:I> - -
V27T -'" (f
thenjor any real numbers a and b (finite or infinite, in which a < b),
The most widely available tables of the normal distribution are the
Tables of the Normal Probability Functions, National Bureau of Standards,
Applied Mathematics Series 23, Washington, 1953, which tabulate
Q(x)
1
= y_e- lL 2
72X ,
1
P(x) = ---= IX e- lL 2
72Y dy
27T y27T -x
THEORETICAL EXERCISES
6.1. One of the properties of the normal density functions which make them
convenient to work with mathematically is the following identity. Verify
algebraically that for any real numbers x, m 1 , m2, 111 , and 112 (among which
111 and 112 are positive)
where
(6.10)
6.2. Although it is not possible to obtain an explicit formula for the normal
distribution function (1)(.), in terms of more familiar functions, it is possible
192 NUMERICAL-VALUED RANDOM PHENOMENA CH.4
(6.11)
EXERCISES
(7.1) BX"x. = {2-tuples (Xl', x 2'): Xl' < Xl' x 2' < X2}'
In words, BXl ,X2 is the set consisting of all 2-tuples (Xl', x 2 ') whose first
component Xl' is less than the specified real number Xl and whose second
component x 2' is less than the specified real number X 2 • We arc thus led to
introduce the distribution function F(. , .) of the numerical 2-tuple valued
random phenomenon, which is a function of two variables, defined for all
real numbers Xl and x 2 by the equation
(7.2)
The quantity F(xI' x 2) represents the probability that an observed occurrence
of the random phenomenon under consideration will have as its description
a 2-tuple whose first component is less than or equal to Xl and whose
second component is less than or equal to x 2 . In terms of the unit mass of
probability distributed over the plane of Fig. 7A, F(XI' x 2) is equal to the
weight of the probability substance lying over the "infinitely extended
rectangle," which consists of all 2-tuples (Xl', x 2'), such that Xl' ::::; Xl and
x 2' < x 2, which corresponds to the shaded area in Fig. 7A.
The probability assigned to any rectangle in the plane may also be
expressed in terms of the distribution function F(. , .): for any real numbers
al and a2 and any positive numbers hI and h2
(7.3) P[{(xl ', x2'): al < Xl' + hI' a2 < x2' < a2 + h2 }]
< al
= F(al + hI' a 2 + h2) + F(al , a2 ) - F(al + hI, a2) - F(al , a2 + hJ.
As in the case of numerical valued random phenomena, the most important
SEC. 7 NUMERICAL n-TUPLE VALUED RANDOM PHENOMENA 195
cases of numerical 2-tuple valued random phenomena are those in which
the probability function is specified either by a probability mass function or
a probability density function.
Given a numerical 2-tuple valued random phenomenon, we define its
probability mass function, denoted by pC. , .), as a function of two variables,
defined for all real numbers Xl and X 2 by the equation
a2 ------------
Fig.7A. The set R2 of all 2-tuples (x,', X2') of real numbers, represented as a
2-dimensional plane on which a rectangular coordinate system has been
imposed.
(7.7)
(7.8)
(7.2')
SEC. 7 NUMERICAL n-TUPLE VALUED RANDOM PHENOMENA 197
(7.3') P[{(xl ', x 2', .•• ,xn '): al < xl' < al + hI>
a2 < x 2' < az + h2' ... , an < x n' < an + hn }]
= F(al + hI' a z + h2' ... , an + h n )
- F(al , a 2 + h2' ... , an + h n ) - •••
- F(a l + hI' ... , a n- I + hn - V an)
+ ............................................... .
+ (-l)nF(al , a2 , ••• ,an)'
(7.6')
(7.7')
(7.8')
There are many other notions that arise in connection with numerical
n-tuple valued random phenomena, but they are best formulated in terms
of random variables and consequently are discussed in Chapter 7.
EXERCISES
7.2. An urn contains M balls, numbered I to M. Two balls are drawn, 1 after
the other, with replacement (without replacement). Consider the 2-tuple
valued random phenomenon (:vI' x 2 ), in which :leI is the number on the
first ball drawn, and ~;~ is the number of the second ball drawn. Find the
probability mass function of this 2-tuple valued random phenomenon and
show that its probability law is discrete.
7.3. Consider a square sheet of tin, 20 inches wide, that contains 10 rows and
10 columns of circular holes, each I inch in diameter, with centers evenly
spaced at a distance 2 inches apart.
(i) What is the probability that a particle of sand (considered as a point)
blown against the tin sheet will fall upon I of thc holes and thus pass
through?
(ii) What is the probability that a ball of diameter t inch thrown upon the
sheet will pass through I of the holes without touching the tin sheet?
Assume an appropriate uniform probability law.
CHAPTER 5
(1.1)
_
X =
Xl + X 2 + ... + xn = 1
-
~
L.., Xi'
n n i=l
199
200 MEAN AND VARIANCE OF A PROBABILITY LAW CH.5
The quantity x is also called the arithmetic mean of the numbers Xl' X 2 , ••• ,
Xn •
For example, consider the scores on an examination of a class of 20
students:
(1.2) {10, 10, 10, 10,9,9,9,9,9, 8, 8, 8, 8, 8, 7, 7, 6, 5, 5, 5}
The average of these scores is 160/20 = 8.
Very often, a set of n numbers, Xl' X 2, ••• , Xno which is to be averaged,
may be described in the following way. There are k real numbers, which
we may denote by Xl" x 2 ', ••• , x k ', and k integers, nl , n2 , ••• , nk (whose
sum is n), such that the set of numbers {Xl' X 2 , ••. , xn} consists of nl
repetitions of the number Xl" n2 repetitions of the number xz', and so on,
up to nk repetitions of the number x k '. Thus the set of scores in (1.2) may
be described by the following table:
(1.5) f(X.') = n i
• n
that represents the fraction of the set of numbers {Xl' X 2, ••• , x n }, which is
equal to the number x/. Then (1.4) becomes
k
(1.6) X = L: x/f(x/).
i=1
Scores x/ 11098765
(1.7)
Number ni of students scoring the score x/ 3 5 6 2 3 1
The average of this set of scores is 8, as it would have been if the scores
had been
Scores x/ 11098765
(1.8) - - - - - - - - - - - - - ; - - - - - - - -
Number ni of students scoring the score x/ 338330
The value of the expression (\.9) for the data in (1.3), (1.7), and (1.8) is
equal to 1.3, 1.1, and 0.9, respectively, where in each case the mean x = 8.
Another possible measure of the spread of the data is the average of the
squares of the deviation from the mean x of each number x;' in the set;
in symbols,
k
(1.10) square dispersion = .L (x;' - x)2J(x/),
i=!
202 MEAN AND VARIANCE OF A PROBABILITY LAW CH.5
which has the values 2.7,2.0, and 1.5 for the data in (1.3), (1.7), and (1.8),
respectively.
Next, one may desire a measure for the symmetry of the distribution of
the scores about their mean, for which purpose one might take the average
of the cubes of the deviation of each number in the set from the mid-point
x (= 8); in symbols,
k
(1.11 ) L (x/ - x)3j(x/),
i=l
which has the values -2.7, -1.2, and 0 for the data in (1.3), (1.7), and
(1.8), respectively.
From the foregoing discussion one conclusion emerges clearly. Given
data {Xl' X 2 , • . • , x n }, there are many kinds of averages one can define,
depending on the particular aspect of the data in which one is interested.
Consequently, we cannot speak of the average of a set of numbers.
Rather, we must consider some function g(x) of a real variable X; for
example, g(x) = X, g(x) = (x - 8)2, or g(x) = (x - 8)3. We then define
the average oftheJunctiong(x) with respect to aset oJnumbers {Xl' X 2 , • •• , x n }
as
1 n k
(1.12) - L g(xi ) = L g(x/)J(x;,),
nj=l i=l
in which the numbers Xl" ... , x k ' occur in the proportions J(x/), ... ,J(xk ')
in the set {Xl' X 2 , •.• , x n }·
EXERCISES
In each of the following exercises find the average with respect to the data
given for these functions: (i) g(x) = x; (ii) g(x) = (J; - x)2, in which x is the
answer obtained to question (i); (iii) g(x) = (x - x)3; (iv) g(x) = (x - x);
(v) g(x) = Ix- xl. Hint: First compute the number of times each number
appears in the data.
1.1. The number of rainy days in a certain town during the month of January
for the years 1950-1959 was as follows:
Year 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959
Number of rainy 8
8 9 21 16 16 9 13 9 21
days in January
1.2. Record the last digits of the last 20 telephone numbers appearing on the
first page of your local telephone directory. .
SEC. 2 EXPECTATION OF A FUNCTION 203
1.3. Ten light bulbs were subjected to a forced life test. Their lifetimes were
found to be (to the nearest 10 hours)
850, 1090, 1150, 940, 1150, 960, 1040, 920, 1040, 960.
1.4. An experiment consists of drawing 2 balls without replacement from an
urn cont.aining 6 balls, numbered 1 to 6, and recording the sum of the 2
numbers drawn. In 30 repetitions of the experiment the sums recorded
were (compare example 4A of Chapter 2)
7958574635911 949
11 7 10 4 8 5 6 10 9 5 7 9 10 10 3.
In words, the expectation Efg(x)], difined in (2.1), exists if and only if the
infinite series difining Efg(x)] is absolutely convergent. A test for conver-
gence of an infinite series is given in theoretical exercise 2.1.
For the case in which the probability function P[·] is specified by a
probability density function f(·), we define
In words, the expectation E[g(x)] defined in (2.3) exists if and only if the
improper integral defining E[g(x)] is absolutely convergent. In the case in
which the functions gO and fO are continuous for all (but a finite number
of values of) x, the integral in (2.3) may be defined as an improper
Riemann* integral by the limit
1
An = -(Xl
n
+ Xz + ... + X n),····
(2.11) E[x] = .~
1
J "" xe -~- (x-m)2
a dx =
1
. ;;:-
J"" (m + 2
ay)e-~Y dy,
av 21T - "" V 21T - ""
where we have made the change of variable y = (x - m)/a. Now
Equation (2.19) follows from (2.18), applied first with gl(X) = g(x) and
glx) = Ig(x)1 and then with &(x) = -lg(x)1 and g2(X) = g(x).
~ Example 2B. To illustrate the use of (2.15) to (2.19), we note that
E[4] = 4, E[x2 - 4x] = E[x2] - 4E[x], and E[(x - 2)2] = E[x2 - 4x + 4]
= E[x2 ] - 4E[x] + 4. ....
In words, the variance of a probability law is equal to its mean square, minus
its square mean. To prove (2.20), we write, letting m = E[x],
E[x] = 0 . q + 1 .P =P
(2.21) E[X2] = 0 2 • q + 12 • P = P
= np L
n (n - 1) plc-1q(n-ll-(Ic-ll = np(p + q)n-l = np.
k=l k - I
(2.23)
Since k(k - l)(Z) = n(n - 1)(~:= ~), the sum in (2.24) is equal to
n(n - 1)p2 L (n - 2) p"-2q
n l-(1c-2) = n(n - l)p2(p + q)n-2.
(n-2
k=2 k - 2
(2.26)
= Np i (a - 1) ( b )
( ~) k=l k - 1 n- k '
in which we have let a = Np, b = Nq. Now, letting j =k- 1 and using
(2.37) of Chapter 4, the last sum written is equal to
Consequently,
(N -
n-l
1)
(2.27) E[x] = Np (~) = np.
Next, we evaluate E[x2 ] by first evaluating E[x(x - 1)] and then using the
fact that E[x2] = E(x(x - 1)] + E[x]. Now
Notice that the mean of the hypergeometric probability law is the same as
that of the corresponding binomial probability law, whereas the variances
differ by a factor that is approximately equal to 1 if the ratio nj N is a small
number. ....
210 MEAN AND VARIANCE OF A PROBABILITY LAW CR.S
~ Example 2F. The uniform probability law over the interval a to b has
probability density function f(·) given by (4.10) of Chapter 4. Its mean,
mean square, and variance are given by
E[x] Joo
= _ 00 xf(x) dx =
1
b_ a
(b b2
Ja x dx = 2(b _
- a2
a) =
b +a
-2-
Note that the variance of the uniform probability law depends only on the
length of the interval, whereas the mean is equal to the mid-point of the
interval. The higher moments of the uniform probability law are also
easily obtained:
(2.30) E[xn] = - -
1
xn dx =
Ib
bn +1 - a n +1
.
b- a a (n + 1)(b - a)
~ Example 2G. The Cauchy probability law with parameters ex. = 0 and
f3 = 1 is specified by the probability density function
1 1
(2.31) f(x) = - - - .
7Tl+x2
The mean E[x] of the Cauchy probability law does not exist, since
THEORETICAL EXERCISES
2.1. Test for convergence or divergence of infinite series and improper integrals.
Prove the following statements. Let hex) bea continuous function. If,
for some real numberr > 1, the limits
(2.34) lim xTlh(x)l, lim IxITlh(x)1
X---+OO x_-co
SEC. 2 EXPECTATION OF A FUNCTION 211
both exist and are finite, then
(2.35) t""'co
co hex) dx,
co
I
k= - co
h(k)
converge absolutely; if, for some r .::::; 1, either of the limits in (2.34)
exist and is not equal to 0, then the expressions in (2.35) fail to converge
absolutely.
2.2. Pareto's distribution with parameters r and A, in which r and A are
positive, is defined by the probability density function
1
(2.36) f(x) = rAr '-+1 for x ;::: A
x
= 0 for x < A.
Show that Pareto"s distribution.possesses a finite nth moment if and only
if n < r. Find the mean and variance of Pareto's distribution in the cases
in which they exist.
2.3. "Student's" t-distribution with parameter v > 0 is defined as the con-
tinuous probability law specified by the probability density function
(2.39)
1 00[1 - F(x)]dx
o
=j~oodx (OOdy(y)
0 Jx
= [ooYf(Y)dY,
~O
Fig. 2A. The mean of a probability law with distribution function F(·) is equal
to the shaded area to the right of the y-axis, minus the shaded area to the left
of the y-axis.
area I, minus area II. Although we have proved this assertion only for
the case of a continuous probability law, it holds for any probability law.
2.6. A geometrical interpretation of the higher moments. Show that the nth
moment E[xn] of a continuous probability law with distribution function
FO can be expressed for n = 1, 2, ...
2.8. The square mean is less than or equal to the mean square. Show that
(2.43) IE[x] I :s; E[lxll :s; EV,[x 2].
Give an example of a probability law whose mean square E[x 2] is equal
to its square mean.
SEC. 2 EXPECTATION OF A FUNCTION 213
2.9. The mean is not necessarily greater than or equal to the variance. The
binomial and the Poisson are probability laws having the property that
their mean m is greater than or equal to their variance 0'2 (show this);
this circumstance has sometimes led to the belief that for the probability
law of a random variable assuming only nonnegative values it is always
true that m ::::: 0'2. Prove this is not the case by showing that m < 0'2
for the probability law of the number of failures up to the first success in
a sequence of independent repeated Bernoulli trials.
2.10. The median of a probability law. The mean of a probability law provides
a measure of the "mid-point" of a probability distribution. Another such
measure is provided by the median of a probability law, denoted by me,
which is defined as a number such that
EXERCISES
In exercises 2.1 to 2.7, compute the mean and variance of the probability law
specified by the probability density function, probability mass function, or
distribution function given.
2.1. (i) I(x) = 2x for 0 < x < 1
=0 elsewhere.
(ii) I(x) = Ixl for Ixl -:::; 1
=0 elsewhere.
(iii) 8/3 (
I(x) = - - 1 +-
2
X)-3
1T 3v 3
SEC. 3 MOMENT-GENERATING FUNCTIONS 215
1 _11.CZ-2)"
2.4. (i) I(x) = - = e ' 2
2V271"
(ii) I(x) = V(2/7I")e- lIkZ2 for x> 0
=0 elsewhere.
2.5. (i) p(x) = i- for x = 0
=i for x = 1,
.=0 elsewhere.
(iii) p(x) =
(!)(6 ~ J for x = 0, 1, ... , 6
(Ii)
=0 elsewhere.
1
2.6. (i) p(x) ="32er-
"3 for x = 1,2, ...
=0 otherwise.
2Z
(ii) p(x) = e- 2- for x = 0, 1, 2, ...
x!
=0 otherwise.
2.7. (i) F(x) = 0 for x <0
=x2 for 0 ::; x ::; 1
=1 for x > 1.
(ii) F(x) ='0 for x <0
= xllk for 0 ::; x ::;. 1
=1 for x> 1.
2.S. Compute the means and variances of the probability laws obeyed by the
numerical valued random phenomena described in exercise 4.1 of Chapter
4.
2.9. For what values of r does the probability -law, specified by the following
probability density function, possess (i) a finite mean, (ii) a finite variance:
r - 1
I(x) = 21 xlr " Ixl > 1
=0 otherwise.
3. MOMENT-GENERATING FUNCTIONS
= E[~ x 2e tX ]
(3.4) 3
1p(3)(/) = d 1p(t) = E[x 3e tx ]
dt 3 ot
SEC. 3 MOMENT-GENERATING FUNCTIONS 217
Letting t = 0, we obtain
1p'(O) = E[x]
1p"(0) = E[x2]
(3.5) 1p(3)(0) = E[x 3 ]
(3.10)
with derivatives
E[x] = 1p'(O) = np,
(3.11) 1p"(t) = npet(pet + q)n-l + n(n _ 1)p2e2 t (pe t + q)n-2,
E[X2] = np + n(n - 1)p2 = npq + n2p 2.
TABLE 3A
IV
SOME FREQUENTLY ENCOUNTERED DISCRETE PROBABILITY LAWS WITH THEIR MOMENTS AND GENERATING FUNCTIONS .....
00
Mean Variance
Probability Law Parameters Probability Mass Function pO = E[x 2 ] - E2[x]
m = E[x]
I I 0'2
Bernoulli o ~p ~ 1 p(x) = P x= 1
=q x=O P pq
=0 otherwise ~
Z
Binomial n = 1,2,'" pex) = (~) proqn-x
~
X = 0, 1, 2, ... , II
lip npq
o ~p ~ 1 =0 otherwise
. AX ~
~
Poisson A>O p(x) = e- I. - x = 0, 1,2,'"
x! A Jc
=0 otherwise
Geometric o ~p ~ 1 p(x) = pq,H X = 1,2,'" 1 q
$
- »-
=0 otherwise p j} '"tI
=0 otherwise
p
~
Hypergeometric N= 1,2,·"
n = 1,2,···,N (?)(/~?J
p(x) = x = O,I,"',n np npq (N-n)
N-l
1 2
P = 0, N' N'" ·,1 (~) ~
=0 otherwise U1
--- - -- ----- - - - - - - - - - -
~
w
TABLE 3A (Continued).
SOME FREQUENTLY ENCOUNTERED DISCRETE PROBABILITY LAWS WITH THEIR MOMENTS AND GENERATING FUNCTIONS
- -
Poisson
(pet + q)n
eA(et-l)
(peiU + q)n
eA(eiU-l)
npq(q - p)
A
3n2p2q2 + pqn(1 - 6pq)
A+ 3},2
I
tTl
~
~
~
pet pe iu
Geometric
1.-qe t 1 - qe iU fo(1 +2;) p2
1)
1(1 +9 p2 ~
S4
Negative binomial
(1 !qetf
= (Q _ Pe t)-l'
(
1 - pqe W. r = (Q - Pe iu )-" :2(1 + 2~)
p2 p
= rpQ(Q + P)
rq ( 1 + (6 + 3r) j2
j2
= 3r 2p 2Q2
q)
~
+ rPQ(1 + 6PQ)
Hypergeometric see M. G. Kendall, Advanced Theory of Statistics, Charles Griffin, London, 1948, p. 127.
-
t-.)
\D
tv
~
TABLE 3B
SOME FREQUENTLY ENCOUNTERED CONTINUOUS PROBABILITY LAWS WITH THEIR MOMENTS AND GENERATING FUNCTIONS
Mean Variance
Probability Law Parameters Probability Density Function/O
m = E[x] a2 = E[X2] _ E2(X) ~
z
Uniform over -co<a<b<co
1
f(x) = b - a a<x<b
a +b
2
(b - a)2
12
~
I
in terval a to b
=0 otherwise
-co<m<oo
1 C-m)
f{x) = ---= e -J,i -a-
2
a2
Normal m
av'27T ~
a>O >-
'd
~
Vo
~
r
w
TABLE 3B (Continued).
SOME FREQUENTLY ENCOUNTERED CONTINUOUS PROBABILITY LAWS WITH THEIR MOMENTS AND GENERATING FUNCTIONS
I I
!j
.......
222 MEAN AND VARIANCE OF A PROBABILITY LAW CR. 5
~ Example 3C. The Poisson probability law with parameter J" has a
moment-generating function for all t.
co co (},et)k ,
(3.12) 1jJ(t) = 2: et 1;(k) = e- A2: -,- = e-J'e Ae' = eA(e -1),
k~O k~O k.
with derivatives
7p'(t) = eA(e'-l)},e t , E[x] = 1p'(O) = J",
(3.13)
1p"(t) = e A(e'-1)J,,2 e2t + J"e t eA(e'-l), E[x2] = 1p"(O) = J,,2 + J".
Consequently, the variance 0'2 = E[X2] - E2[X] =.J". Thusjor the Poisson
probability law the mean and the variance are equal. ~
From (3.16) one may show that the central moments of the normal
probability law are given by
(3.17) E[(x - m)n] = 0 if n = 3, 5," . ,
= 1 . 3 . 5 ... (n - l)a n if n = 2,4,' ...
An alternate method of deriving (3,17) is by use of (2.22) in Chapter 4. ~
• Example 3F. The exponential probability law with parameter J" has a
moment-generating function for t < A.
THEORETICAL EXERCISES
=0 otherwise.
Show that the corresponding moment-generating function may be written
M Ml
(3.21) 1p(t) = 2: etmp(m) = 2: "Ir. (e t -
1n~O r~O
lY.
3.4. The first M moments of the number of matches in the problem of matching
M balls in M urns coincide with the first M moments of the Poisson proba-
bility law with parameter.l. = 1. Show that the factorial moment-generating
function of the Poisson law with parameter Ais given by
(3.23)
By comparing (3.22) and (3.23), it follows that the first M factorial moments,
and, consequently, the first M moments of the probability law of the
number of matches and the Poisson probability law with parameter 1,
coincide.
EXERCISES
Compute the moment generating function, mean, and variance of the pro-
bability law specified by the probability density function, probability mass
function, or distribution function given.
3.1. (i) f(x) = e-X for x ;:,. 0
=0 elsewhere.
(ii) f(x) = e-(X-5) for x ;:,. 5
=0 elsewhere.
1
3.2. (i) f(x) = -----= e-xj2 for x> 0
V27TX
=0 elsewhere.
(ii) f(x) = !xe- Xj2 for x > 0
=0 elsewhere.
3.3. (i) p(x) = !(t)"'-l for x = 1,2,'"
=0 elsewhere.
2'"
(ii) p(x) = e- 2 - for x = 0, 1, ...
x!
=0 elsewhere.
SEC. 4 CHEBYSHEV'S INEQUALITY 225
3.4. (i) X -
F(x) = <I> ( - 2 -
2)
(ii) F(x) = 0 for x < 0
= 1 - e- x / 5 for x 2:: O.
3.5. Find the mean, variance, third central moment, and fourth central moment
of the number of matches when (i) 4 balls are distributed in 4 urns, 1 to
an urn, (ii) 3 balls are distributed in 3 urns, 1 to an urn.
3.6. Find the factorial moment-generating function of the (i) binomial, (ii)
Poisson, (iii) geometric probability laws and use it to obtain their means,
variances, and third and fourth central moments.
4. CHEBYSHEV'S INEQUALITY
m-lw
f(x) dx.
Let us compute Q(h) in certain cases. For the normal probability law
with mean m and standard deviation a
1 r"'+lw I (y-m) 2
(4.2) Q(h) = A / - I. e- Yf ----;:; dy = 1>(h) - <D( -h).
V 27Ta v",-lw
For the uniform distribution over the interval a to b, for h < '\1"3,
b+a -'- h b-a
2 ' v'12
(4.4) Q(h) = _1_
b-a
f dx
h
= V3'
b+a _ h b-a
2 v'12
For the other frequently encountered probability laws one cannot so
readily evaluate Q(h). Nevertheless, the function Q(h) is still of interest,
since it is possible to obtain a lower bound for it, which does not depend
on the probability law under consideration. This lower bound, known as
Chebyshev's inequality, was named after the great Russian probabilist
P. L. Chebyshev (1821-1894).
Chebyshev's inequality. For any distribution function FO and any
h>O
1
(4.5) Q(h) = F(rn + h(J) - F(rn - hlJ) >.1 - h2 •
Note that (4.5) is trivially true for h < 1, since th~ right-hand side is then
negative.
We prove (4.5) for the case of a continuous probability law with prob-
ability density functionj{·). It may be proved in a similar manner (using
Stieltjes integrals, introduced in section 6) for a general distribution
function. The inequality (4.5) may be written in the continuous case
(",+ha 1
(4.6) Jm-ha f(x) dx > I - h2 •
- ro
(X - m)2j(x)dx +j
/'1
m+ha
00
(X - m)'1(x)dx
that follows, since the variance (j2 is equal to the sum of the two inte-
grals on the right-hand side of (4.7), plus the nonnegative quantity
1 1Jl+ha
1n-ha
(x - m)2j(x) dx. Now for x < m - h(1, it holds that (x - rn)2 >
h2(12. Similarly, x > m + h(1 implies (x - m)2 > h 2 (j2. By replacing
(x - m)2 by these lower bounds in (4.7), we obtain
1.0
0.8
Chebyshev's lower bound for Q(h)
0.6
0.4
0.2
2 3 4 5 6 7
EXERCISES
4.1. Use Chebyshev's inequality to determine how many times a fair coin must
be tossed in order that the probability will be at least 0.90 that the ratio
of the observed number of heads to the number of tosses will lie between
0.4 and 0.6.
4.2. Suppose that the number of airplanes arriving at a certain airport in any
20-minute period obeys a Poisson probability law with mean 100. Use
Chebyshev's inequality to determine a lower bound for the probability
that the number of airplanes arriving in a given 20-minute period will be
between 80 and 120.
4.3. Consider a group of N men playing the game of "odd man out" (that is,
they repeatedly perform the experiment in which each man independently
tosses a fair coin until there· is an "odd" man, in the sense that either
exactly 1 of the N coins falls heads or exactly 1 of the N coins falls tails).
Find, for (i) N = 4, (ii) N = 8, the exact probability that the number of
repetitions of the experiment required to conclude the game will be within
2 standard deviations of the mean number of repetitions required to con-
clude the game. Compare your answer with the lower bound given by
Chebyshev's inequality.
4.4. For Pareto's distribution, defined in theoretical exercise 2.2, compute and
graph the function Q(h), for A = 1 and r = 3 and 4, and compare it with
the lower bound given by Chebyshev's inequality.
In words, (S.2) and (S.3) state that as the number n of trials tends to
infinity the relative frequency of successes in n trials tends to the true
probability p of success at each trial, in the probabilistic sense that any
nonzero difference E between fn and p becomes less and less probable of
observation as the number of trials is increased indefinitely.
Bernoulli proved (S.3) by a tedious evaluation of the probability in (S.3).
Using Chebyshev's inequality, one can give a very simple proof of (S.3).
By using the fact that the probability law of Sn has mean np and variance
npq, one may prove that the probability law ofj~1 has mean p and variance
[p(1 - p)]/n. Consequently, for any E > 0
. p(1 - p)
(S.4) P[/j" - pi > €] < nE
2'
230 MEAN AND VARIANCE OF A PROBABILITY LAW CH.5
Now, for any value of p in the interval 0 <p < 1
(5.5) p(1 - p) < t,
using the fact that 4p(l - p) - 1 = -(2p - 1)2 < O. Consequently, for
any e > 0
1
(5.6) P[I/" - pi > e] < -2--+0
4ne
as n --+ 00,
no matter what the true value of p. To prove (5.2), one uses (5.3) and the
fact that
(5.7) P[I/" - pi < e] = 1 - P[l/.. - pi > e].
It is shown in section 5 of Chapter 8 that the foregoing method of proof,
using Chebyshev's inequality, permits one to prove that if Xl' X 2 , ••• ,
Xn> ... is a sequence of independent observations of a numerical valued
random phenomenon whose probability law has mean m then for any
e>O
(5.8) ·
11m
n--..co
p[1 Xl + X + ... + Xn -m
_
2
n
I> ]= .
e 0
(5.13)
-K(IX}
~(y) dy = oc.
TABLE SA
K(Ct.)
0.50 0.675
0.6827 1.000
0.90 1.645
0.95 1.960
0.9546 2.000
0.99 2.576
0.9973 3.000
Since pq < m for all p, we finally obtain from (5.15) that (5.9) will
hold if
(5.16)
EXERCISES
in which the limit is taken over all partitions of the interval (a, b], as the
maximum length of subinterval in the partition tends to o.
It may be shown that if F(·) is specified by a probability density function'
I(·), then
(6.2) f.b g(x) dF(x) =f.bg(X)I(X) dx,
a+ a
whereas if F(·) is specified by a probability mass function p(.) then
The Stieltjes integral of the continuous function g(.), with respect to the
distribution function F(·) over the whole real line, is defined by
(6.8) E[g(XI' X2 , ••. , xn)] = JJ... Jg(x l, X2 , ••• , xn) dF(x l , x 2 , ••• , xn)
Rn
in which the integral is a Stieltjes integral over the spaceRn of all n-tuples
(Xl' X 2 , ••• ,Xn ) of real numbers. We shall not write out here the definition
of this integral.
SEC. 6 MORE ABOUT EXPECT ATION 235
We note that (6.2) and (6.3) generalize. If the distribution function
F(xl , X2 , ••• ,xn) is specified by a probability density function f(x l , X2 , ••• , xn)
so that (7.7') of Chapter 4 holds, then
(6.9) E[g(Xl' x 2 , ••• , xn)]
EXERCISES
= 0 otherwise.
236 MEAN AND VARIANCE OF A PROBABIUTY LAW CH.5
(iii)
The normal distribution function and the normal probability laws have
played a significant role in probability theory since the early eighteenth
century, and it is important to understand from what this signifiance
derives.
To begin with, there are random phenomena that obey a normal
probability law precisely. One example of such a phenomenon is the
velocity in any given direction of a molecule (with mass M) in a gas at
absolute temperature T (which, according to Maxwell's law of velocities,
obeys a normal probability law with parameters m = 0 and (J2 = M/kT,
where k is the physical constant called Boltzmann's constant). However,
with the exception of certain physical phenomena, there are not many
random phenomena that obey a normal probability law precisely. Rather,
the normal probability laws derive their importance from the fact that
under various conditions they closely approximate many other probability
laws.
237
238 NORMAL, POISSON, AND RELATED PROBABILITY LAWS CH. 6
The normal distribution function was first encountered (in the work of
de Moivre, 1733) as a means of giving an approximate evaluation of the
distribution function of the binomial probability law with parameters
nand p for large values of n. This fact is a special case of the famed
central limit theorem of probability theory (diseussed in Chapters 8
and 10) which describes a very general class of random phenomena whose
distribution functions may be approximated by the normal distribution
function.
A normal probability law has many properties that make it easy to
manipulate. Consequently, for mathematical eonvenience one may often,
in practice, assume that a random phenomenon obeys a normal probability
law if its true probability law is specified by a probability density function
of a shape similar to that of the normal density function, in the sense that
it possesses a single peak about which it is approximately symmetrical.
For example, the height of a human being appears to obey a probability
law possessing an approximately bell-shaped probability density function.
Consequently, one might assume that this quantity obeys a normal
probability law in certain respects. However, care must be taken in using
this approximation; for example, it is conceivable for a normally distri-
buted random quantity to take values between -10 6 and _10100, although
the probability of its doing so may be exceedingly small. On the other
hand, no man's height can assume such a large nega.tive value. In this
sense, it is incorrect to state that a man's height is approximately distributed
in accordance with a normal probability law. One may, nevertheless,
insist on regarding a man's height as obeying approximately a normal
probability law, in order to take advantage of the computational simplicity
of the normal distribution. As long as the justification of this approxima-
tion is kept clearly in mind, there does not seem to be too much danger in
employing it. -
There is another sense in which a random phenomenon may approxi-
mately obey a normal probability law. It may happen that the random
phenomenon, which as measured does not obey a normal probability law,
can, by a numerical transformation of the measurement, be cast into a
random phenomenon that does obey a normal probability law. For
example, the cube root of the weight of an animal may obey a normal
probability law (since the cube root of weight may be proportional to
height) in a case in which the weight does not. --
Finally, the study of the normal density function is important even for
the study of a random phenomenon that does not obey a normal probability
law, for under certain conditions its probability density function may be
expanded in an infinite series of functions whose terms -involve the
successive derivatives of the normal density function.
SEC. 2 THE APPROXIMATION OF THE BINOMIAL PROBABILITY LAW 239
(2.1)
= <I>(b -np + t) _ (a -
<I> np - t)
v'npq v'npq
Before indicating the proof of this theorem, let us explain its meaning
and usefulness by the following examples.
~ Example 2A. Suppose that n = 6000 tosses of a fair die are made.
The probability that exactly k of the tosses will result in a "three" is given
by (6~0) G) G) k 600G-k The ptobabilfty that· the nu.p1ber of tosses on
which a "three" will occur is between 980 and 1030, inclusive, is given by
the sum
(2.2) l~O (6000)
k=980 k ·6 6
(!)
k (~) 600G-k
f
28.87
p(x)
0.100 n = 100
0.050
0.000 x
10 20 30 40 50 60 70 80 90
p(x)
0.100
n = 50
0.050
0.000 x
10 20 30 40 50 60 70 80 90
: p*(h)
I
I
o.looi
I
n = 100
Iii
0.0501-
III
o.oooj h
-4 -3 -2 -1 0 2 3 4
: p*(h)
1:
O.lDD~
n =50
I:
0.050,...
j j
I j I 0.000: II I
I I h
I
-4 -3 -2 -1 o 2 3 4
Fig. 2B. Graphs of the probability mass functionp*(h) for p = t and n = 50 and 100.
SEC. 2 THE APPROXIMATION OF THE BINOMIAL PROBABILITY LAW 241
~ Example 2B. In 40,000 independent tosses of a coin heads appeared
20,400 times. Find the probability that if the coin were fair one would
observe in 40,000 independent tosses (i) 20,400 or more heads, (ii) between
19,600 and 20,400 heads.
Solution: .Let X be the number of heads in 40,000 independent tosses of
a fair coin. Then X obeys a binomial probability law with mean
m = np = 20,000, variance all = npq = 10,000, and standard deviation
a = 100. According to the normal approximation to the binomial
probability law, X approximately obeys a normal probability law with
parameters m = 20,000 and a = 100 [in making this statement we are
ignoring the terms in (2.1) involving t, which are known as a "continuity"
correction]. Since 20,400 is four standard deviations more then the mean
of the probability law of X, the probability is approximately 0 that one
would observe a value of X more than 20,400. Similarly, the probability
is I that one would observe a value of X between 19,600 and 20,400. ....
In order to have a convenient language in which to discuss the proof of
(2.1), let us suppose that we are observing the number X of successes in n
independent repeated Bernoulli trials with probability p of success at each
trial. Next, to each outcome X let us compute the quantity
(2.3) h_ X - np
- -"';-:=n=r.pq=- ,
(2.5)
(2.6)
. '\f27Tnpq p(h.v,pq
11m + np) _
l~h2 - 1.
ft-+CX) e .
To prove (2.6), we first obtain the approximate expression for p(x); for
k = 0,1, ... ,n
( n)
k
n!
= k !(n -
1
k)! = V2,;
J n -(n) k ( n )
ken - k) k n _ k
n-k R
e.
1
(2.9) o < rem) < 12m.
-_ _ loge {(np)k(
- -nq- )n-k}
k n- k
+ (nq - hv-;;pq)[-hv'(p/nq) - h2 .L
2 nq
+ terms in n~J
°
If we ignore all terms that tend to as n tends to infinity in (2.11), we
obtain the desired conclusion, namely (2.6).
From (2.6) one may obtain a proof of (2.0. However, in this book we
give only a heuristic geometric proof that (2.6) implies (2.1). For an
elementary rigorous proof of (2.1) the reader should consult J. Neyman,
First Course in Probability and Statistics, New York, Henry Holt, 1950,
pp. 234-242. In Chapter 10 we give a rigorous proof of (2.1) by using the
method of characteristic functions.
A geometric derivation of (2.1) from (2.6) is as follows. First plot p*(h)
°
in Fig. 2B as a function of h; note that p*(h) = for all points h, except
those that may be represented in the form
(2.12) h = (k - np)/Vnpq
for some integer k = 0, 1, ... ,n. Next, as in Fig. 2C, plot p*(h) by a
series of rectangles of height (l/V27T)e-J~h2, centered at all points h of the
form of (2.12).
244 NORMAL, POISSON, AND RELATED PROBABILITY LAWS CH.6
which is equal to the sum of the areas of the rectangles in Fig. 2C centered
at the points h of the form of (2.12), corresponding to the integers k from
a to h, inclusive. Now, the sum of the area of these rectangles is an
approximating sum to the integral of the function (1/~)e-Y,N between
the limits (a - np - t)/~ and (b - np + t)/~. We have thus
obtained the approximate formula (2.1).
-4 -3 -2 -1 ·0 1 2 3 4
Fig. 2e. The normal approximation to the binomial probability law. The continuous
curve represents the normal density function. The area of each rectangle represents
the approximate value given by (2.5) for the value of the probability mass function
p*(h) at the mid-point of the base of the rectangle.
have become available in recent years. The Tahles oj' the Binomial
Probability Distribution, National Bureau of Standards, Applied Mathe-
mat.ics Series 6, Washington, 1950, give 1 - FB(x; n,p) to seven decimal
places for p = 0.01 (0.01) 0.50 and n = 2(1) 49. These tables are extended
in H. G. Romig, 50-100 Binomial Tables, Wiley, .New York, 1953, in
which FB(x; n,p) is tabulated for n = 50(5) 100 and p = 0.01 (0.01) 0.50.
A more extensive tabulation of FB(x; n,p) for n = 1(1) 50(2) 100(10)
200(20) 500(50) 1000 and p = 0.01 (0.01) 50 and also p equal to certain
other fractional values is available in Tables of the Cumulative Binomial
Probability Distribution, Harvard University Press, 1955.
The Poisson Approximation to the Binomial Probability Law. The
Poisson approximation, whose proof and usefulness was indicated in
section 3 of Chapter 3, states that
(2.18)
K
~
(n) pk(l _ p)n-7c --=-- K
e-np ~
(nn)"
_1'_
k~O k k~O k!
. ~ T + J_)
( K - nn 2 ~ ( -nip - 1.2 )
= Vnp(1 - p) - vnp(1 - p) ,
where the first equality sign in (2.18) holds if the Poisson approximation
SEC. 2 THE APPROXIMATION OF THE BINOMIAL PROBABILITY LAW 247
to the binomial applies and the second equality sign holds if the normal
approximation to the binomial applies.
Define, for any A. > 0 and integer n = 0, 1, ... ,
n itk
(2.19) Fp(n; A.) = .L e-). -k! '
k=O
.?
vnpq
> Vnj;q>4 if npq > 16
The solution K of the inequality (2.21) can be read from tables prepared by
E. C. Molina (published in a book entitled Poisson's Exponential Binomial
Limit, Van Nostrand, New York, 1942) which tabulate, to six decimal
places, the function
'" A.k
(2.24) 1 - Fp(K; it) = .L e-). k'
k=K+l •
for about 300 values of A. in the interval 0.001 < A. < 100.
The value of K, determined by (2.21) and (2.22), is given in Table 2A
for p = lo, lo, !, n = 90,900, and Po = 0.95, 0.99. .....
248 NORMAL, POISSON, AND RELATED PROBABILITY LAWS CH.6
TABLE 2A
THE VALUES OF K DETERMINED BY (2.21) AND (2.22)
p ..L ..L
30 10
THEORETICAL EXERCISES
(2.25)
2.2. A competition problem. Suppose that m restaurants compete for the same
n patrons. Show that the number of seats that each restaurant should have
to order to have a probability greater than Po that it can serve all patrons
SEC. 2 THE APPROXIMATION OF THE BINOMIAL PROBABILITY LAW 249
who come to it (assuming that all the patrons arrive at the same time and
choose, independently of one another, each restaurant with probability
p = 11m) is given by (2.22), with p = 11m. Compute K for m = 2, 3, 4
and Po = 0.75 and 0.95. Express in words how the size of a restaurant
(represented by K) depends on the size of its market (represented by n),
the number of its competitors (represented by 111), and the share of the
market it desires (represented by Po),
EXERCISES
2.1. In 10,000 independent tosses of a coin 5075 heads were observed. Find
approximately the probability of observing (i) exactly 5075 heads, (ii) 5075
or more heads if the coin (a) is fair, (b) has probability 0.51 of falling heads.
2.2. Consider a room in which 730 persons are assembled. For i = 1,2, ... ,
730, let Ai be the event that the ith person was born on January 1. Assume
that the events AI' . .. ,A73o are independent and that each event has
probability equal to 1/365. Find approximately the probability that
(i) exactly 2, (ii) 2 or more persons were born on January 1. Compare the
answers obtained by using the normal and Poisson approximations to the
binomial law.
2.3. Plot the probability mass function of the binomial probability law with
parameters n = 10 and p = t against its normal approximation. In
your opinion, is the approximation close enough for practical purposes?
2.4. Consider an urn that contains 10 balls, numbered 0 to 9, each of which is
equally likely to be drawn; thus choosing a ball from the urn is equivalent
to choosing a number 0 to 9, and one sometimes describes this experiment
by saying that a random digit has been chosen. Now let n balls be chosen
with replacement. Find. the probability that among the n numbers thus
chosen the number 7 will appear between (n - 3 V n)/10 times and
(n + 3vn)/l0 times, inclusive, if (i) n = 10, (ii) n = 100, (iii) n = 10,000.
Compute the answers exactly or by means of the normal and Poisson
approximations to the binomial probability law.
2.5. Find the probability that in 3600 independent repeated trials of an
experiment, in which the probability of success of each trial is p, the number
of successes is between 3600p - 20 and 3600p + 20, inclusive, if (i) P = t,
(ii)p = i.
2.6. A certain corporation has 90 junior executives. Assume that the proba-
bility is /0 that an executive will require the services of a secretary at the
beginning of the business day. If the probability is to be 0.95 or greater
that a secretary will be available, how many secretaries should be hired
to constitute a pool of secretaries for the group of 90 executives?
2.7. Suppose that (i) 2, (ii) 3 restaurants compete for the same 800 patrons.
Find the number of seats that each restaurant should have in order to
have a probability greater than 95 % that it can serve all patrons who
come to it (assuming that all patrons arrive at the same time and choose,
independently of one another, each restaurant with equal proQability).
250 NORMAL, POISSON, AND RELATED PROBABILITY LAWS CH.6
2.S. At a certain men's college the probability that a student selected at random
on a given day will require a hospital bed is 1/5000. If there are 8000
students, how many beds should the hospital have so that the probability
that a student will be turned away for lack of a bed is less than 1 % (in
other words, find K so that P[X > K] ::; 0.01, where X is the number of
students requiring beds).
2.9. Consider an experiment in which the probability of success at each trial
is p. Let X denote the successes in n independent trials of the experiment.
Show that
P[iX - npl ::; (1.96)Vnpql ::.'..: 95%.
Consequently, if p = 0.5, with probability approximately equal to 0.95,
the observed number X of successes in 11 independent trials will satisfy
the inequalities
(2.26) (0.5)n - (0.98) v;; ::; X::; 0.5n + (0.98) v;;.
Determine how large n should be, under the assumption that (i) p = 0.4,
(ii) p = 0.6, (iii) p = 0.7, to have a probability of 5 % that the observed
number X of successes in the n trials will satisfy (2.26).
2.10. In his book Natural Inheritance, p. 63, F. Galton in 1889 described an
apparatus known today as Galton's quincunx. The apparatus consists
of a board in which nails are arranged in rows, the nails of a given row
being placed below the mid-points of the intervals between the nails in the
row above. Small steel balls of equal diameter are poured into the
apparatus through a funnel located opposite the central pin of the first
row. As they run down the board, the balls are "influenced" by the nails
in such a manner that, after passing through the last row, they take up
positions deviating from the point vertically below the central pin of the
first row. Let us call this point .1: = O. Assume that the distance between
2 neighboring pins is taken to be I and that the diameter of the balls is
slightly smaller than 1. Assume that in passing from one row,to the next
the abscissa (x-coordinate) of a ball changes by either t or -1, each possi-
bility having equal probability. To each opening in a row of nails, assign as
its abscissa the mid-point of the interval between the 2 nails. If there is an
even number of rows of nails, then the openings in the last row will have
abscissas 0, ± I, ±2, . . .. Assuming that there are 36 rows of nails, find
for k = 0, ± I, ±2, ... , ± 10 the probability that a ball inserted in the
funnel will pass through the opening in the last row, which has abscissa k.
2.11. Consider a liquid of volume V, which contains N bacteria. Let the liquid
be vigorously shaken and part of it transferred to a test tube of volume v.
Suppose that (i) the probability p that any given bacterium will be trans-
ferred to the test tube is equal to the ratio of the volumes vi V and that (ii)
the appearance of I particular bacterium in the test tube is independent of
the appearance of the other N - I bacteria. Consequently, the number
of bacteria in the test tube is a numerical valued random phenomenon
obeying a binomial probability law with parameters Nand p = viV.
Let m = NI V denote the average number of bacteria per unit volume. Let
the volume v of the test tube be equal to 3 cubic centimeters.
SEC. 3 THE POISSON PROBABILITY LAW 251
(i) Assume that the volume v of the test tube is very small compared to the
volume V of liquid, so that p = vI V is a small number. In particular,
assume that p = 0.001 and that the bacterial density m = 2 bacteria per
cubic centimeter. Find approximately the probability that the number
of bacteria in the test tube will be greater than I.
(ii) Assume that the volume v of the test tube is comparable to the volume
Vof the liquid. In particular, assume that V = 12 cubic centimeters and
N = 10,000. What is the probability that the number of bacteria in the
test tube will be between 2400 and 2600, inclusive?
2.12. Suppose that among 10,000 students at a certain college 100 are red-
haired.
(i) What is the probability that a sample of 100 students, selected with
replacement, will contain at least one red-haired student?
(ii) How large is a. random sample, drawn with replacement, if the proba-
bility of its containing a red-haired student is 0.95?
It would be more realistic to assume that the sample is drawn without
replacement. Would the answers to (i) and (ii) change if this assumption
were made? Hint: State conditions under which the hypergeometric
law is approximated by the Poisson law.
2.13. Let S be the observed number of successes in n independent repeated
Bernoulli trials with probability p of success at each trial. For each of
the following events, find (i) its exact probability calculated by use of the
binomial probability law, (ii) its approximate probability calculated by
use of the normal approximation, (iii) the percentage error involved in
using (ii) rather than (i).
(3.2)
254 NORMAL, POISSON, AND RELATED PROBABILITY LAWS CH.6
Now (3.2) is only an approximation to the probability that k events will
occur in time t. To get an exact evaluation, we must let the number of
subintervals increase to infinity. Then (3.2) tends to (3.1) since rewriting
(3.2)
as n~ 00.
It should be noted that the foregoing derivation of (3.1) is not completely
rigorous. To give a rigorous proof of (3.1), one must treat the random
phenomenon under consideration as a stochastic process. A sketch of
such proof, using differential equations, is given in section 5.
~ Example 3B. It is known that bacteria of a certain kind occur in water
at the rate of two bacteria per cubic centimeter of water. Assuming that
this phenomenon obeys a Poisson probability law, what is the probability
that a sample of two cubic centimeters of water will contain (i) no bacteria,
(ii) at least two bacteria?
Solution: Under the assumptions made, it follows that the number of
bacteria in a two-cubic-centimeter sample of water obeys a Poisson
probability law with parameter fll = (2)(2) = 4, in which fl denotes the
rate at which bacteria occur in a unit volume and t represents the volume of
the sample of water under consideration. Consequently, the probability
that there will be no bacteria in the sample is equal to e- 4 , and the proba-
bility that there will be two or more bacteria in the sample is equal to
1 - 5e- 4 • .....
o 59
1 27
2 9
3 1
over 3 o
0 0.6065 58.224 59
0.3033 29.117 27
2 0.0758 7.277 9
3 0.0126 1.210 1
over 3 0.0018 0.173 0
The solution K of the second inequality in (3.5) can be read from Molina's
tables (E. C. Molina, Poisson's Exponential Binomial Limit, Van Nostrand,
258 NORMAL, POISSON, AND RELATED PROBABILITY LAWS CH. 6
New York, 1942). If}, is so large that the normal approximation to the
Poisson law may be used, then (3.5) may be solved explicitly for K. Since
the first sum in (3.5) is approximately equal to
THEORETICAL EXERCISES
3.1. A problem of aerial search. State conditions for the validity of the
following assertion: if N ships are distributed at random over a region
of the ocean of area A, and if a plane can search over Q square miles
of ocean per hour of flight, then the number of ships sighted by a plane
in a flight of T hours obeys a Poisson probability law with parameter
A = NQT/A.
3.2. The number of matches approximately obeys a Poisson probability law.
Consider the number of matches obtained by distributing M balls,
numbered 1 to M, among M urns in such a way that each urn contains
exactly 1 ball. Show that the probability of exactly m matches tends to
e-1(1/m !), as Mtends to infinity, so that for large M the number of matches
approximately obeys a Poisson probability law with parameter 1.
EXERCISES
State carefully the probabilistic assumptions under which you solve the
following problems. Keep in mind the empirically observed fact that the
occurrence of accidents, errors, breakdowns, and so on, in many instances
appear to obey Poisson probability laws.
3.1. The incidence of polio during the years 1949-1954 was approximately 25
per 100,000 population. In a city of 40,000 what is the probability of
having 5 or fewer cases? In a city of 1,000,000 what is the probability of
having 5 or fewer cases? State your assumptions.
3.2. A manufacturer of wool blankets inspects the blankets by counting the
number of defects. (A defect may be a tear, an oil spot, etc.) From past
records it is known that the mean number of defects per blanket is 5.
Calculate the probability that a blanket will contain 2 or more defects.
3.3. Bank tellers in a certain bank make errors in entering figures in their
ledgers at the rate of 0.75 error per page of entries. What is the probability
that in 4 pages there will be 2 or more errors?
SEC. 3 THE POISSON PROBABILITY LAW 259
3.4. Workers in a certain factory incur accidents at the rate of 2 accidents per
week. Calculate the probability that there will be 2 or fewer accidents
during (i) 1 week, (ii) 2 weeks; (iii) calculate the probability that there
will be 2 or fewer accidents in each of 2 weeks.
3.5. A radioactive source is observed during 4 time intervals of 6 seconds each.
The number of particles emitted during each period are counted. If the
particles emitted obey a Poisson probability law, at a rate of 0.5 particles
emitted per second, find the probability that (i) in each of the 4 time
intervals 3 or more particles will be emitted, (ii) in at least 1 of the 4
time intervals 3 or more particles will be emitted.
3.6. Suppose that the suicide rate in a certain state is 1 suicide per 250,000
inhabitants per week.
(i) Find the probability that in a certain town of population 500,000 there
will be 6 or more suicides in a week.
(ii) What is the expected number of weeks in a year in which 6 or more
suicides will be reported in this town.
(iii) Would you find it surprising that during 1 year there were at least 2
weeks in which 6 or more suicides were reported?
3.7. Suppose that customers enter a certain shop at the rate of 30 persons an
hour.
(i) What is the probability that during a 2-minute interval either no one
will enter the shop or at least 2 persons will enter the shop.
(ii) If you observed the number of persons entering the shop during each
of 30 2-minute intervals, would you find it surprising that 20 or more of
these intervals had the property that either no one or at least 2 persons
entered the shop during that time?
3.8. Suppose that the telephone calls coming into a certain switchboard obey
a Poisson probability law at a rate of 16 calls per minute. If the switch-
board can handle at most 24 calls per minute, what is the probability,
using a normal approximation, that in 1 minute the switchboard will
receive more calls than it can handle (assume all lines are clear).
3.9. In a large fleet of delivery trucks the average number inoperative on any
day because of repairs is 2. Two standby trucks are available. What ~s
the probability that on any day (i) no standby trucks will be needed,
(ii) the number of standby trucks is inadequate.
3.10. Major motor failures occur among the buses of a large bus company at
the rate of2 a day. Assuming that each motor failure requires the services
of 1 mechanic for a whole day, how many mechanics should the bus
company employ to insure that the probability is at least 0.95 that a
mechanic will be available to repair each motor as it fails? (More precisely,
find the smallest integer K such that the probability is greater than or
equal to 0.95 that K or fewer motor failures will occur in a day.)
3.11. Consider a restaurant located in the business section of a city. How many
seats should it have available if it wishes to serve at least 95 % of all those
260 NORMAL, POISSON AND RELATED PROBABILITY LAWS CH. 6
who desire its services in a given hour, assuming that potential customers
(each of whom takes at least an hour to eat) arrive in accord with the
following schemes:
(i) 1000 persons pass by the restaurant in a given hour, each of whom has
probability 1/100 of desiring to eat in the restaurant (that is, each person
passing by the restaurant enters the restaurant once in every 100 times);
(ii) persons, each of whom has probability 1/100 of desiring to eat in the
restaurant, pass by the restaurant at the rate of 1000 an hour;
(iii) persons, desiring to be patrons of the restaurant, arrive at the restaurant
at the rate of 10 an hour.
3.12. Flying-bomb hits on London. The following data (R. D. Clarke, "An
application of the Poisson distribution," Journal of the Institute ofActuaries,
Vol. 72 (1946), p. 48) give the number of fiying-bomb hits recorded in
each of 576 small areas of t = t square kilometers each in the south of
London during World War II.
o 229
1 211
2 93
3 35
4 7
5 or over 1
Using the procedure in example 3E, show that these observations are
well fitted by a Poisson probability law.
3.13. For each of the following numerical valued random phenomena state
conditions under which it may be expected to obey, either exactly or
approximately, a Poisson probability law: (i) the number of telephone
calls received at a given switchboard per minute; (ii) the number of
automobiles passing a given point on a highway per minute; (iii) the
number of bacterial colonies in a given culture per 0.01 square millimeter
on a microscope slide; (iv) the number of times one receives 4 aces per
75 hands of bridge; (v) the number of defective screws per box of 100.
It has already been seen that the geometric and negative binomial
probability laws arise in response to the following question: through how
many trials need one wait in order to achieve the rth success in a sequence
of independent repeated Bernoulli trials in which the probability of success
at each trial is p? In the same way, exponential and gamma probability
SEC. 4 THE EXPONENTIAL AND GAMMA PROBABILITY LAWS 261
laws arise in response to the question: how long a time need one wait if
one is observing a sequence of events occurring in time in accordance with
a Poisson probability law at the rate of f-l events per unit time in order to
observe the rth occurrence of the event?
~ Example 4A. How long will a toll collector at a toll station at which
automobiles arrive at the mean rate f-l = 1.5 automobiles per minute have
to wait before he collects the rth toll for any integer r = 1, 2, ... ? .....
We now show that the waiting time to the rth event in a series of events
happening in accordance with a Poisson probability law at the rate of f-l
events per unit of time (or space) obeys a gamma probability law with
parameter rand f-l; consequently, it has probability density function
Ill}
= e-1 {1 + 1! + 2! + ... + (r
- I)! .
show that this cannot be true, since the function G(x) satisfies the inequality
IG(x) I < 2M for all x, in which M is the constant given in (4.11). To prove
this, note that G(1) = O. Since G(x) satisfies the functional equation in
(4.10) it follows that, for any integer n, G(n) = 0 and G(n x) = G(x) for +
o < x <1. Thus G(x) is a function that is periodic, with period 1. By
(4.11), G(x) satisfies the inequality IG(x) I < 2M for 0 < x <1. Being
periodic with period 1, it therefore satisfies this inequality for all x. The
proof of the theorem is now complete.
For references to the history of the foregoing theorem, and a generaliza-
tion, the reader may consult G. S. Young, "The Linear Functional
Equation," American Mathematical Monthly, Vol. 65 (1958), pp. 37-38.
EXERCISES
4.1. Consider a radar set of a type whose failure law is exponential. If radar
sets of this type have a failure rate A = 1 set/WOO hours, find a length T of
time such that the probability is 0.99 that a set will operate satisfactorily
for a time greater than T.
4.2. The lifetime in hours of a radio tube of a certain type obeys an exponential
law with parameter (i) A = 1000, (ii) A = 1/1000. A company producing
these tubes wishes to guarantee them a certain lifetime. For how many
hours should the tube be guaranteed to function, to achieve a probability
of 0.95 that it will function at least the number of hours guaranteed?
264 NORMAL, POISSON, AND RELATED PROBABILITY LAWS CR. 6
4.3. Describe the probability law of the following random phenomenon: the
number N of times a fair die is tossed until an even number appears (i) for
the first time, (ii) for the second time, (iii) for the third time.
4.4. A fair coin is tossed until heads appears for the first time. What is the
probability that 3 tails will appear in the series of tosses?
4.5. The customers of a certain newsboy arrive in accordance with a Poisson
probability law at a rate of 1 customer per minute. What is the probability
that 5 or more minutes have elapsed since (i) his last customer arrived, (ii)
his next to last customer arrived?
4.6. Suppose that a certain digital computer, which operates 24 hours a day,
suffers breakdowns at the rate of 0.25 per hour. We observe that the
computer has performed satisfactorily for 2 hours. What is the probability
that the machine will not fail within the next 2 hours?
4.7. Assume that the probability of failure of a ball bearing at any revolution
is constant and equal to p. What is the probability that the ball bearing
will fail on or before the nth revolution? Ifp = 10-4 , how many revolutions
will be reached before 10 % of such ball bearings fail? More precisely, find
K so that P[X > K] .:; 0.1, where X is the number of revolutions to failure.
4.8. A lepidopterist wishes to estimate the frequency with which an unusual
form of a certain species of butterfly occurs in a particular district. He
catches individual specimens of the species until he has obtained exactly
5 butterflies of the form desired. Suppose that the total number of butter-
flies caught is equal to 25. Find the probability that 25 butterflies would
have to be caught in order to obtain 5 of a desired form, if the relative
frequency p of occurrence of butterflies of the desired form is given by
(i) P = i, (ii) p = i.
4.9. Consider a shop at which cusJ:omers arrive at random at a rate of 30 per
hour. What fraction of the time intervals between successive arrivals will
be (i) longer than 2 minutes, (ii) shorter than 4 minutes, (iii) between 1 and
3 minutes.
In this section we indicate briefly how one may derive the Poisson
probability law, and various related probability laws, by means of differen-
tial equations. The process to be examined is treated in the literature of
stochastic processes under the name "birth and death."
Consider a population, such as the molecules present in a certain sub-
volume of gas, the particles emitted by a radioactive source, biological
organisms of a certain kind present in a certain environment, persons
waiting in a line (queue) for service, and so on. Let X t be the size of the
population at a given time t. The probability law of X t is specified by its
probability mass function,
(5.1) pen; t) = P[Xt = n] n = 0,1,2,' ...
SEC. 5 BIRTH AND DEATH PROCESSES 265
A differential equation for the probability mass function of X t may be
found under assumptions similar in spirit to, but somewhat more general
than, those made in deriving (3.1). In reading the following discussion
the reader should attempt to formulate explicitly for himself the assump-
tions that are being made. A rigorous treatment of this discussion is given
by W. Feller, An Introduction to Probability Theory and its Applications,
Wiley, 1957, pp. 397-41l.
Let ro(h), riCh), and r2(h) be functions defined for h > 0 with the property
that
lim roCh) = lim riCh) = lim r2 (h) = O.
h-+O h h->O h h-+O h
Assume that the probability is rlh) that in the time from t to t + h the
population size will change by two or more. For n > 1 the event that
X Hh = n Cn members in the population at time t + h) can then essentially
happen in anyone of three mutually exclusive ways: (i) the population
size at time t is n and undergoes no change in the time from t to t + h;
(ii) the population size at time t is n - 1 and increases by one in the time
from t to t + h; (iii) the population size at time tis n + 1 and decreases
by one in the time from t to t + h. For n = 0, the event that X Hh = 0
can happen only in ways (i) and (iii). Now let us introduce quantities
An and fln> defined as follows; Anh + rICh) for any time t and positive
value of h is the conditional probability that the population size will
increase by one in the time from t to t + h, given that the population had
size n at time t, whereas flnh + roCh) is the conditional probability that
the population size will decrease by one in the time from t to t + h, given
t9-at the population had size n at time t. In symbols, An and fln are such
that, for any time t and small h > 0,
Anh . P[XHh - Xt = 1 I X t = n], n>O
(5.2)
flnh --=- P[XHh - Xt = -11 X t = n], n 1;>
the approximation in (5.2) is such that the difference between the two
sides of each equation tends to 0 faster than h, as h tends to O. In writing
the next equations we omit terms that tend to 0 faster than h, as h tends
to 0, since these terms vanish in deriving the differential equations in
(5.10) and (5.11). The reader may wish to verify this statement for himself.
The event (i) then has probability,
(5.3) pCn; t)(1 - Anh - flnh);
the event (ii) has probability
(5.4) pen - 1; t)An_Ih;
the event (iii) has probability
(5.5) pen + 1; t)fln+1h.
266 NORMAL, POISSON, AND RELATED PROBABILITY LAWS CH.6
Consequently, one obtains for n > 1
(5.6) pen; t + h) = pen; t)(l - Anh - flnh)
+ pen - 1; t)An_Ih + pen + 1; t)fln+lh.
For n = °one obtains
(5.7) p(o; t + h) = p(O; t)(l - Aoh) + p(l; t)fllh.
It may be noted that if there is a maximum possible value N for the
population size then (5.6) holds only for 1 < n < N - 1, whereas for
n = None obtains
(5.8) peN; t) = peN; t)(1 - flNh) + peN - 1; t)J.N_1h.
(5.10)
a
otp(n; t) = -(An + fln)p(n; t)
+ An_IP(n - 1; t) + fln+IP(n + 1; t).
Similarly, for n = 0 one obtains
(5.11)
a
otP(O; t) = -AopeO; t) + flIP(1; t).
The question of the existence and uniqueness of solutions of these equations
is nontrivial and is not discussed here.
We solve these equations only in the case that
Ao = Al = A2 = ... = An = ... = A
(5.12)
fll = fl2 = fla = ... = fln = ... = 0,
(5.15)
a
atp(J; t) = -Ap(1; t) + },p(O; t),
which has solution (under the assumption p(l ; 0) = 0)
(5.16) p(1; t) = Ae-.1i Sot eNp(O; t') dt'
= Ate-oAt.
Proceeding inductively, one obtains (assuming p(n; 0) = 0)
(At)n
(5.17) pen; t) = - , e- At,
n.
so that the size X t of the population at time t obeys a Poisson probability
law with mean At.
THEORETICAL EXERCISES
5.1. The Yule process. Consider a population whose numbers can (by splitting
or otherwise) give birth to new members but cannot die. Assume that the
probability is approximately equal to J..h that in a short time interval of
length h a member will create a new member. More precisely, in the model
of section 5, assume that
fin = o.
If at time 0 the population size is k, show that the probability that the
population size at time t is equal to n is given by
(5.18) n ~ k.
Show that the probability law defined by (5.18) has mean m and variance
given by
0"2
(5.19)
CHAPTER 7
Random Variables
probability space, for if the sample description s is known then the value
of X is known.
(1.1) Xes) = 0 if s = (5,6), (6,5)
=1 if s = (1, 5), (1, 6), (2, 5), (2, 6), (3, 5), (3,6), (4, 5), (4,6)
(5, 1), (6, 1), (5, 2), (6, 2), (5,3), (6, 3), (5,4), (6,4)
=2 if s = (1, 2), (1, 3), (1,4), (2, 1), (2,3), (2,4), (3, I), (3,2)
(3,4), (4,1), (4, 2), (4, 3). ~
EXERCISE
1.1. Show that the following quantities are random variables by explaining
how they may be defined as functions on a probability space:
(i) The sum of 2 dice that are tossed independently.
(ij) The number of times a coin is tossed until a head appears for the first
time.
(iii) The second digit in the decimal expansion of a number chosen on the
unit interval in accordance with a uniform probability law.
(iv) The absolute value of a number chosen on the real line in accordance
with a normal probability law.
(v) The number of urns that contain balls bearing the same number, when
52 balis, numbered I to 52, are distributed, one to an urn, among 52 urns,
numbered 1 to 52.
(yi) The distance from the origin of a 2-tuple (Xl' x 2) in the plane chosen
in accordance with a known probability law, specified by the probability
density function I (x!> x 2 ).
Equation (2.2) represents the definition of P.-d B]; it is clear that it embodies
the intuitive meaning of Px[B] given above, since the function X will have
an observed value lying in the set B if and only if the observed value s of
the underlying random phenomenon is such that Xes) is in B .
.... Example 2A. The probability function of the number of white balls in
a sample. To illustrate the use of (2.2), let us compute the probability
function of the random variable X defined by (1.1). Assuming equally
likely descriptions on S, one determines for any set B of real numbers that
the value of P.-dB] depends on the intersection of B with the set {a, I, 2}:
_l-- i 6 _.!L ..L
15 15 15 15 15
(2.6)
I Px(X)
over all pOints x such .
= 1.
that.px(x) > 0
(2.7)
Px[B] = P[X is in B] = I px(X).
over all points x in B
snch th" t l' x(x) > 0
(2.8)
Fx(x) = I Px(X').
over all points x':::;x
such th"tpx(x') >0
= 0 otherwise.
Thus for a random variable X, which has a binomial distribution with
parameters n = 6 and P = t,
(2.12)
at all points x at which the derivative on the right-hand side of (2.12) exists.
~ Example 2D. A random variable X is said to be normally distributed
if it is continuous and if constants m and (j exist, where -00 < m < 00
and a > 0, such that the probability density functionf.y(·) is given by, for
any real number x, _ .-
(2.13)
I 1/(",-11')-
f.y(x) = ~ e - / 2 ---;,
aV27T
SEC. 2 DESCRIBING A RANDOM VARIABLE 275
Then for any real numbers a and b
EXERCISES
In exercises 2.1 to 2.8 describe the probability law of the random variable
given.
2.1. The number of aces in a hand of 13 cards drawn without replacement
from a bridge deck.
2.2. The sum of numbers on 2 balls drawn with replacement (without
replacement) from an urn containing 6 balls, numbered 1 to 6.
2.3. The maximum of the numbers on 2 balls drawn with replacement (without
replacement) from an urn containing 6 balls, numbered 1 to 6.
2.4. The number of white balls drawn in a sample of size 2 drawn with replace-
ment (without replacement) from an urn containing 6 balls, of which 4
are white.
2.5. The second digit in the decimal expansion of a number chosen on the unit
interval in accordance with a uniform probability law.
2.6. The number of times a fair coin is tossed until heads appears (i) for the
first time, (ii) for the second time, (iii) the third time.
2.7. The number of cards drawn without replacement from a deck of 52 cards
until (i) a spade appears, (ii) an ace appears.
276 RANDOM VARIABLES CH.7
2.8. The number of balls in the first urn if 10 distinguishable balls are distri-
buted in 4 urns in such a manner that each ball is equally likely to be
placed in any urn.
In exercises 2.9 to 2.16 find P[l :::; X:::; 2] for the random variable X
described.
2.9. X is normally distributed with parameters m = 1 and a = 1.
2.10. X is Poisson distributed with parameter A = l.
2.11. X obeys a binomial probability law with parameters n = 10 and p = 0.1.
2.12. X obeys an exponential probability law with parameter A = 1.
2.13. X obeys a geometric probability law with parameter p = i.
2.14. X obeys a hypergeometric probability law with parameters N = 100,
P = 0.1, n = 10.
2.15. X is uniformly distributed over the interval t to !.
2.16. X is Cauchy distributed with parameters CI. = 1 and fJ = 1.
In the next two sections we discuss an example that illustrates the need
to introduce various concepts concerning random variables, which will, in
turn, be presented in the course of the discussion. We begin in this section
by discussing the example in terms of the notion of a numerical valued
random phenomenon in order to show the similarities and differences
between this notion and that of a random variable.
Let us consider a commuter who is in the habit of taking a train to the
city; the time of departure from the station is given in the railroad time-
table as 7:55 A.M. However, the commuter notices that the actual time of
departure is a random phenomenon, varying between 7:55 and 8 A.M. Let
us assume that the probability law of the random phenomenon is specified
by a probability density function hO; further, let us assume
in which Xl represents the number of minutes after 7 :55 A.M. that the train
departs.
Let us suppose next that the time it takes the commuter to travel from
his home to the station is a numerical valued random phenomenon,
varying between 25 and 30 minutes. Then, ifthe commuter leaves his home
SEC. 3 EXAMPLE-n-TUPLE VALUED RANDOM PHENOMENA 277
at 7 :30 A.M. every day, his time of arrival at the station is a random
phenomenon, varying between 7:55 and 8 A.M. Let us suppose that the
probability law of this random phenomenon is specified by a probability
density function f2('); further, let us assume that hO is of the same
functional form as fl('), so that
in which X z represents the number of minutes after 7:55 A.M. that the
commuter arrives at the station.
The question now naturally arises: will the commuter catch the 7 :55 A.M.
train? Of course, this question cannot be answered by us; but perhaps we
can answer the question: what is the probability that fhe commuter will
catch the 7 :55 A.M. train?
Before any attempt can be made to answer this question, we must
express mathematically as a set on a sample description space the random
event described verbally as the event that the commuter catches the train.
Further, to compute the probability of the event, a probability function on
the sample description space must be defined.
As our sample description space S, we take the space of 2-tuples (Xl' x z)
of real numbers, where Xl represents the time (in minutes after 7:55 A.M.)
at which the train departs from the station, and x 2 denotes the time (in
minutes after 7 :55 A.M.) at which the commuter arrives at the station. The
event A that the man catches the train is then given as a set of sample
descriptions by A = {(Xl' X 2): Xl> x 2 }, since to catch the train his
arrival time X 2 must be less than the train's departure time Xl' The event A
is diagrammed in Fig. 3A.
We define next a probability function P[·] on the events in S. To do this,
we use the considerations of section 7, Chapter 4, concerning numerical
2-tuple valued random phenomena. In particular, let us suppose that the
probability function P[·] is specified by a 2-dimensional probability density
function f(. ,.). From a knowledge of f(. ,.) we may compute the
probability P[A] that the commuter will catch his train by the formula
(3.3)
P[Al = JJf(x l, x 2) dXl dX 2
A
278 RANDOM VARIABLES CH. 7
in which the second and third equations follow by the usual rules of
calculus for evaluating double integrals (or integrals over the plane) by
means of iterated (or repeated) single integrals.
We next determine whether the function f(. , .) is specified by our having
specified the probability density functions/rO andhO by (3.1) and (3.2).
Fig. 3A. The event A that the man catches the train represented
as a set of points in the (x" x 2)-plane.
(3.5)
F 2(X2) = f~ dx2 'L: dxl'f(X1', X2')'
We next use the fact that
(3.6)
(3.9)
280 RANDOM VARIABLES CR. 7
It may be verified, in view of (3.8), that f(. , .) is a probability density
function satisfying (3.4).
We now return to the question of how to determine f(. ,.). There is one
(and, in general, only one) circumstance in which the individual probability
density functions h (-) andJk) determine the joint probability density function
f(. , .), namely, when the respective random phenomena, whose probability
density functions are he-) and f2('), are independent.
We define two random phenomena as independent, letting P I [·] and P 2[']
denote their respective probability functions and P[·] their joint probability
function, if it holds that for all real nllmbers al> bl , a2 , and b2
(3.11)
(3.12)
(3.13)
(3.14)
(3.15)
Since hO and ,M·) are specified by (5.1) and (5.2), respectively, the
probability P[A] that the commuter will catch his train can now be com-
puted by evaluating the integrals in (3.15). However, in the present
example there is a very special feature present that makes it possible to
evaluate P[A] without any laborious calculation.
The reader may have noticed that the probability density functions flO
and f2(·) have the same functional form. ]fwe define fO by f(x) = ·l'5(5-x)
or 0, depending on whether 0 < x <5 or otherwise, we find that
ftCx) = j;(x) = f(x) for all real numbers x. In terms ofj'(-), we may write
(3.15), making the change of variable Xl' = X 2 and x 2 ' = Xl in the second
integral,
(3.16)
282 RANDOM VARIABLES CR. 7
By adding the two integrals in (3.16), it follows that
We conclude that the probability P[A] that the man will catch his train is
equal to t.
EXERCISES
3.1. Consider the example in the text. Let the probability law of the train's
departure time be given by (3.1). However, assume that the man's arrival
time at the railroad station is uniformly distributed over the interval
7 :55 to 8 A.M. Assume that the man's arrival time is independent of the
train's departure time. Find the probability of the event that the man
will catch the train.
3.2. Consider the example in the text. Assume that the train's departure time
and the man's arrival time are independent random phenomena, each
uniformly distributed over the interval 7 :55 to 8 A.M. Find the probability
of the event that the man will catch the train.
JJ
{(x"x 2 ):x, -X 2 :s;y)
f(x l , xJ dXI dX2
(4.4)
(4.5)
d
fy(y) = -d Fy(y) =
fa) dxd(XI' Xl - y).
Y -a)
=
=
r-
0
y
dX(~) 2(5 -
ify > 5.
x)(5 - (x + y» if 0 <y< 5
Therefore,
(4.8) f ( )= 41yl3 - 300lyl + 1000 if Iyl < 5
y Y 6(5)4
= 0 otherwise.
Consequently pry> 0] = 1
o
00 1
fy(y) dy = -.
2
EXERCISES
4.1. Consider the random variable Y defined in the text. Find the probability
density function of Y under the assumptions made in exercise 3.l.
4.2. Consider the random variable Y defined in the text. Find its probability
density function under the assumptions made in exercise 3.2.
(5.2)
(5.3)
as the set consisting of all 2-tuples (Xl" x 2') whose first component Xl' is
less than the specified real number Xl and whose second component x 2 ' is
less than the specified real number x2 , To specify the joint probability
function of Xl and X 2 , it suffices to specify the joint distribution function
Fx1.x,(. , .) of the random variables Xl and X 2 , defined for all real numbers
Xl and x 2 by the equation
(5.6) F x(xI ) = P(XI < Xl] = P[XI < Xl' X 2 < OJ]
= lim Fx l' x 2 (xl> x 2 ) = Fx x (xl> OJ).
I, 2
X2,-+CO
In terms of the probability mass distributed over the plane by the joint
distribution function Fxt.xl, .), the quantity FXJx1) is equal to the
amount of mass in the half-plane that consists of all 2-tuples (xl" x 2') that
are to the left of, or on, the line with equation xl' = Xl'
The function Fx,O is called the marginal distribution function of the
random variable Xl corresponding to the joint distribution function
Fxt,x'(' ,.). Similarly, Fx.O is called the marginal distribution function
of X 2 corresponding to the joint distribution function Fx"x.c ' .).
We next define the joint probability mass function of two random
variables Xl and X 2 , denoted by Px,.x.c ' .), as a function of 2 variables,
with value, for any real numbers Xl and x 2 •
(5.11)
Next, for any real numbers aI' bl , a2 , b2 , such that al < bl , a2 < b2 , one
may verify that
(5.12) P[al < Xl < bl ,a2 < X2 < b2] = fbldxl' fb·dxa'fxl,X.(Xl',X2').
Ja 1 Ja,
The joint probability density function may be obtained from the joint
distribution function by routine differentiation, since
(5.13)
at all 2-tuples (xl' xJ, where the partial derivatives on the right-hand side
of (5.13) are well defined.
If the random variables Xl and X 2 are jointly continuous, then they are
individually continuous, with individual probability density functions for
SEC. 5 JOINTLY DISTRIBUTED RANDOM VARIABLES 289
any real numbers Xl and X2 given by
(5.14)
!X,(X2) = Looeofr"xJX 1, x 2) dxl •
-,------".--- F .
on . (X X ..• X)
OX OX .•. OX X I "Y2'" "Xn l' 2, 'n'
1 2 n
1JXl'X2'···. xn{x1.x:!.··· I x l1 »o
290 RANDOM VARIABLES CR. 7
A continuous joint probability law is specified by its probability density
function: for any Borel set B of n-tuplcs
(5.20) Px"x 2 , ••• ,xJB]
The individual (or marginal) probability law of each of the random variables
Xl, X 2 , ••• , Xn may be obtained from the joint probability law. In the
continuous case, for any k = 1,2, ... ,n and any fixed number x"o,
~I Px,x,(x" xJ
x2
0
101
------
3 3
55
.l;,;),
5 5
PX 2 (X 2)
5
3
x •
O I 3
5'"4
0
_ _ _ _ 1 _ _-
2
I 1
~.'l
5 4
I
Px,lxJ
--- 1 - - - - - -- - - - - - ,
3 2 2 2 2
I 55 55 5-
--- I
X
.Q. ~ ~
5 5 5
I
SEC. 5 JOINTLY DISTRIBUTED RANDOM VARIABLES 291
~ Example SB. Jointly continuous random variables. Suppose that at
two points in a room (or on a city street or in the ocean) one measures the
intensity of sound caused by general background noise. Let Xl and X 2
be random variables representing the intensity of sound at the two points.
Suppose that the joint probability law of the sound intensities, Xl and X 2,
is continuous, with the joint probability density function given by
The probability that the sum of the sound intensities is less than IS
given by
P[X1 + X 2 < I] = Jf
{(Xl'X 2): x, +X2s:
fx,.x)X I , x 2) dX I dX 2
I}
= 0 otherwise.
Define Y as the maximum intensity; in symbols, Y = maximum
(Xl' X 2 , X 3 , X 4 , Xs)· For any positive number y the probability that Y is
less than or equal to y is given by
P[Y< y] = P[XI < y, X 2 < y,"', Xs < y]
THEORETICAL EXERCISE
5.1. Multivariate distributions with given marginal distributions. Let [10 and
[20 be two probability density functions. An infinity of joint probability
densities(. , .) exist, of which(lO and [20 are the marginal probability
density functions [that is, such that (3.4) holds]. One method of con-
structingf(. , .) is given by (3.9); verify this assertion. Show that another
method of constructing a joint probability density function [(. , .), with
given marginal probability density functions /10 and [2(-), is by defining
for a given constant a, such that lal ::; I,
(5.22) [(Xl' :);2) =[l:r l ).fl:v2){1 + a[2Fl (x l ) - I][2F2(x 2) - In
in which FlO and F20 are the distribution functions corresponding to
[10 and /20, respectively. Show that the distribution function F(. , .)
corresponding to [(. , .) is given by
(5.23) F(x l , x 2) = F l (:lJ l )F2(:lJ 2){ [ + all - Fl(Xl)][l - F 2(X2)1}
Equations (5.22) and (5.23) are due to E. J. Gumbel, "Distributions a
plusieurs variables dont les marges sont donnees," C. R. A cad. Sci. Paris,
Vol. 246 (1958), pp. 2717-2720.
EXERCISES
I Px"x.(JJ l ,X2)
~ 0 I 2 PX 2 (X 2)
0 h 2h 3h 6h
I 2h 4h 6h I2h
2 3h 6h 9h I8h
3 4h 8h 12h 24h
5.7. Show that the individual probability mass functions of Xl and X 2 may
be obtained by summing the respective columns and rows as indicated.
Are Xl and X 2 (i) jointly discrete, (ii) individually discrete?
5.S. Find (i) P[XI :::;; 1, X z :::;; 1], (ii) P[XI + X z :::;; 1], (iii) P[XI + X z > 2].
294 RANDOM VARIABLES CR. 7
TABLE 5B
PX X 2(X 1 , x 2)
"
'~
x 2
0 1 2 PX 2(X 2)
0 h 4h 9h 14h
1 2h 6h 12h 20h
2 3h 81z 3h 14h
3 41z 2h 6h 12h
5.9. Find (i) P[XI < 2X2 ], (ii) P[XI > 1], (iii) P[X1 = X 2 ].
5.10. Find (i) P[XI ;:::: X 2 1 X 2 > I], (ii) P[X1 2 + X 22 ::; 1].
(6.7)
(iii) if the random variables are jointly continuous, then for any real
numbers Xl' X2, .•• , Xn
(6.8) !Xl'Xa••..• X",(X I , X 2, ••• ,XJ = !X (XJ/'Ya(X2) •• -JX..(X n );
1
(iv) if the random variables are jointly discrete, then for any real numbers
Xl' X 2 ,· •• , Xn
SEC. 6 INDEPENDENT RANDOM VARIABLES 297
THEORETICAL EXERCISES
EXERCISES
(7.4) P[B] =
4 4
k~O P[B I A,,]P[Ak] = 1~1"4 k pkq4-k,
k(4)
where we have letp = 0.1587, q = 0.8413. Then
(7.5) P[B] = p i(
k~l
3
k - 1
)pk-lq3- Ck-l) = p,
The event B that the distance between the two points selected is less than
la is then the event [X2 - Xl < }a]. The probability of this event is the
probability attached to the cross-hatched area in Fig. 7A. However, this
probability can be represented as the ratio of the area of the cross-hatched
triangle and the area of the shaded rectangle; thus
1 ] _ Ij2[(1j3)a]2 _ 2
[
(7.7) P X2 - Xl < 3: a - [(1j2)a]2 -"9 .
r_
Fig. 7C. Fig.7D.
The length X of the chord may be expressed in· terms of the random
variables Y1 , Y2 , Zl' and Z2:
(7.9) X=2Vr2- y 22 or X=2rcosZ 2•
Consequently P[X < r] = P[Y2 > rtv3], or P[X < r] = P[cos Z2 < t] =
P[Z2> (7T/3)]. In both cases the required probability is equal to the ratio
of the areas of the cross-hatched regions in Figs. 7C and 7D to the areas
of the corresponding shaded regions. The first solution yields the answer
for the probability that the length of the chord chosen will be less than the
radius of the circle.
It should be noted that random experiments could be performed in such
a way that either (7.10) or (7.11) would be the correct probability in the
sense of the frequency definition of probability. If a disk of diameter 2r
were cut out of cardboard and thrown at random on a table ruled with
parallel lines a distance 2r apart, then one and only one of these lines
would cross the disk. All distances from the center would be equally
likely, and (7.10) would represent the probability that the chord drawn by
the line across the disk would have a length less than r. On the other hand,
if the disk were held by a pivot through a point on its edge, which point
lay upon a certain straight line, and spun randomly about this point, then
(7.11) would represent the probability that the chord drawn by the line
across the disk would have a length less than r. ~
(7.12) (1 - (n - l)~r.
Solution: for j = 1,2, ... ,n let Xj denote the position of the jth
person. We assume that Xl' X 2 , ••• ,Xn are independent random
variables, each uniformly distributed over the interval 0 to L. Their joint
probability density function is then given by
1
(7.13) !Xl'X2•...• xn(x I , X 2, •.• , xn) = L'" 0 < Xl' X 2, ••• , Xn < L
= 0 otherwise.
Next, for each permutation, or ordered n-tuple chosen without replace-
ment, (iI' i 2, ... , in) of the integers 1 to n, define
(7.14) l(il' i2 , ••. ,in) = {(Xl' X 2 , ••• ,X n ): Xil < X i2 < ... < xJ.
Thus l(il' i 2 , ••• , in) is a zone of points in n-dimensional Euclidean space.
There are n! such zones that are mutually exclusive. The union of all
zones does not include all the points in n-dimensional space, since an
n-tuple (xl' X 2 , ••• , xJ that contains two equal components does not lie
in any zone. However, we are able to ignore the set of points not included
SEC. 7 RANDOM SAMPLES 305
in any of the zones, sincc this set has probability zero under a continuous
probability law. Now the event B that no two persons are less than a
distance d apart may be represented as the set of n-tuples (xl' X2' ... , xn)
for which the distance IXi - x;1 between any two components is greater
than d. To find the probability of B, we must first find the probability of
the intersection of B and each zone lUI' i 2 , ••• ,in). We may represent
this intersection as follows:
(7.15) BlUI' i 2,· .• ,in) = {(Xl' X2,· .. , Xn): 0 < XiI < L - (n - l)d,
Xi, + d < Xi2 < L - (n - 2)d,
X;
"2
+d< Xi
'3
< L - (n - 3)d, •.. , X;
"n-l
+ d < X; < L}.
"n
Consequently,
(7.16) PXI . X 2 . " ' , xJBl(il' i2 , ••• , in)]
L-(n-l)d L-(n-2)d L
o
J dXiI J
XiI +d
dXi2 • .•
Xin_I +d
J dXin
o
= f
"1 +d'
dUl f
",,_2 +d'
dU 2 • ••
" .. _1 +d'
f dUn_I f duno
(7.17)
1
x (l - (n - k)d' - uJn-k.
(n - k)!
The probability of B is equal to the product of n! and the probability of
the intersection of B and any zone lUI> i 2 , ••• ,in). The proof of (7.12) is
now complete. ....
In a similar manner one may solve the following problem.
~ Example 7G. Packing cylinders randomly on a rod. Consider a hori-
zontal rod of length L on which n equal cylinders, each of length c, are
distributed at random. The probability that no two cylinders will be less
than d apart is equal to, for d such that L > nc + (n - I)d,
(7.18) ( 1 _ (n - l)d))".
L- nc
306 RANDOM VARIABLES CH. 7
The foregoing considerations, together with (6.2) of Chapter 2, establish
an extremely useful result.
The Random Division of an Interml or a Circle. Suppose that a straight
line of length L is divided into II subinter.¥als by (n - 1) points chosen at
random on the line or that a circle of circumference L is divided into n
subintervals by n points chosen at random on the circle. Then the prob-
ability P k that exactly k of the subintervals will exceed d in length is
given by
(7.19) PTc = (kn) [Lid]
~ (_l)J-k
. (n - k). (1 - jd)
-
n-l
.
j~k n- ) L
It may clarify the meaning of (7.19) to express it in terms of random
variables. Let Xl> X 2 , . . • , X n - l be the coordinates of the n - 1 points
chosen randomly on the line (a similar discussion may be given for the
circle.) Then Xl' X 2 , . • • , X n - l are independent random variables, each
uniformly distributed on the interval 0 to L. Define new random variables
Y1 , Y2 , · •• ,Yn - l : YI is equal to the minimum of Xl' X 2 , .•• , X n - 1 ;
Y2 is equal to the second smallest number among Xl' X 2 , ••• , X n _ 1 ; and,
so on, up to Yn - l , which is equal to the maximum of Xl' X 2 , . . . , X n - l .
The random variables YI , Y2 , •.• , Yn - 1 thus constitute a reordering of
the random variables Xl' X 2 , ••• , X n - l , according to increasing magnitude.
For this reason, the random variables Yl , YI , . . . , Yn- l are called the
order statistics corresponding to Xl' X 2 , • • . ,Xn - l • The random variable
Yk , for k = 1,2, ... , n - 1, is usually spoken of as the kth smallest value
among Xl' X 2 , ••. , X n - l ·
The lengths Wl' W 2 , . • • , Wn of the n successive subintervals into which
the (n - I) randomly chosen points divide the line may now be expressed:
W2 = Y2 - YI , ' •• , Wj = Y; - Yj - l , •• "
Wn = L - Yn- l ·
The probability Pk is the probability that exactly k of the n events
[Wl > d], [W2 > d], ... , [Wn > d] will occur. To prove (7.19), one needs
only to verify that for any integer j the probability that j specified
subintervals will exceed d in length is equal to
(7.21) (1 - jd)"-l
-
L
if 0 <j < Lid.
THEORETICAL EXERCISES
7.1. Buffon's Needle Problem. A smooth table is ruled with equidistant parallel
lines at distance D apart. A needle of length L < D is dropped on the
table. Show that the probability U-.at it will cross one of the lines is
(2L)/(7TD). For an account of some experiments made in connection
with the Buifon Needle Problem see J. V. Uspensky, Introduction to
Mathematical Probability, McGraw-Hill, New York, 1937, pp. 112-113.
7.2. A straight line of unit length is divided into n subintervals by n - 1 points
chosen at random. For r = 1,2, ... ,n - 1, show that the probability
that none of r specified subintervals will be less than d in length is equal to
(7.22) (1 - rd)n-l
(7.23) n(1-d)n-l_(;)(1-2d)n-l
EXERCISES
7.1. A young man and a young lady plan to meet between 5 and 6 P.M., each
agreeing not to wait more than 10 minutes for the other. Find the pro-
bability that they will meet if they arrive independently at random times
between 5 and 6 P.M.
7.2. Consider light bulbs produced by a machine for which it is known that the
life X in hours of a light bulb produced by the machine is a random variable
with probability density function
1
jx(x) = - - e-(x/1000) for x > 0
1000
= 0 otherwise.
Consider a box containing 100 such bulbs, selected randomly from the
output of the machine.
(i) What is the probability that a bulb selected randomly from the box will
have a lifetime greater than 1020 hours?
308 RANDOM VARIABLES CH. 7
(ii) What is the probability that a sample of 5 bulbs selected randomly
from the box will contain (a) at least 1 bulb, (b) 4 or more bulbs with a
lifetime greater than 1020 hours?
(iii) Find approximately the probability that the box will contain between
30 and 40 bulbs, inclusive, with a lifetime greater than 1020 hours.
7.3. Six soldiers take up random positions on a road 2 miles long. What is
the probability that the distance between any two soldiers will be more
than (i) }, (ii) -}, (iii) t of a mile?
7.4. Another version of Bertrand's paradox. Let a chord be drawn at random in
a given circle. What is the probability that the length of the chord will be
greater than the side of the equilateral triangle inscribed in that circle?
7.5. A point is chosen randomly on each of 2 adjacent sides of a square. Find
the probability that the area of the triangle formed by the sides of the
square and the line joining the 2 points will be (i) less than i- of the area of
the square, (ii) greater than t of the area of the square.
7.6. Three points are chosen randomly on the circumference of a circle. What
is the probability that there will be a semicircle in which all will lie?
7.7. A line is divided into 3 subintervals by choosing 2 points randomly on the
line. Find the probability that the 3-line segments thus formed could be
made to form the sides of a triangle.
7.8. Find the probability that the roots of the equation x 2 + 2Xl x + X 2 = 0
will be real if (i) Xl and X 2 are randomly chosen between 0 and 1, (ii) Xl
is randomly chosen between 0 and 1, and X 2 is randomly chosen between
-1 and 1.
7.9. In the interval 0 to 1, n points are chosen randomly. Find (i) the proba-
bility that the point lying farthest to the right will be to the right of the
number 0.6, (ii) the probability that the point lying farthest to the left
will be to the left of the number 0.6, (iii) the probability that the point
lying next farthest to the left will be to the right of the number 0.6.
7.10. A straight line of unit length is divided into 10 subintervals by 9 points
chosen at random. For any (i) number d > i, (ii) number d > -} find the
probability that none of the subintervals will exceed d in length.
(8.4) faX+b(y)
] (Y-b)
= -;fx -a- .
(8.5) PaX+b(Y)
y -
= Px ( -a- .
b)
Next, let us consider g(x) = x 2 . Then Y = X2. For y < 0, {x: x2 < y}
is the empty set of real numbers. Consequently,
For y > 0
- - 1
(8.8) fc (Y) = [/\"(vy)
2 + /y(-v'y)] 2Vy fory>O
= ° fory < O.
310 RANDOM VARIABLES CH. 7
It may help the reader to recall the so-called chain rule for differentiation
of a function of a function, required to obtain (8.8), if we point out that
(8.12)
=0 otherwise.
~ Example SB. The positive part of a random variable. Given any real
number x, we define the symbols x+ and x- as follows:
(8.13) x+ =x if x >0, x- = 0 if x >0
=0 if x < O. = -x if x < o.
Then x = x+ - x- and Ixl = x+ + x-. Given a random variable X, let
Y = X+. We call Y the positive part of X. The distribution function
of the positive part of X is given by
=0 otherwise,
in which a and (3 are defined by (8.16).
To illustrate the use of (8.18), let us note the formula: if Xis a continuous
random variable, then
7T
(8.19) for Iyl <-
2
= 0 otherwise.
To prove (8.18), we distinguish two cases; the case in which the function
y = g(x)is monotone increasing and that in which it is monotone
decreasing. In the first case the distribution function of Y for a < y < (3
may be written
(8.20) Fy(y) = P[g(X) < y] = P[X < g-1(y)] = FX[g-l(y)].
In the second case, for oc < y < (3,
(8.20') Fy(y) = P[g(X) < y] = P[X> g-l(y)] = 1 - F X [g-l(y)].
SEC. 8 PROBABILITY LAW OF A FUNCTION OF A RANDOM VARIABLE 313
If (8.20) is differentiated with respect to y, (8.18) is obtained. We leave it
to the reader to consider the case in which y < IX or y > {3.
One may extend (8.18) to the case in which the derivative g'(x) is
continuous and vanishes at only a finite number of points. We leave the
proof of the following assertion to the reader.
Let y = g(x) be differentiable for all x and assume that the derivative
g'(x) is continuous and nonzero at all but a finite number of values of x.
Then, to every real number y, (i) there is a positive integer m(y) and points
x1(Y), x 2(y), ... , x,nCy) such that, for k = 1,2, ... , m(y),
(8.21) g[xiy)] = y,
or (ii) there is no value of x such that g(x) = y and g'(x) #- 0; in this case
we write m(y) = O. If X is a continuous random variable, then Y = g(X)
is a continuous random variable with a probability density function given
by
m(y)
(8.22) fy(y) = L fx[xiy)]lg'[xT/y)]1- 1
k=l
if m(y) > 0
= 0 if m(y) = o.
We obtain as an immediate consequence of (8.22): if X is a continuous
random variable, then
(8.23) /rx,(Y) = fx(Y) + fx( -V) for y > 0
=0 fory<O;
(8.24) fv,.¥i(y) = 2y(fX(y2) + fX(_y2)) for y > 0
= 0 for y < o.
Equations (8.23) and (8.24) may also be obtained directly, by using the
same technique with which (8.8) was derived.
The Probability Integral Transformation. It is a somewhat surprising
fact, of great usefulness both in theory and in practice, that to obtain a
random sample of a random variable X it suffices to obtain a random
sample of a random variable U, which is uniformly distributed over the
interval 0 to 1. This follows from the fact that the distribution function
Fx.(-) of the random variable X is a non decreasing function. Consequently,
an inverse function Fx. -1(.) may be defined for values of y between 0 and 1:
FX-l(y) is equal to the smallest value of x satisfying the condition that
Fx(x) > y.
(8.26)
are a random sample of the random variable V = F x(X), which is
uniformly distributed on the interval 0 to 1.
The transformation of a random variable X into a uniformly distributed
random variable V = Fx(X) is called the probability integral transformation.
It plays an important role in the modern theory of goodness-of-fit tests for
distribution functions; see T. W. Anderson and D. Darling, "Asymptotic
theory of certain goodness of fit criteria based on stochastic processes,"
Annals of Mathematical Statistics, Vol. 23 (1952), pp. 195-212.
EXERCISES
8.1. Let X have a X2 distribution with parameters nand (1. Show that
y = ..; Xln has a X distribution with parameters n and cr.
8.2. The temperature T of a certain object, recorded in degrees Fahrenheit,
obeys a normal probability law with mean 98.6 and variance 2. The
temperature e measured in degrees centigrade is related to T by e =
~(T - 32). Describe the probability law of e.
°
(a) uniformly distributed over the interval -1 to 1, (b) normally distributed with
parameters m = and (J > 0, (c) Rayleigh distributed with parameter (J.
8.17. The waveform X(t) is passed through a squaring circuit; the output yet)
of the squaring circuit at time t is assumed to be given by yet) = X2(t).
Find and sketch the probability density function of yet) for any time
t > O.
316 RANDOM VARIABLES CH. 7
8.18. The waveform X(t) is passed through a rectifier, giving as its output
Y(t) = !X(t)!. Describe the probability law of yet) for any time t > O.
8.19. The waveform X(t) is passed through a half-wave rectifier, giving as its
output Y(t) = X+(t), the positive part of X(t). Describe the probability
law of Y(t) for any t > O.
8.20. The waveform XU) is passed through a clipper, giving as its output
yet) = g[X(t)], where g(~;) = 1 or 0, depending on whether x > 0 or
x < O. Find and sketch the probability mass function of yet) for any
t > O.
8.21. Prove that the function given in (8.12) is a probability density function.
Does the fact that the function is unbounded cause any difficulty?
(9.2) Fy(y) =
To begin with, let us obtain the probability law of the sum of two
jointly continuous random variables Xl and X 2 , with a joint probability
SEc.9 PROBABILITY LAW OF A FUNCTION OF RANDOM VARIABLES 317
density functionfl,.x,(" .). Let Y = Xl + X 2• Then
(9.3) Fy(y) = P[XI + X 2 < y] = P x " x,[{(xl , X 2 ): Xl + X 2 < y}]
ff
{(xl' x 2l: x, +x. :s;y}
fl,. X 2 (Xl , x 2) (hI dX2
If the random variables Xl and X 2 are independent, then for any real
number y
the function f30 is then said to be the convolution of the functions flO
and f2('), and in symbols we write hO = f10*f20·
In terms of the notion of convolution, we may express (9.5) as follows.
The probability density function fx, + X2 0 of the sum of two independent
continuous random variables is the convolution of the probability density
functionsfl,O andfl20 of the random variables.
One can prove similarly that if the random variables Xl and X 2 are
jointly discrete then the probability mass function of their sum, Xl + X 2 ,
for any real number Y is given by
(9.7) PX 1+x.(y) = L
over all X such that
PxI.x.(x, Y - x)
p X"X2(x,y -xl> 0
- L
over all x such that
Px,.x.(Y - x, X)
PXl,X2(Y-x xl >0
318 RANDOM VARIABLES CR. 7
In the same way that we proved (9.4) we may prove the formulas for
the probability density function of the difference, product, and quotient
of two jointly continuous random variables:
(9.12) Fy(Y) = 1 o
2"
de Icyr dr fx
0 1
,x (r cos
2
e, r sin e).
If Xl and X 2 are jointly continuous, then Y is continuous, with a
probability density function obtained by differentiating (9.12) with respect
to y. Consequently,
(9.13) fvx 1 2+X2 .(y) = y12"dOfx
0
x (y cos 0, Y sin 0)
l' 2
for y > °
= ° for y < 0,
2
(9.14) fx 1'+X 2 2(Y) = 110 "dO fx I' x 2 (vY cos 0, Vy sin 0) for y >0
= °
for y < 0,
where (9.14) follows from (9.13) and (8.8).
SEC. 9 PROBABILITY LAW OF A FUNCTION OF RANDOM VARIABLES 319
The formulas given in this section provide tools for the solution of a
great many problems of theoretical and applied probability theory, as
examples 9A to 9F indicate. In particular, the important problem of
finding the probability distribution of the sum of two independent random
variables can be treated by using (9.5) and (9.7). One may prove results
such as the following:
(9.15)
Solution: By (9.5),
320 RANDOM VARIABLES CR. 7
By (6.9) of Chapter 4, it follows that
f
XlY2mr*
1 Joo_oodxexp [(X-l1l*)2]1
-} J' 0'*
where
~ Example 9B. The assembly of parts. It is often the case that a dimension
of an assembled article is the sum of the dimensions of several parts.
An electrical resistance may be the sum of several electrical resistances.
The weight or thickness of the article may be the sum of the weights or
thicknesses of individual parts. The probability law of the individual
dimensions may be known; what is of interest is the probability law of
the dimension of the assembled article. An answer to this question may be
obtained from (9.5) and (9.7) if the individual dimensions are independent
random variables. For example, let us consider two JO-ohm resistors
assembled in series. Suppose that, in fact, the resistances of the resistors
are independent random variables, each obeying a normal probability
law with mean 10 ohms and standard deviation 0.5 ohms. The unit,
consisting of the two resistors assembled in series, has resistance equal to
the sum of the individual resistances; therefore, the resistance of the unit
obeys a normal probability law with mean 20 ohms and standard deviation
{(O.S? + (O.S)2}':l = 0.707 ohms. Now suppose one wishes to measure
the resistance of the unit, using an ohmmeter whose error of measurement
is a random variable obeying a normal probability law with mean 0 and
standard deviation 0.5 ohms. The measured resistance of the unit is a
random variable obeying a normal probability law with mean 20 ohms
and standard deviation V(0.707)2 + (0.5)2 = 0.866 ohms. ....
. 1 -~ (~r
Ix r (y cos e, y sm 6) =
1-" 2
--2
27TCY
e .
SEC. 9 PROBABILITY LAW OF A FUNCTION OF RANDOM VARIABLES 321
Consequently, for Y > 0
Y -~(~r
(9.17) I..; -1
-
x '+x 2(Y) =
2
-2
(J' e .
1 -~y
(9.18) 2+X.2(Y) = ~2
Ix1 2(J' e 20"'
In words, Y X 12 + X 22 has a Rayleigh distribution with parameter (J',
whereas X 12 + X 22 has a X2 distribution with parameters n = 2 and (J' .....
~ Example 9D. The probability distribution of the envelope of narrow-
band noise. A family of random variables X( t), defined for t > 0, is
said to represent a narrow-band noise voltage [see S. O. Rice, "Mathe-
matical Analysis of Random Noise," Bell System. Tech. Jour., Vol. 24
(1945), p. 81J if X(t) is represented in the form
(9.19) X(t) = Xit) cos wt + Xit) sin wt,
in which w is a known frequency, whereas Xit) and X.(t) are independent
normally distributed random variables with meansOand equal variances (J'2.
The envelope of X(t) is then defined as
(9.20) R(t) = [Xc 2(t) + Xs2(t)]~.
In view of example 9C, it is seen that the envelope R(t) has a Rayleigh
distribution with parameter Ci. = (J'. ....
~ Example 9E. Let U and V be independent random variables, such
that U is normally distributed with mean 0 and variance (J'2 and V has a X
distribution with parameters nand (J'. Show that the quotient T = U/V
has Student's distribution with parameter n.
Solution: By (9.10), the probability density function of T for any real
number is given by
= K(y2 + n)-<n+1)/22<n-l)/2 r (n ; 1) ,
322 RANDOM VARIABLES CH. 7
from which one may immediately deduce that the probability density
function of T is given by (4.15) of Chapter 4. ..
~ Example 9F. Distribution of the range. A ship is shelling a target on
an enemy shore line, firing n independent shots, all of which may be
assumed to fall on a straight line and to be distributed according to the
distribution function F(x) with probability density function f(x). Define
the range (or span) R of the attack as the interval between the location of
the extreme shells. Find the probability density function of R.
Solution: Let Xl' X 2 , ••• , Xn be independent random variables repre-
senting the coordinates locating the position of the n shots. The range R
may be written R = V - U, in which V = maximum (Xl> X 2 , ••• , Xn)
and U = minimum (Xl' X 2 , ••. ,X,,). The joint distribution function
Fu ;v(u, v) is found as follows. If u > v, then Fu ,v(u, v) is the probability
that simultaneously Xl < v, ... , Xn < v; consequently,
(9.21) Fu.v(u, v) = [F(vW ifu> v,
since P[Xk < v] = F(v) for k = 1,2, ... ,n. If it < v, then Fu ,v(u, v) is
the probability that simultaneously Xl < v, ... , X" < v but not simul-
taneously u < Xl < v, ... , u < Xn < v; consequently,
(9.22) Fu ,v(u, v) = [F(v)]n - [F(v) - F(u)]" if u < v.
The joint probability density of U and Vis then obtained by differentiation.
It is given by
(9.23) fu,v(u, v) = 0 if u v >
= n(n - I)[F(v) - F(u)]n-'1(u)f(v), if u < v.
From (9.8) and (9.23) it follows that the probability density function of the
range R of 11 independent continuous random variables, whose individual
distribution functions are all equal to F(x) and whose individual probability
density functions are all equal to f(x) , is given by
= 0 for x < 0
= n(n - 1) L
"'00
,}F(v) - F(v - x)]n-'1(v - x)f(v) dv,
for x > O.
The distribution function of R is then given by
(9.25) FR(x) =0 if x < 0
(9.29) Vg(y) =
[(0: 1 0: 2 ,
Jr· J
•••• O:n):U(O:l' ••.• O:n) sy}
dx 1dx2 ••• dx n •
= 0, for y < 0,
wherel"'yn-le-Y2y2 dy = 2(n-2)/2r(n/2). In words, Y has a X distribution
o _
with parameters nand (j = V n.
Solution: Define g(xl , ••. , xn) = V X 1 2 + ... + xn 2 and Iu(y) =
(27T)-n/2 e -Y2y2. Then (9.27) holds. Now Viy) is the volume within a
sphere in n-dimensional space of radius y. Clearly, V/y) = 0 for y < 0,
and for y > 0
Viy) = yn Jr· J
{(Xl" ··,Xn):X/+··· +X,,2:5 1)
dX l •.. dx no
r ( ) _ K -x/kT dVE(x)
J E x - le ~
SEC. 9 PROBABILITY LAW OF A FUNCTION OF RANDOM VARIABLES 325
for some constant K I , in which VE(x) is the volume within the ellipsoid in
3N-dimensional space consisting of all 3N-tuples of velocities whose
kinetic energy E < x. One may show that
dVE(x) = K 2f Nx(3N/2)-1
dx
for some constant K2 in the same way that Viy) is shown in example 9G
to be proportional to y". Consequently, for x > °
fE(X) = roo
Jo x(3N/2)-le- x /kT dx
for some constants a1 > 0, a2 > 0, -1 < p < 1, -00 < 1111 < 00,
-00 < 1112 < 00, in which the function Q(. , .) for any two real numbers
Xl and X 2 is defined by
THEORETICAL EXERCISES
Xo/J~ k=l
IX 7,2
9.10. Let Xl and X 2 have a joint probability density function given by equation
(9.31), with 1111 = 1112 = O. Show that
lJI lJ2 vi
1 _ p2
(9.33) I ,I;1 j'r2(y) = --c-=-"-'''------'--~
1T( lJ2 2 y2 - 2plJI lJ 2Y + lJ12) •
If Xl and X 2 are independent, then the quotient XII X 2 has a Cauchy
distribution.
9.11. Usc the proof of example 90 to prove that the volume Vn(r) of an n-
dimensional sphere of radius r is given by
EXERCISES
9.1. Suppose that the load on an airplane wing is a random variable X obeying
a normal probability law with mean 1000 and variance 14,400, whereas
the load Y that the wing can withstand is a random variable obeying a
normal probability law with mean 1260 and variance 2500. Assuming that
X and Yare independent, find the probability that X < Y (that the load
encountered by the wing is less than the load the wing can withstand).
In exercises 9.2 to 9.4 let Xl and X 2 be independently and uniformly distributed
over the intervals 0 to 1.
9.2. Find and sketch the probability density function of (i) Xl + X2, (ii)
Xl - X 2 , (iii) IXI - X21·
9.3. (i) Maximum (Xl' X 2), (ii) minimum (Xl' X 2).
328 RANDOM VARIABLES CR. 7
9.4. (i) Xl X 2 , (ii) XII X 2 .
In exercises 9.5 to 9.7 let Xl and X 2 be independent random variables, each
normally distributed with parameters m = 0 and (T > O.
9.5. Find and sketch the probability density function of (i) Xl + X 2, (ii)
Xl - X 2, (iii) IX1 - X21, (iv) (Xl + X0/2, (v) (Xl - X 2)/2.
9.6. (i) X 1 2 + X 22 , (ii) (X12 + X22)/2.
We consider only the case in which the functions gl(xI , X 2 , ••• , xn),
g2(X1, X2, ... ,xn ), g n(x I, X2, ... ,x,,) have continuous first partial derivatives
at all points (Xl' X 2 , ••• , xn) and are such that the Jacobian
at all points (Xl' X2, ... , xn). Let C be the set of points (YI' Y2' ... , Yn)
such that the n equations
possess at least one solution (Xl' X 2 , ••• ,xn ). The set of equations in
(10.3) then possesses exactly one solution, which we denote by
in which
Dn = {(Xl' X 2 , ••• , Xn): UI < gl(XI , X 2 , ••• , X,J < Ul + hI' ... ,
un < gn(Xl , X 2 , ••• ; XJ < Un + h n }·
Now, if (U I , u2, .•• , Un) does not belong to C, then for sufficiently small
values of hI' h2' ... , h n there are no points (Xl' X 2 , . . . , xn) in Dn and the
probability in (10.7) is O. From the fact that the quantities in (10.6),
whose limit is being taken, are 0 for sufficiently small values of h1> h2' ... ,l1 n,
it follows that l y,'y2 .... ' yJul , u2 , • . . , un) = 0 for (UI> u2 , ••• , un) not in C.
Thus (lOS) is proved. To prove (10.5), we use the celebrated formula for
change of variables in multiple integrals (see R. Courant, Differential and
Integral Calculus, Interscience, New York, 1937, Vol II, p. 253, or
T. Apostol, Mathematical AnalySiS, Addison-Wesley, Reading, Massa-
chusetts, 1957, p. 271) to transform the integral on the right-hand side of
(10.7) to the integral
In exactly the same way one may establish the following result:
where
cos 2 ex cos sin ex sin2 Cf.
+ --
C(
A = --~ - 2p
2
()12 ()1()2 G"2
From (10.15) one sees that two random variables Y1 and Y2 , obtained
by a rotation of axes from jointly normally distributed random variables,
Xl and X 2 , are jointly normally distributed. Further, if the angle of
rotation r:t. is chosen so that
(10.17)
THEORETICAL EXERCISES
EXERCISES
=0 otherwise.
Find the joint probability density function of (R, 8), in which R =
v X 1 2 + X Z 2 and e = tan- l X 2 / Xl' Show that, and explain why, R2 is
uniformly distributed but R is not.
10.3. Let X and Y be independent random variables, each uniformly distributed
over the interval 0 to 1. Find the individual and joint probability density
functions of the random variables Rand e, in which R = v X 2 + y2
and e = tan-l Y/ X.
10.4. Two voltages X(t) and yet) are independently and normally distributed
with parameters In = 0 and (J = 1. These are combined to give two new
voltages, U(t) = X(t) + yet) and Vet) = X(t) - Y(t). Find the joint
probability density function of U(t) and Vet). Are U(t) and Vet) indepen-
dent? Find P[U(t) > 0, Vet) < 0].
= undefined if P[B] = o.
Now suppose we are given an event A a.nd a random variable X, both
defined on the same probability space. We wish to define, for any real
number x, the conditional probability of the event A, given the event that
the observed value of X is equal to x, denoted in symbols by P[A I X = x).
Now if P[X = x) > 0, we may define thi~ conditional probability by
(1Ll). However, for any random variable X, P[X = x] = 0 for all
(except, at most, a countable number of) values of x. Consequently, the
conditional probability P[A I X = x] of the event A, given that X = x,
must be regarded as being undefined insofar as (11.1) is concerned.
The meaning that one intuitively assigns to P[A I X = x] is that it
represents the probability that A has occurred, knowing that X was
observed as equal to x. Therefore, it seems natural to define
(11.3)
, [x·2n]
Hn(x) = {x: - - < x
,
<
[x·2 n] + I} .
n 2 - 2n
Then we define the conditional probability of the event A, given that the
random variable X has an observed value equal to x, by
I:ooP[A I X = x] dFx(x)
l I
over all x such
that p xix) > 0
P[A I X = X]Px(X),
in which the last two equations hold if X is respectively continuous or
discrete. More generally, for every Borel set B of real numbers, the
probability of the intersection of the event A and the event {X is in B} that
the observed value of X is in B is given by
=t if 10 < x <50
70 - x
=--- if50<x<60
60
= undefined if x < 0 or x > 60.
Consequently, !,[A I X = 30] = t. so that the conditional probability that
y
y=P[Alx=x)
Undefined Undefined
I
o 60
the young man and the young lady will meet, given that the young man
arrives at 5 :30 P.M., is t. Further, by applying (11.5), we determine that
P[A] = ~~. ~
In (11.7) we performed certain manipulations that arise frequently when
one is dealing with conditional probabilities. We now justify these
manipulations.
Consider two jointly distributed random variables X and Y. Let g(x, y)
be a Borel function of two variables. Let z be a fixed real number. Let
A = [g(X, Y) < z] be the event that the random variable g(X, Y) has an
observed value less than or equal to z. Next, let x be a fixed real number,
and let A(x) = [g(x, Y) < z] be the event that the random variable g(x, Y),
338 RANDOM VARIABLES CH.7
Cl1.l7) fYIX(Y I x) =
a
-aY Fyly(Y I x).
"
We now prove the basic formula: iffx(x) > 0, then
(11.19)
a
ax Fx,Y(X, y) = Fylx(Y I x)fx(x).
X exp {-
2(1 - P )(JI
1 2 2 [x - m 1 _ P (J1 (y _
(J2
/112)J 2}
In words, the conditional probability law of the random variable Xl' given
X 2 , is the normal probability law with parameters In = m l + P((JI/(J2)
(X2 - 111 2) and (J = (JI V 1 - p2. To prove (11.21), one need only verify
that it is equal to the quotient !:y1,x.(x, y)!Jx.(y), Similarly, one may
establish the following result. .....
~ Example nc. Let X and Y be jointly distributed random variables.
Let
(11.22) R = VX 2+ y2, 0 = tan-l (Y/X).
(11.24) f Y (y) = L
rea) y,,-le-{lY, for y > 0,
(11.26)
«(3 + X)?!+l •
The reader interested in further study of the foregoing model, as well as
a number of other interesting topics, should consult J. Neyman, "The
Problem of Inductive Inference," Communications on Pure and Applied
Mathematics, Vol. 8 (1955), pp. 13-46. ~
SEC. 11 CONDITIONAL PROBABILITY OF AN EVENT 341
The foregoing notions may be extended to several random variables.
In particular, let us consider n random variables Xl' X 2 , •.• , Xn and a
random variable U, all of which are jointly distributed. By suitably
adapting the foregoing considerations, we may define a function
(11.28) FXl>""xn,u(x1,"', X n, u)
THEORETICAL EXERCISES
11.1. Let T be a random variable, and let t be a fixed number. Define the
random variable U by U = T - t and the event A by A = [T > t].
Evaluate PEA I U = ,1:] and P[U > x I AJ in terms of the distribution
function of T. Explain the difference in meaning between these concepts.
11.2. If X and Yare independent Poisson random variables, show that the
conditional distribution of X, given X + Y, is binomial.
11.3. Given jointly distributed random variables, Xl and X 2 , prove that, for any
x 2 and almost all :r l , FX2 Ix'(;t: z I Xl) = Fx 2(X 2) if and only if Xl and X 2 are
independent.
11.4. Prove that for any jointly distributed random variables Xl and X 2
J_ro.f.Y2Ix/x21.Gl) del'
EXERCISES
Expectation
of a Random Variable
L xpx(x),
over all x snch
thatpx(x)>0
On the other hand, given the Borel function g(') and the random
variable X, we can form the expectation of g(x) with respect to the
probability law of X, denoted by E x[g(x)] and defined by
2:
over all z such
g(x)px(x),
thatpx(z)>0
* At the end of the section we give an example that shows that (1.5) does not hold if
the integrals used to define expectations are not required to converge absolutely.
SEC. 1 EXPECTATION, MEAN, AND VARIANCE OF A RANDOM VARIABLE 345
To each point Yj on the y-axis, there is a number of points xJI), xJ2), ... ,
at which g(x) is equal to y. Form the set of all such points on the x-axis
that correspond to the points YI' ... , Yn' Arrange these points in increasing
order, Xo < Xl < ... < x m . These points divide the x-axis into sub-
intervals. Further, it is clear upon reflection that the last sum in (1.6) is
equal to
111
(1.7) 2: g(xk)PX[{x: X k- l < X <xk}) ~ Ex[g(x)],
k~1
Fig_ lA. With the aid of this graph of a possible function g('),
one can see that (1.5) holds.
Given a random variable X and a function g(-), we thus find two distinct
notions, represented by E[g(X)] and Ex[g(x)), which nevertheless, are
always numerically equal. It has become customary always to use the
notation E[g(X)], since this notation is the most convenient for technical
manipulation. However, the reader should be aware that although we
write E[g(X)) the concept in which we are really very often interested is
Ex[g(x»), the expectation of the function g(x) with respect to the random
variable X. Thus, for example, the nth moment of a random variable X
(for any integer n) is often defined as E[xn], the expectation of the nth
power of X. From the point of view of the intuitive meaning of the nth
moment, however, it should be defined as the expectation Ex[xn] of the
346 EXPECTATION OF A RANDOM VARIABLE CH.8
in the sense that if either of these expectations exists then so does the other,
and the two are equal.
To prove (LlI) we must prove that
(1.14)
One may verify directly that the integrals on the right-hand sides of (1.13)
and (Ll4) are equal, as asserted by (1.11).
348 EXPECTATION OF A RANDOM VARIABLE CH.8
1
(1.19) ( 2n)2- 2n r - . . J _ _
n (mr)li '
the sign indicating that the ratio of the two sides in (1.19) tends to I as n
r-..J
tends to infinity. Consequently, (2117)P[N = 2111] > Kj'V; for some con-
stant K. Therefore, the infinite series in (1.18) diverges, and E[N] = 00. ~
To conclude this section, let us justify the fact that the integrals defining
expectations are required to be absolutely convergent by showing, by
example, that if the expectation of a continuous random variable X is
defined by
(1.20)
E[X + c] = E[X] + c.
Let X be a random variable whose probability density function is an
even function, that is,ix( -x) = ix(x). Then, under the definition given
SEC. 1 EXPECTATION, MEAN, AND VARIANCE OF A RANDOM VARIABLE 351
by (1.20), the mean E[X] exists and equals 0, since f" xfx(x) dx = 0 for
every a. Now -a
f"
Thatf(-) is a probability density function follows from the fact that
1
<Xl 2
f(x) dx = 2A 2: 2 = 2A ~ = 1.
-00 k~lk 6
That (1.22) holds for c > 1 follows from the fact that for k = 22 , 32 , •••
~k+1 lk+1 (k - 1) A
Jk-1 uf(u) du ? (k - 1)
k-1
feu) du = --- A > - .
k 2
THEORETICAL EXERCISES
1.1. The mean and variance of a linear function of a random variable. Let X
be a random variable with finite mean and variance. Let a and b be real
numbers. Show that
E(aX + b] = aE(X] + b, Var [aX + b] = lal 2 Var [X),
(1.24)
IT[aX + b) = iailT[X), 'IJlaX+b(t) = ebl'lJlx(at).
352 EXPECTATION OF A RANDOM VARIABLE CH.8
1.2. Chebyshev's inequality for random variables. Let X be a random variable
with finite mean and variance. Show that for any /z > 0 and any E > 0
EXERCISES
1.1. Consider a gambler who is to win 1 dollar if a 6 appears when a fair die is
tossed; otherwise he wins nothing. Find the mean and variance of his
winnings.
1.2. Suppose that 0.008 is the probability of death within a year of a man aged
35. Find the mean and variance of the number of deaths within a year
among 20,000 men of this age.
1.3. Consider a man who buys a lottery ticket in a lottery that sells 100 tickets
and that gives 4 prizes of 200 dollars, 10 prizes of 100 dollars, and 20 prizes
of 10 dollars. How much should the man be willing to pay for a ticket in
this lottery?
1.4. Would you pay 1 dollar to buy a ticket in a lottery that sells 1,000,000
tickets and gives 1 prize of 100,000 dollars, 10 prizes of 10,000 dollars, and
100 prizes of 1000 dollars?
1.5. Nine dimes and a silver dollar are in a red purse, and 10 dimes are in a
black purse. Five coins are selected without replacement from the red
purse and placed in the black purse. Then 5 coins are selected without
replacement from the black purse and placed in the red purse. The amount
of money in the red purse at the end of this experiment is a random variable.
What is its mean and variance?
1.6. St. Petersburg problem (or paradox?). How much would you be wilIing
to pay to play the following game of chance. A fair coin is tossed by the
player until heads appears. If heads appears on the first toss, the bank
pays the player 1 dollar. If heads appears for the first time on the second
throw the bank pays the player 2 dollars. If heads appears for the first
time on the third throw the player receives 4 = 22 dollars. In general, if
heads appears for the first time on the nth throw, the player receives 2n - 1
dollars. The amount of money the player will win in this game is a random
variable; find its mean. Would you be willing to pay this amount to
play the game? (For a discussion of this problem and why it is sometimes
called a paradox see T. C. Fry, Probability and Its Engineering Uses, Van
Nostrand, New York, 1928, pp. 194-199.)
SEC. 1 EXPECTATION, MEAN, AND VARIANCE OF A RANDOM VARIABLE 353
1.7. The output of a certain manufacturer (it may be radio tubes, textiles,
canned goods, etc.) is graded into 5 grades, labeled A5, A4, AS, A 2, and A
(in decreasing order of guality). The manufacturer's profit, denoted by X,
on an item depends on the grade of the item, as indicated in the table. The
grade of an item is random; however, the proportions of the manu~
facturer's output in the various grades is known and is given in the table
below. Find the mean and variance of X, in which X denotes the manu~
facturer's profit on an item selected randomly from his production.
$1.00 .JL
16
0.80 t
0.60 !
....L
0.00 16
-0.60 -k
1.S. Consider a person who commutes to the city from a suburb by train. He
is accustomed to leaving his home between 7 :30 and 8 :00 A.M. The drive
to the railroad station takes between 20 and 30 minutes. Assume that the
departure time and length of trip are independent random variables, each
uniformly distributed over their respective intervals. There are 3 trains
that he can take, which leave the station and arrive in the city precisely
on time. The first train leaves at 8 :05 A.M. and arrives at 8 :40 A.M., the
second leaves at 8 :25 A.M. and arrives at 8 :55 A.M., the third leaves at
9:00 A.M. and arrives at 9:43 A.M.
(i) Find the mean and variance of his time of arrival in the city.
Oi) Find the mean and variance of his time of arrival under the assumption
that he leaves his home between 7 :30 and 7 :55 A.M.
1.9. Two athletic teams playa series of games; the first team to win 4 games
is the winner. Suppose that one of the teams is stronger than the other
and has probability p [egual to (i) 0.5, (ii) i] of winning each game,
independent of the outcomes of any other game. Assume that a game
cannot end in a tie. Find the mean and variance of the number of games
required to conclude the series. (Use exercise 3.26 of Chapter 3.)
1.10. Consider an experiment that consists of N players independently tossing
fair coins. Let A be the event that there is an "odd" man (that is, either
exactly one of the coins falls heads or exactly one of the coins falls tails).
For r = 1, 2, ... let Xr be the number of times the experiment is repeated
until the event occurs for the rth time.
(i) Find the mean and variance of X r •
(ii) Evaluate E[Xr ] and Var EXT] for N = 3, 4,5 and r = 1, 2, 3.
1.11. Let an urn contain 5 balls, numbered 1 to 5. Let a sample of size 3 be
drawn with replacement (without replacement) from the urn and let X be
the largest number in the sample. Find the mean and variance of X.
354 EXPECTATION OF A RANDOM VARIABLE CH.8
1.12. Let X be N(m, Find the mean and variance of (i) lXI, (ii) IX - cl
a 2 ).
where (a) c is a given constant, (b) a = I1l = C = 1, (c) a = m = 1, c = 2.
1.13. Let X and Y be independent random variables, each N(O, 1). Find the
mean and variance of V X 2 + y2.
1.14. Find the mean and variance of a random variable X that obeys the
probability law of Laplace, specified by the probability density function,
for some constants ex. and (J > 0:
.2
over all poin ts 11
VPu(x"x,ly),
where P g IX,'X2)(Y) > 0
(2.8')
We leave it to the reader to prove that the covariance is equal to the product
moment, minus the product of the means; in symbols,
(2.10)
The covariance derives its importance from the role it plays in the basic
formula for the variance of the sum of two random variables:
The moments can be read off from the power-series expansion of the
moment-generating function, since formally
(2.12)
(2.13) E[XJ =
a
-at 'lJl"A,. A•(0, 0),
v E[X2l = -at V
a
'lJlx x, (0, 0).
1 2
(2.14) E[XI2] =
a
-a
2
2 'lJl" x 2(0, 0),
t "' l'
1
(2.15)
(2.16)
Var [X2] = -a
a
2
2 'lJlx -m x -m (0, 0).
t'l' 2 2
2
a 2
(2.17) Cov [Xl' X 2] = at atz 'lJlX, -m .X -m.(0, 0),
i 1 2
0'2Vl-p2
1
in which 1>(u) = ---= e- Yfu' is the normal density function. Using our
V27r
knowledge of the moment-generating function of a normal law, we may
perform the integration with respect to the variable X 2 in the integral in
(2.19). We thus determine that 'Ifx l' x 2 (tl' t 2) is equal to
-- [1 2
exp '}:t2 0'2 2(1 - P 2) + t2m 2 - t z 0'2
0'1 pm1]
Thus, if two random variables are jointly normally distributed, their joint
probability law is completely determined from a knowledge of their first and
second moments, since ml = E[X1], 1712 = E[X2], 0'12 = Var [Xl]' 0'22 =
Var [X22], PO'l0'2 = Cov [Xl' X 2]. ~
x J:X l'
x 2' .,. , X n (Xl' X 2 , ••• , Xn) dx l dx2 . . . dx n •
If Xl' X 2 , ••• , Xn are jointly discrete, with a joint probability mass
function PX,'X 2 • • • • , xn(Xl> X2 , ••• , x ll ) , it may be shown that
(2.25) E[g(XI' X 2 , ••• , Xn)] =
~ g(XI' x 2 , ••• ,X ,)PX" X2, •.. ,xn(xI , x 2 ,
1 ••• , xn )
over all (X, .X 2• ••• , Xn) sneh that
PXl,X2, ... , x n (X 1 ,X 2 , ...• xn»O
It may also be proved that if Xl' X 2 , ••• ,Xn and Yare random
variables, such that Y = gl(Xl> X 2 , ••. ,Xn ) for some Borel function
gl(X I, X2 , .•• ,xn ) of n real variables, then for any Borel function gO of
one real variable
(2.27)
THEORETICAL EXERCISES
E(XI ] = vEl YJ Loow WI(u) du, E[Xz] = vEl Y] Lcoa;o W2(u) du,
(2.29) Var [Xl] = VE[YZ]J_ct)ct) W12(U)du,
EXERCISES
Now suppose that we modify (3.2) and ask only that it hold for the
functions gl(X) = x and g2(X) = x, so that
(3.4)
For reasons that are explained after (3.7), two random variables, Xl and
X 2 , which satisfy (3.4), are said to be uncorrelated. From (2.10) it follows
that Xl and X z satisfy (3.4) and therefore arc uncorrelated if and only if
(3.5)
For uncorrelated random variables the formula given by (2-11) for the
variance of the sum of two random variables becomes particularly elegant;
the variance of the sum of two uncorrelated random variables is equal to
the sum of their variances. Indeed,
(3.6)
(3.12) ICov [Xl' X 2]1 2 < Var [Xl] Var [X2], < a[X11a[X2].
ICov [Xl' X211
We prove (3.11) as follows. Define, for any real number t, h(t) =
E[(tX1 - X2)2] = t 2E[X12] - 2tE[X1X 2] + E[X22]. Clearly h(t) > 0 for all t.
Consequently, the quadratic equation h(t) = 0 has either no solutions or
one solution. The equation h(t) = 0 has no solutions if and only if
E2[X1 X 2] - E[X12]E[X22] < O. It has exactly one solution if and only if
E2[X1 X 2] = E[X12]E[X22]. From these facts one may immediately infer
(3.11) and the sentence following it.
364 EXPECTATION OF A RANDOM VARIABLE CH.8
The inequalities given by (3.11) and (3.12) are usually referred to as
Schwarz's inequality or Cauchy's inequality.
Conditions for Independence. It is important to note the difference
between two random variables being independent and being uncorrelated.
They are uncorrelated if and only if (3.4) holds. It may be shown that
they are independent if and only if (3.2) holds for all functions glO and
g2(-), for which the expectations in (3.2) exist. More generally, theorem
3c can be proved.
THEOREM 3c. Two jointly distributed random variables Xl and X 2 are
independent if and only if each of the following equivalent statements is
true:
(i) Criterion in terms ofprobability functions. For any Borel sets Bl and
B2 of real numbers, P[XI is in B l , X 2 is in B 2 ] = P[X1 is in B 1lP[X2 is inB 2].
(ii) Criterion in terms of distribution functions. For any two real
numbers, Xl and X2, FX1.X/Xl , x 2) = Fx ,(XJF.'2(X2),
(iii) Criterion in terms of expectations. For any two Borel functions,
gl(-) and g2('), E[gl(Xl )g2(X2 )] = E[gl(XJ]E[giX2)] if the expectations
involved exist.
(iv) Criterion in terms of moment-generating functions (if they exist). For
any two real numbers, 11 and t 2 ,
(3.13)
THEORETICAL EXERCISES
3.1. The standard deviation has the properties of the operation of taking the
absolute value of a number: show first that for any 2 real numbers, x and
V, Ix + yl ::; 1<::1 + IYI, Ilxl - Iyll ::; Ix - vi. Hint: Square both sides of
the equations. Show next that for any 2 random variables, X and Y,
(3.14) a[X + Y] ::; a[X] + 0'[ Y], la[X] - 0'[ YJI ::; a[X - Y].
Give an example to prove that the variance does not satisfy similar
relationships.
3.2. Show that independent random variables are uncorrelated. Give an
example to show that the converse is false. Hint: Let X = sin 27TU,
Y = cos 27TU, in which U is uniformly distributed over the interval 0 to l.
3.3. Prove that if Xl and X 2 are jointly normally distributed random variables
whose correlation coefficient vanishes then Xl and X 2 are independent.
Hint: Use example 2A.
3.4. Let r:f. and f3 be the values of a and b which minimize
f(a, b) = £IX2 - a - bXll2.
SEC. 3 UNCORRELATED AND INDEPENDENT RANDOM VARIABLES 365
Express et:, fl, and feet:, fl) in terms of p(X!> X 2). The random variable
et: + flXI is called the best linear predictor of X 2 ,
given Xl [see Section 7,
in particular, (7.13) and (7.14)].
3.5. Prove that (3.9) and (3.10) hold under the conditions stated.
3.6. Let Xl and X 2 be jointly distributed random variables possessing finite
second moments. State conditions under which it is possible to find 2
uncorre/ated random variables, Y1 and Y2' which are linear combinations
of Xl and X 2 (that is, Y1 = an Xl + a l2 X 2 and Y2 = a 21 X 1 + a2Z X 2 for
some constants an, a 12 , a 21 , a22 and Cov [YI, Y2] = 0).
3.7. Let X and Y be jointly normally distributed with mean 0, arbitrary
variances, and correlation p. Show that
P[X ;::: 0, Y ;::: 0] = P[X :::; 0, Y :::; 0] = ~+ L sin-Ip.
EXERCISES
3.1. Consider 2 events A and B such that P[AJ = t, P[B I A] = to P[A I B] = t.
Define random variables X and Y: X = I or 0, depending on whether
the event A has or has not occurred, and Y = 1 or 0, depending on whether
the event B has or has not occurred. Find E[X], E[ Y], Var [X], Var [Yl,
p(X, Y). Are X and Y independent?
3.2. Consider a sample of size 2 drawn with replacement (without replacement)
from an urn containing 4 balls, numbered 1 to 4. Let Xl be the smallest
and X 2 be the largest among the numbers drawn in the sample. Find
p(X!> X 2 ).
3.3. Two fair coins, each with faces numbered 1 and 2, are thrown independ-
ently. Let X denote the sum of the 2 numbers obtained, and let Y
denote the maximum of the numbers obtained. Find the correlation
coefficient between X and Y.
366 EXPECTATION OF A RANDOM VARIABLE CR. 8
3.4. Let U, V, and W be uncorrelated random variables with equal variances.
Let X = U + V, Y = U + W. Find the correlation coefficient between
Xand Y.
3.5. Let Xl and X 2 be uncorrelated random variables. Find the correlation
p( YI , Y 2) between the random variables YI = Xl + X 2 and Y2 = Xl - X 2
in terms of the variances of Xl and X 2 •
3.6. Let Xl and X 2 be uncorrelated normally distributed random variables.
Find the correlation p( YI , Y2 ) between the random variables YI = X 1 2
and Y2 = X22.
3.7. Consider the random variables whose joint moment-generating function
is given in exercise 2.6. Find p(Xl> X 2 ).
3.8. Consider the random variables whose joint moment-generating function
is given in exercise 2.7. Find p(XI , X 2 ).
3.9. Consider the random variables whose joint moment-generating function
is given in exercise 2.8. Find p(XI , X 2 ).
3.10. Consider the random variables whose joint moment-generating function
is given in exercise 2.9. Find p(X1 , X 2 ).
for k i= j.
(4.7')
Equations (4.1)-(4.3) are useful for finding the mean and variance of a
random variable Y (without knowing the probability law of Y) if one can
represent Y as a sum of random variables Xl> X 2 , ••• , XI!' the mean,
variances, and covariances of which are known.
We now show that (4.9) can be derived by means of (4.1) and (4.3),
without knowing the probability law of Sn. Define random variables
Xl' X 2 , ••• ,Xn : X k = 1 or 0, depending on whether a white ball is or
is not drawn on the ktll draw. Verify that (i) Sn = Xl + X 2 + ... + X,,;
(ii) for k = 1,2, ... ,n, X k is a Bernoulli random variable, with mean
E[Xk ] = P and Var [Xk ] = pq. However, the random variables Xl' ... , Xn
are not independent, and we need to compute their product moments
E[XjXkl and covariances Cov [X;, X k ] for any j =1= k. Now, ErXjXk] =
P[Xj = 1, X k = I], so that E[X;XJ is equal to the probability that the
balls drawn on the jth and kth draws are both white, which is equal to
[a(a - l)]/[N(N - 1)]. Therefore,
a(a - 1) -pq
Cov [X;, X k ] = E[XjXk ] - E[X;]E[Xk ] = ( ) - p2 = - - .
NN-l N-l
Consequently,
Var [Sn] = npq + n(n _ 1)(N-l
-pq) = npq(l _ 1) .
N-l
ll-
THEORETICAL EXERCISES
4.1. Waiting times in coupon collecting. Assume that each pack of cigarettes of
a certain brand contains one of a set of N cards and that these cards are
distributed among the packs at random (assume that the number of packs
SEC. 4 EXPECTATIONS OF SUMS OF RANDOM VARIABLES 369
available is infinite). Let SN be the minimum number of packs that must
be purchased in order to obtain a complete set of N cards. Show that
N
E[SN] = N 2: (i/k), which may be evaluated by using the formula (see H.
k=l
Cramer, Mathematical Methods of Statistics, Princeton University Press,
1946, p. 125)
N I l
k~l k = 0.57722 + log. N + 2N + RN ,
in which 0 < RN < 1/8N 2• Verify that E[Ss2] == 236 if N = 52. Hint:
For k = 0, 1, ... , N - 1 let X" be the number of packs that must be
purchased after k distinct cards have been collected in order to collect the
(k + 1)st distinct card. Show that E[Xk ] = N/(N - k) by using the fact
that Xl' has a geometric distribution.
4.2. Continuation of (4.1). For I' = I, 2, ... , N let Sr be the minimum number
of packs that must be purchased in order to obtain I' different cards. Show
that
1 1 1 1 )
E[ST] = N ( it + N _ 1 + N - 2 + ... + N - I' +1
1 2 1'-1 )
Var [Sr] = N ( (N _ 1)2 + (N _ 2)2 + ... + (N _ I' + 1)2 .
4.3. Continuation of (4.1). For I' preassigned cards let Tr be the minimum
number of packs that must be purchased in order to obtain all r cards.
Show that
" N r N(N - I' + k - 1)
E(Trl = k~l r - k + 1 ' Var [TT] = k 1)22: - ( _
•
k=l I' +
4.4. The mean and variance of the number of matches. Let Sill be the number of
matches obtained by distributing, 1 to an urn, M balls, numbered 1 to M,
among M urns, numbered 1 to M. It was shown in theoretical exercise
3.3 of Chapter 5 that E[S.lI] = 1 and Var [S.11] = 1. Show this, using the
fact that S111 = Xl + ... + X.11> in which X k = 1 or 0, depending on
whether the kth urn does or does not contain ball number k. Hint: Show
that Cov [X;, Xd = (M - 1)/M2 or I/M2(M - 1), depending on whether
.i = k or.i 'fo k.
370 EXPECTATION OF A RANDOM VARIABLE CH.8
4.5. Show that if Xl> ... , Xn are independent random variables with zero means
and finite fourth moments, then the third and fourth moments of the sum
Sn = Xl + . . . + Xn are given by
n n n n
E[Sn 3] = 2 E[Xk3], E[Sn4] = 2 E[X 4] + 62 E[Xk2] 2
k E[X/].
kol k~l k~l j~k+l
(i) Show that E[S2] = 0- 2, Var [S2] = (0-4/ n)[(ft4/0-4) - (n - 3/n - I)], III
which 0- 2 = Var [X], ft4 = E[(X - E[X])4]. Hint: show that
n n
2 (X k - E[X])2 = 2 (Xk - X)2 + n(X - E[X])2.
k=l k=l
EXERCISES
4.4. A man with n keys wants to open his door. He tries the keys independently
and at random. Let N n be the number of trials required to open the door.
Find E[Nn ] and Var [N,,] if (i) unsuccessful keys are not eliminated from
.further selections, (ii) if they are .. Assume that exactly one of the keys can
open the door.
SEC. 5 THE CENTRAL LIMIT THEOREM 371
In exercises 4.5 and 4.6 consider an item of equipment that is composed by
assembling in a straight line 4 components of lengths Xl' X 2 , X 3, and X 4 , respec-
tively. Let E[Xll = 20, E[XJ = 30, E[Xal = 40, E[X4l = 60.
4.5. Assume Var [X;J = 4 for j = 1, ... ,4.
(i) Find the mean and variance of the length L = Xl + X 2 + X 3 + X 4
of the item if Xl, X 2 , X 3 , and X 4 are uncorrelated.
(ii) Find the mean and variance of L if p(Xj , X,..) = 0.2 for 1 :::;'j < k :::;. 4.
4.6. Assume that a[Xj ] = (O.l)E[Xj ] for j = 1, ... ,4. Find the ratio E[L]/a[L],
called the measurement signal-to-noise ratio of the length L (see section 6),
for both cases considered in exercise 4.5.
(5.2)
is called the sample mean.
By (4.1), (4.6), and (4.7), we obtain the following expressions for the
mean, variance, and moment-generating function of Sn and Mno in terms
of the mean, variance, and moment-generating function of X (assuming
these exist):
(5.3) E[S,,] = nE[X], Var [S,,] = n Var [X], Vlsn(t) = [Vlx(t)]".
From (5.11) and (5.12) one obtains (5.7). Our heuristic outline of the
proof of (5.5) is now complete.
Given any random variable X with finite mean and variance, we define
its standardization, denoted by X*, as the random variable
" X - E[X]
(5.13) X·= .
a[X]
The standardization X* is a dimensionless random variable, with mean
E[X*J = 0 and variance a2 [X*] = 1.
374 EXPECTATION OF A RANDOM VAR1ABLE CH. 8
The central limit theorem of probability theory can now be formulated:
The standardization (Sn)* of the sum Sn oj" a large number n of independent
and identically distributed random variables is approximately normally
distributed. In Chapter 10 it is shown that this result may be considerably
extended to include cases in which Sn is the sum of dependent nonidentically
distributed random variables.
EXERCISES
5.1. Which of the following sets of evidence throws more doubt on the
hypothesis that new born babies are as likely to be boys as girls: (i) of
10,000 new born babies, 5100 are male; (ii) of 1000 new born babies, 510
are male.
5.2. The game of roulette is described in example ID. Find the probability
that the total amount of money lost by a gambling house on 100,000
bets made by the public on an odd outcome at roulette will be negative.
5.3. As an estimate of the unknown mean E[X] of a random variable, it is
customary to take the sample mean X = (Xl + X 2 + ... + Xn)!n. of a
random sample Xl' X 2 , ••• , X" of the random variable X. How large a
sample should one observe if there is to be a probability of at least 0.95
that the sample mean X will not differ from the true mean E[X] by more
than 25 % of the standard deviation a[X]?
5.4. A man plays a game in which his probability of winning or losing a dollar
is t. Let S" be the man's fortune (that is, the amount he has won or lost)
after n independent plays of the game.
(i) Find E[S,,] and Var [S,,]. Hint: Write Sn = Xl + ... + X n , in which
Xi is the change in the man's fortune on the ith play of the game.
(ii) Find approximately the probability that after 10,000 plays of the game
the change in the man's fortune will be between -50 and 50 dollars.
5.5. Consider a game of chance in which one may win 10 dollars or lose J, 2, 3,
or 4 dollars; each possibility has probability 0.20. How many times can
this game be played if there is to be a probability of at least 95 % that in the
final outcome the average gain or loss per game will be between -2 and
+2?
5.6. A certain gambler's daily income (in dollars) is a random variable X
uniformly distributed over the interval -3 to 3.
(i) Find approximately the probability that after 100 days of independent
play he will have won more than 200 dollars.
(ii) Find the quantity A that the probability is greater than 95 % that the
gambier'S winnings (which may be negative) in 100 independent days of
play will be greater than A.
SEC. 5 THE CENTRAL LIMIT THEOREM 377
(iii) Determine the number of days the gambler can play in order to have
a probability greater than 95 % that his total winnings on these days will
be less than 180 dollars in absolute value.
5.7. Ad:! 100 real numbers, each of which is rounded off to the nearest integer.
Assume that each rounding-off error is a random variable uniformly
distributed between -t and t and that the 100 rounding-off errors are
independent. Find approximately the probability that the error in the sum
will be between -3 and 3. Find the quantity A that the probability is
approximately 99 % that the error in the sum will be less than A in absolute
value.
5.8. If each strand in a rope has a breaking strength, with mean 20 pounds and
standard deviation 2 pounds, and the breaking strength of a rope is the
sum of the (indepf:ndent) breaking strengths of all the strands, what is the
probability that a rope made up of 64 strands will support a weight of
(i) 1280 pounds, (ii) 1240 pounds.
5.9. A delivery truck carries loaded cartons of items. If the weight of each
carton is a random variable, with mean 50 pounds and standard deviation
5 pounds, how many cartons can the truck carry so that the probability
of the total load exceeding 1 ton will be less than 5 %? State any assump-
tions made.
5.10. Consider light bulbs, produced by a machine, whose life X in hours is a
random variable obeying an exponential probability law with a mean
lifetime of 1000 hours.
(i) Find approximately the probability that a sample of 100 bulbs selected
at random from the output of the machine will contain between 30 and
40 bulbs with a lifetime greater than 1020 hours.
(ii) Find approximately the probability that the sum of the lifetimes of 100
bulbs selected randomly from the output of the machine will be less than
110,000 hours.
5.11. The apparatus known as Galton's quincunx is described in exercise 2.10
of Chapter 6. Assume that in passing from one row to the next the change
X in the abscissa of a ball is a random variable, with the following proba-
bility law: P[X = i] = P[X = -t] = t - 'YJ, P[X = i] = P[X = -!] =
r}, in which 1) is an unknown constant. In an experiment performed with a
quincunx consisting of 100 rows, it was found that 80 % of the balls
inserted into the apparatus passed through the 21 central openings of the
last row (that is, the openings with abscissas 0, ± 1, ±2, ... , ± 10).
Determine the value of 'YJ consistent with this result.
5.12. A man invests a total of N dollars in a group of n securities, whose rates
of return (interest rates) are independent random variables Xl' X 2 , ••• , X n ,
respectively, with means iI' i2, ... , in and variances a l 2 , a22, ... , an 2,
respectively. If the man invests N j dollars in the jth security, then his
return in dollars on this particular portfolio is a random variable R given
by R = NlXl + N 2 X 2 + ... + NnXn. Let the standard deviation a[R]
of R be used as a measure of the risk involved in selecting a given portfolio
of securities. In particular, let us consider the problem of distributing
378 EXPECTATION OF A RANDOM VARIABLE CH.8
investments of 5500 dollars between two securities, one of which has a rate
of return Xv with mean 6 % and standard deviation 1 %, whereas the other
has a rate of return X 2 with mean 15% and standard deviation 10%.
(i) If it is desired to hold the risk to a minimum, what amounts Nl and
N2 should be invested in the respective securities? What is the mean and
variance of the return from this portfolio?
(ii) What is the amount of risk that must be taken in order to achieve a
portfolio whose mean return is equal to 400 dollars ?
(iii) By means of Chebyshev's inequality, find an interval, symmetric
about 400 dollars, that, with probability greater than 75 %, will contain
the return R from the portfolio with a mean return E[R] = 400 dollars.
Would you be justified in assuming that the return R is approximately
normally distributed?
X -
- - - I <b ] > 1 -1-a -[X]
E[X] 2
(6.6) P [ I- -.
E[X] - - b E2[XJ
2
>99%
. > 1 a[X]
If b _ 0 IE[XJI .
(6.8) p[ IX E[X]
- E[X] 1< bJ > 95
- -
0/
/0
. a[X]
If b > 1.96 IE[X]I '
IE[XJI
(6.9)
a[X]
jE[X]j > 45
if Chebyshev's inequality applies,
a[X] -
(6.11)
jE[X]1 > 20 if the normal approximation applies.
a[X] -
The measurement signal-to-noise ratio of various random variables is
given in Table 6A. One sees that for most of the random variables
given the measurement signal-to-noise ratio is proportional to the square
root of some parameter. For example, suppose the number of particles
TABLE 6A
MEASUREMENT SIGNAL-TO-NOISE RATIO OF RANDOM VARIABLES
OBEYING VARIOUS PROBABILITY LAWS
Probability Law of X E[X] a2[X] (E[X]f
a[X]
3(~r
Uniform over the interval a+b
a to b i\(b - a)2
2 b -a
(~r
Normal, with parameters
m and a 111 a2
density with mean density, loses its meaning. The "density fluctuations"
in small volumes can actually be detected experimentally inasmuch as
they cause scattering of sufficiently short wavelengths. ....
~ Example 6B. The law of V;;. The physicist Erwin Schrodinger has
pointed out in the following statement (What is Life, Cambridge Uni-
versity Press, 1945, p.l6), " ... the degree of inaccuracy to be expected in
any physical law, the so-called Vn law. The laws of physics and physical
chemistry are inaccurate within a probable relative error of the order of
IjV-;;', where n is the number of molecules that cooperate to bring about
that law." From the law of Vn Schrodinger draws the conclusion that in
order for the laws of physics and chemistry to be sufficient to explain the
laws governing the behavior of living organisms it is necessary that the
biologically relevant processes of such an organism involve the cooperation
of a very large number of atoms, for only in this case do the laws of physics
become exact laws. Since one can show that there are "incredibly small
groups of atoms, much too small to display exact statistical laws, which
play a dominating role in the very orderly and lawful events within a
living organism", Schrodinger conjectures that it may not be possible to
interpret life by the ordinary laws of physics, based on the "statistical
mechanism which produces order from disorder." We state here a
mathematical formulation of .the law of V;;'. If Xl> X 2 , ••• , Xn are
independent random variables identically distributed as a random variable
X, then the sum Sn = Xl + X 2 + ... + Xn and the sample mean
Mn = Snjn have measurement signal-to-noise ratios given by
E[Sn] E[Mn] . r E[X]
--=--=vn--.
a[Sn] a[Mn] a[X]
In words, the sum or average of n repeated independent measurements of
a random variable X has a measurement signal-to-nbise ratio of the
order of Vn. ....
~ Example 6C. Can the energy of an ideal gas be both constant and a
X2 distributed random variable? In example 9H of Chapter 7 it is shown
that if the state of an ideal gas is a random phenomenon whose probability
law is given by Gibbs's canonical distribution then the energy E of the gas
is a random variable possessing a X2 distribution with 3N degrees of
freedom, in which N is the number of particles comprising the gas. Does
this mean that if a gas has constant energy its state as a point in the space
of all possible velocities cannot be regarded as obeying Gibbs's canonical
distribution? The answer to this question is no. From a practical point
of view there is no contradiction in regarding the energy E of the gas as
SEC. 6 THE MEASUREMENT SIGNAL-TO-NOISE RATIO 383
being both a constant and a random variable with a X2 distribution if the
number of degrees of freedom is very large, for then the measurement
signal-to-noise ratio of E (which, from Table 6A, is equal to (3N/2)Y2)
is also very· large. ~
EXERCISES
6.1. A random variable Xhas an unknown mean and known variance 4. How
large a random sample should one take if the probability is to be at least
0.95 that the sample mean will not differ from the true mean E[X} by (i)
384 EXPECTATION OF A RANDOM VARIABLE CH.8
more than 0.1, (ii) more than 10% of the standard deviation of X. (iii) more
than 10 % of the true mean of X, if the true mean of X is known to be
greater than 10.
6.2. Let Xl. X 2• •••• X .. be independent normally distributed random variables
with known mean 0 and unknown common variance a2• Define
1
S.. = - (X1 2 + X 22 + ... + X ..2).
n
Since E[Snl = a 2, Sn might be used as an estimate of a 2• How large
should n be in order to have a measurement signal-to-noise ratio of S..
greater than 20? If the measurement signal-to-noise ratio of S.. is greater
than 20, how good is S.. as an estimate of a 2 ?
6.3. Consider a gas composed of molecules (with mass of the order of 10-24
grams and at room temperature) whose velocities obey the Maxwell-
Boltzmann law (see exercise 1.15). Show that one may assume that all the
molecules move with the same velocity. which may be taken as either the
mean velocity. the root mean square velocity. or the most probable velocity.
L
over all V sllch that
YPYlx(Y I x);
pylx(vlz) > 0
the last two equations hold, respectively, in the cases in which FYIX(·I x)
is continuous or discrete. From a knowledge of the conditional mean of
Y, given X, the value of the mean E[ Y] may be obtained:
rrooE[YI X ~ xl dFx(x)
(7.2) E[Y] ~ 1rooE[Y I x ~ xl/x(X) <k
L E[YI X= x]px(x)
lover all z such that
Px(z) >0
SEC. 7 CONDITIONAL EXPECTATION 385
~ Example 7 A. Sampling from an urn of random composition. Let a
random sample of size n be drawn without replacement from an urn
containing N balls. Suppose that the number X of white balls in the um
is a random variable. Let Y be the number of white balls contained in
the sample. The conditional distribution of Y, given X, is discrete, with
probability mass function for x = 0, 1, ... ,N and y = 0, 1, ... ,x
given by
(7.3) Py/x(y I x) = P[ Y = y I X = xJ =
(7.8)
386 EXPECTATION OF A RANDOM VARIABLE CH.8
Similarly,
(7.9) E[XI I X 2 = x2 ] = CY. 2 + P2 X
2;
~. ~
The conditional mean of one random variable, given another random
variable, represents one possible answer to the problem of prediction.
Suppose that a prospective father of height Xl wishes to predict the height
of his unborn son. If the height of the son is regarded as a random
variable X 2 and the height Xl of the father is regarded as an observed value
of a random variable Xl' then as the prediction of the son's height we take
the conditional mean E[X2 I Xl = Xl]. The justification of this procedure
is that the conditional mean E[X2 I Xl = Xl] may be shown to have the
property that
(7.10) E[(X2 - E[X21 Xl = xID2]
for any function g(xl ) for which the last written integral exists. In words,
(7.10) is interpreted to mean that if X 2 is to be predicted by a function
g(XI ) of the random variable Xl then the conditional mean E[X2 I Xl = Xl]
has the smallest mean square error among all possible predictors g(X1 ).
From (7.7) it is seen that in the case in which the random variables are
jointly normally distributed the problem of computing the conditional
mean E[X2 I Xl = Xl] may be reduced to that of computing the constants
Ct. l and (31' for which one requires a knowledge only of the means, variances,
ex + (3E[Xl] = E[X2]
(7.12)
cr.E[X1J + (3E[XI2] = E[XI X 2 ].
Comparing (7.7) and (7.13), one sees that the best linear predictor
E*[X2 I Xl = Xl] coincides with the best predictor, or conditional mean,
E[X21 Xl = Xl]' in the case in which the random variables Xl and X 2 are
jointly normally distributed.
We can readily compute the mean square error of prediction achieved
with the use of the best linear predictor. We have
(7.14)
E[(X2 - E*[X21 Xl == XI])2] = E[{(X2 - E[X2 ]) - (3(XI - E[XI]WJ
= Var [X2 ] + (32 Var [Xl] - 2(3 Cov [X2 , Xl]
_ ] Cov2 [Xl' X2]
- Vtr [X 2 - Var[XI ]
= Var [X2] {l - p2(XI' X2)}'
From (7.14) one obtains the important conclusion that the closer the
correlation between two random variables is to 1, the smaller the mean square
error of prediction involved in predicting the value of one of the random
variables from the value of the other.
388 EXPECTATION OF A RANDOM VARIABLE CH.8
The Phenomenon of "Spurious" Correlation. Given three random
variables U, V, and W, let X and Y be defined by
(7. I 5) X = U + W,
u V
Y= V+ W or X=-
W'
y=-
W'
(or in some similar way) as functions of U, V, and W. The reader should
be careful not to infer the existence of a correlation between U and V from
the existence of a correlation between X and Y.
~ Example 7C. Do storks bring babies·~ Let W be the number of women
of child-bearing age in a certain geographical area, U, the number of
storks in the area, and V, the number of babies born in the area during a
specified period of time. The random variables X and Y, defined by
u V
(7.16) X=-
W'
y=-
W'
then represent, respectively, the number of storks per woman and the
number of babies born per woman in the area. If the correlation coefficient
p(X, Y) between X and Y is close to 1, does that not prove that storks
bring babies? Indeed, even if it is proved only that the correlation
coefficient p(X, Y) is positive, would that not prove that the presence of
storks in an area has a beneficial influence on the birth rate there? The
reader interested in a discussion of these delightful questions would be
well advised to consult J. Neyman, Lectures and Conferences on Mathe-
matical Statistics and Probability, Washington, D.C., 1952, pp. 143-154.
~
THEORETICAL EXERCISES
(7.17) var[Y]=nE~](l-E~])Z=~+:=~~var(x].
EXERCISES
7.1. Let Xl> X 2 , X3 be jointly distributed random variables with zero means,
unit variances, and covariances Cov [Xl, X 2 ] = 0.80, Cov [Xl' X 3] = -0040,
Cov [X2 , X 3 ] = -0.60. Find (i) the best linear predictor of Xl' given X 2 ,
(ii) the best linear predictor of Xa, given X 2 , (iii) the partial correlation
between Xl and X 3 , given X 2 , (iv) the best linear predictor of Xl' given
X 2 and X 3 , (v) the residual variance of Xl, given X 2 and X 3 , (vi) the residual
variance of Xl' given X 2•
390 EXPECTATION OF A RANDOM VARIABLE CH.8
7.2. Find the conditional mean of Y, given X, if X and Yare jointly continuous
random variables with a joint probability density function /r,y(;r;, y)
vanishing except for x > 0, y > 0, and in the case in which x > 0, y >
given by
°
4
(i) 5 (x + 3y)e- x - .2y,
(ii) y e- y/(l+x)
(l + x)4 '
9 l+x+y
(iii)
2: (1 + x)4(1 + y)4 .
7.3. Let X = cos 21TU, Y = sin 21TU, in which U is uniformly distributed on
°
to I. Show that for Ixl ~ 1
E*[ Y I X = xl = 0, E[YI X = x] = VI - x 2•
Find the mean square error of prediction achieved by the use of (i) the best
linear predictor, (ii) the best predictor.
7.4. Let U, V, and W be uncorrelated random variables with equal variances.
Let X = U ± W, Y = V ± W. Show that
p(X, W) = p(Y, W) = I/ v i, p(X, Y) = 0.5.
CHAPTER 9
Sums of Independent
Random Variables
Chapters 9 and 10 are much less elementary in character than the first
eight chapters of this book. They constitute an introduction to the limit
theorems of probability theory and to the role of characteristic functions
in probability theory. These chapters seek to provide a careful and
rigorous derivation of the law of large numbers and the central limit
theorem.
In this chapter we treat the problem of finding the probability law of a
random variable that arises as the sum of independent random variables.
A major tool in this study is the characteristic function of a random
variable, introduced in section 2. In section 3 it is shown that the probability
law of a random variable can be determined from its characteristic function.
Section 4 discusses some consequences of the basic result that the charac-
teristic function of a sum of independent random variables is the product
of the characteristic functions of the individual random variables. Section
5 gives the proofs of the inversion formulas stated in section 3.
variable that arises as the sum ofn independent random variables Xl' X 2 , ••• ,
Xm whose joint probability law is known. The fundamental role played by
this problem in probability theory is best described by a quotation from
an article by Harald Cramer, "Problems in Probability Theory," Annals
of Mathematical Statistics, Volume 18 (1947), p. 169.
During the early development of the theory of probability, the majority of
problems considered were connected with gambling. The gain of a player in a
certain game may be regarded as a random variable, and his total gain in a
sequence of repetitions of the game is the sum of a number of independent
variables, each of which represents the gain in a single performance of the game.
Accordingly, a great amount of work was devoted to the study of the probability
distributions of such sums. A little later, problems of a similar type appeared in
connection with the theory of errors of observation, when the total error was
considered as the sum of a certain number of partial errors due to mutually
independent causes. At first, only particular cases were considered; but
gradually general types of problems began to arise, and in the classical work of
Laplace several results are given concerning the general problem to study the
distribution of a sum
S" = Xl + X 2 + ... + Xn
of independent variables, when the distributions of the Xj are given. This
problem may be regarded as the very starting point of a large number of those
investigations by which the modern Theory of Probability was created. The
efforts to prove certain statements of Laplace, and to extend his results further
in various directions, have largely contributed to the introduction of rigorous
foundations of the subject, and to the development of the analytical methods.
In this chapter we discuss the methods and notions by which a precise
formulation and solution is given to the problem of addition of independent
random variables. To begin with, in this section we discuss the two most
important ways in which this problem can arise, namely in the analysis of
sample averages and in the analysis of random walks.
Sample Averages. We have defined a sample of size n of a random
variable X as a set of n jointly distributed random variables Xl' X 2 , ••• , Xm
whose individual probability laws coincide, for k = 1, 2, ... , n, with the
probability law of X; in particular, the distribution function FxJ) of X k
coincides with the distribution function FxO of X. We have defined the
sample as a random sample if the random variables Xl' X 2 , .•• , Xn are
independent.
Given a sample Xl' X 2 , ••. , Xn of size n of the random variable X and
any Borel function gO of a real variable, we define the sample average of
g('), denoted by Mn[g(x)], as the arithmetic mean of the values g(XJ, g(X2 ),
... ,g(Xn ) of the function at the members of the sample; in symbols,
1 n
(1.1) Mn[g(x)] = -
n k=l
2g(Xk )·
SEC. 1 THE PROBLEM OF ADDITION 393
Of special importance are the sample mean Inn' defined by
0·2)
= -
1 L" (Xk - n1 n )2
1 L}/.
= - X,,2 -
(1- n
L X" .)2
nk=[ nk=l nk=l,
For a given function gO the sample average Mn[g(x)] is a random
variable, for it is a function of the random variables Xl' X 2 , ••• , X n •
The value of M,,[g(x)] will, in general, be different when it is computed on
the basis of two different samples of size n. The sample average M.,,[g(x)],
like any other random variable, has a mean E[Mn[g(x)]], a variance
Var [M,,[g(x)]], a distribution function F.1l n fu(x)]O, a moment-generating
function ¥'JfnIY(x)](-), and, depending on whether it is a continuous or a
discrete random variable, a probability density function ilIn[g(z»)O or a
probability mass function PMn[Y(x)](-)' Our aim in this chapter and the next
is to develop techniques for computing these quantities, both exactly and
approximately, and especially to study their behavior for large sample
sizes. The reader who goes on to the study of mathematical statistics will
find that these techniques provide the framework for many of the concepts
of statistics.
To study sample averages Mn[g(x)] with respect to a random sample, it
n
suffices to consider the sum L
Y" of independent random variables
k=1
Yl> ... , Yn> since the random variables Y1 = g(XI ), . . . , Yn = g(Xn )
are independent if the random variables Xl' ... , Xn are. Thus it is seen
that the study of sample averages has been reduced to the study of sums of
independent random variables.
Random Walk. Consider a particle that at a certain time is located
at the point 0 on a certain straight line. Suppose that it then suffers
displacements along the straight line in the form of a series of steps,
denoted by Xl> X 2 , • • . , Xn> in which, for any integer k, X k represents the
displacement suffered by the particle at the kth step. The size X k of the
kth step is assumed to be a random variable with a known probability law.
The particle can thus be imagined as executing a random walk along the
line, its position (denoted by Sn) after n steps being the sum of the n steps
Xl' X 2 , ••• ,Xn ; in symbols, Sn = Xl + X2 + ... + X n. Clearly, Sn is
a random variable, and the problem of finding the probability law of Sn
naturally arises; in other words, one wishes to know, for any integer nand
394 SUMS OF INDEPENDENT RANDOM VARIABLES CR. 9
any interval a to b, the probability P[a < S" < b] that after n steps the
particle will lie between a and b, inclusive.
The problem of random walks can be generalized to two or more
dimensions. Suppose that the particle at each stage suffers a displacement
in an (x, y) plane, and let X k and Yk denote, respectively, the change in the
x- and y-coordinates of the particle at the kth step. The position of the
particle after n steps is given by the random 2-tuple (Sn, Tn), in which
Sn = Xl + X 2 + ... + Xn and Tn = YI + Y z + ... + Y n. We now
have the problem of determining the joint probability law of the random
variables Sn and Tn.
The problem of random walks occurs in many branches of physics,
especially in its 2-dimensional form. The eminent mathematical statistician,
Karl Pearson, was the first to formulate explicitly the problem of the
2-dimensional random walk. After Pearson had formulated this problem
in 1905, the renowned physicist, Lord Rayleigh, pointed out that the
problem of random walks was formally "the same as that of the com-
position of n isoperiodic vibrations of unit amplitude and of phases
distributed at random," which he had considered as early as 1880 (for this
quotation and a history of the problem of random walks, see p. 87 of
S. Chandrasekhar, "Stochastic Problems in Physics and Astronomy,"
Reviews of Modern PhYSics, Volume 15 (1943), pp. 1-89). Almost all
scattering problems in physics are instances of the problem of random
walks.
~ Example lAo A physical example of random walk. Consider the
amplitude and phase of a radar signal that has been reflected by a cloud.
Each of the water drops in the cloud reflects a signal· with a different
amplitude and phase. The return signal received by the radar system is
the resultant of all the signals reflected by each of the water drops in the
cloud; thus one sees that formally the amplitude and phase of the signal
returned by the cloud to the radar system is the sum of a (large) number of
(presumably independent) random variables. .....
In the study of sums of independent random variables a basic role is
played by the notion of the characteristic function of a random variable.
This notion is introduced in section 2.
It has been pointed out that the probability law of a random variable X
may be specified in a variety of ways. To begin with, either its probability
function Px [·] or its distribution function Fx (·) may be stated. Further,
SEC. 2 THE CHARACTERISTIC FUNCTION OF A RANDOM VARIABLE 395
if the probability law is known to be continuous or discrete, then it may
be specified by stating either its probability density function f,J) or its
probability mass function p xO. We now describe yet another function,
denoted by cP xC") called the characteristic function of the random variable X,
which has the property that a knowledge of cPxO serves to specify the
probability law of the random variable X. Further, we shall see that the
characteristic function has properties which render it partifularly useful
for the study of a sum of independent random variables.
To begin our introduction of the characteristic function, let us note the
following fact about the probability function p.,{] and the distribution
function F xO of a random variable X. Both functions can be regarded as
the value of the expectation (with respect to the probability law of X) of
various Borel functions g(.). Thus, for every Borel set B of real numbers
(2.1)
in which IBO is a function of a real variable, called the indicator function
of the set B, with value I B(X) at any point x given by
(2.2) if x belongs to B,
if x does not belong to B.
On the other hand, for every real number y
(2.3)
in which Ii') is a function of a real variable, defined by
(2.4) Iix) =1 if x <y
=0 if x > y.
We thus see that if one knows the expectation Ex[g(x)] of every bounded
Borel function gO, with respect to the probability law of the random
variable X, one will know by (2.1) and (2.3) the probability function and
distribution function of X. Conversely, a knowledge of the probability
function or of the distribution function of X yields a knowledge of E[g(X)]
for every function gO for which the expectation exists. Consequently,
stating the expectation functional E x ['] of a random variable [which is a
function whose argument is a function g(.)] constitutes another equivalent
way of specifying the probability law of a random variable.
The question arises: is there any other family of functions on the real
line in addition to those of the form of (2.2) and (2.4) such that a knowledge
of the expectations of these functions with respect to the probability law
of a random variable X would suffice to specify the probability law? We
now show that the complex exponential functions provide such a family.
396 -SUMS OF INDEPENDENT RANDOM VARIABLES CH.9
(2.14) tPx(u) =
V27T
1 fOO
-00
e iux e-~x2 dx = 1 fro
V27T -oon~O n!
!
(iux)n e-~x2 dx
_ ~ (iu)n 1
-£.,-----=
fOO n -ILX2d
xe'" X
n~O n! V27T -00
= !
m~O
(iu)2m (2m)! =
(2m)! 2mm!
! (_ 2~ u2)
m~O
m ~
m!
= e-~u2.
=e -A £
"" .Ae
, - - =e
-AeAeiu .
k~O k!
a2 +u2 0 a2 +u2
THEORETICAL EXERCISES
2.1. Cumulants and the log-characteristic function. The logarithm (to the base e)
of the characteristic function of a random variable X is often easy to
differentiate. Its nth derivative may be used to form the nth cumulant of
X, written Kn[.X], which is defined by
(2.22) 1 -d
K,,[X] = -:;;
I U
n
d n log </>x(u) Iu=o
If the nth absolute moment E[jXln] exists, then both <pxO and log <pxO
are differentiable n times and may be expanded in terms of their first n
derivatives; in particular,
(iU)2 (iu)n
(2.23) log <Px(u) = K1[X](iu) + K 2[X]2f + ... + Kn[X]---;;r + Rn(u),
in which the remainder Rn(u) is such that lul nRn(u) tends to 0 as lui tends
to O. From a knowledge of the cumulants of a probability law one may
obtain a knowledge both of its moments and its central moments. Show by
evaluating the derivatives at t = 0 of eK(t), in which K(t) = log 4>x(t), that
E[X] = Kl
E[X2] = K2 + KI2
(2.24)
E(X3] = Ka + 3K2K I + K I 3
E[X4] = K4 + 4K3Kl + 3K22 + 3K2K I 2 + KI4
Show, by evaluating the dt1rivatives of eKm(t), in which Km(t) = log 4>x(t) -,
itm and m = E[X], that . .
E[(X - m)2] = K2
(2.25) E[(X - m)3] = K3
E[(X ..;.. m)4] = K4 + 3K22.
2.2. The square root of sum of squares inequality. Prove that (2.7) holds by
showing that for any 2 random variables, X and Y,
(2.26)
Hint: Show, and use the fact, that v' x2 + y2 - v' x02 + Yo2 Z .[(x - xo)xo
+ (y - yo)yo]f v' xo2 + Yo2 for real x, y, Xo, y.() withxoyo oF O.
400 SUMS OF INDEPENDENT RANDOM VARIABLES CH. 9
EXERCISE
(3.2)
Define yO as the Fourier integral (or transform) of g('); that is, for every
real number u
(3.3) y(u) = -1 foo e-iU"'g(x) dx.
27T - 00
Then, for any random variable X the expectation E[g*(X)] may be expressed
in terms of the characteristic function 4>x(');
(3.5)
(3.8)
Further,
1 e- iub _ e-iua
(3.9) y(u) = - .
217 -IU
THEOREM 3B. If a and b, where a < b, are finite real numbers at which
the distribution function F xO is continuous, then
U~a) IIU U
= lim _1
U~CXJ
2U -u
JF
e-iUX¢x(u) duo
* In this section we use the terminology "an absolutely continuous probability law"
for what has previously been called in this book "a continuous probability law". This
is to call the reader's attention to the fact that in advanced probability theory it is
customary to use the expression "absolutely continuous" rather than "continuous."
A continuous probability law is then defined as one corresponding to a continuous
distribution function.
SEC. 3 THE CHARACTERISTIC FUNCTION-ITS PROBABILITY LAW 403
then the random variable X obeys the absolutely continuous probability law
specified by the probability density function fxO for any real number x
given by
(3.15) fx(x) =-
1 fO) .
e-- 1UX c/>x(u) duo
27T - co
One expresses (3.15) in words by saying that fxO is the Fourier transform,
or Fourier integral, of c/>x(-).
The proof of (3.15) follows immediately from the fact that at any
continuity points x and a of FxO
(3.16)
Equation (3.16) follows from (3.6) in the same way that (3.10) followed
from (3.4). It may be proved from (3.16) that (i) FxO is continuous at
every point x, (ii)fxex) = (djdx)Fxex) exists at every real number x and is
~b
given by(3.15), (iii) for any numbers a and b, Fx(b) -Fx(a) = I fxex) dx.
va
From these facts it follows that F x(') is specified by f xO and that f x(x) is
given by (3.15).
The inversion formula (3.15) provides a powerful method of calculating
Fourier transforms and characteristic functions. Thus, for example, from
a knowledge that
(3.17)
u/2
f
( sin (U/2») 2 = . CY.l eW"'j(x)
•
-0)
dx,
(3.19) fOOl
-0)
e- iux -
27T
(sin (X/2») 2
xj2
dx = feu).
Similarly, from
(3.22)
On the other hand, it is clear that the characteristic function of the sum
for any real number u is given by
EXERCISES
3.2. Prove that iff10 and!2(') are probability density functions, whose corre-
sponding characteristic functions 'hO and <P20 are absolutely integrable,
then
(3.24) L1loOf1(y - x) Nx) dx = LLoOoo e- i1'Y<pl(U)<P2(U) du.
(3.25) J
-1 00 e- WY
2". _ ro
• (Sin (UI2)) 4du = J
---
11/2
00
_ co
i(Y - x)f(x) dx.
= 0 otherwise.
SEC. 4 SOLUTION OF THE PROBLEM OF ADDITION 405
The characteristic function of Y may be written
(3.27) Ii'" .
rPy(u) = -
7T 0
e"tA coso du = Jo(Au),
in which JoO is the Bessel function of order 0, defined for our purposes by
the integral in (3.27). Is it true or false that
1 f<.O . 1
(3.28) - e-tUYJo(Au) du = --=== if Iyl < A
27T _ 00 1TV A2 _ y2
= 0 otherwise?
3.5. The image interference distribution. The amplitude a of a signal received
at a distance from a transmitter may fluctuate because the signal is both
directly received and reflected (reflected either from the ionosphere or the
ocean floor, depending on whether it is being transmitted through the air or
the ocean). Assume that the amplitude of the direct signal is a constant al
and the amplitude of the reflected signal is a constant a z but that the phase
difference e between the two signals changes randomly and is uniformly
distributed over the interval 0 to 1T. The amplitude a of the received signal
is then given by a 2 = a 1 2 + a 22 + 2a1aZcos e. Assuming these facts, show
that the characteristic function of a 2 is given by
(3.29) rPa2(u) = e iu(a,2 +a22)Jo(2ala2u).
Use this result and the preceding exercise to deduce the probability density
function of a2 .
EXERCISES
fsn(x) = (
n
~ 1)'. j~O
~ (~)(
]
-l)i(x - j)n-l if 0 :::; x :::; n.
= 0 if x < 0 or x > n.
408 SUMS OF INDEPENDENT RANDOM VARIABLES CH.9
Prove that fs.(Y) = te-~Y for y > 0; hence deduce that Sn has a x 2
distribution with n degrees of freedom.
4.5. Let Xl' X 2 , ••• , Xn be independent random variables, each normally
n
distributed with mean m and variance 1. Let S = 2: X?
j=l
(i) Find the cumulants of S.
(ii) Let T = a Y v for suitable constants a and 11, in which Y. is a random
variable obeying a X2 distribution with 11 degrees of freedom. Determine
a and 11 so that Sand T have the same means and variances. Hint: Show
that each Xl has the characteristic function
(5.9) d
du Erg(X, u)] = [0
E ou g(X, u) ].
As one consequence of theorem 5C, we may deduce (2.10).
410 SUMS OF INDEPENDENT RANDOM VARIABLES CH. 9
THEOREM 5D. Let g(x, u) be a Borel function of two variables such that
(5.5) will hold. ]f a Borel function GO exists such that
I
- 1 U e- iU"4>(u) du
2U -u
= -1
2U -u
I U
du [~OO
J eiu(v-x) dF(y) ]
III .
-00
= I oo
-00
dF(y) -
1
2U-u
du etu(v- x },
SEC. S PROOFS OF THE INVERSION FORMULAS 411
in which the interchange of the order of integration is justified by theorem
SD. Now define the functions
JU .
g(y, U) = -
1
2U -u
sin U(y - x)
e"'(Y-x) du = ----,------,--
U(y - x) ify * x
=1 ify = x.
g(y) =0 if y *x
'= 1 jf y = x.
lim _1
U~'" 2U -u
fU riUX~(u) du = U~oo
lim f'"
-00
g(y, U) dF(y)
(5.16) lim-
U~OO 7T
2l U
IjU
sin-
-
U
ut du =1 if t > 0
=0 if t = 0
= -1 if t < 0,
in which the convergence is bounded for all U and t.
A proof of (S.16) may be sketched as follows. Define
G(a) = 1o
00 sin ut
e- au - - duo
u
Verify that the improper integral defining G(a) converges uniformly for
a > 0 and that this implies that
[ "" sin ut .
- - du = hm G(a).
~O u ~o+
412 SUMS OF INDEPENDENT RANDOM VARIABLES CH.9
Now
(S.17) 1 00
o
e-au cos ut du = -2--2 '
a
a
+t
in which, for each a the integral in (S.17) converges uniformly for all t.
Verify that this implies that C(a) = tan-1 (t/a), which, as a tends to 0,
tends to n/2 or to -n/2, depending on whether t > 0 or t < O. The proof
of (S.16) is complete.
Now define
g(y) = -1 ify < x
=0 ify =x
=1 ify> x.
~ roo 1m (e-iU",cp(u)) du
nJo U
=J 00
-00
g(y) dF(y) = I _ 2F(x).
(S.19) K(z)
1
= -2n (Sin (Z/2)) 2 1
/
z2
= -n 11 0
dv (1 - v) cos vz;
~ du eill(X-V)
JU
2n -u
(1 -~)
U
= ~ Jl dveivU(x-V)(l -
2n-1
Ivl)
=- Ull
n 0
(1 - v) cos vU(x - y) dv.
SEC. 5 PROOFS OF THE INVERSION FORMULAS 413
To conclude the proof of (3.4), it suffices to show that
(5.23)
Define h(t) = [g(x + t) + g(x - t)]/2 - g*(x). From (5.24) it follows that
For d fixed r
J
181;?:Ud
K(s) ds tends to 0 as U tends to 00. Next, by the definition
of h(t) and g*(t), sup !h(t)! tends to 0 as d tends to O. Consequently, by
III ::;d
letting first U tend to infinity and then d tend to 0 in (5.26), it follows that
gu(x) tends boundedly to g*(x) as U tends to 00. The proof of (3.4) is
complete.
CHAPTER 10
Sequences
of Random Variables
Equation (Ll) may be expressed in words: for any fixed difference € the
probability of the event that Zn and Z differ by more than € becomes
arbitrarily close to 0 as n tends to infinity.
Convergence in probability derives its importance from the fact that, like
convergence with probability one, no moments need exist before it can be
considered, as is the case with convergence in mean square. It is immediate
that if convergence in mean square holds then so does convergence in
probability; one need only consider the following form of Chebyshev's
inequality: for any € > 0
(1.2)
The relation that exists between convergence with probability one and
convergence in probability is best understood by considering the following
characterization of convergence with probability one, which we state
without proof. Let Z1' Z2' ... ,Zn be a sequence of jointly distributed
random variables; Zn converges to the random variable Z with probability
one if and only if for every € > 0
(1.3) lim
N----+oo
p[(sup IZn -
n'"2N
ZI) > €] = O.
(1.8)
(1.9) p[lim Zn = 0] = 1.
n~co
The fundamental empirical fact upon which are based all applications of
the theory of probability is expressed in the empirical law oflarge numbers,
first formulated by Poisson (in his book, Recherches sur Ie probabilite des
jugements, 1837):
In many different fields, empirical phenomena appear to obey a certain general
law, which can be called the Law of Large Numbers. This law states that the
ratios of numbers derived from the observation of a very large number of similar
events remain practically constant, provided that these events are governed partly
by constant factors and partly by variable factors whose variations are irregular
and do not cause a systematic change in a definite direction. Certain values of
these relations are characteristic of each given kind of event. With the increase
in length of the series of observations the ratios derived from such observations
come nearer and nearer to these characteristic constants. They could be expected
to reproduce them exactly if it were possible to make series of observations of an
infinite length.
In the mathematical theory of probability one may prove a proposition,
called the mathematical law of large numbers, that may be used to gain
insight into the circumstances under which the empirical law of large
numbers is expected to hold. For an interesting philosophical discussion
of the relation between the empirical and the mathematical laws of large
numbers and for the foregoing quotation from Poisson the reader should
consult Richard von Mises, Probability, Statistics, and Truth, second
revised edition, Macmillan, New York, 1957, pp. 104-134.
418 SEQUENCES OF RANDOM VARIABLES CH.I0
A sequence of jointly distributed random variables, Xl' X 2 , ••• , X n ,
with finite means, is said to obey the (classical) law of large numbers if
(2.1)
Z = Xl + X 2 + ... + Xn _ E(XI + ... + Xn) ---+ 0
n n n
in some mode of convergence as n tends to 00. The sequence {Xn} is said
to obey the strong law of large numbers, the weak law of large numbers,
or the quadratic mean law of large numbers, depending on whether the
convergence in (2.1) is with probability one, in probability, or in quadratic
mean. In this section we give conditions, both for independent and
dependent random variables, for the law of large numbers to hold.
We consider first the case of independent random variables with finite
means. We prove in section 3 that a sequence of independent identically
distributed random variables obeys the weak law of large numbers if the
common mean E[X] is finite. It may be proved (see Loeve, Probability
Theory, Van Nostrand, New York, 1955, p. 243) that the finiteness of E[X]
also implies that the sequence of independent identically distributed
random variables obeys the strong law of large numbers.
In theoretical exercise 4.2 we indicate the proof of the law of large
numbers for independent, not necessarily identicaHy distributed, random
variables with finite means: if, for some a> 0
(2.2)
then
1 n
(2.3) plim - 2: (X7c - E[Xk ]) = O.
n-oo n k=1
Equation (2.2) is known as Markov's condition for the validity of the weak
law of large numbers for independent random variables.
In this section we consider the case of dependent random variables X k ,
with finite means (which we may take to be 0), and finite variances
(Jk2 = E[Xk2]. We state conditions for the validity of the quadratic mean
law oflarge numbers and the strong law of large numbers, which, while not
the most general conditions that can be stated, appear to be general enough
for most practical applications. Our conditions are stated in terms of the
behavior, as n tends to 00, of the covariance
(2.4)
Proof Since E2[X"Z"] < E[X,,2]E[Zn 2], it is clear that if the quadratic
mean law of large numbers holds and if the variances E[Xn 2 ] are bounded
uniformly in n, then (2.8) holds. To prove the converse, we prove first the
following useful identity:
(2.9)
420 SEQUENCES OF RANDOM VARIABLES CH.lO
To prove (2.9), we write the familiar formula
n n k-l
(2.10) E[(X! + ... + Xn)2] = 2: E[Xk2] + 22: 2: E[XkXj]
k=l k=lj=l
n k n
= 22: 2: E[XkXj ] - 2: E[Xk2]
k=lj=l k=l
.. n
= 22: kE[XkZk] - 2: E[Xk2],
k=! k=l
Letting first n tend to infinity and then N tend to 00 in (2.12), we see that
(2.11) holds. The proof of theorem 2A is complete.
If it is known that en tends to 0 as some power of n, then we can
conclude that convergence holds with probability one.
THEOREM 2B. A sequence of jointly distributed random variables
{Xn} with zero mean and uniformly bounded variances obeys the strong
law of large numbers (in the sense that pL~~n;, Zn = 0] = l) if positive
constants M and q exist such that for aU integers n
(2.13)
(2.15)
1 n
- 2:kCk
n2 k=l -
M
< -n2
n
.2:k1-q
k=l
< -n2
-
M i
1
n +1
x 1- q dx <-
-
- -.
2 - q nq
4M 1
SEC. 2 THE LAW OF LARGE NUMBERS 421
By (2.15) and (2.9), it follows that for some constant M' and q > 0
Choose now any integer r such that r> (l/q) and define a sequence of
random variables ZI" Z2" ... , Zm' by taking for Zm' the mTth member of
the sequence {Zn}; in symbols,
(2.17) form = 1,2,···.
(2.21)
(2.22) p[limum=oJ
m_co
=p[limvm=oJ 1n-CO
=l.
422 SEQUENCES OF RANDOM VARIABLES CH.lO
In view of theorem lA, to show that (2.22) holds, it suffices to show that
ro 00
We prove that (2.23) holds by showing that for some constants M u and M v
(2.27) (1 + -l)T -
111
1< -1 r2
111
T- 1 •
(2.29)
from which one may infer the second part of (2.24). The.proof of theorem
2B is now complete.
EXERCISES
p[ lim F,,(k)
'n __ OC
= -Nt] = 1.
SEC. 2 THE LAW OF LARGE NUMBERS 423
2.2. The distribution of digits in the decimal expansion of a random number.
Let Y be a number chosen at random from the unit interval (that is, Y is
a random variable uniformly distributed over the interval 0 to 1). Let
Xl' X 2 , ••• be the successive digits in the decimal expansion of Y; that is,
Xl X Xn
Y = 10 + 1022 + ... + IOn + ....
Prove that the random variables Xl> X2 , ••• are independent discrete
random variables uniformly distributed over the integers 0 to 9. Conse-
quently, conclude that for any integer k (say, the integer 7) the relative
frequency of occurrence of k in the decimal expansion of any number Y
in the unit interval is equal to -l'o for all numbers Y, except a set of numbers
Y constituting a set of probability zero. Does the fact that only 3's occur
t
in the decimal expansion of contradict the assertion?
2.3. Convergence of the sample distribution function and the sample characteristic
function of dependent random variables. Let Xl' X 2 , ••• , Xn be a sequence
of random variables identically distributed as a random variable X. The
sample distribution function Fn(Y) is defined as the fraction of observations
among Xl' X 2 • ••. , Xn which are less than or equal to y. The sample
characteristic function "'n(lI) is defined by
. 1 ~ . ,r
"'n(lI) = Mn[e tuX] = - .L. e"'"'".
n k =l
Show that F,,(y) converges in quadratic mean to F x(Y) = P[X :-::; V], as
n ->- 00, if and only if
(2.30)
Show that "'n(lI) converges in quadratic mean to "'x(u) = E[e iIlX ] if and
only if
(2.31)
Prove that (2.30) and (2.31) hold if the random variables Xl' X 2, • •• are
independent.
2.4. The law of large numbers does not hold for Cauchy distributed random
variables. Let Xl, X 2 , . . • , Xn be a sequence of independent identically
distributed random variables with probability density functions !Yn(x) =
[1T(1 + x2)]-I. Show that no finite constant m exists to which the sample
means (Xl + ... + X n)!11 converge in probability.
2.5. Let {X n } be a sequence of independent random variables identically dis-
tributed as a random variable X with finite mean. Show that for any
bounded continuous function{(·) of a real variable t
(2.32) lim
'n __ c:tJ
E[f(XI + ...n + Xn)] =f(E[X]).
424 SEQUENCES OF RANDOM VARIABLES CH.1O
Consequently, conclude that
(2.33) lim
"->00
II III (Xl + ... + Xn)
0
. ..
0 n
d-e l ... dXn = f(i)
(2.34) lim
n~oo k=O
i I (!:)n (~) t k (1 - t)n-k = 1(1), o :::; I :::; 1.
and read "the law of Zn converges to the law of Z" if anyone (and
consequently all) of the following equivalent statements holds:
(i) For every bounded continuous function gO of a real variable there
is convergence of the expectation E[g(Z,,)] to E[g(Z)]; that is, as n tends
to co,
(3.2) E[g(Z,,)] = J:oog(z) dFzn(z)--'}o LX'oog(Z) dFz(z) = E[g(Z)].
P z,.[{z: g(z) < y}] = Fg(z,/-Y) --'>- Fg(z)(Y) = Pz[{z: g(z) < y}]
at every real number y at which the distribution function Fg(z)(·) is
continuous.
Let us indicate briefly the significance of the most important of these
statements. The practical meaning of convergence in distribution is
expressed by (iii); the reader should compare the statement of the central
limit theorem in section 5 of Chapter 8 to see that (iii) constitutes an exact
mathematical formulation of the assertion that the probability law of Z
"approximates" that of Zn. From the point of view of establishing in
practice that a sequence of random variables converges in distribution, one
uses (ii), which constitutes a criterion for convergence in distribution in
terms of characteristic functions. Finally, Cv) represents a theoretical fact
of the greatest usefulness in applications, f~r it asserts that if Zn converges
in distribution to Z then a sequence of random variables gCZ..}, obtained
as functions of the Zn' converges in distribution to g(Z) if the function g(-)
is continuous.
We defer the proof of the equivalence of these statements to section 5.
The Continuity Theorem of Probability Theory. The inversion formulas
of section 3 of Chapter 9 prove that there is a one-to-one correspondence
between distribution and characteristic functions; given a distribution
function F(·) and its characteristic function
(ii) for any u such that 3u2E[X2] < 1, log c/>x(u) exists and satisfies
To show (3.8), we write [by (3.7)] ,that log g;x(u) = log (1 ---' r), in which
Now Irl < 3u2E[X2]/2, so that Irl < t if u is such that 3u2E[X2] < 1. For
any complex number r of modulus Irl < t
log (l - r) = -r I 1
I
- - dt,
01 - rt
1 t
(3'12) log (1 - r) + r = -r2 [
- - dt,
~O 1 - rt
Ilog (1 - r) - ( -r)1 < Irl2 < (%)u 4 E2[X2],
_ u2 1o
1
dt(1 - t)E[X2(eilltX -1)] = (iU)3 (1 dt(l
2 Jo
_ t)2EfX3eiIl1X].
LEMMA 3B. In the same way that (3.7) and (3.8) are obtained, one may
obtain expansions for the characteristic function of a random variable Y
whose mean E[ Y] exists:
(3.14)
Let Z be any random variable that is normally distributed with mean 0 and
variance 1. We now show that the sequence {Z,,} converges in distribution
to z. To prove this assertion, we first write the characteristic function of
Z" in the form
(3.19)
_ 12 0 sq2+p2 041
- - 2-n u + -6 u (3
n'Pq)'A. + 3 lui 2'
n
in which 0 is some number such that 101 <1.
SEC. 3 CONVERGENCE IN DISTRIBUTION 429
In view of (3.16) and (3.19), we see that for fixed u =I=- 0 and for n so
large that n > 3u2 ,
1 () q2 + p2 3(}lul'
(3.20) log rPz (u) = - - u2
n 2
+ -6 u 3 -(npq)/f
-,- + - - ,
n
EXERCISES
3.1. Prove lemma 3C.
3.2. Let Xl' X 2 , ••• ,X" be independent random variables, each assuming
each of the values + 1 and -1 with probability t. Let Y" = '2" X;/2;.
j=l
Find the characteristic function of Y" and show that, as n tends to co, for
each u, <PY,,(u) tends to the characteristic function of a random variable Y
uniformly distributed over the interval -1 to 1. Consequently, evaluate
P[ -2 < Y" :c:; tJ, P[t < Y" :c:; i] approximately.
3.3. Let Xl' X 2 , ••• , X" be independent random variables, identically distri-
buted as the random variable X. For n = 1, 2, ... , let
Z n = 8" - E[8n ] . "
8 = X1 + X 2 + ... X".
a[8n ]
Assuming that X is (i) binomial distributed with parameters n = 6 and
p = i. (ii) Poisson distributed with parameter A = 2. (iii) x2 distributed
with v = 2 degrees of freedom, for each real number u, show that lim log
"_00
<pz,,(u) = -tu 2. Consequently, evaluate P[18 :c:; 8 10 :c:; 20] approximately.
3.4. For any integer rand 0 < p < 1 let N(r, p) denote the minimum number
of trials required to obtain r successes in a sequence of independent repeated
Bernoulli trials, in which the probability of success at each trial is p. Let
Z be a random variable x2 distributed with 2r degrees of freedom .. Show
that, at each u, lim q,2pN(r,p)(u) = q,z(u). State in words the meaning of this
result. p-O
3.5. Let Z" be binomial distributed with parameters nand p = A/n. in which
A > 0 is a fixed constant. Let Z be Poisson distributed with parameter A.
For each u, show that lim q,zn(u) = q,z(u). State in words the meaning
of this result. n- 00
3.6. Let Z be a random variable Poisson distributed with parameter A. By use
of characteristic functions, show that as A tends to co
The random variables ZI' Z2' ... , Z" are called the sequence of
normalized consecutive sums of the sequence Xl' X2 , . . • , X".
That the central limit theorem is true under fairly unrestrictive conditions
on the random variables Xl' X 2 , ••• was already surmised by Laplace and
Gauss in the early 1800's. However, the first satisfactory conditions,
backed by a rigorous proof, for the validity of the central limit theorem
were given by Lyapunov in 1901. In the 1920's and 1930's the method of
characteristic functions was used to extend the theorem in several directions
and to obtain fairly unrestrictive necessary and sufficient conditions for its
validity in the case in which the random variables Xl' X 2 , ••. are indepen-
dent. M ore recent years have seen extensive work on extending the central
limit theorem to the case of dependent random variables.
The reader is referred to the treatises of B. V. Gnedenko and A. N.
Kolmogorov, Limit Distributions for Sums of Independent Random
Variables, Addison-Wesley, Cambridge, Mass., 1954, and M. Loeve,
Probability Theory, Van Nostrand, New York, 1955, for a definitive
treatment of the central limit theorem and its extensions.
From the point of view of the applications of probability theory, there
are two main versions of the central limit theorem that one should have at
his command. One should know conditions for the validity of the central
limit theorem in the cases in which (i) the random variables Xl' X 2 , •••
are independent and identically distributed and (ii) the random variables
Xl' X 2 , . . • are independent but not identically distributed.
THEOREM 4A. THE CENTRAL LIMIT THEOREM FOR INDEPENDENT IDENTI-
CALLY DISTRIBUTED RANDOM VARIABLES WITH FINITE MEANS AND VARIANCES.
For n = 1,2, ... let Xn be identically distributed as the random variable
X, with fil!ite mean E[X] and standard deviation a[X]. Let the sequence
{Xl1 } be independent, and let Z" be defined by (4.1) or, more explicitly,
(Xl + ... + X,,) - nE[X]
(4.3) Z - -----=-----
n - Vna[X] .
Then (4.2) will hold.
THEOREM 4B. THE CENTRAL LIMIT THEOREM FOR INDEPENDENT RANDOM
VARIABLES WITH FINITE MEANS AND (2 + o)th CENTRAL MOMENT, FOR SOME
0> O. For n = 1,2, ... letXn be a random variable with. finite mean E[Xnl
and finite (2 + o)th central moment f1(2 + 0; n) = E[IXn - E[Xn]1 2 +O].
432 SEQUENCES OF RANDOM VARIABLES CH.I0
Let the sequence {X.. } be independent, and let Zn be defined by (4.1).
Then (4.2) will hold if
1 n
(4.4) lim 2+<'[S] ,,(2 + l5; k) = 0,
Z
....... '" a ..
.. k=l
It is clear that to prove (4.7) will hold we need prove only that the integral
in (4.6) tends to 0 as n tends to infinity. Define g(x, t, u) = x2(eitUX/V;'" - 1).
Then, for any M> 0
Then
(4.10) Iio
i
dt(l - t)E[g(X, t, u)] I< 0' •Mlul
r
vn
+2 i ~~M
x2 dFx(x),
SEC. 4 THE CENTRAL LIMIT THEOREM 433
which tends to 0, as we let first n tend to 00 and then M tend to 00. The
proof of the central limit theorem for identically distributed independent
random variables with finite variances is complete.
We next prove the central limit theorem under Lyapunov's condition,
For k = 1,2, ... ,let X k be a random variable with mean 0, finite variance
U k 2 , and (2 + 15)th central moment ",(2 + 15; k). We have the following
expansion of the logarithm of its characteristic function, for u such that
3U2 Uk 2 < 1:
= - -21 U2k=l
2
(4.13) log 4>zJu) !n log 4>Xk ( -(J[Snl
= k=l U )
.!n -2-
(Jk
[S"l (J
(~)4 <
a[Sn] -
(~)2+<'< ",(2
a[Snl -
+ 0; k)
g2H[Snl '
The proof of the central limit theorem under Lyapunov's condition is
complete.
THEORETICAL EXERCISES
4.1. Prove that the central limit theorem holds for independent random variables
Xl' X 2 , • •• with zero means and finite variances obeying Lindeberg's
condition: for every € > 0
(4.14) lim 1-
-2 !n
n_ 00 (J [S,,] k=l
I x 2 dFxix) = O.
11'1 ::0>: Ea[Snl
Hint: In (4.8) let M = €(J[Sn], replacing aV;; by u[S,,]. Obtain thereby
an estimate for E[Xk2(eiutXk/a[SnJ - 1)]. Add these estimates to obtain
an estimate for log <pzn(u), as in (4.13).
434 SEQUENCES OF RANDOM VARIABLES CH.1O
4.2. Prove the law of large nl!mbers under Markov's condition. Hint: Adapt
the proof of the central limit theorem under Lyapunov's condition, using
the expansions (3.13).
4.3. Jensen's inequality and its consequences. Let X be a random variable, and
let! be a (possibly infinite) interval such that, with probability,one, X takes
its values in I; that is P[X lies in 1]= 1. Let gO be a function of a real
variable that is twice differentiable on '1 and whose second derivative satisfies
g"(x) :::: 0 for all x in 1. The function gO is then said to be convex on 1.
Show that the following inequality (Jensen's inequality) holds:
(4.15) g(E[X]) :s:; E[g(X)].
Hint: Show by Taylor',. theorem that g(x) :::: g(x o) + g'(xo)(x - x o)' Let
Xo = E[X] and take the expectation of both sides of the inequality. Deduce
from (4.15) that for any r :2: 1 and s > 0 .
(4.16) IE[X]IT :s:; £1,[1 XI] :s:; E[IXI']
(4.17) E'[iXl"] :s:; E[lXI"8].
Conclude from (4.17) that if 0 < r1 < r2 then
(4.18) E1/I'\[IXI"] :s:; El/r2[!Xy2].
In particular, conclude that
(4.19) ErlXI] :s:; E!'~[lXI2] :s:; E~[IXI3] :s:; ••.•
4.4. Let {Un} be a sequence 'ofindependen't random variables, each uniformly
distributed on the interval 0 to n. Let {An} be a sequence of positive
const~nts. State conditions under which the sequence Xn = An cos Un
obeys the central limit theorem.
= 1- (a ~ z) if a- d < z< a
-_l_(z-d b) jf b < z< b +d
=0 otherwise.
SEC. 5 PROOFS OF THEOREMS 435
The functiongi') is continuous and integrable. Its Fourier transform yi')
is given for any u by
= e~a) ifa<z<a+d
= (b ~ z) ifb-d<z<b
=0 otherwise.
By the foregoing argument, one may prove that (5.4) holds for gd *(').
Now, the expectations of the functions gi-) and gd *0 clearly straddle the
quantity FZn(b) - FZn(a):
(5.12) g(x; E, M) = °
if Ixl > M,
= if ak - 1 < x <ak ,
g(ak ) for some k = 1,2,"·, K,
=g(-M) if x = -M.
SEC. 5 PROOFS OF THEOREMS 437
It is clear that for Ixl < M
(5.13) Ig(x) - g(x; E, M)I < E.
Now
(5.14) If-"""" g(x) dFn(x) - L"oog(x) dF(x) I< 11,,1 + IJ"I + IJI,
where In = L"'", [g(x) - g(x; E, M)] dFnex)
Next, we may write In as a sum of two integrals, one over the range
Ixl < M and the other over the range Ixl > M. In view of (5.13), we then
have
11,,1 < E + C[l - Fn(M) + FnC -M)].
Similarly
III < E + C[I - F(M) + F(-M)].
In view of (5.14), (5.15), and the two preceding inequalities, it follows that
THEORETICAL EXERCISES
(5.17) lim
,ll~co
lim sup
n~OCl Jrz 2:JfIzi dFzn(z) =
1l
0,
Hint: To any E > 0, choose points - 00 = Zo < 21 < ... < zI\. = 00, so
that Fz(zj) - FZ(Zj_l) < E for j = 1,2, ... , K. Verify that
supremum IFzn(z) - Fz(z) 1 :S max IFzn(Zj) - Fz(zj)! + E.
-OCl<Z<CO j~O,l,···,K
Tables
TABLE I
Area under the Normal Density Function
A table of <l>(x) = ~
''/277 -
If ro
e- HY ' dy
x 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 ·5000 ·5040 .5080 ·5120 ·5160 ·5199 .5239 ·5279 ·5319 ·5359
0.1 ·5398 .5438 ·5478 ·5517 ·5557 .5596 .5636 ·5675 .5714 ·5753
0.2 ·5793 .5832 .5871 ·5910 ·5948 .5987 .6026 ,6064 .6103 .6141
0·3 .6179 .6217 .6255 .6293 .6331 .6368 .6406 .6443 .6480 .6517
0.4 .6554 .6591 .6628 .6664 .6700 .6736 .6772 .6808 .6844 .6879
0·5 .6915 .6950 .6985 ·7019 .7054 .7088 ·7123 ·7157 ·7190 .7224
0.6 ·7257 .7291 .7324 ·7357 .7389 ·7422 .7454 .7486 ·7517 .7549
0·7 ·7580 ·7611 .7642 ·7673 ·7704 ·7734 ·7764 ·7794 ·7823 .7852
0.8 .7881 ·7910 ·7939 ·7967 ·7995 .8023 .8051 .8078 .8106 .8133
0·9 .8159 .8186 .8212 .8238 .8264 .8289 .83,1.5 .8340 .8365 .8389
1.0 .8413 .8438 .8461 .8485 .8508 .8531 .8554 .8577 .8599 .8621
1.1 .8643 .8665 .8686 .8708 .8729 .8749 .8770 .8790 .8810 .8830
1.2 .8849 .8869 .8888 .8907 .8925 .8944 .8962 .8980 .8997 ·9015
1.3 .9032 .9049 .9066 ·9082 ·9099 ·9115 ·9131 .9147 .9162 .9177
1.4 ·9192 ·9207 .9222 ·9236 .9251 .9265 .9279 ·9292 .9306 .9319
1.5 ·9332 .9345 ·9357 .9370 .9382 .9394 .9406 .9418 .9429 .9441
1.6 .9452 ·9463 .9474 .9484 .9495 ·9505 .9515 .9525 ·9535 .9545
1.7 .9554 . 9564 ·9573 .9562 . ·9591 ·9599 .9608 .9616 .9625 .9633
1.8 .9641 .9649 .9656 ·9664 .9671 .9678 .9686 .9693 .9699 ·9706
1.9 ·9713 .9719 .9726 ·9732 .9738 .9744 ·9750 ·9756 .9761 .9767
2.0 .9772 .9778 .9783 ·9788 .9793 .9798 .9803 .9808 .9812 .9817
2.1 .9821 .9826 .9830 ·9834 ·9838 .9842 .9846 .9850 .9854 .9857
2.2 ·9861 .9864 ·9868 ·9871 .9875 .9878 .9881 .9884 .9887 .9890
2·3 .9893 .9896 .9898 ·9901 .9904 .9906 ·9909 ·9911 ·9913 .9916
2.4 .9918 ·9920 ·9922 ·9925 .9927 ·9929 ·9931 ·9932 .9934 .9936
2·5 .9938 ·9940 .9941 .9943 .9945 .9946 ·9948 ·9949 ·9951 ·9952
2.6 ·9953 ·9955 .9956 ·9957 .9959 ·9960 .9961 .9962 .9963 .9964
2·7 .9965 .9966 .9967 .9968 .9969 ·9970 ·9971 .9972 ·9973 .9974
2.8 ·9974 ·9975 .9976 ·9977 ·9977 .9978 ·9979 ·9979 .9980 ·9981
2·9 .9981 ·9982 .9982 .9983 .9984 .9984 .9985 .9985 .9986 .9986
3·0 .9987 .9987 .9987 ·9988 .9988 ·9989 .9989 .9989 ·9990 ·9990
3·1 .9990 ·9991 ·9991 ·9991 .9992 ·9992 ·9992 ·9992 ·9993 ·9993
3·2 ·9993 ·9993 .9994 .9994 .9994 .9994 .9994 ·9995 ·9995 ·9995
3·3 ·9995 ·9995 ·9995 .9996 ·9996 .9996 .9996 .9996 .9996 ·9997
3.4 ·9997 ·9997 ·9997 ·9997 ·9997 ·9997 ·9997 ·9997 ·9997 .9998
3.6 .9998 ·9998 ·9999 ·9999 ·9999 ·9999 ·9999 ·9999 ·9999 ·9999
441
-l'>-
TABLE II -l'>-
N
Binomial Probabilities
nIxl p
.01 .05 .10 .15 .20 .25 .30 1.
3' ·35 .40 .45 .49 .50
2 0 ·9801 ·9025 .8100 ·7225 .6400 .5625 .4900 .4444 .4225 ·3600 ·3025 .2601 .2500
s::
0
1 .0198 .0S50 .1800 .2550 ·3200 .3750 .4200 .4444 .4550 .4800 .4950 .4998 ·5000 I:l
2 .0001 .0025 .0100 .0225 .0400 .0625 .0900 .1111 .1225 .1600 .2025 .2401 .2500 m
::0
.2160 Z
3 0 .9703 .8574 .7290 .6141 ·5120 .4219 .3430 .2963 .2746 .1664 .1327 .1250
."
1 .0294 .1354 .2430 ·3251 ·3840 .4219 .4410 .4444 .4436 .4320 .4084 .3823 .3750 ::0
2 .0003 .0071 .0270 .0574 .0960 .1406 .1890 .2222 .2389 .2880 .3674
3 .0000 .0001 .0010 .0034 .0080 .0156 .0270 .0370 .042~ .0640
·3341
.0911 .1176
·3750
.1250 ~
;.-
to
4 0 .9606 .8145 .6561 ·5220 .4096 .3164 .2401 .1296
1 .0388 .1715 .2916 ·3685 .4096 .4219 .4116
.1975
·3951
.1785
·3845 .3456
.0915
.2995
.0677
.2600
.0625
.2500 r::
2 .0006 .0135 .0486 .0975 .1536 .2109 .2646 .2963 .3105 .3456 ·3675 .3747 ·3750
::i
...::
3 .0000 .0005 .0036 .0115 .0256 .0469 .0756 .0988 .1115 .1536 .2005 .2400 .2500 -I
4 .0000 .0000 .0001 .0005 .0016 .0039 .0081 .0123 .0150 .0256 .0410 .0576 .0625 :r:
m
5 0
1
·9510 .7738
.0480 .2036
·5905 .4437
.3280 ·3915
·3277
.4096
.2373
·3955
.1681
·3602
.1317
·3292
.1160
·3124
.0778
.2592
.0503
.2059
.0345
.1657
.0312
.1562 ~
2 .0010 .0214 .0729 . ~~3(,2 .2048 .2637 .3087 .3292 .3364 .3456 ·3369 .3185 ·3125
><
3 .0000 .0011 .0081 .0244 .0512 .0879 .1323 .1646 .1811 .2304 .2757 ·3060 ·3125
4 .0000 .0000 .0004 .0022 .0064 .0146 .0284 .0412 .0488 .0768 .1128 .1470 .1562
5 .0000 .0000 .0000 .0001 .0003 .0010 .0024 .0041 .0053 .0102 .0185 .0283 .0312
6 0 .9415 ·7351 .5314 ·3771 .2621 .1780 .1176 .0878 .0754 .0467 .0277 .0176 .0156
1 .0571 .2321 .3543 ·3993 ·3932 ·3560 .3025 .2634 .2437 .1866 .1359 .1014 .0938
2 .0014 .0305 .0984 .1762 .2458 .2966 ·3241 ·3292 .3280 ·3110 .2780 .2437 .2344
3 .0000 .0021 .0146 .0415 .0819 .1318 .1852 .2195 .2355 .2765 .3032 .3121 ·3125
4 .0000 .0001 .0012 .0055 .0154 .0330 .0595 .0823 .0951 .1382 .1861 .2249 .2344
5 .0000 .0000 .0001 .0004 .0015 .0044 .0102 .0165 .0205 .0369 .0609 .0864 .0938
6 .0000 .0000 .0000 .0000 .0001 .0002 .0007 .0014 .0018 .0041 .0083 .0139 .0156
TABLE II (Continued)
7 0 ·9321 .6983 .4783 ·3200 .2097 .1335 .U824 .0585 .0490 .0280 .0152 .0090 .0078
1. .OS5~ .2573 ·3720 . 395u ·3670 ·3115 .2471 .2048 .1848 ·1306 .0872 .0603 .0547
2 .0020 .0406 .1240 .2097 .2753 ·3115 ·3177 ·3073 .2985 .2513 .2140 .1740 .1641
3 .oouo .0036 .0230 .0617 .11 47 .1730 .2269 .2561 .2679 .2903 .2918 .2786 .2734
4 .ooou .0002 .0026 .0109 .0287 .0577 .0972 .1280 .1442 .1935 .2388 .2576 .2734
5 .0000 .0000 .0002 .0012 .0043 .0115 . 0250 .0384 .0455 .0774 .1172 .1543 .1641
6 .0000 .0000 .0000 .0001 .0004 .0013 .0036 .0064 .0084 .0172 .0320 .0494 .0547
7 .0000 .0000 .0000 .0000 .0000 .0001 .0002 .0005 .0006 .0015 .0037 .0068 .0078
8 0 .9227 .6634 .4305 .2725 .1678 .1001 .0576 .0390 .0319 .0168 .0084 .0046 .0039
1 .0746 .2793 .3826 .3847 ·3355 .2670 .1977 .1561 .1373 .0896 .0548 .0352 .0312
2 .0026 .0515 .1488 .2376 .2936
3
4
.0001
.0000
.0054
.0004
.0331
.0046
.0839 .1468
·3115
.2076
.2965
.2541
.2731
.2731
.2587
.2786
.2090
.278J
.1569
.2568
.1183
.2273
.1094
.2188 6t:::1
.0185 .0459 .0865 .1361 .1707 .1875 .2322 .2627 .2730 .2734 tTl
5 .0000 .0000 .0004 .0026 .0092 .0231 .0467 .0683 .0808 .1239 .1719 .2098 .2188 ::>::I
6 .0000 .0000 .0000 .0002 .0011 .0038 .0100 .0171 .0217 .0413 .0703 .1008 .1094 Z
.0000
~
7 .0000 .0000 .0000 .0001 .0004 .0012 .0024 .0033 .0079 .0164 .0277 .0312
8 .0000 .0000 .0000 .0000 .0000 .0000 .0001 .0002 .0002 .0007 .0017 .0033 .0039
9 0 ·9135 .6302 .3874 .2316 .1342 .0751 .0404 .0260 .0207 .0101 .0046 .0023 .0020
1 .0830 .2985 .3874 ·3679 ·3020 .2253 .1556 .1171 .1004 .0605 .0339 .0202 .0176
E;
2 .0034 .0629 .1722 .2597 ·3020 .3003 .2668 .2341 .2162 .1612 .1110 .0776 .0703 r;:j
3 .0001 .0077 .0446 .1069 .1762 .2336 .2668 .2731 .2716 .2508 .2119 .1739 .1641
4 .0000 .0000 .0074 .0283 .0661 .1168 .1715 .2048 .2194 .2508 .2600 .2506 .2461
><
>oj
5 .0000 .0000 .0008 .0050 .0165 .0389 .0735 .1024 .1181 .1672 .2128 .2408 .2461
6 .0000 .0000 .0001 .0006 .0028
::t:
.0087 .0210 .0341 .0424 .0743 .1160 .1542 .1641 tTl
7 .0000 .0000 .0000 .0000 .0003 .0012 .0039 .0073 .0098 .0212 0
.0407 .0635 .0703 ::>::I
8 .0000 .0000 .0000 .0000 .0000 .0001 .0004 .0009 .0013 .0035 .0083 .0153 .0176 ><
9 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0001 .0001 .0003 .0008 .0016 .0020
10 0 .9044 .5987 ·3487 .1969 .1074 .0563 .0282 .0173 .0135 .0060 .0025 .0012 .0010
1 .0914 ·3151 .3874 .3474 .2684 . 1877 .1211 .0867 .0725 .0403 .0207 .0114 .0098
2 .0042 .0746 .1937 .2759 .3020 .2816 .2335 .1951 .1757 .1209 .0763 .0495 .0439
3 .0001 .0105 .0574 .1298 .2013 .2503 .2668 .2601 .2522 . 2150 .1665 .1267 .1172
4 .0000 .0010 .0112 .0401 .0881 .1460 .2001 .2276 .2377 .2508 .2384 .2130 .2051
5 .0000 .0001 .0015 .0085 .0264 .0584 .1029 .1366 .1536 . 2007 .2340 .2456 .2461
6 .0000 .0000 .0001 .0012 .0055 .0162 .0368 .0569 .0689 .1115 .1596 .1966 .2051
7 .0000 .0000 .0000 .0001 .0008 .0031 .0090 .0163 .0212 .0425 .0746 .1080 .1172
8 .0000 .0000 .0000 .0000 .0001 .0004 .0014 .0030 .0043 .0106 .0229 .0389 .0439
9 .0000 .0000 .0000 .0000 .0000 .0000 .0001 .0003 .0005 .0016 .0042 .0083 .0098 ~
10 .0000 .0000 .0000 .0000 ~
.0000 .0000 .0000 .0000 .0000 .0001 .0003 .0008 .0010 IoN
~
TABLE ill .j:>.
.j:>.
Poisson Probabilities
A table of e-AJ.."/x! for J.. = 0.1(0.1)2(0.2)4(1)10
~Ix 0 1 2 3 4 5 6 7 8 9 10 11 12
2.2 .1108 .2438 .2681 .1966 .1082 .0476 .0174 .0055 .0015 .0004 .0001 .0000
2.4 .0907 ·2l77 .2613 .2090 .1254 .0602 .0241 .0083 .0025 .0007 .0002 .0000
2.6 .0743 .1931 .2510 .2176 .1414 .0735 .0319 .0118 .0038 .0011 .0003 .0001 .0000
2.8 .0608 .1703 .2384 .2225 .1557 .0872 .0407 .0163 .0057 .0018 .0005 .0001 .0000
3·0 .0498 .1494 .2240 .2240 .1680 .1008 .0504 .0216 .0081 .0027 .0008 .0002 .0001
3·2 .0408 .1304 .2087 .2226 .1781 .1140 .0608 •02 78 .0111 .0040 .0013 .0004 .0001
3·4 .0334 .1135 .1929 .2186 .1858 .1264 .0716 .0348 .0148 .0056 .0019 .0006 .0002
.0984 .2125 .1912 .0826 .0191 .0028 .0003
3.6
3.8
4.0
.0273
.0224
.0183
.0850
.0733
.1771
.1615
.1465
.2046
.1954
.1944
.1954
.1377
.1477
.1563
.0936
.1042
.0425
.0508
.0595
.0241
.0298
.0076
.0102
.0132
.0039
.0053
.0009
.0013
.0019
.0004
.0006
8lil
.0067 .0337 .0842 .1404 .1462 .1044 .0653 .0363 .0181 .0082 .0034
Z
5·0 .1755 .1755
6.0 .0025 .0149 .0446 .0892 .1339 .1606 .1606 .1377 .1033 .0688 .0413 .0225 .0113 ;g
~
7·0 .0009 .0064 .0223 .0521 .0912 .1277 .1490 .1490 .1304 .1014 .0710 .0452 .0264
8:0 .0003 .0027 . 0107 .0286 .057:3 .0916 .1221 .1396 .1396 .1241 .0993 .0722 .0481
9.0 .0001 .0011 .0050 .0150 .0337 .0607 .0911 .1171 .1318 .1318 .1186 .0970 .0728
10.0 .0000 .0005 .0023 .0076 .0189 .0378 .0631 .0901 .1126 .1251 .1251 .1137 .0948
S
...::
~Ix 13 14 15 16 17 18 19 20 21 22 23 24 tool
::r::
[g
5·0
5.0
.0013
.0052
.0005
.0022
.00u2
.0009 .0003 .0001 '"
...::
7·0 .0142 .0071 .0033 .0014 .00uS .0002 .0001
8.0 .0296 .0169 .0090 .0045 .0021 .0009 .0004 .0002 .0001
9·0 .0504 .0324 .0194 .0109 .0058 .0029 .0014 .0006 .0003 .0001
10.0 .0729 .0521 .0347 .0217 .0128 .0071 .0037 .0019 .0009 .0004 .0002 .0001
t
VI
Answers to
Odd-numbered Exercises
CHAPTER 1
4.1. S = {(D, D, D), (D, D, G), (D, G, D), (D, G, G), (G, D, D), (G, D, G),
(G, G, D), (G, G, G)}, A, = {(D, D, D), (D, D, G), (D, G, D), (D, G, G)},
AlA2 = {(D, D, D), (D, D, G)}, A, U A2 = {(D, D, D), (D, D, G), (D, G, D),
(D, G, G), (G, D, D), (G, D, G)}.
4.3. (i), (xvi) {I, 2, 3}; (ii), (viii) {I, 2, 3, 7, 8, 9}; (iii), (iv), (vii), (xiii) {10, 11, 12};
(v), (vi), (xiv) {I, 2, 3,7, 8, 9, 10, 11, 12}; (ix), (xii), {4, 5, 6}; (xi) {I, 2, 3,4,5,6,
7,8,9}; (x), (xv) S.
4.5. (i) {IO, 11, 12}; (ii) {I, 2, 3); (iii) {4, 5,6,7,8, 9}; (iv) </>; (v) S;
(vi) {I, 2,3,4,5,6,7,8, 9}; (vii) {4, 5, 6, 7, 8, 9}; (viii) </>; (ix) {10, 11, 12};
(x){1,2,3, 10, II, I2}; (xi)S; (xii)S.
1
*t 0
t
1
-.
5
1
~-
"
6
t
5
9
~
•
8
°
-}
6 9 I
1 1 1
5.9. N [exactly 0] = 400. N [exactly 1] = 400. N [exactly 2] = 100.
N [at least 0] = 900. N [at least 1] = 500. N [at least 2] = 100.
N [at most 0] = 400. N [at most 1] = 800. N [at most 2] = 900.
447
448 MODERN PROBABILITY THEORY
5.11. Let M, W, and C denote, respectively, a set of college graduates, males and
married persons. Show N[M U W U C] = 1057 > 1000.
7.1. 12/2l.
7.3. (i) 0.14, (ii) 0.07.
7.5. t.
CHAPTER 2
1.1. 450.
1.3. 10,32.
1.5. 10.
1.7. (i) 70; (ii) 2.
1.9. n = IS, r = 10.
1.11. 204, 54, lOS, 9S.
1.13. 2205.
2.1. Without replacement, (i) ,'\" (ii) H, (iii) H; with replacement, (i) H, (ii) ~!,
(iii) H.
2.3. k 2, 12 3, 11 4,10 5,9 6, S 7
with replacement l. 2
3-. la 36
.. 3\
6
3-.
3.7. (45)5/(50)5'
3.9. 0.1.
3.11. Manufacturer would prefer plan (a), consumer would prefer plan (b).
3.13. (900)5/(1000)5 == (0.9)5 = 0.59.
3.15. H.
4.1. t.
ANSWERS TO ODD-NUMBERED EXERCISES 449
4.3. (i), (ii) '"46.; (iii) ,s.;.
4.5. (i) False, since P[ABl = t; (ii) false; (iii) true; (iv) false.
4.9. if.
4.11. (i) t; (ii) %; (ii ), (iv) t; (v) t; (vi) 0; (vii) i; (viii) undefined.
4.13. l, t.
5.3. (6)., 6" (~),G).
(;~) - (i~) - 4(i~) (i~) - (i~)
(;~) _ (i~) ; (ii) (i~) .
5.5.
(i)
CHAPTER 3
1.3. No.
1.5. (i) 0.729; (ii) 0.271; (iii) 0.028; (iv) 0.001.
1.9. Possible values for (P[A], P[B]) are (t, t) and (t, D.
2.1. (i) T; (2) F; (3) F; (4) T; (5) F; (6) T.
2.3. (i) H; (ii), (iii) H.
3.1. (i) 0.240; (ii) 0.260; (iii) 0.942; (iv) 0.932.
3.3. (i) 0.328; (ii) 0.410; (iii) 0.262.
3.5. (i) 0.133; (ii) 0.072.
3.7. (i) 0.197; (ii) 0.803; (iii) 0.544.
3.9. Choose n such that (0.90)n < 0.01; therefore, choose n = 44.
3.11. (i) (l - q60)' = 0.881; (ii) (l - q60)' + 5q60(l - q.O)4 = 0.994;
(iii) (1 - q6o)'(1 - q65) = 0.846; (iv) (1 - q60)'q., = 0.035.
450 MODERN PROBABILITY THEORY
3.17. i.
3.19. 0.010.
3.21. (i) 0.1755; (ii) 0.5595.
3.23. (i) 0.0256; (ii) 0.0081; (iii) 0.1008.
3.25. (i) 0.3770; (ii) 9.
3.27. 0.379.
3.29. (i) 1; (ii) 4 or 5.
4.1. (i) H; (ii) t.
4.3. (i) T; (ii) F; (iii) T; (iv) T; (v) F; (vi) F.
4.5. t.
4.7. l.
4.9. Let the event that a student wears a tie, comes from the East, comes from the
Midwest, or comes from the Far West be denoted, respectively by A, B; C, D.
Then P[B I A] = H, p[e I AJ = la, P[D I A] = 3'\.
(ii)
P2 = [' H]
:
to,
t i-
[tl n
Pa = t t
t t
["! "]
-1
(iv) P z = P3 = t i- .
t t
6.3. (i), (iii) 11'1 = 11'2 = t; (ii) 1T, = t, -.t2 = ;}.
ANSWERS TO ODD-NUMBERED EXERCISES 451
6.5. P has rows (p, q, 0, 0), (0, 0, p, q), (p, q, 0, 0), (0, 0, p, q). For n > 1 the rows
are (P2, pq,pq, q2).
CHAPTER 4
2.9. (ij A = i; (iii) (a) 0.1353, (b) 0.6321, (c) 0.2326; (iv) P[A(b)] = r blS •
2.11. (i) A =~; (iii) (a) CD', (b) ii·, (c) {; (iv) P[A(b)] = (W.
3.9. (ii) [(x) = 2~~0 e-(r/50)2; (iii) (a), (b) 0.184, (c) 0.632, (d) 0; (iv) (a) 1 - e-"
(b) (e- 1 - e- 4 )/(2 - e-').
3.11. (ii) [(x) = ~- for 0 < x < 1; = t for 2 < x < 4; = 0 otherwise; (iii) (a) t,
(b) 1, (c) t; (iv) (a) t (b) ~.
4.1. (i) Hypergeometric with parameters N = 200, IZ = 20, P = 0.05; (ii) binomial.
with parameters n = 30, P = 0.51; (iii) geometric with parameter p = 0.51;
(iv) binomial with parameters n = 35,p = 0.75.
CHAPTER 5
1.1. (i) 13; (ii) 24.4; (iii) 63.6; (iv) 0; (v) 4.4.
1.3. (i) 10 10; (ii) 9100; (iii) 63,600; (iv) 0; (v) 840.
2.1. Mean (i) t, (ii) 0, (iii) l±S; variance (i) la, (ii) !, (iii) fl.',
2.3. Mean (i) does not exist, (ii) 0, (iii) 0; variance (i) does not exist, (ii) 3, (iii) 1.
2.5. Mean (i) f, (ii) 4, (iii) 4; variance (i) t, (ii) ~, (iii) l'l'
2.7. Mean (i) S, (ii)}; variance (i) l., (ii) ,±s,
2.9. (i) r > 2; (ii) r > 3.
3.1. (i) 1/(1 - t); (ii) e 5 '/CI - t).
CHAPTER 6
2.1. (i) (a) 0.003; (b) 0.007; (ii) (a) 0.068; (b) 0.695.
2.5. (i) 0.506; (ii) 0.532.
2.7. (i) 423; (ii) 289.
3.1. 0.0671,0.000.
3.3. 0.8008.
3.11. 15.
4.1. T = 10 hours.
4.3. N - r obeys a negative binomial probability law with parameters p = t and
(i) r = 1, (ii) r = 2, (iii) r = 3.
4.5. (i) 0.0067; (ii) 0.0404.
CHAPTER 7
(ii) 4( 48
x-I
)/(53 _x)(x52- I ) for x = 1,2, .. ·,49; = 0 otherwise.
2.9. 0.3413.
2.11. 0.5811.
2.13. t.
2.15. t.
3.1. t.
454 MODERN PROBABILITY THEORY
6.3. (i) Yes; (ii) yes; (iii) yes; (iv) 1 - e- 2 ; (v) yes; (vi) 0.8426; (vii) IX2(Y) =
1 . 1
--=- e- y/2 for y > 0; = 0 otherwise; (viii) Ix. y.(u, v) = - - = e-(U+vI/2 for
...; 2mJ ' 21T"'; uv
Ii, V > 0; = 0 otherwise; (ix) yes; (x) no.
6.5. (i) True; (ii) false; (iii) true; (iv) false; (v) false.
6.7. (i) 0.125; (ii) 0.875.
7.7. t.
7.9. (i) 1 - (0.6)"; (ii) 1 - (O.4)n; (iii) (0.4)" + n(O.6)(O.4)n-l.
2 ...;;; -x/kT
8.3. IE(x) = ...;; (kT)% e for X > 0; = 0 otherwise.
7. 2 distribution with parameters n = 3 and (J = (tkT)}i
1
8.5. - (1 - x 2)-}i for 1);1 < 1; = 0 otherwise.
1T
ANSWERS TO ODD-NUMBERED EXERCISES 455
8.11. (a): (i)} for 1 < y < 3; = 0 otherwise; (ii) i for -1 < y < 3; =0
1(Y - 1) -~for 1 < Y < 3;
otherwise; (b) 4 '-2- = 0 otherwise.
4y 6y'
8.13. (i) . e-~y4 for y > 0,0 otherwise; (ii) .~ e-~'i1!6 for y > 0, 0 otherwise.
vh aJ
vh
8.15. (i) [27T 3(1 - y2)l-~ L e-~Xk2 where V = sin 7TX" for Iyl ::; 1; = 0 otherwise;
k=- 00
1 1
8.17. (a) -----= for 0 < y < 1; 0 otherwise; (b) ----== e- v/2cr' for y > 0; 0 otherwise;
2Vy aV27TY
1 1 2
(c) 2a' e- Y 2a for y > 0; 0 otherwise.
1 for x = 0; <D (;) for x> 0; (c) 0 for x < 0; 1 - e- x' /2a ' for x> O.
9.1. 0.9772.
9.3. (i) 2y, 0 < V < 1; 0 otherwise. (ii) 2(1 - V), 0 < y < 1; 0 otherwise;
1
9.5. (i), (ii) Normal with mean 0, variance 2a'; (iii) --_ e -.2/4a 2 for V > 0;
aVrr
ootherwise; (iv), (v) normal with mean 0, variance ~a·.
7T) -
( 2 esc' r - 2 for 1 ~ r ~ v2; 0 otherwise; /0(8) = i see 2 e for 0 ~ 8 ~ 7T'4;
zl csc
2 e for:::4<-e-<2::: " 0 otherwise.
11.1. (i) 1; (ii), (iii), (iv) i.
11.3. (i) 0.865; (ii) 0.632; (iii) 0.368; (iv) 0.5.
11.5. (i) 0.276; Oi) 0.5; (iii) 0.2; (iv) 0.5, (v) ~¢(v/2).
CHAPTER 8
3.1. E[X) = t, E[Yl = t, Var [Xl = f., Var [Yl = t, p[X, Yl = 0; X and Yare
independent.
3.3. v2/3.
3.7. 4a - 1.
ANSWERS TO ODD-NUMBERED EXERCISES 457
4.1. 0.8413.
4.5. E[L] = 150. (i) Var [L] = 16; (ii) Var [L] = 25.6.
5.1. (i) throws more doubt than (ii).
5.3. 62.
5.5. 25 or more.
5.7. 0.70; 7.4.
5.9. 38.
5.11. 1) = 0.10.
6.1. (i) n ::::: 1537; (ii) II ::::: 385; (iii) n ::::: 16.
6.3. E[vl/(j[v] == 10'
7.1. (i) 0.8X2: (ii) -0.6x.: (iii) t; (iv) lx. + tX3; (v) 0.35; (vi) 0.36.
7.3. (i) Var [Yj = 0.5; (ii) O.
CHAPTER 9
2.1. (i) G + lei")"; (ii) e 3Iei"-11; (iii) e l "/4(l - te i "); (iv) e3iu-19/SJU2; (v) (1 - yU)-2.
3.3.} - yO + iiyi" for iyi ::; 1; t - 2iyl + y2 - tlYI" for 1 ::; IYI ::; 2: 0 otherwise.
3.5. (1T 2[4a 12a; - x2])-~i for Ix - al' - a?1 < 2al a.: 0 otherwise.
4.5. (i) kth cumulant of S is 112k-l(k - 1)!(l + km'); (ii) 11 = (1 + 1Il2?/1 + 2m 2,
a = (1 + 2m2)/(1 + 1/1 2 ).
Index
Variance, of a probability law, 205 Waiting time, exponential law of, 260
of a random variable, 346 Wallis, W. A., 256
of a sum of random variables, 211, Waugh, D. F., 119
366 Waugh, F. V., 119
Variances, table of, 218, 220, 380 World Series, 112, 353
Variation, coefficient of, 379
Venn diagrams, 13 Young, G. S., 263
Vernon, P. E., 78 Yule process, 267