0% found this document useful (0 votes)

15 views28 pages

Probability and Calculus

Uploaded by

sapruanya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views28 pages

Probability and Calculus

Uploaded by

sapruanya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

1-1

Chapter 1. The Calculus of Probabilities.

A century ago, French treatises on the theory of probability were commonly called “Le Calcul des
Probabilités”—“The Calculus of Probabilities.” The name has fallen out of fashion, perhaps due to the
potential confusion with integral and differential calculus, but it seems particularly apt for our present topic.
It suggests a system of rules that are generally useful for calculation, where the more modern “probability
theory” has a speculative connotation. Theories may be overthrown or superceded; a calculus can be used
within many theories. Rules for calculation can be accepted even when, as with probability, there may be
different views as to the correct interpretation of the quantities being calculated.
The interpretation of probability has been a matter of dispute for some time, although the terms of the
dispute have not remained constant. To say that an event (such as the occurrence of a Head in the toss
of a coin) has probability 1/2 will mean to some that if the coin is tossed an extraordinarily large number
of times, about half the results will be Heads, while to others it will be seen as a subjective assessment,
an expression of belief about the uncertainty of the event that makes no reference to an idealized (and
not realizable) infinite sequence of tosses. In this book we will not insist upon any single interpretation of
probability; indeed, we will find it convenient to adopt different interpretations at different times, depending
upon the scientific context. Probabilities may be interpreted as long run frequencies, in terms of random
samples from large populations, or as degrees of belief. While not philosophically pure, this opportunistic
approach will have the benefit of permitting us to develop a large body of statistical methodology that can
appeal to and be useful to a large number of people in quite varied situations.
We will, then, begin with a discussion of a set of rules for manipulating or calculating with probabilities,
rules which show how we can go from one assignment of probabilities to another, without prejudice to the
source of the first assignment. Usually we will be interested in reasoning from simple situations to complex
situations.
1.1 Probabilities of Events.
The rules will be introduced within the framework of what we will call an experiment. We will be
purposefully vague as to exactly what we mean by an experiment, only describing it as some process with an
observable outcome. The process may be planned or unplanned, a laboratory exercise or a passive historical
recording of facts about society. For our purposes, the important point that specifies the experiment is that
there is a set, or list, of all possible outcomes of the experiment, called the sample space and denoted S.
An event (say E) is then a set of possible outcomes, a subset of S. We shall see that usually the same
actual experiment may be described in terms of different sample spaces, depending upon the purpose of the
description.
The notation we use for describing and manipulating events is borrowed from elementary set theory. If
E and F are two events, both subsets of the same sample space S, then the complement of E (denoted E c ,
or sometimes E 0 ) is the set of all outcomes not in E, the intersection of E and F (E ∩ F ) is the set of all
outcomes in both E and F , and the union of E and F (E ∪ F ) is the set of all outcomes in E or in F or in
both E and F . It is often convenient to represent these definitions, and arguments associated with them, in
terms of shaded regions of Venn diagrams, where the rectangle S is the sample space and the areas E and
F two events.
[Figure 1.1]
If E and F have no common outcomes, they are said to be mutually exclusive. Even with only these
elementary definitions, fairly complicated relationships can be described. For example, consider the event
(A ∩ B) ∪ (A ∩ B c ), where A and B are two events. Then a simple consideration of Venn diagrams shows
that in fact this describes the same set of outcomes as A:
(A ∩ B) ∪ (A ∩ B c ) = A.
[Figure 1.2]
Thus even without any notion of what the symbol P for probability may mean, we would have the identity
P ((A ∩ B) ∪ (A ∩ B c )) = P (A).
1-2

Of course, for such equations to be useful we will need to define probability. As mentioned earlier,
we avoid giving a limited interpretation to probability for the present, though we may, whatever the inter-
pretation, think of it as a measure of uncertainty. But for all interpretations, probability will have certain
properties, namely those of an additive set function. With respect to our general sample space S these are:

(1.1) Scaling: P (S) = 1, and 0 ≤ P (E) for all E in S

(1.2) Additivity: If E and F are mutually exclusive, P (E ∪ F ) = P (E) + P (F ).

Property (1.1) is a scaling property; it says, in effect, that we measure uncertainty on a scale from 0 to 1,
with 1 representing certainty. If we let φ denote the null or empty event (so φ = S c ) where no outcome
occurs, then since S = S ∪ φ, (1.2) tells us P (φ) = 0, so 0 represents impossibility on our scale. If E and F
are mutually exclusive, E ∩ F = φ, and P (E ∩ F ) = 0.
Property (1.2) may be taken to be a precise way of imposing order on the assignment of probabilities; it
requires in particular that if E is smaller than F (that is, E is contained in F , E ⊂ F ), then P (E) ≤ P (F ).
This use of a zero to one scale with increasing values representing greater certainty is by no means the only
scale that could be used. Another scale that is used in some statistical applications is the log odds:
µ ¶
P (E)
log odds(E) = loge .
P (E C )

This measures probability on a scale from −∞ to ∞, with 0 as a middle value (corresponding to P (E) = 1/2).
But for present purposes, the zero-to-one scale represented by P (E) is convenient.
Together (1.1) and (1.2) imply a number of useful other properties. For example,

(1.3) Complementarity: P (E) + P (E c ) = 1 for all E in S.

(1.4) General additivity: For any E and F in S (not necessarily mutually exclusive),

P (E ∪ F ) = P (E) + P (F ) − P (E ∩ F ).

(1.5) Finite additivity: For any finite collection of mutually exclusive events E1 , E2 , . . . , En ,

n
[ n
X
P( Ei ) = P (Ei ).
i=1 i=1

Properties (1.1) and (1.2) are not sufficiently strong to imply the more general version of (1.5), namely:

(1.6) Countable additivity: For any countably infinite collection of mutually exclusive events E1 , E2 , . . .
in S,
[∞ X∞
P ( Ei ) = P (Ei ).
i=1 i=1

For our purposes this is not a restrictive additional condition, so we shall add it to (1.1) and (1.2) as an
assumption we shall make about the probabilities we deal with. In some advanced applications, for example
where the sample space is a set of infinite sequences or a function space, there are useful probability measures
that satisfy (1.1) and (1.2) but not (1.6), however.
Probabilities may in some instances be specified by hypothesis for simple outcomes, and the probabilities
of more complex events computed from these rules. Indeed, in this chapter we shall only consider such
hypothetical probabilities, and turn to empirical questions in a following chapter. A trivial example will
illustrate.
1-3

Example 1.A: We might describe the experiment of tossing a single six-sided die by the sample space
S = {1, 2, 3, 4, 5, 6}, where the possible outcomes are the numbers on the upper face when the die comes to
rest. By hypothesis, we might suppose the die is “fair” and interpret this mathematically as meaning that
each of these six outcomes has an equal probability; P ({1}) = 1/6, P ({2}) = 1/6, etc. As mentioned earlier,
this statement is susceptible to several interpretations: it might represent your subjective willingness to bet
on #1 at 5 to 1 odds, or the fact that in an infinite sequence of hypothetical tosses, one-sixth will show #1.
But once we accept the hypothesis of equally likely faces under any interpretation, the calculations we make
are valid under that interpretation. For example, if

E = “an odd number is thrown”

and
F = “the number thrown is less than 3,”
then E = {1} ∪ {3} ∪ {5}, F = {1} ∪ {2}, and rule (1.5) implies

3
P (E) = P ({1}) + P ({3}) + P ({5}) =
6
and
2
P (F ) = P ({1}) + P ({2}) = .
6
Furthermore, E ∩ F = {1}, so P (E ∩ F ) = 1/6. Then by rule (1.4),

3 2 1 4
P (E ∪ F ) = P (E) + P (F ) − P (E ∩ F ) = + − = .
6 6 6 6
In this simple situation, and even in more complicated ones, we have alternative ways of computing the same
quantity. Here E ∪ F = {1, 2, 3, 5} and we can also verify that P (E ∪ F ) = 4/6 from rule (1.5).

1.2 Conditional Probability

In complex experiments it is common to simplify the specification of probabilities by describing them

in terms of conditional probabilities. Intuitively, the conditional probability of an event E given an event F ,
written P (E|F ), is the probability that E occurs given that F has occurred. Mathematically, we may define
this probability in terms of probabilities involving E and F as

(1.7) Conditional probability: The probability that E occurs given F has occurred is defined to be

P (E ∩ F )
P (E|F ) = if P (F ) > 0.
P (F )

If P (F ) = 0 we leave P (E|F ) undefined for now. Conditional probability may be thought of as relative
probability: P (E|F ) is the probability of E relative to the reduced sample space consisting of only those
outcomes in the event F . In a sense, all probabilities are conditional since even “unconditional” probabilities
are relative to the sample space S, and it is only by custom that we write P (E) instead of the equivalent
P (E|S).
The definition (1.7) is useful when the quantities on the right-hand side are known; we shall make
frequent use of it in a different form, though, when the conditional probability is given and the composite
probability P (E ∩ F ) is sought:

(1.8) General multiplication: For any events E and F in S,

P (E ∩ F ) = P (F )P (E|F ).
1-4

Note we need not specify P (F ) > 0 here, for if P (F ) = 0 then P (E ∩F ) = 0 and both sides are zero regardless
of what value might be specified for P (E|F ). We can see that (1.7) and (1.8) relate three quantities, any two
of which determine the third. The third version of this relationship (namely P (F ) = P (E ∩ F )/P (E|F )) is
seldom useful.
Sometimes knowing that F has occurred has no effect upon the specification of the probability of E:

(1.9) Independent events: We say events E and F in S are independent if P (E) = P (E|F ).
By simple manipulation using the previous rules, this can be expressed in two other equivalent ways.

(1.10) Independent events: If E and F are independent then P (E|F ) = P (E|F c ).

(1.11) Multiplication with independent events: If E and F are independent, then

P (E ∩ F ) = P (E) · P (F ).

Indeed, this latter condition (which is not to be confused with the almost opposite notion of “mutually
exclusive”) is often taken as the definition of independence.
Note that independence (unlike, for example, being mutually exclusive) depends crucially upon the
values specified for the probabilities. In the previous example of the die, E and F are independent for the
given specification of probabilities. Using (1.9),
µ ¶
P (E ∩ F ) 1/6 1
P (E|F ) = = = = P (E).
P (F ) 2/6 2

Alternatively, using (1.11),

µ ¶µ ¶
1 3 2
P (E ∩ F ) = = = P (E)P (F ).
6 6 6

However, if the die were strongly weighted and P ({1}) = P ({2}) = P ({4}) = 31 , then P (F ) = 32 . P (E) = 13 ,
and P (E ∩ F ) = 31 , so for this specification of probabilities E and F are then not independent.
Example 1.B: A Random Star. Early astronomers noticed many patterns in the heavens; one which
caught the attention of mathematicians in the eighteenth and nineteenth centuries was the occurrence of six
bright stars (the constellation of the Pleiades) within a small section of the celestial sphere 1◦ square. How
likely, they asked, would such a tight grouping be if the stars were distributed at random in the sky? Could the
occurrence of such a tight cluster be taken as evidence that a common cause, such as gravitational attraction,
tied the six stars together? This turns out to be an extraordinarily difficult question to formulate, much less
answer, but a simpler question can be addressed even with the few rules we have introduced, namely: Let
A be a given area on the surface of the celestial sphere that is a square, 1◦ on a side. A single star is placed
randomly on the sphere. What is the probability it lands in A? A solution requires that we specify what
“placed randomly” means mathematically. Here the sample space S is infinite, namely the points on the
celestial sphere. Specifying probabilities on such a set can be challenging. We give two different solutions.
First Solution: The star is placed by specifying a latitude and a longitude. By “placed randomly” we
may mean that the latitude and longitude are picked independently, the latitude in the range −90◦ to 90◦
with a probability of 1/180 attached to each 1◦ interval, and the longitude in the range 0◦ to 360◦ with a
probability of 1/360 attached to each 1◦ interval. This is not a full specification of the probabilities of the
points on the sphere, but it is sufficient for the present purpose. Suppose A is located at the equator (Figure
1.3). Let
E = “pick latitude in A’s range”
F = “pick longitude in A’s range”
A = “pick a point within A”.
1-5

Then A = E ∩ F , P (E) = 1/180, and by independence P (F |E) = P (F ) = 1/360 and so

P (A) = P (E) · P (F )
1 1
= ·
180 360
1
= .
64800

Second Solution: We may ask how many 1◦ squares make up the area of the sphere; if all are equally
likely, the probability of A is just the reciprocal of this number. It has been known since Archimedes that if
a sphere has radius r, the area of the surface is 4πr2 . (This can be easily remembered as following from the
“Orange Theorem”: if a spherical orange is sliced into four quarters then for each quarter the area of the
two flat juicy sides equals that of the peel. The flat juicy sides are two semicircles of total area πr2 , so the
peel of a quarter orange has area πr2 and the whole peel is 4πr2 .) Now if the units of area are to be square
degrees, then since the circumference is 2πr we will need to choose r so that 2πr = 360◦ , or r = 360 2π . Then
the area of the surface is µ ¶2
360 3602
4πr2 = 4π = = 41253.
2π π
Each square degree being supposed equally likely, we have
1
P (A) = ,
41253
which is π/2 times larger than the first solution.
Both solutions are correct; they are based upon different hypotheses. The hypotheses of the first solution
may well be characterized as “placed at random” from one point of view, but they will make it more likely
that a square degree near a pole contains the star than that one on the equator does.
The original problem is more difficult than the one we solved because we need to ask, if the approximately
1500 bright stars (of 5th magnitude or brighter) are placed “randomly” and independently on the sphere,
and we search out the square degree containing the largest number of stars, what is the chance it contains
six or more? And even this already complicated question glosses over the fact that our interest in a section
1◦ square (rather than 2◦ square, or 1◦ triangular, etc.) was determined a posteriori, after looking at the
data. We shall discuss some aspects of this problem in later chapters.

1.3 Counting

When the sample space S is finite and the outcomes are specified to be equally likely, the calculation of
probabilities becomes an exercise in counting: P (E) is simply the number of outcomes in E divided by the
number of outcomes in S. Nevertheless, counting can be difficult. Indeed, an entire branch of mathematics,
combinatorics, is devoted to counting. We will require two rules for counting, namely those for determining
the numbers of permutations and of combinations of n distinguishable objects taken r at a time. These are

(1.12) The number of ways of choosing r objects from n distinguishable objects where the order of choice
makes a difference is the number of permutations of n choose r, given by
n!
Pr,n = .
(n − r)!

(1.13) The number of ways of choosing r objects from n distinguishable objects where the order of choice
does not make a difference is the number of combinations of n choose r, given by
³n´ n!
= Cr,n =
r r!(n − r)!
1-6

In both cases, n! denotes n factorial , defined by n! = 1 · 2 · 3 · · · (n − 1) · n for integer n > 0, and we take
0! = 1 for convenience. Thus we have also
1 · 2 · 3··· · n
Pr,n = = (n − r + 1) · · · (n − 1)n
1 · 2 · · · · (n − r)
and ³n´ Pr,n (n − r + 1) · · · (n − 1)n
= = .
r r! 1 · 2 · 3 · · · (r − 1)r
¡ ¢ ¡ ¢
A variety of identities can be established using these definitions; some easily (eg. n0 = 1, n1 = n,
¡n¢ ³ n ´ Pn ¡ ¢
n n
r = n−r ), others with more difficulty (eg. r = 2 , which can, however, be directly established by
r=0
noting the lefthand side gives the number of ways any selection can be made from n objects without regard
to order, which is just the number of subsets, or 2n ).
Example 1.C : If r = 2 people are to be selected from n = 5 to be designated president and vice president
respectively, there are P2,5 = 20 ways the selection can be made. If, however, they are to serve as a committee
¡ ¢
of two equals (so the committees (A, B) and (B, A) are the same committee), then there are only 52 = 10
ways the selection can be made.
Example 1.D: For an example of a more important type, we could ask how many binary numbers of
length n (= 15, say) are there with exactly r (= 8, say) l’s. That is, how ¡many ¢ possible
¡ ¢ sequences of 15
0’s and 1’s are there for which the sum of the sequence is 8? The answer is nr = 15 8 = 6435, as may be
easily seen by considering the sequence as a succession of n = 15 distinguishable numbered spaces, and the
problem as one of selecting r = 8 from those 15 spaces as the locations for the 8 1’s, the order of the 1’s
being unimportant and the remaining unfilled slots to be filled in by 0’s.

1.4 Stirling’s Formula

¡ ¢
Evaluating n!, Pr,n , or nr can be quite difficult if n is at all large. It is also usually unnecessary, due
to a very close approximation discovered about 1730 by James Stirling and Abraham De Moivre. Stirling’s
formula states that µ ¶
1 1
loge (n!) ∼ loge (2π) + n + loge (n) − n, (1.14)
2 2
and thus √ 1
n! ∼ 2πnn+ 2 e−n , (1.15)
where “∼” means that the ratio of the two sides tends to 1 as n increases. The approximations can be good
for even small n, as Table 1.1 shows.
[Table 1.1]
¡ ¢
Stirling’s formula can be used to derive approximations to Pr,n and nr , namely
³ r ´−(n+ 12 )
Pr,n ∼ 1 − (n − r)r e−r , (1.16)
n
and
³n´ 1 ³ r ´−(n−r+ 12 ) ³ r ´−(r+ 21 )
∼√ 1− . (1.17)
r 2πn n n
¡ ¢
These too are reasonably accurate approximations; for example 10 5 = 252, while the approximation gives
258.37. While not needed for most purposes, there are more accurate refinements available. For example,
the bounds √ 1 √
1 1 1
2π nn+ 2 e−n+ 12n+1 < n! < 2π nn+ 2 e−n+ 12n (1.18)
give, for n = 5, 119.9699 < n! = 120 < 120.0026. Feller (1957, Chapter II.9) gives a proof of Stirling’s
formula and a nice discussion.
1-7

1.5 Random Variables

Sometimes the outcomes of an experiment are expressed as numbers, and at other times we will be
most interested in numerical descriptions that capture only some aspects of the outcomes, even in situations
where we find it easiest to specify the probabilities of the outcomes themselves. We will use the term random
variables for such a description: a function that assigns a numerical value to each outcome in S; a real-valued
function defined on S.
Example 1.E: If a coin is tossed three times, the sample space might be described by a list of 8
three-letter words,
S = {T T T, T T H, T HT, HT T, HHT, HT H, T HH, HHH},
where HHT means that the first two tosses result in Heads, and the third in Tails. One possible random
variable is
X = #H’s in the word.
Another is
Y = #T ’s in the word.
In both cases, the possible values are 0, 1, 2, and 3.
We will call random variables whose values can be listed sequentially in this manner discrete random
variables. In such cases, once the probabilities of the values of the random variable have been specified, they
can be described rather simply, by listing them. A list of the possible values of a discrete random variable
together with the probabilities of these values is called the probability distribution of the random variables;
we shall denote the probability that the random variable X is equal to the possible value x by pX (x), or,
when there is no likely confusion, by p(x).
For the coin example, the specification of the probability distribution of the random variable X = #H’s
can be straightforward. If we assume the coin is “fair” (which we may take to mean that P (H) = 1/2 for
a single toss), and the tosses are independent, then applying the multiplication rule for independent events
(1.11) repeatedly gives us, for example,
µ ¶3
1 1
P (HHT ) = P (H) · P (H) · P (T ) = = ,
2 8
and so the 8 points in S are equally likely. Now the event “X = 1” consists of the outcomes {HT T, T HT, T T H}
and by the additivity rule (1.5) it has probability
pX (1) = P (X = 1)
= P ({HT T, T HT, T T H})
= P (HT T ) + P (T HT ) + P (T T H)
1 1 1 3
= + + = .
8 8 8 8
The full probability distribution is easily found:
x 0 1 2 3
1 3 3 1
pX (x) 8 8 8 8

The same probability distribution can

¡ ¢ be described in many ways. For example, the above table can
be compressed into a formula: pX (x) = x3 (1/2)3 , for x = 0, 1, 2, 3. We will find an alternative description
convenient later, what we will call the cumulative distribution function, sometimes abbreviated c.d.f. and
denoted FX (x) (or F (x)), defined by
FX (x) = P (X ≤ x)
X
= pX (a).
all values
a≤x
1-8

It is often helpful to think of a probability distribution of a random variable as a distribution of a unit mass
along the real line, with pX (x) giving the mass assigned to the point x. Then FX (x) gives the cumulative
mass, starting from the left, up to and including that at the point x.
For the coin example the calculation is simple:

x 0 1 2 3
1 3 3 1
pX (x) 8 8 8 8
1 4 7 8
FX (x) 8 8 8 8 =1

Graphically, we can depict pX (x) as a system of spikes, and then FX (x) is a jump function with jumps at
each possible value of x.
[Figure 1.4]
Note that the probability distribution pX (x) can be recovered from FX (x) by evaluating the sizes and
locations of the jumps. If x − 1 and x are two consecutive possible values of X, then

pX (x) = FX (x) − FX (x − 1).

The two alternative specifications are thus completely equivalent, given one the other can be found, and we
can choose between them on the grounds of convenience.

1.6 Binomial Experiments

The experiment of tossing a fair coin three times is a special case of a broad class of experiments of
immense use in statistics, the class of binomial experiments. These are characterized by three conditions:

(1.19) (a) The experiment consists of a series of n independent trials.

(b) The possible outcomes of a single trial are classified as being of one of two types:

A, called “success”

Ac , called “failure”.
(c) The probability of success on a single trial, P (A) = θ, is the same for all n trials (and so, of
course, is P (Ac ) = 1 − θ). This probability θ is called the parameter of the experiment.
For the fair coin example, n = 3, A = H, and θ = 1/2.
The sample space S of the possible outcomes of a binomial experiment consists of a list of “words” of
length n, made up entirely of the “letters” A and Ac . These range from all successes to all failures:

n times
AA ··· AA all successes
AA ··· AAc
AA ··· Ac A
· ·
· ·
AAc ··· AAc
· ·
· ·
· ·
· ·
Ac Ac ··· Ac Ac all failures
1-9

There are 2 choices for each letter and thus 2 × 2 × · · · × 2 = 2n different such words. For the coin example,
n = 3 and there are 23 = 8 outcomes.
Since the trials are independent by hypothesis, it is easy to compute the probability of a single outcome
using the multiplication rule for independent events (1.11). For example:
For n = 2:
P (AA) = P (A) · P (A) = θ · θ = θ2
P (AAc ) = P (A)P (Ac ) = θ(1 − θ).

For n = 3:
P (AAc A) = P (A)P (Ac )P (A)
= θ(1 − θ)θ = θ2 (1 − θ).
In general, the probability of an outcome will only depend upon the numbers of A’s and Ac ’s in the outcome.
If the word
AAc . . . AAc
consists of
x A’s and
n−x Ac ’s
then
P (AAc · · · AAc ) = P (A)P (Ac ) · · · P (A)P (Ac )
= θ(1 − θ) · · · θ(1 − θ)
= θx (1 − θ)n−x .

Now, for binomial experiments we will frequently only be interested in a numerical summary of the
outcome, the random variable
X = # successes = #A’s.
The possible values of X are 0, 1, 2, . . . , n, and its probability distribution can be found as follows: The event
“X = x” consists of exactly those outcomes with x A’s and n − x Ac ’s. We have just found that each such
outcome has probability θx (1 − θ)n−x . It remains only to determine the number, say C, of outcomes with
xA’s and n − x Ac ’s; the desired probability will then be pX (x) = P (X = x) = C · θx (1 − θ)n−x . But C
is equal to the number of binary numbers of length n with exactly x 1’s and n − x 0’s (just ¡ think
¢ of each
A as “1” and each Ac as “0”), and we have found (in Example 1.D) that this number is nx . Therefore
¡ ¢
pX (x) = nx θx (1−θ)n−x . This probability distribution is called the Binomial Distribution, and is sometimes
given a special symbol that shows its dependence upon n and θ explicitly:

(1.20) The Binomial (n, θ) Distribution

³n´
b(x; n, θ) = θx (1 − θ)n−x for x = 0, 1, 2, . . . , n
x
=0 otherwise.

Figure 1.5 illustrates some examples, for n = 8. The parameters n and θ determine the distribution; for each
integer n ≥ 1 and 0 ≤ θ ≤ 1, we have a different distribution. The Binomial distribution is thus an example
of what is called a parametric family of probability distributions.
The trials that make up a binomial experiment are often called Bernoulli trials, after the Swiss mathe-
matician Jacob Bernoulli (1654–1705) who was instrumental in the early study of this experiment. Bernoulli
trials can be conducted in manners other than that we have discussed; the most important of these is where
rather than conduct a fixed number n of trials, the trials are conducted until a fixed number of successes r
have been observed. Because this is a sort of reversal of the original scheme, it is called the negative binomial
experiment. For example, if r = 1, the trials are conducted until the first success, and the sample space
consists of “words” of increasing length with a single A at the end: A, Ac A, Ac Ac A, Ac Ac Ac A, etc.
1-10

For this experiment the random variable of most interest is

Z = # “failures” before the rth “success”.

For r = 1 the probability distribution of Z is easy to compute. For example, we will have Z = 3 only for
the outcome Ac Ac Ac A, and since P (Ac Ac Ac A) = (1 − θ)3 θ we have pZ (3) = (1 − θ)3 θ. More generally,
pZ (z) = (1 − θ)z θ for z = 0, 1, 2, . . .. Note that Z is a discrete random variable with a countably infinite
number of possible values.
To find the probability distribution of Z in general, we can reason analogously to the way we found the
binomial distribution. The sample space S will consist of words with r A’s, each word ending with an A
(since the experiment terminates with the rth success). The outcomes corresponding to Z = z will be those
with r A’s and z Ac ’s, and each of these will have probability θr (1 − θ)z . To find the probability distribution
of Z we need only find the number C of outcomes in S with Z = z; for then pZ (z) = Cθr (1 − θ)z . But
C is the number of “words” of length r + z ending in A, with exactly z Ac ’s. This is the same as the
number of “words” of length r + z − 1 with exactly z Ac ’s and no restrictions on the last letter, namely
¡ r+z−1 ¢ ³ r+z−1 ´
C= z = r−1 . We have therefore found the

(1.21) Negative Binomial Distribution: The probability distribution of the number of failures Z before the
rth success in a series of Bernoulli trials with probability of success θ is
µ ¶
r+z−1
nb(z : r, θ) = θr (1 − θ)z for z = 0, 1, 2, . . .
r−1
=0 otherwise.

This distribution is sometimes called the Pascal distribution, after an early programming language
that will continue to compile until the first bug is encountered. The special case where r = 1, namely
pZ (z) = θ(1 − θ)z , for z = 0, 1, 2, . . . , is called the geometric distribution. We shall later see that the
negative binomial distribution has a close relationship to another important discrete distribution, the Poisson
distribution.
[Figure 1.6]

The Binomial and Negative Binomial distributions are more closely related than the fact that both
involve Bernoulli trials. Let
B(x; n, θ) = P (X ≤ x)

N B(z; r, θ) = P (Z ≤ z)
be their respective cumulative distribution functions. Then a little reflection tells us that if we were com-
puting X and Z from the same series of trials, we would have X ≥ r if and only if Z ≤ n − r. Since
P (X ≥ r) = 1 − P (X ≤ r − 1), this means

N B(n − r; r, θ) = 1 − B(r − 1; n, θ), (1.22)

and so one set of probabilities can be computed from the other. For example, tables of the binomial
distribution can be used to find Negative Binomial probabilities. The Binomial distribution enjoys certain
symmetry properties. In particular

B(x; n, θ) = 1 − B(n − x − 1; n, 1 − θ). (1.23)

This relation allows the computation of Binomial (and hence Negative Binomial) probabilities using a table
of the Binomial distribution for 0 < θ ≤ 12 .
1-11

1.7 Continuous Distributions

A random variable is called continuous if its possible values form an interval, and hence cannot be listed
sequentially.
Example 1.F: Consider the spinner, a disc with a pointer rotating freely around the center point, pointing
at the edge of the disc which is labelled continuously from a to b.

[Figure 1.7]

If the pointer is spun and allowed to come to rest at an (in some sense random) point X, then the sample
space S is the interval {x : a ≤ x < b} and X is a random variable whose possible values are the numbers in
this interval.
Because the values of a continuous random variable cannot be listed, their probabilities cannot be
listed, and another device is used to describe the probability distribution. In a direct extension to the
interpretation of discrete probability distributions as mass distributions, continuous probability distributions
will be described by probability density functions, nonnegative functions which give the probabilities of an
interval through the area under the function over the interval. Mathematically, since areas are given by
integrals, we will define fX (x) (or f (x) if no confusion arises) to be the probability density function of the
continuous random variable X if for any numbers c and d, with c < d,
Z d
P (c < X ≤ d) = fX (x)dx.
c

[Figure 1.8]
It will necessarily be true of probability density functions that

(i) fX (x) ≥ 0 for all x

and

R∞
(ii) fX (x)dx = 1 .
−∞
Indeed, any function satisfying (i) and (ii) may be considered as the probability density of a continuous
random variable. Note that the values of fX (x) do not themselves give probabilities (they may even exceed
1), though we can think heuristically of fX (x)dx (= height fX (x) times base dx) as the probability X falls
in an infinitesimal interval at x:
P (x < X ≤ x + dx) = fX (x)dx
[Figure 1.9]
It is frequently helpful to think of the density function fX (x) as describing the upper boundary of a sheet
of unit mass resting upon the line of possible values, the area under that boundary over an interval being
equal to the mass over that interval.
One consequence of using probability densities to describe distributions is that individual points are
assigned probability zero:
P (X = c) = 0 for any c.
The area or mass exactly over each single point must be considered to be zero, or contradictions would ensue,
as we shall see. As a consequence, for continuous random variables we have, for any c < d,

P (c < X ≤ d) = P (c < X < d)

= P (c ≤ X < d)
= P (c ≤ X ≤ d),
1-12

since, for example,

P (c < X ≤ d) = P (c < X < d) + P (X = d)
by the additivity rule (2).
Example 1.F (Continued): To illustrate probability density functions, and to better understand the
apparent paradox that P (X = c) = 0 for all c (all values are “impossible”) yet P (−∞ < X < ∞) = 1
(some value is “certain”), consider the spinner. We may wish to capture the notion that all values in the
interval [a, b] are “equally likely” in some sense; we do that by taking the probability of any subinterval as
proportional to the length of the subinterval. This is true if we adopt as the probability density of X the
Uniform (a, b) or Rectangular distribution:

1
fX (x) = for a ≤ x < b (1.25)
(b − a)
= 0 otherwise.

[Figure 1.10]
Clearly the total area under fX (x) is 1, and the area or probability over any subinterval (c, d) is (d−c)/(b−a),
proportional to the length d − c of the subinterval. The numbers a and b are the parameters of this
distribution. If we ask what probability could be assigned to any single number c, we see it must be smaller
than that assigned to the interval c ≤ x < c + ², for any ² > 0, that is, smaller than P (c ≤ X < c + ²) =
²/(b − a). But no positive number fits that description, and we are forced by the limitations of our number
system to take P (X = c) = 0. This will not cause difficulties or contradictions as long as we follow our rules
and only insist that probabilities be countably additive: having the probability P (a ≤ X < b) = 1, yet each
P (X = c) = 0 does not contradict the additivity rule (1.6) since there are uncountably many c’s between a
and b.
Similarly to the discrete case, we define the cumulative distribution function of a continuous random
variable X by
FX (x) = P (X ≤ x)
Z x
= fX (u)du
−∞

FX (x) is thus the area under fX (x) to the left of x. As before, FX (x) is a nondecreasing

[Figure 1.11]

function, though it is no longer a jump function. It gives an alternative way to describe continuous distri-
butions. The fundamental theorem of calculus holds that
Z x
d
fX (u)du = fX (x),
dx −∞
so
d
FX (x) = fX (x)
dx
and we may find the probability density function from the cumulative distribution function as well as vice
versa.
Example 1.G: The Exponential Distribution. Consider the experiment of burning a lightbulb until failure.
Let X be the time until failure; X may be considered a continuous random variable with possible values
{x : 0 ≤ x < ∞}. In order to specify a class of possible probability distributions for X, we would expect
to have the probability of survival beyond time t, P (X > t), decreasing as t → ∞. One class of decreasing
functions which also have P (X > 0) = 1 are the exponentially decreasing functions P (X > t) = C t , where
0 < C < 1. Equivalently, writing e−θt for C, where θ > 0 is a fixed parameter, we have

P (X > t) = e−θt for t ≥ 0

1-13

and
FX (t) = P (X ≤ t) = 1 − e−θt for t ≥ 0
=0 for t < 0.
The corresponding probability density function is found by differentiation:

fX (t) = θe−θt for t ≥ 0

(1.27)
=0 for t < 0.

This is called the Exponential (θ) distribution; θ is a parameter of the distribution (for each θ > 0 we get
a different distribution). When we come to discuss the Poisson process we shall see how the Exponential
distribution arises in that context from more natural and less arbitrary assumptions as a common failure
time distribution or waiting time distribution.

[Figure 1.12]

1.8 Transformations of Random Variables

In the calculus of probabilities, it is common to specify a probability distribution for a situation where
that task is simple, and to then reason to a more complicated one. We may choose to describe the sample
space so that the outcomes are equally likely and then deduce the distribution of a random variable whose
possible values are not equally likely. Or we may use ideas of conditional probability to break a complicated
framework down into a number of simple steps, perhaps even into independent trials. In the case of the
binomial experiment, both devices were adopted. Another common route, one which will be particularly
useful in statistical applications, is to go from the distribution of one random variable, say X, whose distri-
bution is easily specified or has previously been determined, to that of another random variable which is a
transformation or function of X, say Y = h(X).
Example 1.H Suppose X is the time to failure of a lightbulb (or an electronic component), and that we
believe X to have an Exponential (θ) distribution with density (1.27),

fX (x) = θe−θx x≥0

=0 x < 0.

Upon failure, we plan to replace the lightbulb with a second, similar one. The probability that the first
survives beyond time t is P (X > t) = e−θt ; the probability the second survives longer than the first is then
Y = e−θX , a random variable that depends upon the time to failure of the first. What is the distribution of
Y?
Example 1.I: Suppose a fair coin is tossed three times, and you received $2 for every Head. The number
of Heads, X, has a Binomial distribution, what is the distribution of your winnings, Y = 2X? Or what if
you receive Y = X 2 ?
To begin to address the general question of finding the distribution of a transformation Y = h(X) of a
random variable X, consider first the case where h is a strictly monotone function, at least over the range of
possible values of X. This restriction will ensure that each value of Y could have come from only one possible
X, and the ideas will be easier to explain in that case. For example, h(X) = 2X + 3 is strictly monotone,
while h(X) = X 2 is not, although it will be allowed in the present discussion if X takes no negative values,
since it is strictly monotone for nonnegative x.
Example 1.J: The extremely useful transformation h(X) = loge (X) is strictly monotone, though only
defined for X > 0. We can see what happens to a probability distribution under transformation by looking at
this one special case. Figure 1.14 illustrates the effect of this transformation upon the X-scale: it compresses
the upper end of the scale by pulling large values down, while spreading out the scale for small values. The
1-14

gap between X’s of 5 and 6 (namely, 1 X-unit) is narrowed to that between Y ’s of 1.61 and 1.79 (.18 Y -units),
and the gap between X’s of .2 and 1.2 (also 1 X-unit) is expanded to that between Y ’s of -1.61 and .18 (1.79
Y -units). Figure 1.15 illustrates the effect of this transformation upon two probability distributions, one
discrete and one continuous. The effect in the discrete case is particularly easy to describe: as the scale is
warped by the transformation, the locations of the spikes are changed accordingly, but their heights remain
unchanged. In the continuous case, something different occurs. Since the total area that was between 5 and
6 on the X-scale must now fit between 1.61 and 1.79 on the Y -scale, the height of the density over this part
of the Y -scale must be increased. Similarly, the height of the density must be decreased over the part of the
Y -scale where the scale is being expanded, to preserve areas there. The result is a dramatic change in the
appearance of the density. Our object in the remainder of this section is to describe precisely how this can
be done.
If Y = h(X) is a strictly monotone transformation of X, then we can solve for X in terms of Y , that is,
find the inverse transformation X = g(Y ). Given Y = y, the function g “looks back” to see which possible
value x of X produced that value y; it was x = g(y). If Y = h(X) = 2X + 3, then X = g(Y ) = (Y − 3)/2.
If Y = h(X) √ = loge (X), for, X > 0, then X = g(Y ) = eY . If Y = h(X) = X 2 , for X > 0, then
X = g(Y ) = + Y .
In terms of this inverse relationship, the solution for the discrete case is immediate. If pX (x) is the
probability distribution function of X, then the probability distribution function of Y is

pY (y) = P (Y = y)
= P (h(X) = y) (1.28)
= P (X = g(y))
= pX (g(y)).

That is, for each value y of Y , simply “look back” to find the x that produced y, namely x = g(y), and
assign y the same probability that had been previously assigned to that x, namely pX (g(y)).
¡ ¢
Example 1.K: If X has the Binomial distribution of Example 1.E; that is, pX (x) = x3 (0.5)3 for
√ √
x = 0, 1, 2, 3, and Y = X 2 , what is the distribution of Y ? Here g(y) = + y, and so pY (y) = pX ( y) =
³ ´ √ √
√3 3
y (0.5) for y = 0, 1, 2, 3 (or y = 0, 1, 4, 9). For all other y’s, pY (y) = pX ( y) = 0. That is,

1
pY (y) = for y = 0
8
3
= for y = 1
8
3
= for y = 4
8
1
= for y = 9
8
= 0 otherwise.

[Figure 1.16]

In the continuous case, an additional step is required, the rescaling of the density to compensate for the
compression or expansion of the scale and match corresponding areas. For this reason it is not true that
fY (y) = fX (g(y)), where g is the inverse transformation, but instead
¯ ¯
¯ dg(y) ¯
¯
fY (y) = fX (g(y)) · ¯ ¯. (1.29)
dy ¯
¯ ¯
¯ ¯
The rescaling factor ¯ dg(y) 0
dy ¯ = |g (y)| is called the Jacobian of the transformation in advanced calculus, and
it is precisely the compensation factor needed to match areas. When |g 0 (y)| is small, x = g(y) is changing
1-15

slowly as y changes (for example, for y near 0 in Figures 1.14 and 1.15), and we scale down. When g(y)
changes rapidly with y, |g 0 (y)| is large (for example, for y near 6 in Figures 1.14 and 1.15), and we scale up.
It is easy to verify that this is the correct factor: simply compute P (Y ≤ a) in two different ways.
First, Z a
P (Y ≤ a) = fY (y)dy, (1.30)
−∞

by the definition of fY (y). Second, supposing for a moment that h(x) (and hence g(y) also), is monotone
increasing, we have
P (Y ≤ a) = P (h(X) ≤ a)
= P (X ≤ g(a))
Z g(a)
= fX (x)dx.
−∞

Now, making the change of variables, x = g(y), and dx = g 0 (y)dy, we have

Z a
P (Y ≤ a) = fX (g(y))g 0 (y)dy. (1.31)
−∞

Differentiating both (1.30) and (1.31) with respect to a gives fY (y) = fX (g(y))g 0 (y). If h(x) and g(y) are
monotone decreasing, the result is the same, but with −g 0 (y) as the compensation factor; the factor |g 0 (y)|
covers both cases, and gives us (1.29).
Example 1.H (Continued). Let X be the time to failure of the first lightbulb, and Y the probability
that the second bulb burns longer than the first. Y depends on X, and is given by Y = h(X) = e−θX . The
random time X has density
fX (x) = θe−θx x ≥ 0
=0 otherwise.
Now loge (Y ) = −θX, and the inverse transformation is X = g(Y ) = − loge (Y )/θ. Both h(x) and g(y)
are monotone decreasing. The inverse g(y) is only defined for 0 < y, but the only possible values of Y are
0 < y ≤ 1.
We find g 0 (y) = − θ1 · y1 , and
1
|g 0 (y)| = , for y > 0.
θy
Then
fY (y) = fX (g(y))|g 0 (y)|,
and, noting that fX (g(y)) = 0 for y ≤ 0 or y > 1, we have

1
fY (y) = θe−θ(− log(y)/θ) · for 0 < y ≤ 1
θy
=0 otherwise,

or, since θe−θ(− log(y)/θ) = θy,

fY (y) = 1 for 0 < y ≤ 1
=0 otherwise.
We recognize this as the Uniform (0, 1) distribution, formula (1.25) of Example F, with a = 0, b = 1.
Example 1.L. The Probability Integral Transformation. The simple answer we obtained in Example 1.H,
namely that if X has an Exponential (θ) distribution, Y = e−θX has a Uniform (0, 1) distribution, is an
example of a class of results that are useful in theoretical statistics. If X is any continuous random variable
with cumulative distribution function FX (x), then both transformations Y = FX (X) and Z = 1 − FX (X)
1-16

(= 1−Y ) have Uniform (0, 1) distributions. Example 1.H concerned Z for the special case of the Exponential
(θ) distribution. Because FX (x) is the integral of the probability density function fX (x), the transformation
h(x) = FX (x) has been called the probability integral transformation. To find the distribution of Y = h(X),
−1
we need to differentiate g(y) = FX (y), defined to be the inverse cumulative distribution function, the
function that for each y, 0 < y < 1, gives the value of x for which FX (x) = y. [Figure 1.17] (For continuous
random variables X with densities, FX (x) is continuous and such an x will exist for all 0 < y < 1. For more
−1 −1
general random variables, FX (y) can be defined as FX (y) = infimum{x : FX (x) ≥ y}.) The derivative of
−1
g(y) = FX (y) can be found by implicit differentiation:

y = FX (x)

so
d dx
1= FX (x) = fX (x) ·
dy dy
by the chain rule, and so
dx 1
=
dy fX (x)
−1
or, with x = g(y) = FX (y),
d −1 1
g 0 (y) = FX (y) = −1 , (1.32)
dy fX (FX (y))
But then
fY (y) = fX (g(y)) · |g 0 (y)|
−1 1
= fX (FX (y)) · −1 for 0 < y < 1
fX (FX (y))
=1 for 0 < y < 1
=0 otherwise,

the Uniform (0, 1) distribution. The fact that Z = 1 − Y also has this distribution can be shown by repeating
this derivation with h(x) = 1 − FX (x), or more simply by transforming the distribution of Y by h(y) = 1 − y,
whose inverse g(z) = 1 − z has |g 0 (z)| = 1.
Thus far we have considered only strictly monotone transformations h(x). A full treatment for nonmono-
tone transformations is possible, but it is cumbersome, and not necessary for our anticipated applications.
The following example, involving a transformation that can be broken down into two transformations mono-
tone over different ranges, captures all of the important ideas for even more general cases.
Example 1.M. The Standard Normal and Chi-square (1 d.f.) distributions. Suppose X is a continuous
random variable with probability density function fX (x) defined for all x, −∞ < x < ∞, by

1 −x2
φ(x) = √ e 2 . (1.33)
2π

This distribution is called the Standard Normal distribution, and it is sufficiently important that the symbol
φ(x) is reserved for its density and Φ(x) for its cumulative distribution function. [Figure 1.18]. The cu-
mulative distribution function cannot be written in closed form in terms of simple functions, but it can be
evaluated numerically and is tabled at the back of the book for x ≥ 0. By the symmetry and continuity of
the distribution, this range is sufficient, since

P (X ≤ −x) = P (X ≥ x) = 1 − P (X < x) = 1 − P (X ≤ x),

or
FX (−x) = 1 − FX (x),
1-17

or
Φ(−x) = 1 − Φ(x) for all x.

Consider the nonmonotone transformation of X, Y = h(X) = X 2 . Because h is nonmonotone over the

range of X, we cannot find a single inverse; rather we exploit the fact that h(x) is separately monotone for
negative and for positive values, and we can find two “inverses”:
√
x = g1 (y) = − y for −∞ < x < 0
√
x = g2 (y) = + y for 0 < x < ∞.
The probability density of Y can then be found by following our earlier logic twice, once for each branch of
the inverse, and adding the results to give

fY (y) = fX (g1 (y)) · |g10 (y)| + fX (g2 (y)) · |g20 (y))|. (1.34)

Why does this work? In essence, for each y > 0 it recognizes that y could have come from either of two
different x’s, so we “look back” to both, namely x = g1 (y) and x = g2 (y). Heuristically, the probability
appropriate to a small interval of width dy at y will be the sum of those found from the two separate branches
(Figure 1.19).
For our example, the range of y is y > 0, and we find
√ 2
1 e−(− y)
1 −y
fX (g1 (y)) = √ =√ e 2 ,
2π 2 2π
√ 2
1 e−( y) 1 −y
fX (g2 (y)) = √ =√ e 2
2π 2 2π
−1
g10 (y) = √
2 y
and
1
g20 (y) = √ ,
2 y

so
1
|g10 (y)| = |g20 (y)| = √ ,
2 y
and

1 −y 1 1 −y 1
fY (y) = √ e 2 · √ + √ e 2 · √ for y > 0
2π 2 y 2π 2 y
1 −y
=√ e 2 for y > 0
2πy
= 0 for y ≤ 0. (1.35)

We shall encounter this density later, it is called the Chi-square distribution with 1 degree of freedom, a name
that will seem a bit less mysterious later.
Example 1.N. Linear change of scale. A common and mathematically simple example of a transformation
of a random variable is a linear change of scale. The random variable X may be measured in inches; what is
the distribution of Y = 2.54X, the same quantity measured in centimeters? Or if X is measured in degrees
Fahrenheit, Y = (X − 32◦ )/1.8 is measured in degrees Celsius. The general situation has

Y = aX + b, (1.36)
1-18

where a and b are constants. For any a 6= 0, h(x) = ax + b is a monotone transformation, with inverse
g(y) = (y − b)/a, g 0 (y) = 1/a, and
1
|g 0 (y)| = .
|a|
We then have, for any continuous random variable X,
µ ¶
y−b 1
fY (y) = fX · , (1.37)
a |a|

while for any discrete random variable X,

µ ¶
y−b
pY (y) = pX . (1.38)
a

Example 1.M (continued) The Normal (µ, σ 2 ) Distribution. A special case of Example 1.N will be of great
use later, namely where X has a standard normal distribution, and Y is related to X by a linear change of
scale
Y = σX + µ, σ > 0. (1.39)
Then Y has what we will call the Normal (µ, σ 2 ) distribution with density
µ ¶
y−µ 1
fY (y) = φ · (1.40)
σ σ
1 (y−µ)2
=√ e− 2σ2 , for −∞ < y < ∞.
2π · σ

[Figure 1.20]
This might be called “general” Normal distribution, as a contrast to the “standard” Normal distribution.
Actually, it is of course a parametric family of densities, with parameters µ and σ. When we encounter this
family of distributions next we shall justify referring to µ as the mean and σ as the standard deviation of
the distribution.

MAST20004 Probability: Lecturers: Mark Fackrell and Aihua Xia
No ratings yet
MAST20004 Probability: Lecturers: Mark Fackrell and Aihua Xia
560 pages
A Thre Ya Sarkar Tanner
No ratings yet
A Thre Ya Sarkar Tanner
252 pages
STA112 Concepts and Principles of Probability
No ratings yet
STA112 Concepts and Principles of Probability
9 pages
Lens Material
No ratings yet
Lens Material
3 pages
İlk Ünite-Introduction To Probability
No ratings yet
İlk Ünite-Introduction To Probability
70 pages
Statistics Ebook - Class 12
100% (1)
Statistics Ebook - Class 12
281 pages
Advance Mathematics Unit 3 N 4
No ratings yet
Advance Mathematics Unit 3 N 4
95 pages
Coin Flip
No ratings yet
Coin Flip
10 pages
Probability CSE
No ratings yet
Probability CSE
16 pages
End321 01 Introduction
No ratings yet
End321 01 Introduction
34 pages
STAT102 Ch2
No ratings yet
STAT102 Ch2
34 pages
Chapter 1 Notess
100% (1)
Chapter 1 Notess
19 pages
BS UNIT 2 Probability New PDF
No ratings yet
BS UNIT 2 Probability New PDF
62 pages
Definitions: Definition 1.1.1
No ratings yet
Definitions: Definition 1.1.1
28 pages
2 Axioms
No ratings yet
2 Axioms
10 pages
Niyathi Iris and Bella
No ratings yet
Niyathi Iris and Bella
24 pages
Introduction To The Theory and Practice of Economtrics
0% (1)
Introduction To The Theory and Practice of Economtrics
112 pages
Chapter - 1 Background
No ratings yet
Chapter - 1 Background
75 pages
Summary of Lectures - Chapter 1. Probability
No ratings yet
Summary of Lectures - Chapter 1. Probability
4 pages
Probability
No ratings yet
Probability
13 pages
Unit - I Theory of Probability
No ratings yet
Unit - I Theory of Probability
27 pages
Operation & Maintenance Manual Hermetically Sealed
No ratings yet
Operation & Maintenance Manual Hermetically Sealed
15 pages
Chapter 1 - Language of Chemistry
No ratings yet
Chapter 1 - Language of Chemistry
36 pages
Probability and Statistics
No ratings yet
Probability and Statistics
65 pages
Probabilitytheory
No ratings yet
Probabilitytheory
10 pages
CHAPTER 4. Probability PDF
No ratings yet
CHAPTER 4. Probability PDF
81 pages
Probability - Lecture 2-3
No ratings yet
Probability - Lecture 2-3
6 pages
Probability Modified PDF
No ratings yet
Probability Modified PDF
23 pages
Mechanics 2 (International) (Cambridge International - Douglas Quadling - Cambridge Advanced Mathematics, Cambridge, 2002-200!1!50
No ratings yet
Mechanics 2 (International) (Cambridge International - Douglas Quadling - Cambridge Advanced Mathematics, Cambridge, 2002-200!1!50
50 pages
STT 430/630/ES 760 Lecture Notes: Chapter 3: Probability
No ratings yet
STT 430/630/ES 760 Lecture Notes: Chapter 3: Probability
15 pages
Welcome To Engineering Physics (PHY1001)
No ratings yet
Welcome To Engineering Physics (PHY1001)
124 pages
Chapter 7 Probability
No ratings yet
Chapter 7 Probability
42 pages
UNIT II Notes
No ratings yet
UNIT II Notes
31 pages
Iso Full
No ratings yet
Iso Full
118 pages
Lecture-5
No ratings yet
Lecture-5
17 pages
CHAPTER 8 Introduction To Probability
No ratings yet
CHAPTER 8 Introduction To Probability
50 pages
Environmental
No ratings yet
Environmental
32 pages
Probability PDF
No ratings yet
Probability PDF
56 pages
Probablity
No ratings yet
Probablity
10 pages
CE Board May 2021 - Engineering Mechanics - Set 1
No ratings yet
CE Board May 2021 - Engineering Mechanics - Set 1
3 pages
SLIDES Probability-Part1
No ratings yet
SLIDES Probability-Part1
27 pages
PCS Notes M1
No ratings yet
PCS Notes M1
17 pages
Lesson 2 - Probability (With Exercises)
No ratings yet
Lesson 2 - Probability (With Exercises)
10 pages
Lecture - 1
No ratings yet
Lecture - 1
9 pages
15 Famous Indian Scientists and Their Inventions
No ratings yet
15 Famous Indian Scientists and Their Inventions
11 pages
Chapter 2. Discrete Models: 1 Probability - The Foundation of Statistics
No ratings yet
Chapter 2. Discrete Models: 1 Probability - The Foundation of Statistics
29 pages
Probability
No ratings yet
Probability
11 pages
21 Probability PDF
No ratings yet
21 Probability PDF
11 pages
Probability: Sample Space, Sample Points and Events
No ratings yet
Probability: Sample Space, Sample Points and Events
9 pages
Statistics and Probability Q3
No ratings yet
Statistics and Probability Q3
29 pages
Lecture 10 & 11 - Probability-1
No ratings yet
Lecture 10 & 11 - Probability-1
19 pages
Frequency With Which That Outcome Would Be Obtained If The Process Were
No ratings yet
Frequency With Which That Outcome Would Be Obtained If The Process Were
23 pages
11 9709 12 Afp n24
No ratings yet
11 9709 12 Afp n24
20 pages
Ch.2 Probability
No ratings yet
Ch.2 Probability
26 pages
ProbabilityStatistics Probability
No ratings yet
ProbabilityStatistics Probability
10 pages
Lecture 1
No ratings yet
Lecture 1
6 pages
Notes 1
No ratings yet
Notes 1
4 pages
Artificial Lift Design: Experiment No: Date: - / - /2015
No ratings yet
Artificial Lift Design: Experiment No: Date: - / - /2015
19 pages
Lecture - 3 Dosimetric Quantities and Biological Effects
No ratings yet
Lecture - 3 Dosimetric Quantities and Biological Effects
33 pages
Catalog - Mag Drive Motor Driven Reels
No ratings yet
Catalog - Mag Drive Motor Driven Reels
24 pages
Stats Prob Week 1
No ratings yet
Stats Prob Week 1
13 pages
UNIT3
No ratings yet
UNIT3
5 pages
MW6 Microwave Sensor
No ratings yet
MW6 Microwave Sensor
2 pages
CH 6 - Probability
No ratings yet
CH 6 - Probability
7 pages
Probability Theory
No ratings yet
Probability Theory
9 pages
Lecture 1
No ratings yet
Lecture 1
4 pages
Class VI Geometry 10
No ratings yet
Class VI Geometry 10
3 pages
Simple Probability
No ratings yet
Simple Probability
3 pages
PHP Ul 3 Dy 4
No ratings yet
PHP Ul 3 Dy 4
17 pages
Probability Books
No ratings yet
Probability Books
3 pages
Sci.10 PT Weeks 1-3 - 2nd Quarter
No ratings yet
Sci.10 PT Weeks 1-3 - 2nd Quarter
2 pages
Probability: Concepts Related To Probability
No ratings yet
Probability: Concepts Related To Probability
10 pages
E82EV - 8200 Vector 0.25-11kW - v4-0 - EN
No ratings yet
E82EV - 8200 Vector 0.25-11kW - v4-0 - EN
25 pages
Physics in Context
No ratings yet
Physics in Context
8 pages
Aditek Catalogue Ed.04 - English
No ratings yet
Aditek Catalogue Ed.04 - English
64 pages
MCE 312 Lecture Note (Module 2)
No ratings yet
MCE 312 Lecture Note (Module 2)
29 pages
Analytic Geometry Problems
No ratings yet
Analytic Geometry Problems
4 pages
TG0027A Uid12172009144042
No ratings yet
TG0027A Uid12172009144042
8 pages
ECT Inspection Technique: Setup and Calibration
100% (2)
ECT Inspection Technique: Setup and Calibration
19 pages
Gra 2
No ratings yet
Gra 2
5 pages
Refraction of Light at A Plane Surface Object:: Incidence Is
No ratings yet
Refraction of Light at A Plane Surface Object:: Incidence Is
6 pages
Solid State-1-Final - Upto Packing Fraction & Density-C7
No ratings yet
Solid State-1-Final - Upto Packing Fraction & Density-C7
14 pages
Finite Element Analysis of Machining Thin-Wall Parts: Key Engineering Materials December 2010
No ratings yet
Finite Element Analysis of Machining Thin-Wall Parts: Key Engineering Materials December 2010
9 pages
Probability Theory - Counting
No ratings yet
Probability Theory - Counting
8 pages
The Acid-Base Properties of Water
No ratings yet
The Acid-Base Properties of Water
3 pages
Achieving A Constant Power Speed Range For Drives: R. Slemon
No ratings yet
Achieving A Constant Power Speed Range For Drives: R. Slemon
5 pages
Diving in The Material World
No ratings yet
Diving in The Material World
42 pages
Harmonic Analysis and the Theory of Probability
From Everand
Harmonic Analysis and the Theory of Probability
Salomon Bochner
No ratings yet
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)

Probability and Calculus

Uploaded by

Probability and Calculus

Uploaded by

1-1

Chapter 1. The Calculus of Probabilities.

(1.1) Scaling: P (S) = 1, and 0 ≤ P (E) for all E in S

(1.2) Additivity: If E and F are mutually exclusive, P (E ∪ F ) = P (E) + P (F ).

(1.3) Complementarity: P (E) + P (E c ) = 1 for all E in S.

E = “an odd number is thrown”

1.2 Conditional Probability

In complex experiments it is common to simplify the specification of probabilities by describing them

(1.8) General multiplication: For any events E and F in S,

(1.10) Independent events: If E and F are independent then P (E|F ) = P (E|F c ).

(1.11) Multiplication with independent events: If E and F are independent, then

Alternatively, using (1.11),

Then A = E ∩ F , P (E) = 1/180, and by independence P (F |E) = P (F ) = 1/360 and so

1.4 Stirling’s Formula

1.5 Random Variables

The same probability distribution can

pX (x) = FX (x) − FX (x − 1).

1.6 Binomial Experiments

(1.19) (a) The experiment consists of a series of n independent trials.

(1.20) The Binomial (n, θ) Distribution

For this experiment the random variable of most interest is

Z = # “failures” before the rth “success”.

N B(n − r; r, θ) = 1 − B(r − 1; n, θ), (1.22)

B(x; n, θ) = 1 − B(n − x − 1; n, 1 − θ). (1.23)

1.7 Continuous Distributions

(i) fX (x) ≥ 0 for all x

P (c < X ≤ d) = P (c < X < d)

since, for example,

P (X > t) = e−θt for t ≥ 0

fX (t) = θe−θt for t ≥ 0

1.8 Transformations of Random Variables

fX (x) = θe−θx x≥0

Now, making the change of variables, x = g(y), and dx = g 0 (y)dy, we have

or, since θe−θ(− log(y)/θ) = θy,

P (X ≤ −x) = P (X ≥ x) = 1 − P (X < x) = 1 − P (X ≤ x),

Consider the nonmonotone transformation of X, Y = h(X) = X 2 . Because h is nonmonotone over the

while for any discrete random variable X,

You might also like