Probability Theory
Probability Theory
MTH 2102
Lecture Notes
BY
Daniel Nkwata Katongole
2025
MTH 2102 Probability Theory– Semester Two 1
Table of Contents
Table of Contents 2
1 Probability 4
1.1 Introduction to Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.1 A Statistical ( random) experiment . . . . . . . . . . . . . . . . . . . 4
1.1.2 Features or characteristics of a statistical experiment . . . . . . . . 4
1.1.3 Example of statistical experiments . . . . . . . . . . . . . . . . . . . 5
1.1.4 Sample or Event space. . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.5 Examples of random experiments and their sample spaces . . . . . 5
1.1.6 Types of sample spaces . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.1.7 An Event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.1.8 Operation of events . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Mutually Exclusive Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Probability of an event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.1 The case of countably finite sample space . . . . . . . . . . . . . . . 12
1.3.2 The classical definition . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.3.3 The Relative frequency definition . . . . . . . . . . . . . . . . . . . . 14
1.3.4 The subjective definition . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.3.5 The Kolmogorov definition/Axiomatic definition . . . . . . . . . . . 15
1.4 Dependent (Conditional Probability) events. . . . . . . . . . . . . . . . . . 19
1.5 Independent events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.6 Total probability and Bayes’ Theorem . . . . . . . . . . . . . . . . . . . . . 24
2 RANDOM VARIABLES 28
2.1 Types of quantitative Random variables . . . . . . . . . . . . . . . . . . . . 29
2.2 The Probability Distributions/ Probability Functions . . . . . . . . . . . . . 29
2.3 Discrete Probability Distributions/Functions . . . . . . . . . . . . . . . . . . 29
2.4 The cumulative mass function/Distribution function . . . . . . . . . . . . . 31
2.5 Properties of a discrete random variable . . . . . . . . . . . . . . . . . . . . 33
2.5.1 Expectation of a random variable E(X) . . . . . . . . . . . . . . . . . 33
2.6 Expectation of a function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.7 Variance of a discrete random variable Var(x) . . . . . . . . . . . . . . . . . 35
2.8 Mode of a discrete random Variable . . . . . . . . . . . . . . . . . . . . . . 36
2.9 Median of a discrete random Variable . . . . . . . . . . . . . . . . . . . . . 36
2
3.3 Mean and Variance of a Binomial random variable . . . . . . . . . . . . . . 40
3.3.1 The Expected value of a binomial random variable X . . . . . . . . 40
3.3.2 Variance, Var(X) of a Binomial random variable X . . . . . . . . . . 41
3.4 The Negative Binomial distribution (The waiting time or Pascal distribution) 46
3.4.1 The mean and variance of X . . . . . . . . . . . . . . . . . . . . . . . 47
3.4.2 Remark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.4.3 Mean and variance of X ∽ b(x; 1, P) . . . . . . . . . . . . . . . . . . . 49
3.5 The mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.6 Hyper geometric distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.6.1 Mean and Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.7 The Poisson distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.7.1 Properties / postulates of a Poisson process . . . . . . . . . . . . . . 53
3.7.2 Example of Poisson processes; . . . . . . . . . . . . . . . . . . . . . 54
3.7.3 The Mean and Variance of a Poisson random variable . . . . . . . . 54
3.7.4 Verifying that P(x; λ)is a pmf . . . . . . . . . . . . . . . . . . . . . . 56
3.7.5 Mode (most likely value) of the Poisson distribution . . . . . . . . 57
3.7.6 Computing probabilities . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.7.7 Poisson approximation of the Binomial distribution . . . . . . . . . 59
Probability
1.1 Introduction to Probability
1.1.1 A Statistical ( random) experiment
A statistical experiment is a chance / random /nondeterministic /stochastic process whose
particular outcome to occur cannot be predicted with certainty though all its possible
outcomes may be known. After carrying out an experiment, the result of the random
experiment is then known. Thus, we cannot know the particular outcome of an experi-
ment/process until it is performed.
ii When the experiment gets performed repeatedly, the individual outcomes seem to
occur in a haphazard manner. However repeating the experiment a large number of
times, a definite pattern or regularity of outcomes appears.
iii Although the particular outcome to occur cannot be stated with certainty, the set of
all possible outcomes can be described (Sample / event space).
• Each possible outcome can be predicted or specified in advance but the particular
one to occur may not be known with certainty.
4
1.1.3 Example of statistical experiments
1 A coin toss;
Each individual element of an event space resulting from a random experiment is called
a sample outcome or a point but generally an outcome. A sample point appears at most
once in a set. When we repeat a random experiment several times, we call each particular
performance a trial.
• If the sample points can be listed but the number of sample points is infinite (endless
list of outcomes), then the sample space is said to be a countably infinite sample
space. Example 5 above.
Countably finite or countably infinite sample spaces are generally called Discrete/Discontinuous
sample spaces while uncoutably infinite sample spaces are said to be Continouos sample
spaces.
1.1.7 An Event
Is the occurrence of one or a combination of two or more sample points of a sample space.
An event is part or a subset of the sample space-hence an event is a set also. If it contains
only one sample point it is said to be a simple event or indecomposable. And if the event
contains two or more sample points, it is called a compound or decomposable event because
it can be decomposed into simple events.
Example
Consider rolling a dice once; The sample space is given by:
S = {1, 2, 3, 4, 5, 6} (1.1)
The set
E = {obtaining a 4}, =⇒ E = {4} (1.2)
is a simple event. Whereas if
2 Union of events:
• ∈: is a member of
• ∋: contains as a member,
• ⊂: is a subset of
• ⊆: is a subset of or equal to
• ⊇: is a super-set of or equal to
• {}: These are curly brackets used for a set, ( for binding an operation)
Remarks
Operations of events is similar to the algebra of sets. The following are some of the rules
of events. Let S be the universal set.
1 Identity laws
• A∪∅=A
• A∪S=S
• A∩∅=∅
• A∩S=A
2 Commutative laws
• A∪B=B∪A
• A∩B=B∩A
3 Associative laws
• (A ∪ B) ∪ C = A ∪ (B ∪ C)
• (A ∩ B) ∩ C = A ∩ (B ∩ C)
4 Distributive laws
• A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)
• A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C)
5 Complementary laws
• (Ac )c = A
• Sc = ∅
• ∅c = S
• (A ∪ Ac ) = S
• A ∩ Ac =∅
6 De-Morgan’s laws
• (A ∩ B)c = Ac ∪ Bc
• (A ∪ B)c = Ac ∩ Bc
• (A1 ∩ A2 ∩ A2 ∩ ....... ∩ An )c = Ac1 ∪ Ac2 ∪ Ac2 ∪ ....... ∪ Acn = ∪ni=1 Aci
7 Idempotent law
• A∩A=A
• A∪A=A
A∩B=∅ (1.5)
or
P(A ∩ B) = 0 (1.6)
such that
P(A ∪ B) = P(A) + P(B) (1.7)
If this is so, then A and B are also said to disjoint events. A and B have no common
elements. Generally, for n disjoint events Ai ; i = 1, 2, .....n Then
n
X
P(∪ni=1 Ai ) = P(A1 ) + P(A2 ) + .........P(An ) = P(Ai ) (1.8)
1
(A ∪ B) = S (1.9)
or
P(A ∪ B) = P(S) = 1 (1.10)
then A and B are said to be mutually exhaustive events. Note that clearly mutual exclu-
siveness implies mutual exhaustiveness. Thus,
(A ∩ B) = ∅ =⇒ (A ∪ B) = S (1.11)
or
P(A ∩ B) = 0 =⇒ P(A ∪ B) = P(S) = 1 (1.12)
However, mutual exhaustiveness does not necessarily imply mutual exclusiveness. Thus,
if A ∪ B = S or P(A ∪ B) = P(S) = 1, then not necessarily true that A ∩ B = ∅ or P(A ∩ B) = 0.
Example
If A and B are mutually exhaustive events, test whether the following are mutually
exclusive events;
Solution
From the contingency table
B Bc
A A∩B A ∩ Bc
Ac Ac ∩ B Ac ∩ Bc
(A ∩ B) = ∅ (1.13)
and
(A ∪ B) = S (1.14)
or
P(A ∩ B) = 0 (1.15)
and
P(A ∪ B) = 1 (1.16)
Now
B = (A ∩ B) ∪ (Ac ∩ B) (1.17)
or
Ac = (Ac ∩ B) ∪ (Ac ∩ Bc ) (1.18)
P(B) = P(A ∩ B) + P(Ac ∩ B) (1.19)
P(Ac ∩ B) = P(B) − P(A ∩ B) (1.20)
P(Ac ∩ B) = P(B) − 0 (1.21)
P(Ac ∩ B) = P(B) , 0 (1.22)
Hence Ac and B are not mutually exclusive although events A and B are mutually
exhaustive
(b) P(A ∩ Bc )
Solution
From the contingency table
B Bc
A A∩B A ∩ Bc
Ac Ac ∩ B Ac ∩ Bc
B Bc
A A∩B A ∩ Bc
Ac Ac ∩ B Ac ∩ Bc
Bc = (A ∩ Bc ) ∪ (Ac ∩ Bc ) (1.31)
or
Ac = (Ac ∩ B) ∪ (Ac ∩ Bc ) (1.32)
P(Ac ) = P((Ac ∩ B) ∪ (Ac ∩ Bc )) (1.33)
P(Bc ) = P(A ∩ Bc ) + P(Ac ∩ Bc ) (1.34)
p(Ac ∩ Bc ) = P(Bc ) − P(A ∩ Bc ) (1.35)
p(Ac ∩ Bc ) = P(Bc ) − (P(A only)) (1.36)
p(Ac ∩ Bc ) = P(A ∪ B)c (1.37)
p(Ac ∩ Bc ) = 1 − P(A ∪ B) (1.38)
p(Ac ∩ Bc ) = 1 − [P(A) + P(B) − P(A ∩ B)] (1.39)
p(Ac ∩ Bc ) = 1 − [P(A) + P(B)] , 0 (1.40)
P(A) is a value/ number between 0 and 1 inclusive i.e 0≤ P(A) ≤ 1 that shows how likely
an event A is to occur. If P(A) is 0, the event null, if close to 0, then it is very unlikely that
the event A occurs , if P(A) is close to 1, A is very likely to occur and if P(A) is 1, then
it is a sure deal that the event is to occur. Below are some of the alternative definitions
/interpretations of the probability of an event, P(A);
wi = 1.
P
and w can be established using the conditions 0 ≤ wi ≤ 1 and S
Example 1
Suppose that a fair dice is rolled once, find the probability that the outcome is an even
number.
Solution
w1 + w2 + w3 + w4 + w5 + w6 = 1
Since the die is fair, then all the outcomes are equally likely. and therefore
w1 = w2 = w3 = w4 = w5 = w6 = w
A = {2, 4, 6}
Hence,
X 1
P(A) = wi = w + w + w = 3w = 3 × = 0.5
6
Example 2
A dice is loaded once such that an odd number is thrice as likely to appear on top as an
even number. What is the probability that on the toss of the dice, numbers less than 3
appear?
Solution
Finite Sample space
{1, 2, 3, 4, 5, 6}
Let weights w, be the weight of an even number, then the weight of an odd number is
3w (or if the weight of an odd number is w, the weight f an even number is 31 w, such that
0 ≤ wi ≤ 1 and 61 wi = 1 This implies that
P
3w + w + 3w + w + 3w + w = 1
12w = 1
or
1
w=
12
Let A be an event that an even number appears, A = {1, 2} Hence,
X
P(A) = Awi = 3w + w = 4w
or
1
P(A) =
3
Or if the weight of an odd number is w, the weight of an even number is 31 w, such that
0 ≤ wi ≤ 1 and 61 wi = 1 This implies that
P
1 1 1
w+ + w + + w + = 4w = 1
w w w
1
w=
4
Let A be an event that an even number appears,A = {1, 2} Hence,
X 1 4 1
P(A) = Awi = w + w = w =
3 3 3
Disadvantages of this definition/interpretation
n(A) n
P(A) = = , N,0 (1.42)
n(S) N
Proof
Let S = a1 , a2 , ....aN . Since the outcomes are equally likely, and S is finite we assign w to
each ai such that that 0 ≤ w ≤ 1 inclusive and Sw = 1
P
1
Nw = 1 =⇒ w = (1.43)
N
If n of the N outcomes correspond to event A, then A = {a1 , a2 , .....an } and so that
X 1 n
P(A) = Aw = w + w + ...... + w = nw = n × = (1.44)
N N
Remark
The word fair is used to convey the fact that the possible outcomes are equally likely. This
is the most commonly used definition of probability of an event. It is a particular case of
the first definition when the outcomes are equally likely.
Limitations
This definition fails if;
Example
If records show that 516 out of the 600 aero-planes from Entebbe to Nairobi arrive on time,
what is the probability that anyone aero-plane from Entebbe to Nairobi will not arrive on
time?
Solution
n 84
P(L) = lim = lim = 0.14
N→∞ N 600→∞ 600
Limitations
• Repeating an experiment exactly under the same conditions is impossible even with
the same equipment.
• We are not told how large the number of trials N should be in order to be considered
being “sufficiently large”
• Superstitions
• Lotteries
1. Axiom 1:
For any event A, P(A) ≰ 0, P(A) ≤ 1. Or 0 ≤ P(A) ≤ 1. This axiom indicates that
P(A) cannot be negative. The smallest value for P(A) is zero and if P(A) = 0 then the
event A will never happen/occur.
2. Axiom 2:
Probability of the sample space S is P(S) = 1. This axiom states that the probability of
the whole sample space is absolutely 1 (100%). S is a sure event to occur. The reason
for this is that the sample space S contains all possible outcomes of the random
experiment. Thus, the outcome of each trial always belongs to S, i.e., the event S
always occurs and P(S) = 1. At any trial, any outcome must be part of the sample
space.
3. Axiom 3:
If A1 , A2 , ......An is a sequence of n disjoint or mutually exclusive events in the sample
space S, then the probability of occurrence of at least one of them i.e their union is
given by
n
X
P(A1 ∪ A2 ∪ ... ∪ An ) = P(A1 ) + P(A2 ) + ... + P(An ) = P(∪ni=1 AI ) = P(Ai ) (1.46)
i=1
This axiom is probably the most interesting one. The basic idea is that if some
events are disjoint (i.e, there is no overlap between them), then the probability of
their union must be the summations of their probabilities. Another way to think
about this is to imagine the probability of a set as the area of that set. If several sets
are disjoint, then the total area of their union is the sum of individual areas.
Kolmogorov’s three axioms have been used to prove various theorems in Probability
theory such as;
Figure 1.1
P(A ∪ Ac ) = P(S) = 1
3. If A ⊆ S,then P(A) ≤ 1.
From
P(A) ≤ P(S) = 1
then
P(A) ≤ 1 (1.55)
Figure 1.2
5. P(A ∪ B) = P(A) + P(B) − P(A ∩ B) where A and B are non-mutually exclusive events
in a sample space S. This is called an additive rule.
Figure 1.3
A ∪ B = (A ∪ B only) (1.60)
A ∪ B = (A only ∪ B) (1.61)
Using Eq. 1.59
P(A ∪ B) = P(A only ∪ (A ∩ B) ∪ B only) (1.62)
P(A ∪ B) = P(A only) + P((A ∩ B) + P(B only) − 0 (1.63)
since A only,(A ∩ B) and B only) are Mutually exclusive. But
and
P(B only) = P(B) − P(A ∩ B) (1.65)
So
Or
P(A ∪ B) − P(B) = P(A) − P(A ∩ B) (1.67)
Or
P(A − B) = P(A only) = P(A) − P(A ∩ B) (1.68)
And
P(B only) = P(B) − P(A ∩ B) (1.69)
Note
In all the definitions above, a probability function of an event,P(A) is a number.
P(A ∩ B)
P(A/B) = (1.71)
P(B
This implies that
P(A ∩ B) = P(A/B) × P(B) (1.72)
Similarly, P(B/A), is the conditional probability of event B given that event B has occurred,
defined by
P(A ∩ B)
P(B/A) = (1.73)
P(A)
This implies that
P(B ∩ A) = P(A/B) × P(A) (1.74)
Since (A ∩ B) = (B ∩ A) by the commutative law, then;
The equations Eq. 1.72 and Eq. 1.74 are called multiplicative rules of probability. Gener-
ally, if A1 , A2 , ........An are events in some sample space S such that P(Ai ) , 0, P(Ai ∩ A j ) , 0
and P(∩ni=1 Ai , 0), then the multiplicative rule becomes
A2 A3 An
P(∩ni=1 Ai ) = P(A1 ) × P ×P × ...... × P (1.76)
A1 A1 ∩ A2 A1 ∩ A2 ∩ A3 ∩ ....... ∩ An−1
For i = 3,
A2 A3
P(∩ni=1 Ai ) = P(A1 ) × P ×P (1.77)
A1 A1 ∩ A2
Example//
Solution
Let D=Plane departs on time P(D) = 0.83, If A =Plane arrives on time P(A) = 0.92,
P(D ∩ A) = 0.78 = P(A ∩ D)
i)
P(A ∩ D) 0.78
P(A/D) = = = 0.94
P(D) 0.83
ii)
P(D ∩ A) 0.78
P(D/A) = = = 0.95
P(A) 0.92
2 An urn contains 20 mangoes of which 5 are bad. If 2 mangoes are selected at random
from the urn in succession without replacement of the first mango, find the probability
that both mangoes are bad.
Solution
Let M1 =picking a mango the first time M2 =picking a mango the second time
5 4 1
P(M1 ∩ M2 ) = P(M1 ) × P(M2 /M1 ) = × =
20 19 19
And and
P(B ∩ A) = P(B) × P(A/B) × P(A) (1.81)
More generally, from Eq. 1.80,
∩ B Bc
A A∩B A ∩ Bc
Ac Ac ∩ B Ac ∩ Bc
1. a (Ac ∩ B)
P(Ac ∩ B) = P(B) − P(A ∩ B) (1.83)
P(Ac ∩ B) = P(B) − P(A) × P(B) = P(B)(1 − P(A)) (1.84)
P(Ac ∩ B) = P(B) × P(Ac ) (1.85)
2. (A ∩ Bc )
P(A) = P(A ∩ B) + P(A ∩ Bc ) (1.86)
P(A ∩ Bc ) = P(A) − P(A) × P(B) (1.87)
P(A ∩ Bc ) = P(A)[1 − P(B)] = P(A) × P(Bc ) (1.88)
3. (Ac ∩ Bc )
P(Ac ∩ Bc ) = P(Ac ∩ B) + P(Ac ∩ Bc ) (1.89)
P(Ac ∩ Bc ) = P(Ac ) − P(Ac ∩ B) + P(Ac ∩ Bc ) (1.90)
P(Ac ∩ Bc ) = P(Ac ) − P(Ac ) × P(B) = P(Ac )[1 − P(B)] = P(Ac ) × P(B∗) (1.91)
Examples
a) Without replacement?
b) With replacement?
5 4 3 1
P(D ∩ D ∩ D) =
× × =
20 19 18 114
b)With replacement, it means an independent case
5 5 5 1
P(D ∩ D ∩ D) = P(D) × P(D) × P(D) = × × =
20 20 20 64
3 A box contains 6 black ball and 4 white balls. Two balls are selected from the box
without replacement. Find the probability that;
a) Both balls are black
b) Both balls are of the same colors
c) Both balls are of different colors
Solution
Given n(B) = 6, n(W) = 4 and n(T) = 10, therefore P(B) = 6
10
and P(W) = 4
10
Without
replacement implies dependence case a) Both ball black
6 5 1
P(B ∩ B) = × =
10 9 3
b) Balls of the same color
6 5 4 3 7
P((B ∩ B) ∪ (W ∩ W)) = P(B ∩ B) + P(W ∩ W) = × + × =
10 9 10 9 15
c) Different colors
7 8
P((B ∩ W) ∪ (W ∩ B)) = P(B ∩ W) + P(W ∩ B) = 1 − P(Same Color) = 1 − =
15 15
4 A bag contains 4 white buttons and 3 black ones and a second bag contains 3 white
and 5 black buttons. One button is picked at random from the second bag and placed
unseen into the first bag. What is the probability that a button drawn from the first bag
is white?
5 Two baskets A and B contain 8 white, 4 blue and 5 white and 10 blue balls respectively.
A ball is drawn from randomly from A and is transferred into B. If a ball is drawn
randomly from B, find the probability that it will be white?
7 A coin is tossed three times. Find the probability of getting 2 tails and 1 head if the coin
is;
(a) Fair
(b) Such that a head is twice as likely to occur as the tail.
Solution
The sample space is S = {HHH, HHT, HTH, THH, HTT, THT, TTH, TTT}// a) If the coin is
fair, then all the outcomes in the sample space are equally-likely , each with a probability
of P = 18 . The event with 2 tails and 1 head would be A = {HTT, THT, TTH}. Then
1 1 1 3
P(A) = + + =
8 8 8 8
b) Here, the outcomes independently occur but are not equally likely Letw bePthe weight
of a tail This implies that the weight of the head is such that 0 ≤ w ≤ 1and s w = 1
For HHH,
P(HHH) = P(H) × P(H) × P(H) = 2w × 2w × 2w = 8w3
For HHT,
P(HHT) = P(H) × P(H) × P(T) = 2w × 2w × w = 4w3
For HTH,
P(HTH) = P(H) × P(T) × P(H) = 2w × w × 2w = 4w3
For THH,
P(THH) = P(T) × P(H) × P(H) = w × 2w × 2w = 4w3
For HTT,
P(HTT) = P(H) × P(T) × P(T) = 2w × w × w = 2w3
For THT,
P(THT) = P(T) × P(H) × P(T) = w × 2w × w = 2w3
For TTH,
P(TTH) = P(T) × P(T) × P(H) = w × w × 2w = 2w3
For TTT,
P(TTT) = P(T) × P(T) × P(T) = w × w × w = w3
But X
W=1
S
But
Therefore
2 1 1 1 2 1 12 1 2
P(A) = × × + × × + × =
3 3 3 3 3 3 33 3 9
Figure 1.4
After Ai has occurred, also let B be an event in S, read as “B s a partition of S” such that
P(B) , 0
B can be expressed as a union of the n Mutually exclusive events (Ai ∩ B). Thus,
This is called the law of total probability. For some event AI , then
This is called the famous Reverend Thomas Bayes’ Theorem This is called the famous
Reverend Thomas Bayes’ Theorem
Example
Members of some consultancy firm rent cars from three rental agencies; 60% from agency
A1 , 30% from agency A2 , and 10% from agency A3 . If 9% of the cars from A1 need a
tune-up, 20% of the cars from A2 need a tune-up and 6% from A3 need a tune-up.
(a) What is the probability that a rental car delivered to the firm will need a tune-up?
(b) If a rental car delivered to the firm needed a tune-up what is the probability that it
Solution
Let T represent an event that a car needs a tune-up service, then; Given P(A1 ) = 0.6,
P(A2 ) = 0.3,P(A3 ) = 0.1, P(T/A1 ) = 0.09,P(T/A2 ) = 0.2 and P(T/A3 ) = 0.06
(a)
P(T) = P(A1 ∩ T) + P(A2 ∩ T) + P(A3 ∩ T)
P(T) = P(A1 ) × P(T/A1 ) + P(A2 ) × P(T/A2 ) + P(A3 ) × P(T/A3 )
P(A2 ∩ T)
P(A2 /T) = = P(A2 ) × P(T/A2 )
T
P(A2 ∩ T) P(A2 ) × P(T/A2 ) 0.3 × 0.2
P(A2 /T) = = = = 0.5
P(T) P(T) 0.12
1 In a production process, there are two different machines M1 and M2 . 20% and 80%
of the items are produced by M1 and M2 respectively. It has been established that 50%
of the items produced by M1 and 8% of the items produced by M2 are defective. If an
item is selected at random, find;
2 Two boxes one with 8 black and 2 white balls and the other with 3 black and 7 white
balls are placed on a table. One box is chosen at random and a ball is picked from it.
3 Three professors have been nominated for the post of Dean. Their respective proba-
bilities are 0.3, 0.5 and 0.4. If professor A is elected, the probability that the faculty
will have new computers is 0.8. If B is elected, the probability is 0.1 and if C, it is 0.3.
Find the probabilities that the professors A,B and C are elected if the faculty gets new
computers.
RANDOM VARIABLES
A random variable:
A variable is a characteristic being studied that can assume a prescribed set of values. It
can be Qualitative (Nominal & ordinal) or quantitative (Discrete & Continuous).
Example
Consider tossing a fair coin once, twice and three times. And in every case, count the
“number of heads” X obtained;
Remarks
• In the above experiments, we were not interested in the individual outcomes but
instead some variable X (the number of heads) whose values were determined by
all the outcomes of the random experiment
• The variable X assume different real values x for the respective random trials
• The prescribed set of values assumed by the variable X form the domain variables.
Therefore, a variable whose value is real and determined by all the sample outcomes
of the experiment is called a random variable/chance variable/stochastic variable.
In these particular experiments, a random variable X is some function that assigns
a real value/number x) to each possible outcome in the sample space.
28
2.1 Types of quantitative Random variables
1 A discrete random variable X is one that assumes countable number of values. It is
defined over a countably finite and infinite sample spaces. It only assumes finite/infinite
number of values. Example;
• Number of heads in the above example
• Number of colored bulbs in a house
• Number of chairs in a library
2 A continuous random variable is defined over an uncountably infinite sample space.
It assumes values on the whole or part of the interval of the real line. In other words
it assumes values associated with the points on the number line. Examples include
weghts, heights, racing time, etc
X=x 0 1 2
1 2 1
P(X=x) 4 4 4
This is a probability distribution and since P(X = x) ≥ 0 for all x = 0, 1, 2 and P(X =
P
all x
x) = 1, then this distribution is a pmf.
X=x 0 1 2
1 2 1
P(X=x) 4 4 4
X=x 0 1 2 3 4 5 6
1 1 1 1 1 1 1
P(X=x) 6 6 6 6 6 6 6
+ 1 2 3 4 5 6
1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12
X=x 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 5 4 3 2 1
P(X=x) 36 36 36 36 36 36 36 36 36 36 36
Exercise
e) Given
x
25 ;
x = 1, 2, 3, 4, 5
P(X = x) =
(2.3)
0;
Otherwise
i Is P(X = x) a pmf?
ii Find P(1 ≤ x ≤ 3)
iii P(x ≥ 2)
iv Find P(2 ≤ x ≤ 5)
f) Given a pmf
cx; x = 1, 2, 3, 4
P(X = x) =
(2.4)
0; Otherwise
X S=x
X
F(x) = P(X ≤ x) = P(−∞ ≤ x ≤ ∞) = P(x) = P(S = x) (2.5)
all x S=s
F(x) Specifies the probability that an observed value of a random variable X will not
exceed x. Such a function is called a cmf or simply a distribution function. F(x) has the
following properties;
ii F(x) ≥ 0. it is non-negative
iii F(x ≥ ∞) = 1
iv If a and b are constants such that a ≤ b then F(a) = P(x ≤ a) ≤ P(x ≤ b) = F(b) This
implies that F(a) ≤ F(b)
Examples
Solution
For all values of x < 1
F(x) = 0
For x = 1
5−1 4
F(x) = P(x ≤ 1) = =
10 10
For x = 2
4 3 7
F(x) = P(x ≤ 2) = P(x = 1) + P(x = 2) = + =
10 10 10
For x = 3
4 3 2 9
F(x) = P(x ≤ 2) = P(x = 1) + P(x = 2) + P(x = 3) = + + =
10 10 10 10
For x = 4
4 3 2 1
F(x) = P(x ≤ 2) = P(x = 1) + P(x = 2) + P(x = 3) + P(x = 4) = + + + =1
10 10 10 10
4
10
; 1≤x≤2
7
10 ;
2≤x≤3
P(X = x) =
9
10
; 3≤x≤4
1; x≥4
0;
Else where
If graphed, F(x) gives a step graph
(b) Given
x
10 ;
x = 1, 2, 3, 4
P(X = x) =
(2.7)
0;
Else where
Find F(x) For all values of x < 1
F(x) = 0
For x = 1
1
F(x) = P(x ≤ 1) =
10
Example
Given
x
25 ;
x = 1, 2, 3, 4, 5
P(X = x) =
(2.9)
0;
Else where
Find E(x)
Solution
From the definition for expectation,
X 5
X
E(X) = xP(X = x) = xP(X = x) = 1×P(X = 1)+2×P(X = 2)+3×P(X = 3)+4×P(X = 4)+5×P(X = 5)
all x 1
Simplifying
1 2 3 4 5 55
E(X) = 1 × +2× +3 +4× +5× =
25 25 25 25 25 25
Assume
g(x) = ax + c (2.11)
Then
• E(ax + c) = aE(x) + c
Example
(a) Given
X=x 0 1 2
1 2 1
P(X=x) 4 4 4
Find E(2x + 5)
Solution
X
E(x) = xP(x) = 1
all x
E(2x + 5) = 2E(x) + 5 = 2 × 1 + 5 = 7
55
E(5x + 7) = 5E(x) + 7 = 5 × + 7 = 18
25
And p
σX = Var(X) (2.16)
is the standard deviation of x NOT standard error. If g(x) is a function of a discrete random
variable X, then
Var(g(x)) = E[g(x) − E(g(x))]2 (2.17)
In particular if g(x) = ax + c , then;
• when a = 0, thenVar(x) = 0
• Var(ax) = a2 Var(x)
• Var(ax) = a2 Var(x)
Proofs exist
Examples
(a) Given
X=x 0 1 2
1 2 1
P(X=x) 4 4 4
Solution
X
E(x) = xP(x) = 1
all x
2
X X
Var(X) = x2 P(X = x) − xP(X = x)
All x All x
1 2 1 1
Var(X) = 02 × + 12 × + 22 × − 12 =
4 4 4 2
(b) Given
x
25 ;
x = 1, 2, 3, 4, 5
P(X = x) =
(2.18)
0;
Else where
Find Var(X)
(c) Find Var(5x + 7) in (b) above
Example
In
X=x 0 1 2
1 2 1
P(X=x) 4 4 4
• The probability of Success, P remains constant for each trial and 0 ≤ P ≤ 1 and the
probability of failure, q = (1 − P) also remains constant.
38
• The binomial distribution b(xi ; n, p) is a function of a discrete random variable X,
hence b(x; n, p) is a discrete probability distribution
• The term !
n x n−x
pq (3.3)
x
n
is the general term in the binomial expansion of (p + q)n or p P(1 − p) . Hence;
! ! ! ! n !
n n 0 n n−1 1 n n−2 2 n 0 n X n x n−x
(p + q) =
n
p q + p xq + p xq + ... + pq = pq (3.4)
0 1 x n x=0
x
Or
! ! ! ! n !
n n 0 n n−1 1 n n−2 2 n 0 n X n x n−x
(q + P) =
n
q P + q xP + q xP + ... + qP = qP (3.5)
0 1 x n x=0
x
Hence
b(x; n, p) = b(0; n, p) + b(1; n, p) + b(2; n, p) + ... + b(n; n, p) (3.7)
• Since (P + q) = 1, then
!
n x n−x
p q =1
x
and also summing up all possible probability outcomes b(x; n, p) = 1. Each individ-
ual term b(xi ; n, p) = 1 ≥ 0
• The value of individual terms:b(0; n, p), b(1; n, p), b(2; n, p), ....b(n; n, p) can be got from
the binomial tables.
• And we can now talk of: P(X = r), P(X < r), P(X > r), P(X ≤ r), P(X ≥ r), P(a ≤ X ≤ b)
e.t.c. whose values can be got from the binomial probability tables.
• !
n n!
n
Cx = = (3.8)
x (n − x)!x!
• If there exists only one Bernoulli trial (n = 1), then the Binomial random variable
X˜(n, P) reduces to a Bernoulli random variable X ≈ (P) simply written as X ≈ (P)
Proof
n n ! n !
X X n x n−x X n
E(X) = Xi P(X = xi ) = x P q =0+ x pPx qn−x ; (3.14)
x=0 x=o
x x=1
x
n !
X n x n−x
E(X) = x Pq ; (3.15)
x=1
x
n
X n!
E(X) = x × Px qn−x ; (3.16)
x=1
x!(n − x)!
m !
X m y m−y
E(X) = nP P q ; (3.20)
y=0
y
Proof
m !
X m
E(X(X − 1)) = n(n − 1)P2 × P y × qm−y (3.41)
y=0
y
Pr Pr−1
Reading from probability tables, P(X = r) = x=0 b(x; n, p) − x=0 b(x; n, p)
(b) When less than r successes result out of n independent trials; P(X < r) = P(X ≤ r − 1) =
Pr−1
x=0 b(x; n, p) = 1 − P(X ≥ r)
(c) When at most r successes result P out of n independent trials; Using the cumulative
Binomial distribution P(X ≤ r) = rx=0 b(r; n, p) = 1 − P(X > r)
(d) When at least r successes result out of n independent trials; P(X ≥ r) = rx=0 b(r; n, p) =
P
1 − P(X < r)
(f) Between a1 and a2 inclusive, a < x < b,result out of n independent trials;
P(aP< x < b) = P(a +P1 ≤ x ≤ b − 1)= P(X ≤ b − 1) − P(X ≤ a)
b−1
= x=0 b(x : n, p) − ax=0 b(x : n, p)
Example1
The probability that a patient recovers from a rare blood disease is 0.4. If 10 people are
known to have contracted the disease, find the probability that;
Solutions
(a) Exactly 5 survive? Given that, P = 0.4, q = 1 − 0.4 = 0.6, n = 10, X = r = 5 Then
!
n r n−r
compute; P(X = r) = b(x : n, p) = pq
r
!
10
P(X = 5) = b(5 : 10, 0.4) = 0.45 0.610−5 = 0.2007
5
Or Using tables (Summations),
P(X = r) = rx=0 b(x : n, p)− r−1
P P P5 P4
x=0 b(x : n, p)= x=0 b(5 : 10, 0.4)− x=0 b(4 : n, 0.4)=0.8338−
0.6331 = 0.2007
Example 2 A multiple quiz has 15 questions, each with four possible answers of which
only one is the correct answer. Determine the probability that shear guess yields;
Solution
Example 3
A newly married couple wishes to produce 5 children, what is the probability that;
Solution
Example 4
12 objective questions, of which each has 4 alternatives in which only one is correct, were
given to a candidate. Find the probability that the candidate gets at most 7 correct.
Solution P
P(X ≤ 7) = 7x=0 b(7 : 12, 41 ) = 0.9972 Note
1 E(X) and Var(X) can also be got using the moments method and probability generating
functions (See PT ahead)
x − E(X) x − np x − np
Z= p = √ = p asn → α (3.50)
Var(X) npq np(1 − p)
i.e. AS 2
1 (x−np)
e− 2 2np(1−p)
n → α, b(x; n, p) ≈ f (x) = p (3.51)
2πnp(1 − p)
Example
The probability that a patient recovers from T.B is 0.4. If 415 people contracted the disease,
find the probability that from 7 to 9 people survive.
Solution
• Using the Binomial approach;
P(a ≤ x ≤ b) = bx=0 b x : n, p − a−1
P P
x=0 b x : n, p
P(7 ≤ x ≤ 9) = 9x=0 b (x : 15, 0.4) − 7−1
P P
P9 x=0 b (x : 15, 0.4)
P7−1
P(7 ≤ x ≤ 9) = x=0 b (x : 15, 0.4) − x=0 b (x : 15, 0.4)
P(7 ≤ x ≤ 9) = 0.9662 − 0.6098 = 0.3564
Figure 3.1
We normally write X ∽ b(x; k, P) to mean that the random variable X follows a Negative
binomial distribution with parameters k = number of successes and P =probability of
success on each trial.
Figure 3.2
(a) The 2th person will be the first to catch the disease
3.4.2 Remark
In the negative binomial distribution if k = 1, the resulting distribution is that of the
Geometric distribution with a parameter p whose pmf is
x − 1
P1 qx−1 ; x = 01, 2, 3.....n
P(X = x) = b(x; 1, P) =
1−1 (3.55)
0;
else where
Pqx−1 ; x = 01, 2, 3.....n
P(X = x) =
(3.56)
0;
else where
In some situations where the binomial applies, we may be interested in the number of
trials on which the first success will occur. For this to happen the xth trial must be preceded
by x − 1 trials of failures. The probability that the first success will occur on thexth trial is
the Geometric distribution defined above.
Example 2
If the probability is 0.6 that a person exposed to a certain contagious disease will catch it.
Find the probability that The 2nd person will be the first to catch the disease
Solution
This is a negative binomial process, where X is the random variable, the number of trials
at which the The kth person catches the disease. P = 0.6, k = 1, x = 2, (x − k) = (2 − 1) = 1
!
x − 1 k x−k
P(X = x) = pq
k−1
!
1
P(X = 4) = 0.61 × 0.41 = 0.24
0
Hence, α
X x−1 1 1 1
E(X) = p xq = p × 2
=p× 2 = (3.63)
x=1
(1 − q) p p
And
Var(X) = E(x2 ) + (E(x))2 (3.64)
!2
1
Var(X) = E(x ) +
2
(3.65)
p
Using the identity
1
E(x(x − 1)) = E(x2 ) − E(x) = E(x2 ) − (3.66)
p
1
E(x2 ) = E(x(x − 1)) + (3.67)
p
!2
1 1
Var(X) = E(x(x − 1)) + + (3.68)
p p
Since
n
1 X
2
= xqx−1 = 1 + 2q1 + 3q2 + ........ (3.69)
(1 − q) n=0
α n
! X
d 1 2 X
= x−1 = = x(x − 1)qx−2
2
xq 3
(3.70)
dq (1 − q)
x=0
(1 − q) n=0
So
α
X α
X
E(x(x − 1)) = x(x − 1)pq x−1
= pq x(x − 1)qx−2 (3.71)
x=1 x=1
2
E(x(x − 1)) = pq × (3.72)
(1 − q)3
2pq
E(x(x − 1)) = (3.73)
p3
2q 1 1 q
Var(X) = + + = (3.74)
p2 p p2 p2
Generally if we adapt the nomenclature success and failure to describe the two groups/categories,
we denote the number of successes by k and thus the number of failures is (N-k). Suppose
we are interested in the probability of getting x successes in n trials (sample size) after se-
!
k
lecting n of N elements without replacement from the population, then there are ways
x
!
N−k
of selecting x of the k successes and ways to select (n-x) of the (N-k) failures.
n−x
Figure 3.4
! !
k N−k
Therefore, there are ( ways to select x successes and n-k failures from a group
x n−x
of N elements, taking a sample of n at a time; of which k of the N are successes and the
rest N-k are failures.
We write X ∽ h(x : n, N, k) to mean that the random variable x follows a hyper geo-
metric distribution with parameters n, N and k. However, if the sampling is done with
replacement, then the x successes in n trials follow a binomial distribution.
Solution
(a) 2 of them will be females This is a hyper geometric process where X is the number of
female committee members selected for training assuming values x=0,1,2,3..... Given:
N=8, k=3, n=5, x=2
Required is P(x = 2)
! ! ! !
k N−k 3 5
x n−x 2 3
P(X = x) = ! = ! = (3.78)
N 8
n 5
If this is so, the we write X ∽ P(x : λ) to mean that the random variable X follows a
Poisson distribution with a parameter λ ,the average number of successes occurring in a
given time interval or a specified region and e = 2.71828
(1) The Poisson process applies in situations where we expect a fixed number of successes
or counts per unit of time or space or some other kind of unit.
λ0 λ1 λ2 λ3 λ4
" #
E(X) = λe −λ
+ + + + + ....... (3.86)
0! 1! 2! 3! 4!
λ1 λ2 λ3 λ4
" #
E(X) = λe 1 +
−λ
+ + + + ....... (3.87)
1! 2! 3! 4!
E(X) = λe−λ eλ
h i
(3.88)
E(X) = λ (3.89)
The variance,
Var(X) = E(X − E(X))2 = E(x2 ) − (E(x)2 ) (3.90)
Var(X) = E(X − E(X))2 = E(x2 ) − λ2 (3.91)
But
α
X
Var(X) = x2 P(X = x) (3.92)
x=0
α
X λx e−λ
Var(X) = x2 (3.93)
x=0
x!
α
X λ × λx−1 e−λ
Var(X) = x (3.94)
x=0
(x − 1)!
Let y = x − 1
α
X λ × λ y e−λ
Var(X) = (y + 1) (3.95)
y=0
y!
α α
X λ × λ y e−λ X λ × λ y e−λ
Var(X) = y + (3.96)
y=0
y! y=0
y!
α α
X λ y e−λ X λ y e−λ
Var(X) = λ y +λ (3.97)
y=0
y! y=0
y!
λx e−λ
P(x; λ) = (3.118)
x!
is a pmf
λx e−λ
P(X = r) = P(x; λ) = (3.119)
x!
α
X λx e−λ
P(X ≤ r) = 1 − P(X > r) = 1 − (3.123)
x=r+1
x!
P(a1 < x < a2 ) = P(a1 < x) + P(x < a2 ) = P(x > a1 ) + −P(X < a2 ) (3.125)
53 e−5
P(X = 3) = P(3; 5) = = 0.1404 (3.128)
3!
(ii) Less than 2 bouncing cheques
α r−1
X X λx e−λ
P(X < r) = P(x; λ) = 1 − P(X ≤ r − 1) = 1 − (3.129)
x=0 x=0
x!
2−1 2−1 x −5
X X 5e
P(X < 2) = P(x; 5) = 1 − P(X ≤ 1) = 1 − (3.130)
x=0 x=0
x!
1
X 5x e−5
P(X < 2) = 1 − (3.131)
x=0
x!
P(X < 2) = 1 − 0.9596 = 0.0404 (3.132)
(iii) One or more bouncing cheque
α r−1
X X λx e−λ
P(X ≥ r) = P(x; λ) = 1 − P(X < r) = 1 − P(X ≤ r − 1) = 1 − (3.133)
x=0 x=0
x!
α
X 50 e−5
P(X ≥ 1) = P(x; λ) = 1 − P(X < 1) = 1 − P(X ≤ 0) = 1 − = 1 − e−5 (3.134)
x=0
0!
3
X
P(X ≤ r) = P(0; 5) + P(1; 5) + ..... + P(3; 5) = P(x; 5) = 0.1512 (3.137)
x=0
becomes
n λ x λ n−x
!
b(x; n, p) = 1− q (3.139)
x n n
λ λ n−x
x
n!
b(x; n, p) = 1− (3.140)
(n − x)!x! n n
n(n − 1)(n − 2)........(n − (n − 1))(n − x)! λ x λ n−x
b(x; n, p) = 1− (3.141)
(n − x)!x! n n
n(n − 1)(n − 2)........(n − (n − 1)) λ λ n−x
x
b(x; n, p) = 1− (3.142)
x! nx n
n(n − 1)(n − 2)........(n − (n − 1)) λ n−x
b(x; n, p) = ×λ 1− x
q (3.143)
n.n.n.n.......n.x! n
λ n−x
" #
n n−1 n−2 n − (x − 1) 1
b(x; n, p) = × × × ...... ×λ 1− x
(3.144)
n n n n x! n
1 2 x−1 1 λ n λ −x
b(x; n, p) = 1 × 1 − × 1− × ...... × 1 − ×λ 1− x
× 1−
n n n x! n n
(3.145)
If n → α, keeping x and λ are constant, then;
x −λ
λ e
x! ; x = 0, 1, 2, .....α
P(λ) =
(3.146)
0;
elsewhere
Example 1
Suppose on the average, one person in every 1000, is an alcoholic, find the probability
that a random sample of 8000 people will yield fewer than 7 alcoholics.
Solution
n = 8000, P = 1000
1
= 0.001, x < 7
This is a binomial process where the random variable X is the number of alcoholics.
But since n = 8000 is large and P → 0, use a Poisson approximation such that
λ = np = 8000 × 0.001 = 8
7
X
P(X < 7) = P(0; 8) + P(1; 8) + P(2; 8) + ..... + P(6; 8) = P(x; 8) = 0.3134 (3.148)
0
Or
8000
X
P(x < 7) = 1 − P(x ≥ 7) = 1 − P(x; 8) = 1 − 0.6866 = 0.3134 (3.149)
7
Example 2
If the probability is 0.002 that any police officer attending a parade on a hot day will
suffer from heat exhaustion. What is the probability that 20 of the 5000 police officers
attending the parade will suffer from the heat exhaustion.
Solution
This is a pure binomial process, where the R.V X=number of police officers who suffer
Since n = 5000 is large and P = 0.002 is small, and λ = np = 5000 × 0.002 = 10 > 5 is
of moderate size and constant, we could also use the Poisson approximation.
(g) In a situation where the Poisson applies, if success occur at a near rate of α per unit
of time, or per unit region, then the average number of successes λ in an interval of
t units of time or t units of the specified region is also a Poisson process with mean
λ = αt. i.e
(αt)x e−αt
P(x; λ = αt) = (3.152)
x!
Example
A certain type of carpet has on average 2 defects per one square meter. What is the
probability that 3 square meter carpet of this type will have 3 or more defects?
Solution
This is a Poisson process where X is the number of defects per 3m2 . Since the mean
number of defects per 1m2 is α = 2, then the mean number of defects per 3m2 isλ =
αt = 2 × 3 = 6 defects. Required to find (P(X ≥ 3)
(iii) f (x)≥ 0
Rb
(iv) x=a f (x)dx = 1
Rb
(v) Since P(X = x) = 0, then, P(a < x < b = P(a ≤ x < b) = P(a < x ≤ b) = P(a ≤ x ≤ b) = x=a
f (x)dx
Conditions (iii) and (iv) are necessary and sufficient for f (x) to be a pdf.
Examples
1 Given a pdf
ke−3x ; x > 0
f (x) =
0;
Else where
Find k
Solution
Rb
f (x)dx = 1
Rx=a
∞
x=0
ke−3x dx = 1
63
R∞ h −3x i∞
k limt→∞ x=0 e−3x dx = limt→∞ e−3 =1
0
This implies that k = 3 and therefore
3e−3x ; x > 0
f (x) =
0;
Else where
2 Given
kx(x − 2); 0 < x < 2
f (x) =
0;
Else where
Find k and hence evaluate P(x < 1.5)
Solution
R2
0
kx(x − 2) = 1 solving shows that k= 43
Therefore
3
− 4 x(x − 2);
0<x<2
f (x) =
0;
Else where
R 1.5
Hence P(x < 1.5) = P(0 < x < 1.5) = − 0
3
4
x(x − 2)dx = 27
32
Therefore
1 − e−3x ;
x>0
f (x) =
0;
Else where
And
P(0.5 ≤ x ≤ 1) = F(0.5) − F(1) = (1 − e−3×1 ) − (1 − e−3×0.5 ) = 0.173
(3) The mode (most likely value) of the function is such that f ′ (x) = 0 and f ”(x) < 0
Example
− 4 x(x − 2); 0 < x < 2
3
f (x) =
0;
Else where
3x 6
f (x) = − + =0
2 4
Implying that x = 1
And
3
f ”(x) = − < 0
2
Rm
(4) The median of X is defined by −∞
f (x)dx = 0.5 where m is the median Example
2x; 0 < x < 2
f (x) =
0; Else where
Z m
f (x)dx = 0.5 = x2 |m
0
−∞
1
m= √ (4.2)
2
β β
β−α
Z Z
1 1 1 β
dx = dx = [x]α = =1 (4.4)
α β−α β−α α β−α β−α
Whose graph is
Or
1 xi −µ 2
√ 1 e− 2 ( δ ) ; −∞ ≤ x ≤ ∞ , −∞ ≤ µ ≤ ∞, and δ > 0
N(xi ; µ, δ2 ) =
2πδ2 (4.17)
0; elsewhere
Note
(2) The normal distribution is fully specified by π and δ2 . If you know these two pa-
rameters, then you know everything about the Normal distribution. And almost all
natural phenomena can be approximated by the normal distribution.
• ∀x, f (x) ≥ 0
1 xi −µ 2
R∞
• −∞ δ √12π e− 2 ( δ ) = 1
Changing the variables of this integral from y to x to the polar co-ordinates r and θ,
by letting x = r cos θ, y = r sin θ, then x2 + y2 = r2 ,where dxdy = rdrdθ.
Z ∞ Z ∞ Z 2π Z ∞ !
− 21 (x2 +y2 ) − 21 r2
I =
2
e dxdy = e rdr dθ (4.24)
−∞ −∞ 0 0
Z ∞ Z ∞ Z 2π
1 2 ∞
− 12 (x2 +y2 )
h i
I =
2
e dxdy = −e− 2 r dθ (4.25)
0
−∞ −∞ 0
Z ∞ Z ∞ Z 2π i∞ Z 2π
− 12 (x2 +y2 )
h
− 12 r2
I =
2
e dxdy = −e dθ = dθ = [θ]2π
0 = 2π (4.26)
0
−∞ −∞ 0 0
Therefore √
I= 2π (4.27)
(5) The curve has the maximum value when x = µ, implying that f (x) = √1
δ 2π
P
x
(6) The curve is symmetric about the vertical axis through the mean x = µ = N
Let y = x − µ, then dy = dx
Z ∞
1 y 2
ye− 2 ( δ ) dy + µ
1
E(x) = √ (4.39)
δ 2π −∞
∞
δ2 δ2
Z y2 ∞
1 1 y 2
E(x) = √ ye− 2 ( δ ) dy + µ = − √ e− 2δ2 + µ = √ [0 − 0] + µ = µ (4.40)
δ 2π −∞ δ 2π −∞ δ 2π
Therefore
E(x) = µ (4.41)
The variance
Var(x) = E(x2 ) − [E(x)]2 (4.42)
Var(x) = E[x2 ) − E(x)]2 (4.43)
Z ∞
1 1 xi −µ 2
Var(x) = (x − µ)2 √ e− 2 ( δ ) dx (4.44)
−∞ δ 2π
x−µ
Let y = δ
,implying x = µ + δy, it can be shown that dx = δdy
Z ∞
1 1 2
Var(x) = (µ + δy − µ)2 √ e− 2 y dx (4.45)
−∞ δ 2π
Which simplifies to
∞
δ2
Z
1 2
Var(x) = √ y2 e− 2 y dy (4.46)
2π −∞
Integrating by parts
Let u = y, du
dx
=1
(11) If X ∽ N(µ, δ2 ), X can be transformed into a new random variable Z with mean
µ = 0 and variance δ2 = 1. Z is called the standard or normalized random variable
x −E(x ) x −µ
Z = i δx i = i δ , E(Z) = 0 and Var(Z) = 1 implying that Z ∽ N(0, 1)
i
φ(Z ≤ z) gives the area under the curve at most z and the inverse gives z φ−1 (Z ≤ z) =
invnormal(p) = z. 1 − φ(Z ≤ z) gives the area under the curve above z
(15) Most statistical tables give values of φ(Z ≤ z) for different values of z. In most
problems that call for the assumption that a random variable X ∽ N(µ, δ2 ), we usually
end up requiring the integral calculus evaluation of
Z x2
1 1 xi −µ 2
P(x1 ≤ X ≤ x2 ) = √ e− 2 ( δ ) dx (4.55)
δ 2π x1
and this is tedious! Therefore, it is consoling to note that for any random variable
X ∽ N(µ, δ2 ) x − µ x − µ x − µ
1 2
P(x1 ≤ X ≤ x2 ) = P ≤ ≤ (4.56)
δ δ δ
x − µ x2 − µ
1
P(x1 ≤ X ≤ x2 ) = P ≤Z≤ (4.57)
δ δ
Z z2
1 1 2
P(x1 ≤ X ≤ x2 ) = P (z1 ≤ Z ≤ z2 ) = √ e− 2 z dz (4.58)
z1 2π
P(x1 ≤ X ≤ x2 ) = P (Z ≤ z2 ) − P (Z ≤ z1 ) (4.59)
Values are read from the standard tables. Example
Example 1 Given that X ∽ N(50, 100), evaluate;
Solution
(16) Normal approximation to the binomial When the number of trials n is large n → α
and the probability of success P is close to 0.5, the Normal
p distribution is used to
√
approximate the Binomial process and µ = nP, δ = npq = np(1 − p),
x−E(x) x−np x−np
Z= √ = √npq = √
Var(x) np(1−p)
To approximate the Binomial random variable using the Normal distribution, we
do continuity corrections by adding or subtracting the correction term 0.5 as in the
following the mnemonics below. This is done with an error.
(a < x < b) e.g. (5 < x < 8) = (5.5 < x < 7.5) (4.61)
(a < x ≤ b) e.g. (5 < x ≤ 8) = (5.5 < x < 8.5) (4.62)
(a ≤ x < b) e.g. (4.5 < x < 7.5) = (5.5 < x < 8.5) (4.63)
(x < b) e.g. (x < 8) = (x < 7.5) (4.64)
(x > a) e.g. (x > 8) = (x > 8.5) (4.65)
(17) Normal approximation to the Poisson distribution And when the number of trials n
is large n → ∞ and λ > 20, the Normal
√ distribution can also be used to approximate
x−E(x) x−np
the Poisson process and µ = λ,δ = λ such that Z = √ = x−λ
√ = √
λ
Var(x) np(1−p)
Respect the continuity corrections above.
βα xα−1 e−βx
∞
Z
dx = 1 (4.72)
0 Γ(α)
Z ∞ α α−1 −βx
β x e
Γ(α) = dx = 1 (4.73)
0 Γ(α)
The equation Z ∞
Γ(α) = βα xα−1 e−βx dx (4.74)
0
When β = 1 Z ∞ Z ∞
α−1 −x
Γ(α) = x e dx = tα−1 e−t dt (4.75)
0 0
Hence α−1 −t
t e
Γ(α) ;
0<t<∞
f (t) =
(4.79)
0; elsewhere
And G(x; α, β) can serve as a pdf and If X ∽ G(x; α, β) then E(x) = αβ, Var(x) = β2 , Also
letting t = βx , dt = β1 dx
Z ∞ x α−1 e− βx
Z ∞ −t α−1
e t β 1
dt = dx (4.84)
0 Γ(α) 0 Γ(α) β
Z ∞ xα−1 1 α−1 e− βx
Z ∞ −t α−1
e t β 1
dt = dx (4.85)
0 Γ(α) 0 Γ(α) β
Z ∞ −t α−1 Z ∞ α−1 − x
e t x e β 1
dt = dx (4.86)
0 Γ(α) 0 Γ(α) βα