Chapter 2-Probability Distribution
Chapter 2-Probability Distribution
Distribution
Topic outline
Reading Text:
Jakob Bernouli, Swiss mathematician, published article on the binomial theorem. The
binomial probability distribution is one of the most widely used discrete probability
distributions. It has many applications. It is used to find the probability that an outcome
will occur x times in n experiments (trials).
; Q=1-P
Note:
x is the random variable defined as the number of successes
n-x is the number of failures
C denotes combination
n is the number of trials or sample size
P is the probability of success on each trial; 1-P or Q is the probability of failure
n & P are the two binomial parameters that are required to find probability of x
successes in n trials (random sample size) for a binomial experiment. Probability Tables
can also be used for limited number of n and P to find binomial probabilities.
Example 9
1. Construct the binomial probability distribution for number of heads for the
binomial experiment of tossing 3 coins.
Given: Success=number of heads, n=3, P=0.5, Q=0.5;
The probability of exactly 0 (no) head in tossing 3 coins, P(x=0)=?
= = 1*1*0.125=0.125.
Likewise, =3*0.5*0.25=0.375
= 3*0.25*1=0.375
= 1*0.125*1=0.125
Note: For example 3C2=3, gives the frequency or possibilities or number of ways or
arrangements of getting 2 heads in 3 tosses i.e. THH, HTH, HHT.
9.1 The probability distribution for the binomial experiment of tossing 3
coins for number of heads
Xi (# of Heads) 0 1 2 3 Total
P(Xi) 0.125 0.375 0.375 0.125 ∑P(Xi)=1
2. The probability that a randomly chosen sales prospect will make a purchase is
0.20. If a sales representative calls on fifteen prospects, find the probability that:
a. Fewer than two sales are made
Given: n=15, P=0.2, Q=0.8, P(x<2)=?
P(x<2)=P(x=0)+P(x=1)=[15C0*0.20*0.815-0+15C1*0.21*0.815-1]=
=0.0352+0.1319= 0.1671
b. At least two sales are made
P(x≥2)=P(x=2,3,…15)=1-P(x<2)=1-[P(x=0)+P(x=1)]=1-0.1671=0.8329
(refer to a)
Exercise 9
1. A manufacturer of window frames knows from long experience that 5% of the
production will have some type of minor defect that will require an adjustment.
What is the probability that in a sample of 20 window frames:
a. None will need adjustment?
b. At least one will need adjustment?
c. More than two will need adjustment?
2. In a recent study, 90% of the homes in Bole area were found to have large-
screen TVs. In a sample of nine homes, what is the probability that:
a. All (exactly) nine have large-screen TVs?
b. Less than five have large-screen TVs?
c. More than five have large-screen TVs?
d. Develop a binomial probability distribution for the problem
Synopsis
A binomial probability distribution or experiment:
o It has n trials that are similar, independent and random
o Probability of success (P) remain the same for series of trials
o Random sampling is with replacement form large population
o Binomial parameters are n & P; then P(x)= probability of x successes in n
trials or sample size:
o ; Q=1-P
Compute the expected value (mean), variance and standard deviation for the
binomial probability distribution
Topic outline
Reading Text:
The mean (μ), the variance (σ2), and the standard deviation (σ) of a binomial
distribution are computed in a “shortcut” fashion as follows respectively:
; ;
Example 10
1. Find the mean, variance and standard deviation for the binomial probability
distribution in Example 7 and 8 of Session 7 and 8: n=3, P=0.5, Q=0.5
E(X)=µ=3*0.5=1.5 heads; σ2=3*0.5*0.5=0.75 heads; σ= =0.87 heads
2. The probability that a randomly chosen sales prospect will make a purchase is
0.20. If a sales representative calls on fifteen prospects, find the expected
number of sales (as a long-run average), the variance, and the standard
deviation associated with making calls on 15 prospects.
= 15*0.2=3 sales
= 15*0.2*0.8=2.4 sales
= =1.55 sales
Exercise 10
1. A university has found that 20% of its students withdraw without completing the
first semester course. Assume that 6 students have registered for the course this
semester.
a. What is the probability that two or fewer will withdraw?
b. What is the probability that exactly four will withdraw?
c. What is the expected number of withdrawals?
d. What is the variance for the number of withdrawals?
e. What is the standard deviation for the number of withdrawals
2. If 5% of new automobiles require warranty service within the first year and X
sells 8 automobiles in a month
a. What is the probability that none of these automobiles requires warranty
service?
b. Find the probability that exactly two automobiles require warranty service?
c. Compute the mean and standard deviation of this probability distribution.
Synopsis
o ; ;
Wrap up Discussion Questions:
Topic outline
Reading Text:
When sampling is done without replacement of each sampled item taken from a finite
population of items, the binomial process does not apply because there is a systematic
change in the probability of success as items are removed from the population.
Hence, the hypergeometric probability distribution is instead used.
For example, in choosing a random sample of 2 men for committee, from population of
10 men, without replacement; each man has a chance of 1/10 to be selected the first
round, & has 1/9 chance to be selected the second time, note that trials are dependent.
The formula for the hypergeomeric distribution is:
P(x)= SCx*(N-S)C(n-x)
NCn
N= size of population
n= size of sample
S= # of successes in N or population;
N-S= # of failure in N or population;
X=# of successes in sample size of n;
n-X= # of failure in n sample size
Note: Thus choosing successes (X from S is SCx ); choosing failures (n-X from N-S is N-
SCn-x), in choosing both at the same time we multiply both terms; and totally out of N
population size, n sample size can be chosen randomly and without replacement in NCn
ways.
s s s N n
Mean:µ= E ( X ) n and Variance x n 1
2
N N N N 1
Note: as the population is finite, the variance is multiplied by the finite population
N n
correction factor .
N 1
Example 11
Note that sampling is done without replacement and the sample size of 5 or 10% of the
population. This is greater than the 5% requirement. Given: N=50; n= 5; S= 10;
2. If 180 of the 200 shipping orders that the inspector examined have authorized
credit approval. What is the mean and the variance of the number in a sample of
40 randomly chosen orders will have credit approval?
Given: n=40, N=200, S=180
s
Mean:µ= E ( X ) n = Mean=E(X) = 40(180/200) = 36; and
N
s s N n
Variance x n 1
2
N N N 1
Variance=4(180/200)(20/200) (160/199) = 2.8945
Exercise 11
Synopsis
The hypergeometric probability distribution is useful for determining the
probability of a number of occurrences when sampling without replacement from
finite population (n/N>5%).
It counts the number of successes (x) in n selections, without replacement, from
a population of N elements, S of which are successes and (N-S) of which are
failures
P(x)= SCx*(N-S)C(n-x)
NCn
s s s N n
Mean=µ= E ( X ) n and Variance x n 1
2
N N N N 1
What are the conditions that must be satisfied to apply the hypergeometric
probability distribution?
Compare and contrast the hypergeometric and binomial probability distribution.
How do we find probabilities in hypergeometric experiment?
How do we compute the mean and variance of hypergeometric distribution?
What must be n/N, if the population has to be small/finite enough, or sample is
large enough?
Topic outline
This distribution is named after French mathematician Simeon Denise Poisson (1781-
1840). It is a discrete distribution. It expresses the probability of a number of
events/occurrences (successes) occurring in a fixed interval of time, space (area,
volume, distance) etc. For example, the random variable of interest might be the
number of arrivals at a foreign exchange service desk at a bank in one hour, the
number of accidents in 100 km of a highway, or the number of Walia Ibexes observed
in 2 hectares of the Semen National Park.
Properties (Assumptions) of Poisson Probability Distribution (Experiment)
The probability of an occurrence (success) is the same for any two intervals
of equal length and remains proportional.
The number of occurrences (successes) in any interval is random, and
independent of the number of occurrences (successes) in any other interval.
The Poisson parameter is the mean number of occurrences (successes) i.e.
denoted by µ or λ remains constant (for similar intervals)
computed as follows:
Poisson Probabilities for any x can easily be obtained from tables given the µ or λ
(Note: table is available only for limited mean)
Example 12
= =2*0.1353=0.2707
b. There will be three arrivals during any five minutes period (interval)
As the interval of the mean is changed from 1 minute to five minutes, adjust the
mean; it will be µ=10
Exercise 12
computed as follows:
What are the conditions that must be satisfied to apply the Poisson probability
distribution?
Identify the Poisson parameter
How do we find Poisson probabilities?
Next Session’s Assignment:
Attempt exercise 12, #1, 2, and 3
Read about the Poisson approximation of a binomial probability distribution.
Topic 13: Discrete Probability Distribution: Poisson Approximation
of the Binomial Probability Distribution
Topic outline
What are the conditions that must be satisfied to solve binomial probability
problems using Poisson probability distribution?
How are variance and standard deviation calculated for Poisson distribution?
Reading Text:
A hotel has 50 rooms if 2 % of the hotel rooms are not occupied on a given day what is
the probability that less than 2 rooms will not be occupied?: (Apply Poisson
approximation of the Binomial distribution and use table)
As n≥20 and P≤5% is satisfied the binomial probability problem can be solved using
Poisson probability table by finding the mean (µ); µ=n*P=50*0.02=1.
For the Poisson probability distribution, the mean and variance are equal to µ and the
standard deviation is equal to the square root of µ. i.e.
σ2 = µ
σ=
µ
Exercise 13
Note: Find the following binomial probabilities using Poisson (if possible)
Topic outline
Reading Text:
Unlike discrete random variables, a continuous random variable assumes any value in
one or more intervals. The possible values that a continuous random variable can
assume are infinite and uncountable. Example: the variable representing the time taken
by the worker to finish the work, the time taken to complete an examination, amount of
milk in a pot, prices of houses, the flight time of an air plane from Gondar to Addis
Ababa, the weight of a person, the life of a battery, etc. are examples of continuous
random variables. Because the number of values contained in any interval is infinite,
the possible number of values that a continuous random variable can assume is also
infinite. Moreover we cannot count these values.
Because there are an infinite number of possible fractional measurements, one cannot
list every possible value with a corresponding probability. Instead, a probability density
function is defined. This mathematical expression gives the function of x, represented
by the symbol f(x), for any designated value of the random variable x. The plot for such
a function is called a probability curve, and the area between any two points under the
curve indicates the probability of a value between these two points occurring by
chance.
Synopsis
A continuous random variable assumes any value in one or more intervals. The
possible values that a continuous random variable can assume are infinite and
uncountable.
The area between any two points under the probability curve indicates the
probability of a value between these two points occurring by chance.
Probability tables of several standard continuous probability distributions are
used to find the probability of the range of values of a continuous random
variable
There are many types of continuous probability distribution that are identified by
different names and properties
Normal probability distribution is the most widely used of all
Topic outline
Reading Text:
The normal probability distribution is the most commonly used of all probability
distributions; it is important in statistical inference for three distinct reasons:
1. The measurements obtained in many random processes (naturally and in
industry) are known to follow this distribution. Some examples are the IQs,
weights, and heights of a large number of people and the variations in
dimensions of a large number of parts produced by a machine.
2. Normal probabilities can often be used to approximate other probability
distributions, such as the binomial and Poisson distributions.
3. Distributions of such statistics as the sample mean and sample proportion are
normally distributed when the sample size is large, regardless of the distribution
of the parent population
The characteristics of the normal probability distribution
-∞ +∞
Synopsis
The normal distribution is the most commonly used of all probability distributions
in statistical analysis.
It is bell shaped, symmetrical and moderately peaked; and asymptotic
There is a family of unlimited normal distribution, each defined by two
parameters: its mean (µ) and standard deviation (σ)
What are the parameters that define normal distribution? How do they uniquely
define a normal distribution?
List and describe the major features of a normal probability distribution
Why is the normal distribution the most widely used of all probability distribution
in statistical analysis?
Read about standard normal (z) distribution; and how other normal distribution
is converted to this form.
Topic outline
Briefly explain the characteristics of the standard distribution and its use
Reading Text:
Any value x from a normally distributed population can be converted into the equivalent
A z value restates the original value x in terms of the number of units of the standard
deviation by which the original value differs from the mean of the distribution. A
negative value of z would indicate that the original value x was below the value of the
mean, and a positive value of z would indicate the original value x was above its mean.
The standard normal (z) distribution (table) indicates proportions of area for various
intervals of values for the standard normal probability distribution, with the lower
boundary of the interval always being at the mean. Converting designated values of the
variable x into standard normal values allows finding normal probabilities using the z
table easily, and it makes it unnecessary to use the complex method of integration with
respect to the equation for the density function shown in previous section.
Using Tables:
Divide the number and decimals of the z values as follows: The first digit before the
decimal point and the first decimal point are taken and checked for at the left most
column, and the second and third components of the decimal digits at the top most row
and the intersection in the body of the table will be the corresponding area (probability)
for the +z value measured starting from 0.
Example 16
1. Find the area (probability) under the standard normal curve by using the z table:
Note: For each of the questions below, locate the area under the following curve
(Fig. 16.1) for each of the following ranges of z values
µ=
Figure 16.1 The Standard Normal Distribution
used:
Exercise 16
Synopsis
Explain the use and features of the standard normal (z) distribution
Topic outline
Explain how normal probabilities are calculated using the z distribution for
any interval of values of normal random variable
Reading Text:
The standard normal distribution is very useful for determining probabilities for any
normally distributed random variable. The basic procedure is to find the z value for a
particular value of the random variable based on the mean and standard deviation of its
distribution.
The z value can be found using the conversion formula:
. Then, using the z value, we can use the standard normal distribution to find
P (100<x<130)=?
Convert x to z; = and
P (-1.00<z<0.50) = 0.3413+0.1915=0.5328
Note: From the z table, the area under question in the curve below can be found as
follows: the area between 0 and -1=0.3413, this is the same as the area between 0 and
1; and the area between 0 and 0.50=0.1915;
P(140<x<150)=P(1.00<z<1.50)=0.4332-0.3413=0.0919
Note: the values are already converted to the z form above, to find the area
under question in the curve below, we find the difference between (the area
between 0 and 1.50, i.e. 0.4332), and (area between 0 and 1.00, i.e. 0.3413)
e. If there are 50 examinees, how many will have IQ score below 100 approximately?
P(x<100)=P(z<-1)= 0.5-0.3413=0.1587
0.1587 x 50≈ 8 examinees
Note: the area between 0 and 1 is 0.3413, and the left half of the curve is 0.5;
thus the total shaded area is given by the sum of 0.3413 and 0.5. To find the
number of examinees having IQ below 100, we multiplied the probability by the
size of the examinees.
Note: as 120 is the mean, its value in the z form is 0; and z=(140-120)/20=4; the area
designated by the range stated above covers the lion share of the right half of the
normal curve, the answer is almost half; we cannot say 0.5 as the indicated area goes
as far as z value of only 4.00, it is known that the z value extend infinitely on both
sides.
g. Practically almost all of the IQ score (99.72%) is between which two scores?
µ±3σ=120±3*20=120±60=60-180
Exercise 17
2. What Service life of truck tyres for heavy-duty trucks follows the normal
distribution with mean 50000 km and standard deviation 5000 km;
a. Find the probability that a tyre will last:
i. between 47,000 km and 60000 km
ii. at least 48500 km
iii. more than 48,000 km
iv. at most 60,000 km
v. above 70,000 km
b. What are the median and modal service life kilometers of tyres?
Probability for any range of values of normal random variable can be found from the
standard normal (z) distribution table, convert values using the following formula
How is probability found for any range of values of normal random variable from
the z distribution (table)?
Topic outline
How should the normal probability be used to find unknown value of x given
probability (area)?
Reading Text:
So far, we have been finding probability for a given range of value (s) of normal
random variable, by converting or transforming x to z; conversely, in this section, the
inverse transformation of z to x, given probability, will be discussed. A further
application of the normal distribution involves finding the value of the observation x
when the area or probability (percent) above or below the observation is given. To find
the unknown x (value of normal random variable) follow the steps below:
Draw pictures of the normal distribution in question and of the standard normal
distribution.
Shade the area corresponding to the desired probability.
From the table of the standard normal distribution, find the z value or values
corresponding to the area or probability.
Use the inverse transformation from z to x to get value(s) of the original random
variable as follows:
Example 18
In Wollega University, an analysis of the final test scores for Managerial Statistics
reveals the scores follow the normal probability distribution. The mean of the
distribution is 75 and the standard deviation is 8.
Required:
a. If the Instructor wants to award an A to students whose score is in the highest
10%. Students will get A above which score?
b. Find the limits of the central/average 50% of scores obtained by the students.
c. If students with the least 2% of the score will be given F, below which score is F
assigned?
d. 70% of the score is below which point?
e. 95.44% of the score is between which two points?
Solution: Given: µ=75, σ=8;
a. Area (probability)=10%=0.1; To get the corresponding z value from the table:
0.5-0.1=0.4000; z0.4000=1.28; pictorial representation of the problem is given
below:
Z=
X=?
X=µ+zσ=75+1.28*8=75+10.24=85.24
b. Area (probability)=0.5 at the center; z0.25=±0.67
X=µ+zσ=75±0.67*8=75±5.36=69.64-80.36
c. Area (probability)=0.02; 0.5-0.02=0.4800; z0.4800=-2.05
0.5-0.02=0.4800
0.02
z=-2.05 0
X=µ+zσ=75-2.05*8=75-16.4=58.6
d. Area (probability)=0.7 (0.5+0.2); z0.2000=0.52
X=µ+zσ=75+0.52*8=75+4.16=79.16
e. For a normal distribution 95.44% of the distribution is within µ±2σ; thus,
75±2*8=75±16=the scores are between 59 and 91
Exercise 18
1. The average age of a person married for the first time is 26 years. Assume the
ages of the first marriages have a normal distribution with a standard deviation
of 4 years.
a. Before what age do 90% of people get married for the first time?
b. If the 20% of people getting married for the first time very late are given
a counseling service by a certain NGO what is the least age that deserves
the service?
c. Find the limits of ages that the central/average 50% of people getting
married for the first time has.
d. 5% of people making marriage at the earliest age, for the first time, are
found below which age?
e. Practically almost all people getting married for the first time do so
between which two ages?
2. Individuals adopt new products or innovation or idea at different time and rate.
People can be classified into 5 adopter categories as shown in Figure 18.1. As
shown by the normal curve, after a slow start, an increasing number of people
adopt the new product. The number of adopters reaches a peak and then drops
off as fewer non-adopters remain. As successive groups of consumers adopt the
innovation (the rising curve), it eventually reaches its saturation level. Innovators
are defined as the first 2.5 percent of buyers to adopt a new idea; the early
adopters are the next 13.5 percent; and so forth. Suppose the mean adoption
time for a new product is 1800 days with a standard deviation of 200 days,
answer the following based on Figure 18.1.
a. Innovators will adopt the new product as late as how many days?
b. Laggards will adopt the new product after how many days?
c. The middle majority (early majority and late majority together) are
expected to adopt the new product within upper and lower limits of how
many days?
d. Early adopters in the population are expected to start adopting the new
product between which two days?
e. Indicate how many standard deviations are each group’s adoption time is
from the mean adoption time?
f. 68.26% of the population adopt the new product between which two
days?
A
B C D E
A. B. C. D. E.
Adoption time
Explain how the inverse transformation is used to find unknown value of random
variable (x) given probability.
Find unknown mean and standard deviation for a normal distribution given
two arbitrary values and corresponding area under the normal curve
Topic outline
Sometimes the mean and the standard deviation of a normal distribution may not be
given. In such situations the given probability of the two values of normal random
variables (x1 and x2) are used to compute the mean and the standard deviation. From
the given probabilities find the corresponding z values and formulate the following pair
of equations and solve them simultaneously and find the unknown mean and standard
deviation:
N.B.: the mean and standard deviation are the same for the pair of equations because
the values and probabilities belong to the same normal distribution.
Example 19
The heights of soldiers are normally distributed. If 11.51% of the soldiers are taller than
70.4 inches and 9.68% are shorter than 65.4 inches, find the mean & standard
deviation for the data of heights of soldiers.
From area (probability) of 0.1151 on the right extreme of a normal curve, 0.5-
0.1151=0.3849; Z0.3849=1.20;
From area (probability) of 0.0968 on the left extreme of a normal curve, 0.5-
0.0968=0.4032, Z0.4032=-1.30
Exercise 19
1. Construction time for a certain building is normally distributed with mean &
unknown variance; we do know that 75% of the time construction takes within a
maximum of 12 month. 45% of the time construction takes within at most 10
month. Find the mean & Standard deviation
2. The annual sales are normally distributed with unknown mean and unknown
standard deviation. 40% of the time sales are more than 480,000 and 10% of
the time sales are more than 500,000. What are the mean and the standard
deviation?
Synopsis
Unknown mean and standard deviation can be found for a normal probability
distribution given 2 values of a normal random variable and related probabilities
solving the following simultaneously:
Explain how unknown mean and standard deviation could be found given
probabilities and related values of normal random variable.
Topic outline
State the conditions when the binomial and Poisson probability distributions
are better approximated by the normal probability distribution
Reading Text:
Rule: Suppose ‘a’ represents certain discrete value; adjustment using the continuity
correction factor is made as follows:
for x > a, add 0.5; thus it will be x > a+0.5
for x ≤ a, add 0.5; thus it will be: or x ≤ a+0.5
for x < a, subtract 0.5; thus it will be x< a-0.5
for x ≥ a, subtract 0.5; thus it will be x ≥ a-0.5
Using the binomial approach, find the mean (µ)=n.p and standard deviation
(σ)= ; finally, find the z score and compute probability.
Example 20
1. It is believed that 30% of all stolen cars are recovered and returned to the
owners. In a month when 200 cars are stolen,
a. What is the expected number of cars that are recovered and returned seats?
µ=n.p=200*0.3=60
b. What is the variance and standard deviation of the cars that are recovered
and returned?
Variance=n.p.q=200*0.3*0.7=42 S.dev.= =6.48
c. What is the probability (applying normal approximation to the
binomial) that:
i. Fewer than 65 cars will be recovered and returned to their owners?
Note: n=200, P=0.3, Q=0.7; thus n>20 and np & nq≥5 is satisfied and
the binomial problem can be approximated using normal distribution
P(x<65); adjust using continuity correction factor (subtract 0.5); P(x<65-
0.5)= P(x<64.5); convert to z; Z= 64.5-60/6.48=0.69
P(z<0.69)=0.5-0.2549=0.2451
ii. More than 75 cars will be recovered and returned to their owners?
P(x>70); adjust P(x>70+0.5)=P(x>70.5); z=75.5-60/6.48=2.39
P(z>2.39)=0.5-0.4916=0.0084
iii. Exactly 70 cars recovered and returned to their owners?
P(x=70) is the same as P(x≥70 and x≤70); adjust P(x≥70-0.5 and
x≤70+0.5)
P(69.5≤x≤70.5)=P(1.47≤z≤1.62)=0.4474-0.4292=0.0182
When the mean of a Poisson is relatively large, the normal prob. dist. can be used to
approximate it. Rule: Good approximation is when µ≥10
2. The average number of calls for a service received by a machine repairer shop
per 8hour shift is 10. What is the probability that more than 15 calls will be
received during a randomly selected 8 hours shift?
µ=10, for Poisson, Standard deviation= = =3.16;
P(x>15)= adjust; P(x>15.5); convert to z, z=15.5-10/3.16)=
P(z>1.74)=0.5-0.4591=0.0409
Exercise 20
1. For an Airline 80% of the time seats in all flights are occupied. If a particular Air
plane has 100 seats:
a. What is the expected number of occupied seats?
b. What is the variance and standard deviation of the occupied seats?
c. What is the probability (applying normal approximation to the binomial)
that:
i. More than 85 seats will be occupied
ii. At most 88 seats will be occupied
iii. Between 76 and 82 seats will be occupied
iv. Exactly 88 seats will be occupied
2. Patients arrive at a hospital at an average rate of 25 per day. What is the probability
that more than 22 patients will arrive in a day? Assuming arrival follow Poisson
distribution.
Synopsis
Binomial probability distribution can best be approximated by the normal
probability distribution when np≥5 and nq≥5; and n>20
Poisson probability distribution can best be approximated by the normal
probability distribution when µ ≥10
o Continuity correction factor (adding or subtracting 0.5) is used to adjust
discrete values into continuous forms
o Using the mean and standard deviation of each distributions; z value is set
,and hence, the probability