Unit-2 - Random Variables and Probability Distributions - Jan2025
Unit-2 - Random Variables and Probability Distributions - Jan2025
Continuous random variable: If the random variable X can assume infinite and
uncountable set of values, it is said to be a continuous random variable.
For example, age of students admitted into Mac Data science program, height or weight of
students in the School of Science, etc. are all continuous random variables.
Generally, discrete random variables represent counted data, while continuous random
variables represent measured data.
Probability distribution
The set of all possible values of a random variable and their associated
probabilities is called a probability distribution.
Probability Mass Function (PMF)
A probability mass function (PMF) is a function that describes the probability distribution of a
discrete random variable.
It assigns probabilities to each possible outcome of the random variable, such that the sum of
all probabilities is equal to 1.
The PMF is typically denoted by P(X = xi), where X is the random variable and xi is one of
its possible values.
The value of the PMF at xi gives the probability that X takes on the value xi.
For example, suppose we toss a fair coin twice, and let X be the number of heads that come
up. Then X can take on the values 0, 1, or 2. The PMF of X is given by:
This PMF tells us that the probability of getting no heads is 1/4, the probability of getting one
head is 1/2, and the probability of getting two heads is 1/4.
Probability Mass Function
• 0 ≤ pi ≤ 1 and
• ∑ i
pi = 1
Probability Mass Function
f ( x)
0.5
0.3
0.2
F(x) = P(X ≤ x)
F(x − 1) = P(X ≤ x − 1)
∑
Mean = xp(x)
2 2
∑
Variance = x p(x) − (Mean)
Problem: You draw 5 cards from a standard deck of 52 cards without replacement. Let X
denote the number of aces in your hand. Find the probability mass function describing the
distribution of X.
(5)
52
We have ways of picking 5 cards from the deck of 52.
(5)
52
We have ways of picking 5 cards from the deck of 52.
Let X be a continuous random variable taking values on the interval [a, b].
A function f(x) is said to be the probability density function of the continuous random variable X, if it satisfies
the following properties :
P(X = x) = f(x) ≥ 0
∫a
P(a ≤ X ≤ b) = f(x)dx = 1
The PDF is defined as the derivative of the cumulative distribution function (CDF) of the random variable.
In other words, if F(x) is the CDF of a continuous random variable X, then the PDF of X is given by:
d
f(x) = F(x)
dx
The value of the PDF at any point x gives the rate at which the CDF changes at that point. The area under
the PDF curve between any two points gives the probability that the random variable falls within that interval.
Probability Density Function
( 3)
2 1
Find P X ≤ | X > .
3
Solution:
Note:
1. For a continuous random variable, the probability at a point is always zero i.e.,
P(X = c) = 0, for all single point values of c. Hence, in case of continuous random
variable, we always talk of probabilities in an interval and not at a point (which is always
zero).
2. Since in the case of continuous random variable, the probability at a point is always zero,
we have P(X = c) = 0 and P(X = d) = 0.
3. P(c ≤ X ≤ d ) = P(c < X ≤ d ) = P(c ≤ X < d ) = P(c < X < d ). Hence, in case of continuous
random variable, it does not matter if one or both the end points of the interval (c, d) are
included or not.
f ( x)
49.5 50.5 x
Probability Density Function
Problem: Check whether the following is a valid PDF.
f ( x) = 1.5 − 6( x − 50.2) 2 for 49.5 ≤ x ≤ 50.5
f ( x) = 0, elsewhere
Solution: If the cumulative density in the given range is 1, then the PDF is a valid one.
50.5
2 3 50.5
∫ (1.5 − 6( x − 50.0) )dx = [1.5 x − 2( x − 50.0) ] 49.5
49.5
3
= [1.5 × 50.5 − 2(50.5 − 50.0) ]
−[1.5 × 49.5 − 2(49.5 − 50.0)3 ]
= 75.5 − 74.5 = 1.0
• Since the total probability in the specified range is equal to 1, the above density
function is a valid PDF.
Probability Density Function
Find the probability that a metal cylinder has a diameter between 49.8 and 50.1 mm.
dF ( x)
⋅ f ( x) =
dx
⋅ P ( a < X ≤ b) = P ( X ≤ b) − P ( X ≤ a )
= F (b) − F (a )
⋅ P ( a ≤ X ≤ b) = P ( a < X ≤ b)
Mean of a continuous random variable X is given by
∫−∞
E(X) = μ = xf(x)dx
∞ ∞
∫−∞ ∫−∞
Var(X) = σ 2 = (x − μ)2 f(x)dx = x 2f(x)dx − μ 2
Problems:
1. Check whether the following is a probability distribution.
X 0 1 2 3 4 5
P(X = x) = p(x) 0.12 0.26 0.28 0.19 0.13 0.07
2. For an experiment of simultaneous toss of three fair coins, obtain probability distribution
for the random variable representing number of heads in three tosses.
3. Two dice are rolled at random. Obtain the probability distribution of the sum of the
numbers on them.
4. Five defective mobile phones were accidentally mixed with twenty good ones and by
looking at them it is not possible to differentiate between them. Four phones are drawn
at random from the lot. Find the probability distribution of X, the number of defective
phones.
Mathematical Expectation of a Random Variable
• Mathematical expectation is an important concept in summarizing
characteristics of probability distributions.
• Mathematical expectation of a discrete random variable with PMF, P(X = xi) = pi is defined
as:
E( X ) = ∑px i i
i
• Mathematical expectation of a continuous random variable with PDF, f(x) is defined as:
∫
E(X) = xf(x)dx
• The expected value of a random variable is also called the mean of the random variable.
• The expected value E(X) is a weighted mean of X, where the weights are the probabilities pi.
8. Var(X + b) = Var(X)
9. Covariance between X and Y is: Cov(X,Y) = E(XY) - E(X).E(Y)
Example: Number of drugs prescribed for each of the 100 patients visited a
hospital is summarized and presented below. Find the mean and standard
deviation.
6. A car rental company has two types of cars available for rent: economy and luxury.
The probability that a customer chooses an economy car is 0.65, and the probability
that they choose a luxury car is 0.35. The rental price for an economy car is Rs 4000
per day, and the rental price for a luxury car is Rs 7000 per day. What is the average
rental price per day?
7. A casino o ers a game where you can bet Rs 1000 on a single roll of a die. If the die
shows a 1 or 2, you win Rs 2000. If it shows a 3, 4, or 5, you win Rs. 1000. If it shows
a 6, you lose your bet. What is your expected return on a Rs 1000 bet?
8. A study has shown that the probability distribution of X, the number of customers in
line (including the one being served, if any) at a checkout counter in a department
store, is given by P(X=0)=0.2, P(X=1)=0.25, P(X=2)=0.15, P(X=3)=0.25; P(X≥4)=0.15.
Consider a newly arriving customer to the checkout line. (a) What is the probability
that this customer will not have to wait behind anyone? (b) What is the probability
that this customer will have to wait behind at least one customer? (c) On average, the
newly arriving customer will have to wait behind how many other customers?
ff
Discrete Probability Distribution is a probability distribution of a discrete
random variable.
• This distribution has only two possible outcomes and a single trial.
• The probability that the outcome might be heads can be considered equal
to p and (1 − p) for tails (the probabilities of mutually exclusive events that
encompass all possible outcomes needs to sum up to one).
• Binomial distribution describes the possible number of times that a particular event
will occur in a sequence of observations or experiments or trials.
• The event is coded binary, it may or may not occur.
• The binomial distribution is used when a researcher is interested in the occurrence
of an event.
• For example, in a clinical trial, a patient may survive or die. The researcher studies
the number of survivors, and not how long the patient survives after treatment.
• Another example is whether an electronic item is working or not. Here, the binomial
distribution describes the number of items in working condition, and not how long
they worked.
• Other situations in which binomial distributions arise are quality control, public
opinion surveys, medical research, insurance problems, etc.
The binomial distribution is a prob. distribution of a random variable X, defined as the
number of successes in n trials. The probability mass function (pmf) is
(x)
n x n−x
P(X = x; n, p) = pq for x = 0, 1, 2, …., n
Mean = E(X) = np
= 0, otherwise
Variance = V(X) = npq
where p = probability of success
q = 1 − p = probability of failure
x = number of successes
n = number of trials
X = random variable representing number of successes
( x ) x!(n − x)!
n n!
= = the number of combinations of n items taken x at a time.
1. There are only two possible outcomes (success or failure) for each trial.
• When p < 0.5, it is skewed to the right (positively skewed; when p > 0.5, it is
skewed to the left (negatively skewed).
• For the binomial distribution with parameters n and p, variance cannot exceed
n/4.
Example: If a coin is tossed 4 times, then we may obtain 0, 1, 2, 3, or 4 heads. We
may also obtain 4, 3, 2, 1, or 0 tails, but these outcomes are equivalent to 0, 1, 2, 3, or
4 heads. What is the probability of getting a maximum of two heads?
The sample space = { TTTT, TTTH, TTHT, THTT, HTTT, TTHH, THTH, THHT, HHTT,
HTHT, HTTH, THHH, HTHH, HHTH, HHHT, HHHH }
Let X = number of heads, be the random variable ~ binomial distribution.
∑ (x)
n
n x
0 1 1/16 P(X ≤ 2) = p (1 − p)n−x
1 4 4/16 x=0
It is said to be a
positively skewed
distribution if the
tail is on the right,
and it is said to be
negatively skewed
if the tail is on the
left.
Case-2: If (n + 1)p is not an integer, then there exist unique mode value (i.e.) which is an
integral part of (n + 1)p
We know that 67.33 is not an integer, and hence the unique mode is 67, which is an integral
part (n + 1)p.
1. The binomial variate X lies within the range {0, 1, 2, 3, 4, 5, 6}, provided that
P(X = 2) = 9P(X = 4).
a) Find the parameter p of the binomial variate X.
b) Compute P(X > 4) and P(3 ≤ X ≤ 5)
2.A student obtained the following answer to a problem given to him/her. Mean =
3.4 and variance = 4.2 for a binomial distribution. Are the results correct?
3.In a binomial distribution with 7 independent trials, the probability of 3 and 4
successes is found to be 0·2903 and 0·1935 respectively. Find the probability
of success p.
4.An insurance agent sells life insurance policies to four men, all of identical age
and good health. According to the actuarial tables, the probability that a man of
this particular age will be alive 25 years hence is 4/5. Find the probability that in
25 years (a) all four men will be alive, (b) at least one man will be alive, (c) at
least 3 men will be alive, (d) at most three men will be alive.
4.Past data reveals that 65% of IT innovation startup companies reported profit in their first year of
operation. If a sample of 20 new IT startups is selected, what is the probability of (a) exactly 12
startups generate profit, (b) at most 10 startups report profit, (c) minimum of 14 startups report
loss, in the first year.
5.Suppose there were 45 customers who ordered consumer goods from an e-commerce portal,
find the probability that 20 customers’ orders were delivered correctly by the restaurant? Order
history says that about 85% orders were delivered correctly by the restaurant.
6.A random sample of 1,000 patients were examined to study severe effects due to medication.
The probability/proportion of having a severe effect is 1/5. Find the probability of the medication
having severe effects (i) in 150 patients, (ii) in at least 180 patients, and (iii) at most 220
patients
7.An appliance repairman services five laptops on site each day. One-third of the service calls
require installation of a particular part. The repairman has only one such part on his truck today.
(a) Find the probability that the one part will be enough today, that is, that at most one laptop he
services will require installation of this particular part. (b) Find the minimum number of such
parts he should take with him each day in order that the probability that he have enough for the
day's service calls is at least 95%.
8. It is known that approximately 10% of a population is hospitalized at least once during a
year. If 10 people in a population are interviewed, what is the probability that you will
find:
a) All have been hospitalized at least once during the year?
b) 50% have been hospitalized?
c) At least 3 have been hospitalized?
d) Exactly 3 have been hospitalized?
9. A hospital administrator has noticed that on 42 of the past 60 days the first surgery
scheduled in the operating rooms at the hospital has started at least 30 minutes late.
The administrator is not pleased and has issued a memo encouraging the hospital staff
to make every effort to begin surgeries on time. The administrator decides to monitor the
operating room performance for the next 10 days. Assume that the memo had no effect,
calculate (i) the probability that the first surgery will start at least 30 minutes late on 4 of
the next 10 days. (ii) the probability that on three or fewer of the 10 days the first
scheduled surgery will begin at least 30 minutes late.
10.Suppose we have 10 balls in a bowl, 3 of the balls are red and the rest are blue.
The experiment of drawing a ball with replacement is repeated 20 times, and getting
a red ball is defined as a success. What is the probability that the number of red
balls drawn would be: (i) Five (ii) eight (iii) less than six (iv) Greater than 17. Also
compute mean and standard deviation.
11.There are 10 multiple choice questions each with 5 choices, only one of which is
correct. The student is attempting to guess the answers. What is the probability that
the student will get: (i) exactly 6 questions correct, (ii) at least 8 questions answered
correctly, and (iii) at most 5 questions answered correct.
12.Suppose that 20% of all devices manufactured by a company fail in the life test. If
15 randomly selected devices fail the test, then find the probability that: (i) at most 8
devices fail the test, (ii) exactly 7 fail, (iii) at least 11 fail, and (iv) probability that
between 4 and 7, inclusive, fail.
If a public radio station solicits contributions during a late-night program, how many listeners
should it expect to call in if n = 1,000 are listening and p= 0.15? How large is the SD of the
number who call in?
Fitting of Binomial distribution
• To fit any theoretical distribution, we should know its parameters and probability distribution.
(0)
5
0 12 0 (0.45)0(0.55)5 = 0.050 5
Here, n = 5
(1)
5
1 20 20 (0.45)1(0.55)5−1 = 0.206 20.6
np = 2.25
(2)
5p = 2.25 ⇒ p = 0.45; 5
2 25 50 (0.45)2(0.55)5−2 = 0.337 33.7
q = 1 − p = 0.55
(3)
5
3 22 66 (0.45)3(0.55)5−3 = 0.276 27.6
Use p and q to derive probabilities
(4)
theoretically. 4 16 64 5 11.3
(0.45)4(0.55)5−4 = 0.113
(5)
5 5 25 5 1.9
(0.45)5(0.55)5−5 = 0.019
Description of parameters
• x is a vector of numbers.
• p is a vector of probabilities.
• n is number of observations.
• size is the number of trials.
• prob is the probability of success of each trial.
dbinom() function
• dbinom(x, size, prob) function gives the probability density at
each point.
Example: Ram makes 65% of his free-throw attempts. If he shoots 16 free throws,
what is the probability that he makes exactly 12?
• That is, pbinom returns the area to the left of a given value x in the binomial
distribution.
Example: Abhinav flips a fair coin 9 times. What is the
probability that the coin lands on heads a maximum of 6 times?
#find the probability of more than 6 successes during 9
trials where the #probability of success on each trial is 0.5
Example: Mary flips a fair coin 9 times. What is the probability that the
coin lands on heads more than 5 times?
#find the probability of more than 5 successes during 9 trials where the
#probability of success on each trial is 0.5
Examples:
▪ # of accidents that occur on a highway during a 1-week period
▪ # of customers visiting a bank during a 1-hour interval
▪ # of laptops sold at an electronics store during a given week
▪ # of breakdowns of a washing machine per year
▪ # of Emergency Department visits by an infant during the first year of life,
▪ # of road accidents at a particular junction on a highway.
▪ # of white blood cells found in a cubic centimeter of blood.
Conditions
Consider the # of breakdowns of a washing machine per month
• Each breakdown is called an occurrence
• Occurrences are random in that they do not follow any pattern
(unpredictable)
• Occurrence is always considered with respect to an interval (time or
distance)
The Poisson distribution is based on four assumptions. We will use the term
"interval" to refer to either a time interval or an area, depending on the context of
the problem.
1. The probability of observing a single event over a small interval is approximately
proportional to the size of that interval.
2. The probability of two events occurring in the same narrow interval is negligible.
3. The probability of an event within a certain interval does not change over
different intervals.
Solution: −6 4
6
(i) x = 4 and λ = 6, then p(4) = = .
4!
−
(ii) x = 10, λ = 12, then p(10) = = .
!
𝟏
𝟎
𝟎
𝟏
𝟎
𝟓
𝟎
𝟎
𝟏
𝟑
𝟓
𝑒
𝟏
𝟐
𝑒
𝟏
𝟐
𝟏
𝟎
Problem: The number of failures of a testing instrument from contamination particle on the
product is a Poisson random variable with a mean of 0.04 failures per hour.
a) What is the probability that the instrument does not fail in an 8-hour shift?
b) What is the probability of at least one failure in one 24-hour day?
Solution:
Here λ=0.04 per hour
a) Let X = number of failures in 8 hours. Then, X has a Poisson distribution with λ=8*0.04=0.32
e −λ λ x
P(X = x) = ⇒
x!
e −0.320.320
Probability that no instrument failed = P(X = 0) = = 0.726
0!
b) Let Y = number of failures in 24 hours. Then, Y has a Poisson distribution with λ=0.04*24=0.96
Probability of at least one instrument failed =
e −0.960.960
P(Y ≥ 1) = 1 − P(Y = 0) = 1 − = 0.6171
0!
Problems
1. A car rental agency has seven cars which it hires out day by day. The number of
demands for a car on each day is distributed as a Poisson variate with mean 5.
Calculate the proportion of days on which (i) None the cars is used (ii) Some
demand is refused.
2. In a certain factory making optical lenses, there is a small chance 1/250 for any
one lens to be defective. The lenses are supplied, in packets of 10. Use Poisson
distribution to calculate the approximate number of packets containing: (a) no
defective, (b) one defective, (c) two defective, and (d) four defective lenses
respectively in a consignment of 20,000 packets.
3. A manufacturer of electronic chips knows that 3% of his product is defective. If he
sells chips in boxes of 100, and guarantees that not more than 5 chips will be
defective, what is the probability that a box will fail to meet the guaranteed quality ?
Problem: It was found that the average breakdown rate of a particular part of the
diagnostic system is 2 per day. Assuming Poisson process, the hospital management
wants to determine the need for additional maintenance resource to cope with the
breakdowns in the machine shop.
If the hospital works 7 days per week, what is the probability of getting: a) exactly 10,
and b) more than 15 breakdowns in a week?
Solution:
Average number of breakdowns/week = λ= 2x7=14
e −141410
(i) Probability of exactly 10 breakdowns in a week =P(X = 10) = = 0.0663
10!
(ii) Probability of more than 15 breakdowns in a week
e −141415
P(X > 15) = 1 − P(X ≤ 15) = = 1 − = 0.3306
15!
Problem: The average number of errors due to a particular bug occurring in a
minute is 0.0004. (i) What is the probability that no error will occur in 25
minutes? (ii) How long would the program need to run to ensure that there
will be a 97% chance that an error will show up to highlight this bug?
That is, the program needs to run for more than 6 days to ensure that there will be a
97% chance that an error will show up to highlight the bug.
Problems on Poisson distribution:
1. Births in a hospital occur randomly at an average rate of 1.8 births per hour. What is the
probability of observing (i) 4 births in a given hour (ii) observing 5 births in a given hour? (iii)
observing more than or equal to 2 births in a given hour at the hospital?
2. An insurance company receives on average 3 claims every day. Find the probability that it will
receive: a) 4 claims on any given day, b) at least 2 claims a day, c) it receives 5 claims in two
days, and d) a maximum of 8 claims in two days
3. In the average year, there are 6.5 accidents occurring at a particular junction on the national
highway. Find the probability that: (a) at most 6 accidents occur in a year, (b) at least 10
accidents occur in a year, and (c) more than 8 accidents will occur at this hospital during the
year 2022.
4. The number of bone fracture patients entering to a hospital emergency room in one day has a
Poisson distribution with mean 6.8 patients. The hospital manager has decided to allocate
emergency room resources that are sufficient to comfortably cope with up to ten bone fracture
patients per day. What is the probability that on a given day these resources will be inadequate?
5. Phone calls arrive at the rate of 48 per hour at the reservation desk for a 5-star
hotel. Compute the probability of receiving:
6. During the period of time that a local university takes phone-in registrations, calls
come in at the rate of one every two minutes.
a. What is the expected number of calls in one hour?
b. What is the probability of three calls in five minutes?
c. What is the probability of no calls in a five-minute period?
7. If Y is a Poisson variable such that
P(Y = 2) = 9P(Y = 4) + 90 P(Y = 6), find the mean and variance of Y.
8. A video recording company has discovered that the number of defects on records appears to follow a
Poisson distribution with a mean equal to 0·7.
(i) What is the probability that a record selected at random will have defects between 3 and 5?
(ii) If the company sets a policy that all video records sold to customers must not have any defects,
what per cent of its records production will not be made available for sales because of defects?
9. A computing system manager states that the rate of interruptions to the internet service is 0.4 per
week. Use the Poisson distribution to find the probability of
(a) two interruptions in 3 weeks
(b) at least two interruptions in 4 weeks
(c) at most one interruption in 10 weeks.
10.A consulting engineer receives, on average, 0.8 requests per week. If the number of requests
follows a Poisson process, find the probability that
(a) in a given week, there will be at least 2 requests;
(b) in a given 4-week period, there will be at most 3 requests.
Sample Poisson Distribution
with λ = 2
P(X)
X
Poisson approximation to Binomial
Distribution
• If the number of Bernoulli trials is more than 20, i.e. > 20, and the probability
of success is less than 0.05, the probability of successes may be well
approximated by a Poisson random variable instead of the binomial distribution.
• When ≥ 100 and < 10, the approximation will generally be excellent.
𝑛𝑛
𝑥
𝑛
𝑝
Problems on Poisson approximation to the binomial
distribution
1. If 2 percent of the credit card users are defaulters, use the Poisson approximation
to the binomial distribution to determine the probability that 3 of 400 users of credit
card are defaulters.
2. Records show that the probability is 0.00007 that a car will have a flat tire while
crossing a certain bridge. Use the Poisson distribution to approximate the binomial
probabilities that, among 20,000 cars crossing this bridge, (a) exactly two will have
a flat tire; (b) at most two will have a flat tire.
Fitting of a Poisson Distribution
∑ fi xi
1) For the given data, compute mean using formula, Mean = and equate it
∑ fi
to mean of poisson distribution, λ.
2) Compute probabilities of various values of the random variable X by using p.m.f.
e −λλ x
P(X = x) =
x!
3) Each probability obtained in step (2) is multiplied by N ( = Σfi = the total
frequency) to get expected frequencies.
Fit a Poisson distribution for the following data.
x Frequency (f)
0 56
1 156
2 132 ∑ fi xi 986
Mean = = = 1.972
3 92 ∑ fi 500
4 37 λ = 1.972
5 22
7 0
8 1
Problem: The number of defects per unit in a sample of 300 units of manufactured product was
found as follows:
Number of Number of
defects units
0 40
1 60
2 100
3 50
4 30
5 20
1. The simplest and handiest way is to see if the variance is roughly equal to
the mean for your Poisson data.
The exact shape of the Normal distribution depends on the mean and the
standard deviation of the distribution.
• Skewed – shows that the distribution lacks symmetry; used to denote the
data is thin at one end, and accumulated at the other end
• Absence of symmetry
• Extreme values or ‘tail’ in one side of a distribution
• Positively- or right-skewed vs. negatively- or left-skewed
0.15
0.15
0.10
0.10
y
y
0.05
0.05
0.00
0.00
0 5 10 15 20 0 5 10 15 20
x x
Skewness
Generally, If Mean > Mode, the skewness is positive.
3( − )
=
where, is the mean
Md is the median;
Sk is coefficient of skewness;
σ is the Standard deviation
∑ =1 ( − ̄ )
3
Alternatively, = 3
𝜎
𝜎
𝑆
𝑘
𝑆
𝑘
𝑖
𝑖
𝜇
𝑀
𝑑
𝑥
𝑥
𝜇
𝑇
Coefficient of Skewness
Leptokurtic
Mesokurtic
Platykurtic
Measuring Kurtosis
T
∑i=1 (xi − x)4 /n
Coefficient of Kurtosis, K =
S4
40 µ = 50 60
Smaller µ, same σ
µ = 40 50 60
Larger µ, same σ
40 50 µ = 60
Normal Distribution with Different
Values for σ
Same µ, smaller σ
Same µ, larger σ
µ
Although the distribution remains symmetric, the distribution becomes
flatter if we increase the standard deviation.
Three Common Areas Under Normal Curves
.15% .15%
99.7%
-3σ + 3σ
a µ b
Normal Distribution – Calculating Probabilities
• Rather than create a different table for every normal distribution
(with different mean and std devs), we can calculate a standardized
normal distribution, called Z
x
z= σ− µ
• A z-score gives the number of standard deviations that a value x is
above the mean.
2
f(z) = 1
_____ e-z / 2 -∞ < z < +∞
√2π
where z =(X-µ)/σ, is a standard normal variable
µ = mean
σ = standard deviation of the original normal distribution
Standard Normal Distribution
• Standard normal probability tables give the total area under the Z curve
between 0 and any point on the positive Z-axis.
• Since the curve is symmetric, the area under the curve between Z and 0 is
the same whether the Z curve is positive or negative.
Uses of z-table
And for x=24 with µ = 20 and σ = 4, the corresponding z = 1 implies that the
value of x = 24 lies 1 standard deviation above its mean of 20.
Table Lookup of a Standard Normal Probability
P( 0 ≤ Z ≤ 1) = 0. 3413
For X = 550
X− 550 − 494
Z= = = 0.56
100
106
Applying the Z Formula
For X = 300
X - µ 300 − 494
Z= = = −1.94
σ 100
For X = 600
X - µ 600 − 494
Z= = = 1.06
σ 100 0.4738+ 0.3554 = 0.8292
Probability as
Area Under the Curve
The total area under the curve is 1.0, and the curve is
symmetric, so half is above the mean, half is below
0.5 0.5
µ X
P( −∞ < X < ∞ ) = 1.0
P(Z < a)
P(Z > -a)
A graph of the standardized normal curve (mean μ=0 and standard deviation σ = 1) is
shown below:
f(X)
P (a ≤ X ≤ b )
a b X
Problem:
Find the values of z that correspond to the probabilities of (i)
0.3517; (ii) 0.2536 using the standard normal distribution table.
Solution:
(i) Since 0.3517 falls between 0.3508 and 0.3531,
corresponding to z = 1.04 and z = 1.05, and 0.3517 is closer to
0.3508 than 0.3531, we choose z = 1.04.
Solution:
(a) We need to find the specific transaction value, x, such that above this value the top 15%
of high-spending customers are found.
Step-1: Draw the normal curve, showing the transaction value, x, that identifies the area
corresponding to the 15% of highest spenders.
Step-2: Using z-table, find z-value that corresponds to an area of 0.15 in the
right/top tail of the standard normal distribution. To use the z-table, the
appropriate area to read off is 0.5 – 0.15 = 0.35 (i.e. the middle area). The
closest z-value is 1.04.
Step-3: Find the x-value associated with the identified z-value in Step 2.
−
Substitute z = 1.04, µ = 2440 and σ = 64 into the z transformation = ,
and solve for x.
− 2440
1.04 = ⇒ = 2506.56
64
Thus the highest-spending 15% of store customers spend at least INR 2506.56
𝜎
per transaction.
𝑥
𝑧
𝑥
𝑥
𝜇
(b) This requires that a specific transaction value, x, be identified such that the area below
this value is 30%, which represents the lowest spending 30% of store customers.
Step-1: Sketch the normal curve, showing the transaction value, x, that identifies the area
corresponding to the 30% of lowest spenders.
Step-2: From the z-table, find the z-value that corresponds to an area of 0.30 in the bottom
tail of the standard normal distribution. To use the z-table, the area that must be found is 0.5
– 0.30 = 0.20 (i.e. the middle area). The closest z-value read off the z-table is 0.52.
However, since the required z-value is below its mean, the z-value will be negative. Hence
z = −0.52.
Step-3: Find the x-value associated with the identified z-value in Step 2. Substitute
− 2440
−0.52 = ⇒ = 2406.72
64
Solution:
Let X be the monthly sale of laptops. X follows normal distribution with mean 2700 and SD 300.
We have to find the values x1 and x2 such that P(x1 < X < x2) = 0.95
X1 − μ X − μ X1 − μ
or P( < < ) = 0.95 ⇒ P(z1 < Z < z2 ) =0.95
σ σ σ
X1 − μ X1 − 2700 X2 − μ X2 − 2700
So z1 = -1.96 ⇒ = -1.96 ⇒ z2 = 1.96 ⇒ = 1.96 ⇒ = 1.96
σ 300 σ 300
⇒ X1=2112 ⇒ X2=3288
Hence, the retailer is 95% sure that sales in any given month will be between 2112 and 3288 units.
Class exercise:
1. The cricket scores for a university team were normally distributed with a mean of 52
runs and a standard deviation of 7. Find the probability that a randomly selected
player scored less than 48 runs.
2. The golf scores for a school team were normally distributed with a mean of 68 and a
standard deviation of three. Find the probability that a golfer scored between 66 and
70.
3. The amount of fuel consumed by an aircraft on a flight between Hyderabad and
Delhi is a normally distributed random variable X with mean µ = 4.7 tons and
standard derivation σ = 0.5 tons. Carrying too much fuel is inefficient as it slows the
plane. If too little fuel is loaded on the plane, an emergency landing may be
necessary. What should be the amount of fuel to load so that there is 0.99
probability that the plane will arrive at its destination without emergency landing?
4. The demand for gasoline at a service station is normally distributed with mean
12,809 litres per day and standard deviation 49 litres. Find two values that will give
a symmetric 0.80 probability interval and 98% probability interval for the amount of
gasoline demanded daily.
Standard Normal Distribution (z) - Finding Probabilities for z-values in MS-
Excel
For Example, P(z > 1.82) is the complement of the cumulative probability from −∞ up to z =
1.82. Thus to find P(z > 1.82), type =1–NORMSDIST(1.82) to give 0.0344.
Normal Distribution – Finding Probabilities for x-values in MS-Excel
The values x, mean (µ), and standard_dev (σ) are to be specified to use this function.
Cumulative = TRUE specifies that cumulative normal probabilities are computed.
This function computes normal probabilities directly for an x random variable with a given
mean, µ, and standard deviation, σ, without first having to convert to a z-value.
For example, for X~N(45,8), P(40 < x < 51) can be found by subtracting the two cumulative
probabilities P(x < 51) and P(x < 45).
Thus to find P(45 < x < 51), we use NORMDIST(51, 45, 8, TRUE)–NORMDIST(40, 45, 8, TRUE) to
get 0.5074.
For example, P(x < 48) is found by evaluating NORMDIST(48, 45, 8, TRUE) to give 0.64617.
Standard Normal Distribution – Finding z-values using MS-Excel
The NORMSINV (i.e. standard normal inverse) function finds the z-value associated with the
cumulative probability up to z.
=NORMSINV(cumulative probability).
For example, to find k such that P(0 < z < k) = 0.3461, the cumulative area up to k must be used
in the Excel function.
The cumulative area is 0.5 + 0.3461 = 0.8461.
NORMSINV(0.8461) gives 1.01985. Thus k = 1.02.
To find k such that P(z > k) = 0.8051, the cumulative area of the complement up to k must be
used in the Excel function.
The cumulative area up to k is 1.0 – 0.8051 = 0.1949.
NORMSINV(0.1949) to give −0.85998. Thus k = −0.86.
Normal distribution – Finding x-limits in MS-Excel
The NORMINV (i.e. normal inverse) function finds the x-limit associated with a
normal probability distribution with a specified mean µ and standard deviation σ.
Thus k = 223.0204.
Practice problems:
1. For the numbers below find the area between the mean and the z
a) z = 1.17 b) z = -.85 c) z = 2.07 d) z = -1.37
2. Determine: (a) P(0 < z < 1.46) (b) P(−2.3 < z < 0) (c) P(z > 1.82)
(d) P(−2.1 < z < 1.32) (e) P(1.24 < z <2.075)
3. If scores are normally distributed with a mean of 30 and a standard deviation
of 5, what percent of the scores is: (a) greater than 30? (b) greater than 37?
(c) between 28 and 34?
4. The cost of treatment per patient for a certain medical problem was modeled
by one insurance company as a normal random variable with mean Rs. 8500
and standard deviation Rs.1750. What is the probability that the treatment
cost of a patient is less than Rs. 9,000, based on this model?
5. Suppose the length of stay in a hospital following coronary artery bypass surgery
is normally distributed with a mean of 14 days and a standard deviation of 3
days. What proportion of patients will have a length of stay:
(A) less than 10 days?
(B) greater than 10 days?
(C) is between 12 days and 18 days?
6. A city hospital conducted a cognitive abilities test for patients recently diagnosed
with Parkinson’s disease. The average and standard deviation of scores were 64
and 7 respectively. (a) If a randomly selected patient scores 58 on the test, what
is this patient's percentile rank? (b) If another patient scores 66 on the test, what
percent of individuals would receive a higher score?
7. The lifetime of a certain brand of washing machine is normally distributed
with mean and standard deviation equal to 3.4 and 1.6 years respectively.
(a) If this type of washing machine is guaranteed for one year, what
percentage of original sales will require replacement if they fail within the
guarantee period? (b) What percentage of these washing machines is likely
to be operating:(i) after 4 years? (ii) after 5.5 years? (c) If the manufacturer
of these washing machines wants to ensure that no more than 5% of these
machines will be replaced within a guarantee period, what new guarantee
period should they choose?
8. The service time of the first service of a BMW is normally distributed, with a
mean of 70 minutes and a variance of 81 minutes. (a) If a customer brings
her BMW in for its first service, what is the probability that the car will be
ready within one hour? (b) What is the probability that the job will take
more than an hour and a half? (c) What percentage of first services will be
completed between 50 and 60 minutes? (d) The BMW dealer has a policy to
give its customers a 15% discount on the cost of the first service if the
service is not completed within 80 minutes. From a sample of
120 customers who brought their BMWs in for its first service, how many
are likely to receive the 15% discount? (e) If the BMW dealer wants to
ensure that no more than 5% of all first services will take longer than
80 minutes, what should the mean service time be?
9. A coffee-dispensing machine used in cafeterias is set to dispense coffee with
an average fill of 230 ml and a standard deviation of 10 ml per cup. Assume
that the volume dispensed is normally distributed. (a) For a randomly
selected cup dispensed by the machine, what is the probability that: (i) the
cup is filled to more than 235 ml? (ii) the cup is filled to between 235 ml and
245 ml? (iii) the cup is less than 220 ml full? (b) If the company supplying the
coffee machines wants only 15% of cups to exceed a given fill level, what
level of fill (in ml) does this correspond to? (c) What must the mean fill level
(in ml) be set to in order to ensure that no more than 10% of cups are filled
to less than 220 ml?
10. Assume that the mean life of a particular brand of car battery is normally
distributed with a mean of 28 months and a standard deviation of 4 months.
(a) For a randomly selected battery, what is the probability that it will last
between 30 and 34 months?
(b) What is the probability that a randomly selected battery will fail within
two years of the date of purchase?
(c) By what time period will 60% of all batteries of this make fail?
(d) If a guarantee period is to be set, how many months would it have to be
to replace no more than 5% of batteries of this make?
Fitting of Normal Distribution
• To fit normal distribution to the given data, we first calculate the mean μ , and standard
deviation σ, from the given data.
Class Frequency
10-20 9
20-30 16
30-40 25
40-50 12
50-60 9
60-70 4
Fitting a normal distribution
Class Frequency Mid value of fi mi fi mi2
( fi) class (mi)
10-20 9 15 135 2025
20-30 16 25 400 10000
30-40 25 35 875 30625
40-50 12 45 540 24300 ∑ fimi
50-60 9 55 495 27225
Mean =
∑ fi
60-70 4 65 260 16900
Total 75 2705 111075 ∑ fimi2
Variance = − Mean 2
Mean = 36.1 ∑ fi
Variance = 180.2
SD = 13.4
Fitting a normal distribution
Problem: Fit a normal distribution for the following data pertaining to end-
semester marks in Probability and statistics (P&S) subject obtained by B.Tech.
students.