0% found this document useful (0 votes)
56 views136 pages

Unit-2 - Random Variables and Probability Distributions - Jan2025

The document discusses random variables, their types (discrete and continuous), and probability distributions, including probability mass functions (PMF) and probability density functions (PDF). It explains the concepts of mean and variance for both discrete and continuous random variables, as well as mathematical expectation and its properties. Additionally, it provides examples and problems related to calculating probabilities and distributions.

Uploaded by

vbollam27
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views136 pages

Unit-2 - Random Variables and Probability Distributions - Jan2025

The document discusses random variables, their types (discrete and continuous), and probability distributions, including probability mass functions (PMF) and probability density functions (PDF). It explains the concepts of mean and variance for both discrete and continuous random variables, as well as mathematical expectation and its properties. Additionally, it provides examples and problems related to calculating probabilities and distributions.

Uploaded by

vbollam27
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 136

Unit-II

Random Variables and Probability


distributions
Random variable
When the values of a particular characteristic obtained arise as a
result of chance factors, (so that they cannot be exactly predicted
in advance), the variable is called a random variable.

An example of a random variable is:


• adult height,
• weight of newborn baby
• insurance amount claimed by insured persons
• number of students scored A+ grade, etc.

Values resulting from measurement procedures are often referred


to as observations or measurements.
Examples of Random Variables
Observations or
Random measurements of
Experiment Outcome Variables Random Variables
• Stock 50 Number of X = number of 0, 1, 2, ..., 50
Christmas trees Christmas Christmas
trees sold trees sold

• Inspect 600 Number of Y = number of 0, 1, 2, ..., 600


items acceptable acceptable
items items
• Send out 5,000 Number of Z = number of 0, 1, 2, ..., 5,000
sales letters respondents respondents
• Build an % of building R = % of building 0 ≤ R ≤ 100
apartment complete completed
building
Length of time S = time the bulb 0 ≤ S ≤ 80,000
• Test the life-time
bulb lasts up to burns
of a bulb
80,000 minutes
Random Variables for Outcomes that
are not numbers
Range of
Random Random
Experiment Outcome Variables Variables

• Students •Strongly agree (SA) 5 if SA 1, 2, 3, 4, 5


respond to a •Agree (A) 4 if A
questionnaire •Neutral (N) X= 3 if N
•Disagree (D) 2 if D
•Strongly disagree 1 if SD
(SD)
• One machine •Defective 0 if defective 0, 1
is inspected •Not defective Y=
1 if not
• Consumers •Good 3 if good 1, 2, 3
respond to •Average Z= 2 if average
how they like •Poor 1 if poor
a product
Types of random variables
Discrete random variable
Continuous Random Variable
Discrete random variable: If the random variable X assumes only a finite or countably
infinite set of values, it is known as discrete random variable.
For example, Number of students scored above 70 marks in a test, the number of students
in a college, the number of perished mangoes in a basket of mangoes, number of
accidents taking place on a busy road, etc., are all discrete random variables.

Continuous random variable: If the random variable X can assume infinite and
uncountable set of values, it is said to be a continuous random variable.
For example, age of students admitted into Mac Data science program, height or weight of
students in the School of Science, etc. are all continuous random variables.
Generally, discrete random variables represent counted data, while continuous random
variables represent measured data.
Probability distribution

The set of all possible values of a random variable and their associated
probabilities is called a probability distribution.
Probability Mass Function (PMF)

A probability mass function (PMF) is a function that describes the probability distribution of a
discrete random variable.

It assigns probabilities to each possible outcome of the random variable, such that the sum of
all probabilities is equal to 1.

The PMF is typically denoted by P(X = xi), where X is the random variable and xi is one of
its possible values.

The value of the PMF at xi gives the probability that X takes on the value xi.

For example, suppose we toss a fair coin twice, and let X be the number of heads that come
up. Then X can take on the values 0, 1, or 2. The PMF of X is given by:

P(X=0) = 1/4; P(X=1) = 1/2; P(X=2) = 1/4

This PMF tells us that the probability of getting no heads is 1/4, the probability of getting one
head is 1/2, and the probability of getting two heads is 1/4.
Probability Mass Function

• A set of probability values, P(X = xi) = pi for each of the


values taken by the discrete random variable X

• 0 ≤ pi ≤ 1 and

• ∑ i
pi = 1
Probability Mass Function

• Example 1 : Machine Breakdowns


– P (cost=50)=0.3, P (cost=200)=0.2, P (cost=350)=0.5

– 0.3 + 0.2 + 0.5 =1


xi 50 200 350

pi 0.3 0.2 0.5

f ( x)
0.5

0.3
0.2

50 200 350 Cost($)


Cumulative Distribution Function (c.d.f)

Cumulative Distribution Function, F(x) is defined as:

F(x) = P(X ≤ x)

F(x − 1) = P(X ≤ x − 1)

F(x) − F(x − 1) = P(X = x) = p(x)


Mean and Variance of Discrete random variable

Mean of a discrete random variable X is:


Mean = xp(x)

Variance of a discrete random variable X is:

2 2

Variance = x p(x) − (Mean)
Problem: You draw 5 cards from a standard deck of 52 cards without replacement. Let X
denote the number of aces in your hand. Find the probability mass function describing the
distribution of X.

Solution: The random variable X can take on 5 possible values. That is X = 0, 1, 2, 3, 4

(5)
52
We have ways of picking 5 cards from the deck of 52.

There are 4 aces, and 48 non-aces in the deck.

The probabilities are calculated as follows:


Problem: You draw 5 cards from a standard deck of 52 cards without replacement. Let X
denote the number of aces in your hand. Find the probability mass function describing the
distribution of X.

Solution: The random variable X can take on 5 possible values. That is X = 0, 1, 2, 3, 4

(5)
52
We have ways of picking 5 cards from the deck of 52.

There are 4 aces, and 48 non-aces in the deck.

The probabilities are calculated as follows:


Probability Density Function

Let X be a continuous random variable taking values on the interval [a, b].

A function f(x) is said to be the probability density function of the continuous random variable X, if it satisfies
the following properties :

P(X = x) = f(x) ≥ 0

∫a
P(a ≤ X ≤ b) = f(x)dx = 1

The PDF is defined as the derivative of the cumulative distribution function (CDF) of the random variable.
In other words, if F(x) is the CDF of a continuous random variable X, then the PDF of X is given by:
d
f(x) = F(x)
dx
The value of the PDF at any point x gives the rate at which the CDF changes at that point. The area under
the PDF curve between any two points gives the probability that the random variable falls within that interval.
Probability Density Function

Let X be a continuous random variable with probability density function f(x) = 4x 3, 0 ≤ x ≤ 1.

( 3)
2 1
Find P X ≤ | X > .
3

Solution:
Note:
1. For a continuous random variable, the probability at a point is always zero i.e.,
P(X = c) = 0, for all single point values of c. Hence, in case of continuous random
variable, we always talk of probabilities in an interval and not at a point (which is always
zero).

2. Since in the case of continuous random variable, the probability at a point is always zero,
we have P(X = c) = 0 and P(X = d) = 0.

3. P(c ≤ X ≤ d ) = P(c < X ≤ d ) = P(c ≤ X < d ) = P(c < X < d ). Hence, in case of continuous
random variable, it does not matter if one or both the end points of the interval (c, d) are
included or not.

However, this result is not true, for discrete random variables.


Probability Density Function

Suppose that the diameter of a metal cylinder has a p.d.f

f ( x) = 1.5 − 6( x − 50.2) 2 for 49.5 ≤ x ≤ 50.5


f ( x) = 0, elsewhere

f ( x)

49.5 50.5 x
Probability Density Function
Problem: Check whether the following is a valid PDF.
f ( x) = 1.5 − 6( x − 50.2) 2 for 49.5 ≤ x ≤ 50.5
f ( x) = 0, elsewhere
Solution: If the cumulative density in the given range is 1, then the PDF is a valid one.
50.5
2 3 50.5
∫ (1.5 − 6( x − 50.0) )dx = [1.5 x − 2( x − 50.0) ] 49.5
49.5
3
= [1.5 × 50.5 − 2(50.5 − 50.0) ]
−[1.5 × 49.5 − 2(49.5 − 50.0)3 ]
= 75.5 − 74.5 = 1.0
• Since the total probability in the specified range is equal to 1, the above density
function is a valid PDF.
Probability Density Function

Find the probability that a metal cylinder has a diameter between 49.8 and 50.1 mm.

This can be calculated as follows:


50.1
∫ (1.5 − 6( x − 50.0) 2 ) dx = [1.5 x − 2( x − 50.0)3 ]50.1
49.8
49.8

= [1.5 × 50.1 − 2(50.1 − 50.0)3 ]


−[1.5 × 49.8 − 2(49.8 − 50.0)3 ]
f ( x) = 75.148 − 74.716 = 0.432

The probability that a metal


cylinder has a diameter
between 49.8 and 50.1 mm is
0.432
49.5 49.8 50.1 50.5 x
Cumulative Distribution Function

• Cumulative Distribution Function of a continuous random variable is:


x
⋅ F ( x) = P( X ≤ x) = ∫ f ( y ) dy
−∞

dF ( x)
⋅ f ( x) =
dx

⋅ P ( a < X ≤ b) = P ( X ≤ b) − P ( X ≤ a )
= F (b) − F (a )

⋅ P ( a ≤ X ≤ b) = P ( a < X ≤ b)
Mean of a continuous random variable X is given by

∫−∞
E(X) = μ = xf(x)dx

Variance of a continuous random variable X is given by

∞ ∞

∫−∞ ∫−∞
Var(X) = σ 2 = (x − μ)2 f(x)dx = x 2f(x)dx − μ 2
Problems:
1. Check whether the following is a probability distribution.
X 0 1 2 3 4 5
P(X = x) = p(x) 0.12 0.26 0.28 0.19 0.13 0.07

2. For an experiment of simultaneous toss of three fair coins, obtain probability distribution
for the random variable representing number of heads in three tosses.

3. Two dice are rolled at random. Obtain the probability distribution of the sum of the
numbers on them.

4. Five defective mobile phones were accidentally mixed with twenty good ones and by
looking at them it is not possible to differentiate between them. Four phones are drawn
at random from the lot. Find the probability distribution of X, the number of defective
phones.
Mathematical Expectation of a Random Variable
• Mathematical expectation is an important concept in summarizing
characteristics of probability distributions.

• Mathematical expectation, also known as expected value, is a concept in


probability theory that represents the average value of a random variable over
many trials.

• It is calculated by multiplying each possible outcome of the random variable by


its probability of occurrence and then summing the results.

• Mathematical expectation of a random variable X denoted by E(X) and is


calculated as:

E(X) = Σ(x * P(X = x))


where x is a possible value of random variable X, and
P(X = x) is the probability of X taking on the value x.
Mathematical Expectation of a Random Variable
• For example, consider rolling a fair six-sided die. The random variable X
represents the outcome of the roll, and the possible values of X are {1,
2, 3, 4, 5, 6}.
• The probability of rolling each value is 1/6, since there are six equally
likely outcomes.
• The expected value of X = E(X) = (1 * 1/6) + (2 * 1/6) + (3 * 1/6) + (4 *
1/6) + (5 * 1/6) + (6 * 1/6) = 3.5
• This means that, on average, we can expect to roll a value of 3.5 over
many rolls of the die.
• Mathematical expectation is a useful concept in many areas of
mathematical sciences and engineering.
Mathematical Expectation of a Random Variable

• Mathematical expectation of a discrete random variable with PMF, P(X = xi) = pi is defined
as:
E( X ) = ∑px i i
i

• Mathematical expectation of a continuous random variable with PDF, f(x) is defined as:


E(X) = xf(x)dx

• The expected value of a random variable is also called the mean of the random variable.

• The expected value E(X) is a weighted mean of X, where the weights are the probabilities pi.

• Mathematical expectation is a useful concept in many areas of mathematics, statistics, and


engineering. It is used to calculate probabilities, make predictions, and optimize decision-
making under uncertainty.
Properties of Mathematical Expectation

1. If X and Y are two random variables, then E(X+Y) = E(X) + E(Y)

2. If X and Y are independent random variables, then E(XY) = E(X).E(Y)

3. If X and Y are random variables and a and b are constants, then


E(aX + bY ) = aE(X) + bE(Y ) .
4. E(aX + b) = aE(X) + b where a and b are constants.
5. If X ≥ 0, then E(X) ≥ 0.

Var(X) = E(X 2) − [E(X)]2 where E(X 2) = x 2P(X = x)



6. Variance of X is given by:
2
7. Var(aX) = a Var(X)

8. Var(X + b) = Var(X)
9. Covariance between X and Y is: Cov(X,Y) = E(XY) - E(X).E(Y)
Example: Number of drugs prescribed for each of the 100 patients visited a
hospital is summarized and presented below. Find the mean and standard
deviation.

Solution: Let X be the random variable representing the number of drugs


prescribed for each patient.
# drugs (X) Number of patients/
prescriptions
5 40
4 30
3 20
2 10
1 0
E(x) = Σ xiP(xi) = x1P(x1) + x2P(x2) + x3P(x3)+ x4P(x4) + x5P(x5) = 4
i=1
5
Variance = Σ [xi - E(x)]2 P(xi)
i=1

= (5 - 4)2(.4) + (4 - 4)2(.3) + (3 - 4)2(.2)


+ (2 - 4)2(.1)
= (1)2(.4) + (0)2(.3) + (-1)2(.2) + (-2)2(.1)
= 0.4 + 0.0 + 0.2 + 0.4
= 1.0
The standard deviation is
σ = √ Variance
=√1
=1
Problem:
Problems:
1.A fair coin is flipped four times. Let X be the number of heads obtained. Calculate E(X) and variance of
X.
2.Consider a random variable with the following probability distribution: P(X=0)= 0.1, P(X=1)=0.2,
P(X=2)=0.3, P(X=3)=0.35, P(X=4)=0.05. a. Find P(X≤2), b. Find P(1<X≤3), c. Find P(X>0), d. Find
P(X>3|X>2), e. Find expected value and standard deviation of X.
3.A start-up company can make a profit of Rs. 20 million with probability of 0·45 or have a loss of Rs. 15
million with a probability of 0·55. What is the expected profit or loss? Consider a random variable
with the following probability distribution: P(X=0)= 0.1, P(X=1)=0.2, P(X=2)=0.3, P(X=3)=0.35,
P(X=4)=0.05. a. Find P(X≤2), b. Find P(1<X≤3), c. Find P(X>0), d. Find P(X>3|X>2), e. Find expected
value and standard deviation of X.
4.An enterprising young man devises a game of chance in which he proposes to let the participant cast
a fair die and then receive a payment according to the following schedule: If the event A = {1, 2, 3}
occurs, he receives Rs 100; if B = {4, 5} occurs, he receives Rs. 200; and if C = {6} occurs, he receives Rs
300. Construct a probability distribution for the payoffs and find the average and standard deviation
of payoffs.
5.If the probability that the value of a certain stock will remain the same is 0·46, the probability that its
value will increase by Re. 0·50 or Re. 1·00 per share are respectively 0·17 and 0·23 and the probability
that its value will decrease by Re. 0·25 per share is 0·14, what is the expected gain per share ?
Problems:

6. A car rental company has two types of cars available for rent: economy and luxury.
The probability that a customer chooses an economy car is 0.65, and the probability
that they choose a luxury car is 0.35. The rental price for an economy car is Rs 4000
per day, and the rental price for a luxury car is Rs 7000 per day. What is the average
rental price per day?

7. A casino o ers a game where you can bet Rs 1000 on a single roll of a die. If the die
shows a 1 or 2, you win Rs 2000. If it shows a 3, 4, or 5, you win Rs. 1000. If it shows
a 6, you lose your bet. What is your expected return on a Rs 1000 bet?

8. A study has shown that the probability distribution of X, the number of customers in
line (including the one being served, if any) at a checkout counter in a department
store, is given by P(X=0)=0.2, P(X=1)=0.25, P(X=2)=0.15, P(X=3)=0.25; P(X≥4)=0.15.
Consider a newly arriving customer to the checkout line. (a) What is the probability
that this customer will not have to wait behind anyone? (b) What is the probability
that this customer will have to wait behind at least one customer? (c) On average, the
newly arriving customer will have to wait behind how many other customers?
ff
Discrete Probability Distribution is a probability distribution of a discrete
random variable.

Eg: Binomial distribution


Poisson distribution
Bernoulli Distribution

• The Bernoulli distribution is the easiest distribution to understand and can be


used as a starting point to derive more complex distributions.

• This distribution has only two possible outcomes and a single trial.

• A simple example can be a single toss of a biased/ unbiased coin.

• The probability that the outcome might be heads can be considered equal
to p and (1 − p) for tails (the probabilities of mutually exclusive events that
encompass all possible outcomes needs to sum up to one).

• Mean and variance of the Bernoulli distribution are p and p(1 − p)


respectively.
Bernoulli distribution

Bernoulli distribution for a fair coin


BINOMIAL DISTRIBUTION

• Binomial distribution describes the possible number of times that a particular event
will occur in a sequence of observations or experiments or trials.
• The event is coded binary, it may or may not occur.
• The binomial distribution is used when a researcher is interested in the occurrence
of an event.
• For example, in a clinical trial, a patient may survive or die. The researcher studies
the number of survivors, and not how long the patient survives after treatment.
• Another example is whether an electronic item is working or not. Here, the binomial
distribution describes the number of items in working condition, and not how long
they worked.
• Other situations in which binomial distributions arise are quality control, public
opinion surveys, medical research, insurance problems, etc.
The binomial distribution is a prob. distribution of a random variable X, defined as the
number of successes in n trials. The probability mass function (pmf) is

(x)
n x n−x
P(X = x; n, p) = pq for x = 0, 1, 2, …., n

Mean = E(X) = np
= 0, otherwise
Variance = V(X) = npq
where p = probability of success
q = 1 − p = probability of failure
x = number of successes
n = number of trials
X = random variable representing number of successes

( x ) x!(n − x)!
n n!
= = the number of combinations of n items taken x at a time.

n and p are the parameters of the binomial distribution.


Binomial experiment should satisfy the following properties:

1. There are only two possible outcomes (success or failure) for each trial.

2. A Bernoulli (success–failure) experiment is performed n times, where n is a (non-


random) constant.

3. Outcomes from different trials are independent.

4. The probability of success p on each trial is a constant; the probability of failure is


q = 1 − p.
Binomial experiment should satisfy the following properties:

• For binomial distribution, mean = np


Variance = npq

• Binomial distribution is symmetric when p = 0.5 and skewed when p ≠ 0.5.

• When p < 0.5, it is skewed to the right (positively skewed; when p > 0.5, it is
skewed to the left (negatively skewed).

• For the binomial distribution with parameters n and p, variance cannot exceed
n/4.
Example: If a coin is tossed 4 times, then we may obtain 0, 1, 2, 3, or 4 heads. We
may also obtain 4, 3, 2, 1, or 0 tails, but these outcomes are equivalent to 0, 1, 2, 3, or
4 heads. What is the probability of getting a maximum of two heads?
The sample space = { TTTT, TTTH, TTHT, THTT, HTTT, TTHH, THTH, THHT, HHTT,
HTHT, HTTH, THHH, HTHH, HHTH, HHHT, HHHH }
Let X = number of heads, be the random variable ~ binomial distribution.

The probability of getting a head (i.e., success) = ½. Here, n = 4; p = 0.5


X Frequency P(X=x) P(X ≤ 2) = P(X = 0) + P(X = 1) + P(X = 2)

∑ (x)
n
n x
0 1 1/16 P(X ≤ 2) = p (1 − p)n−x
1 4 4/16 x=0

(0) (1) (2)


4 4 4
2 6 6/16 P(X ≤ 2) = 0.50(1 − 0.5)4−0 + 0.51(1 − 0.5)4−1 + 0.52(1 − 0.5)4−2
3 4 4/16
P(X ≤ 2) = 0.6875
4 1 1/16
Problem: It has been claimed that in 70% of all solar-heat
installations, the utility bill is reduced by at least one-third. Find
the probabilities that the utility bill will be reduced by at least
one-third in (a) four of five installations; (b) at least four of five
installations, and (c) utmost two of five installations?
Shape of binomial probability curve

It is said to be a
positively skewed
distribution if the
tail is on the right,
and it is said to be
negatively skewed
if the tail is on the
left.

n = 5 and p = 0.3 n = 5 and p = 0.5 n = 5 and p = 0.7


Symmetric binomial distribution with
p=0.5
+vely skewed binomial distribution
with p>0.5
+vely skewed binomial distribution
with p<0.5
Example: In binomial distribution, X is a binomial variate with n= 100, p= 2/3, and P(X=k) is
maximum. Find the value of k.

Solution: When P(X = k) is maximum, then k is the mode.


There are two cases involved in the mode value of the binomial distribution
Case-1: If (n + 1)p is an integer, then there exists two mode values (i.e.) (n + 1)p and
(n + 1)p − 1.

Case-2: If (n + 1)p is not an integer, then there exist unique mode value (i.e.) which is an
integral part of (n + 1)p

Given that, n= 100, p= 2/3


(n + 1)p = (100+1)(2/3) = 202/3 = 67.33

We know that 67.33 is not an integer, and hence the unique mode is 67, which is an integral
part (n + 1)p.
1. The binomial variate X lies within the range {0, 1, 2, 3, 4, 5, 6}, provided that
P(X = 2) = 9P(X = 4).
a) Find the parameter p of the binomial variate X.
b) Compute P(X > 4) and P(3 ≤ X ≤ 5)
2.A student obtained the following answer to a problem given to him/her. Mean =
3.4 and variance = 4.2 for a binomial distribution. Are the results correct?
3.In a binomial distribution with 7 independent trials, the probability of 3 and 4
successes is found to be 0·2903 and 0·1935 respectively. Find the probability
of success p.
4.An insurance agent sells life insurance policies to four men, all of identical age
and good health. According to the actuarial tables, the probability that a man of
this particular age will be alive 25 years hence is 4/5. Find the probability that in
25 years (a) all four men will be alive, (b) at least one man will be alive, (c) at
least 3 men will be alive, (d) at most three men will be alive.
4.Past data reveals that 65% of IT innovation startup companies reported profit in their first year of
operation. If a sample of 20 new IT startups is selected, what is the probability of (a) exactly 12
startups generate profit, (b) at most 10 startups report profit, (c) minimum of 14 startups report
loss, in the first year.

5.Suppose there were 45 customers who ordered consumer goods from an e-commerce portal,
find the probability that 20 customers’ orders were delivered correctly by the restaurant? Order
history says that about 85% orders were delivered correctly by the restaurant.

6.A random sample of 1,000 patients were examined to study severe effects due to medication.
The probability/proportion of having a severe effect is 1/5. Find the probability of the medication
having severe effects (i) in 150 patients, (ii) in at least 180 patients, and (iii) at most 220
patients

7.An appliance repairman services five laptops on site each day. One-third of the service calls
require installation of a particular part. The repairman has only one such part on his truck today.
(a) Find the probability that the one part will be enough today, that is, that at most one laptop he
services will require installation of this particular part. (b) Find the minimum number of such
parts he should take with him each day in order that the probability that he have enough for the
day's service calls is at least 95%.
8. It is known that approximately 10% of a population is hospitalized at least once during a
year. If 10 people in a population are interviewed, what is the probability that you will
find:
a) All have been hospitalized at least once during the year?
b) 50% have been hospitalized?
c) At least 3 have been hospitalized?
d) Exactly 3 have been hospitalized?

9. A hospital administrator has noticed that on 42 of the past 60 days the first surgery
scheduled in the operating rooms at the hospital has started at least 30 minutes late.
The administrator is not pleased and has issued a memo encouraging the hospital staff
to make every effort to begin surgeries on time. The administrator decides to monitor the
operating room performance for the next 10 days. Assume that the memo had no effect,
calculate (i) the probability that the first surgery will start at least 30 minutes late on 4 of
the next 10 days. (ii) the probability that on three or fewer of the 10 days the first
scheduled surgery will begin at least 30 minutes late.
10.Suppose we have 10 balls in a bowl, 3 of the balls are red and the rest are blue.
The experiment of drawing a ball with replacement is repeated 20 times, and getting
a red ball is defined as a success. What is the probability that the number of red
balls drawn would be: (i) Five (ii) eight (iii) less than six (iv) Greater than 17. Also
compute mean and standard deviation.

11.There are 10 multiple choice questions each with 5 choices, only one of which is
correct. The student is attempting to guess the answers. What is the probability that
the student will get: (i) exactly 6 questions correct, (ii) at least 8 questions answered
correctly, and (iii) at most 5 questions answered correct.

12.Suppose that 20% of all devices manufactured by a company fail in the life test. If
15 randomly selected devices fail the test, then find the probability that: (i) at most 8
devices fail the test, (ii) exactly 7 fail, (iii) at least 11 fail, and (iv) probability that
between 4 and 7, inclusive, fail.
If a public radio station solicits contributions during a late-night program, how many listeners
should it expect to call in if n = 1,000 are listening and p= 0.15? How large is the SD of the
number who call in?
Fitting of Binomial distribution

• To fit any theoretical distribution, we should know its parameters and probability distribution.

• Parameters of Binomial distribution are n and p


• Once n and p are known, binomial probabilities for different random events and the
corresponding expected frequencies can be computed.

• From the given data we can get n by inspection.


mean
• Mean of the binomial distribution is equal to np hence we can estimate p as p = .
n
• Using these n and p, one can fit the binomial distribution.
• That is, fitting a binomial distribution for a given data means finding expected frequencies
using parameters of the binomial distribution.
Example: Fit a binomial distribution for the following data:
∑ fx 225
Mean = = = 2.25
∑f 100 x Frequenc fx P(X=x) Expected
y (f ) frequency=
N*P(X=x)
For binomial distribution, mean = np

(0)
5
0 12 0 (0.45)0(0.55)5 = 0.050 5
Here, n = 5

(1)
5
1 20 20 (0.45)1(0.55)5−1 = 0.206 20.6
np = 2.25
(2)
5p = 2.25 ⇒ p = 0.45; 5
2 25 50 (0.45)2(0.55)5−2 = 0.337 33.7
q = 1 − p = 0.55
(3)
5
3 22 66 (0.45)3(0.55)5−3 = 0.276 27.6
Use p and q to derive probabilities

(4)
theoretically. 4 16 64 5 11.3
(0.45)4(0.55)5−4 = 0.113

(5)
5 5 25 5 1.9
(0.45)5(0.55)5−5 = 0.019

Total 100 225 1.00 100


Problem: Fit a binomial distribution for the following data:
Binomial distribution in R
There are FOUR built-in functions for binomial distribution in R.
➢ dbinom(x, size, prob) - probability density

➢ pbinom(x, size, prob) - cumulative density

➢ qbinom(p, size, prob) - quintile or inverse cumulative density

➢ rbinom(n, size, prob) - random number

Description of parameters
• x is a vector of numbers.
• p is a vector of probabilities.
• n is number of observations.
• size is the number of trials.
• prob is the probability of success of each trial.
dbinom() function
• dbinom(x, size, prob) function gives the probability density at
each point.

• That is, dbinom finds the probability of getting a certain


number of successes (x) in a certain number of
trials (size) where the probability of success on each trial is
fixed (prob).

Example: Ram makes 65% of his free-throw attempts. If he shoots 16 free throws,
what is the probability that he makes exactly 12?

#find the probability of 12 successes during 16 trials where the probability of


#success on each trial is 0.65
dbinom(12,16,0.65)
[1] 0.1553473
pbinom() function
• The function pbinom(x, size, prob) returns the value of the cumulative
density function (cdf) of the binomial distribution given a certain random
variable x, number of trials (size) and probability of success on each trial
(prob).

• That is, pbinom returns the area to the left of a given value x in the binomial
distribution.
Example: Abhinav flips a fair coin 9 times. What is the
probability that the coin lands on heads a maximum of 6 times?
#find the probability of more than 6 successes during 9
trials where the #probability of success on each trial is 0.5

pbinom(6, size=9, prob=.5)


[1] 0.9101562
pbinom() function
• If we need to find the area to the right of a given value x, we
can add the argument lower.tail = FALSE as follows:
pbinom(x, size, prob, lower.tail = FALSE)

Example: Mary flips a fair coin 9 times. What is the probability that the
coin lands on heads more than 5 times?

#find the probability of more than 5 successes during 9 trials where the
#probability of success on each trial is 0.5

pbinom(5, size=9, prob=.5, lower.tail=FALSE)


[1] 0.2539063

What is the probability that the coin lands on heads a maximum of 5


times?
pbinom(5, size=9, prob=.5)
[1] 0.7460937
Poisson Distribution
• Binomial distribution starts with an experiment consisting of trials
• Poisson distribution is based on the number of outcomes occurring during a given time
interval or in a specified region.
• The Poisson distribution arises when you count a number of events across time or over an
area.

Examples:
▪ # of accidents that occur on a highway during a 1-week period
▪ # of customers visiting a bank during a 1-hour interval
▪ # of laptops sold at an electronics store during a given week
▪ # of breakdowns of a washing machine per year
▪ # of Emergency Department visits by an infant during the first year of life,
▪ # of road accidents at a particular junction on a highway.
▪ # of white blood cells found in a cubic centimeter of blood.
Conditions
Consider the # of breakdowns of a washing machine per month
• Each breakdown is called an occurrence
• Occurrences are random in that they do not follow any pattern
(unpredictable)
• Occurrence is always considered with respect to an interval (time or
distance)
The Poisson distribution is based on four assumptions. We will use the term
"interval" to refer to either a time interval or an area, depending on the context of
the problem.
1. The probability of observing a single event over a small interval is approximately
proportional to the size of that interval.

2. The probability of two events occurring in the same narrow interval is negligible.

3. The probability of an event within a certain interval does not change over
different intervals.

4. The probability of an event in one interval is independent of the probability of an


event in any other non-overlapping interval.
Probability function of Poisson distribution

The probability mass function of the Poisson distribution is given by,


e −λλ x
P(X = x; λ) = , x = 0,1,2,....
x!

where e is a constant approximately equal to 2.7183.

λ>0 is called the parameter of the Poisson distribution.

Mean and variance of the Poisson distribution are equal.


Mean = Variance = λ
Problem: If a bank gets on average 6 bad cheques per day, what is the
probability that it will receive: (i) four bad checks on any given day? (ii)
10 bad checks on any two consecutive days?

Solution: −6 4
6
(i) x = 4 and λ = 6, then p(4) = = .
4!

In R program: dpois(x, λ)= dpois(4,6)=0.13385


In Excel: pois.dist(x, λ,0)= pois.dist(4,6,0)= 0.13385


(ii) x = 10, λ = 12, then p(10) = = .
!
𝟏
𝟎
𝟎
𝟏
𝟎
𝟓
𝟎
𝟎
𝟏
𝟑
𝟓
𝑒
𝟏
𝟐
𝑒
𝟏
𝟐
𝟏
𝟎
Problem: The number of failures of a testing instrument from contamination particle on the
product is a Poisson random variable with a mean of 0.04 failures per hour.
a) What is the probability that the instrument does not fail in an 8-hour shift?
b) What is the probability of at least one failure in one 24-hour day?

Solution:
Here λ=0.04 per hour
a) Let X = number of failures in 8 hours. Then, X has a Poisson distribution with λ=8*0.04=0.32
e −λ λ x
P(X = x) = ⇒
x!
e −0.320.320
Probability that no instrument failed = P(X = 0) = = 0.726
0!

b) Let Y = number of failures in 24 hours. Then, Y has a Poisson distribution with λ=0.04*24=0.96
Probability of at least one instrument failed =
e −0.960.960
P(Y ≥ 1) = 1 − P(Y = 0) = 1 − = 0.6171
0!
Problems
1. A car rental agency has seven cars which it hires out day by day. The number of
demands for a car on each day is distributed as a Poisson variate with mean 5.
Calculate the proportion of days on which (i) None the cars is used (ii) Some
demand is refused.
2. In a certain factory making optical lenses, there is a small chance 1/250 for any
one lens to be defective. The lenses are supplied, in packets of 10. Use Poisson
distribution to calculate the approximate number of packets containing: (a) no
defective, (b) one defective, (c) two defective, and (d) four defective lenses
respectively in a consignment of 20,000 packets.
3. A manufacturer of electronic chips knows that 3% of his product is defective. If he
sells chips in boxes of 100, and guarantees that not more than 5 chips will be
defective, what is the probability that a box will fail to meet the guaranteed quality ?
Problem: It was found that the average breakdown rate of a particular part of the
diagnostic system is 2 per day. Assuming Poisson process, the hospital management
wants to determine the need for additional maintenance resource to cope with the
breakdowns in the machine shop.

If the hospital works 7 days per week, what is the probability of getting: a) exactly 10,
and b) more than 15 breakdowns in a week?

Solution:
Average number of breakdowns/week = λ= 2x7=14
e −141410
(i) Probability of exactly 10 breakdowns in a week =P(X = 10) = = 0.0663
10!
(ii) Probability of more than 15 breakdowns in a week
e −141415
P(X > 15) = 1 − P(X ≤ 15) = = 1 − = 0.3306
15!
Problem: The average number of errors due to a particular bug occurring in a
minute is 0.0004. (i) What is the probability that no error will occur in 25
minutes? (ii) How long would the program need to run to ensure that there
will be a 97% chance that an error will show up to highlight this bug?

Solution: λ = .0004 per minute; λ = 25 * 0.0004 = .01 per 25 minute interval.


(i) P(no error will occur in 25 min) = P(X=0) = P(0) = e −.01 = 0.9901 or 99.01%
chance of getting ZERO errors in 25 minutes.
That is, 0.99% chance that an error will show up in the first 25 minutes.
(ii) For 97% sure of catching bug is given by P(X ≥ 1) ≥ .97, or equivalently P(X = 0) = .03
P(No occurrence in k minutes) = e −(.0004)k
To be 97% sure, we have to choose k such that 1 − e −(.0004)k ≥ .97
⇒ k > 8766 mins or 6 days.

That is, the program needs to run for more than 6 days to ensure that there will be a
97% chance that an error will show up to highlight the bug.
Problems on Poisson distribution:
1. Births in a hospital occur randomly at an average rate of 1.8 births per hour. What is the
probability of observing (i) 4 births in a given hour (ii) observing 5 births in a given hour? (iii)
observing more than or equal to 2 births in a given hour at the hospital?

2. An insurance company receives on average 3 claims every day. Find the probability that it will
receive: a) 4 claims on any given day, b) at least 2 claims a day, c) it receives 5 claims in two
days, and d) a maximum of 8 claims in two days

3. In the average year, there are 6.5 accidents occurring at a particular junction on the national
highway. Find the probability that: (a) at most 6 accidents occur in a year, (b) at least 10
accidents occur in a year, and (c) more than 8 accidents will occur at this hospital during the
year 2022.

4. The number of bone fracture patients entering to a hospital emergency room in one day has a
Poisson distribution with mean 6.8 patients. The hospital manager has decided to allocate
emergency room resources that are sufficient to comfortably cope with up to ten bone fracture
patients per day. What is the probability that on a given day these resources will be inadequate?
5. Phone calls arrive at the rate of 48 per hour at the reservation desk for a 5-star
hotel. Compute the probability of receiving:

a. three calls in a 5-minute interval of time.


b. exactly 10 calls in 15 minutes.
c. At least 12 calls in 20 minutes.
d. A maximum of 7 calls in 10 minutes.
e. 10 to 18 calls in 45 minutes.

6. During the period of time that a local university takes phone-in registrations, calls
come in at the rate of one every two minutes.
a. What is the expected number of calls in one hour?
b. What is the probability of three calls in five minutes?
c. What is the probability of no calls in a five-minute period?
7. If Y is a Poisson variable such that
P(Y = 2) = 9P(Y = 4) + 90 P(Y = 6), find the mean and variance of Y.
8. A video recording company has discovered that the number of defects on records appears to follow a
Poisson distribution with a mean equal to 0·7.
(i) What is the probability that a record selected at random will have defects between 3 and 5?
(ii) If the company sets a policy that all video records sold to customers must not have any defects,
what per cent of its records production will not be made available for sales because of defects?
9. A computing system manager states that the rate of interruptions to the internet service is 0.4 per
week. Use the Poisson distribution to find the probability of
(a) two interruptions in 3 weeks
(b) at least two interruptions in 4 weeks
(c) at most one interruption in 10 weeks.

10.A consulting engineer receives, on average, 0.8 requests per week. If the number of requests
follows a Poisson process, find the probability that
(a) in a given week, there will be at least 2 requests;
(b) in a given 4-week period, there will be at most 3 requests.
Sample Poisson Distribution
with λ = 2
P(X)

X
Poisson approximation to Binomial
Distribution

When n is large and p is small, binomial probabilities are


often approximated by
e −λ λ x
P(X = x) = for x=0,1,2,…… with λ = np
x!
Poisson Approximation for Binomial Distribution

• If the number of Bernoulli trials is more than 20, i.e. > 20, and the probability
of success is less than 0.05, the probability of successes may be well
approximated by a Poisson random variable instead of the binomial distribution.

• When ≥ 100 and < 10, the approximation will generally be excellent.
𝑛𝑛
𝑥
𝑛
𝑝
Problems on Poisson approximation to the binomial
distribution
1. If 2 percent of the credit card users are defaulters, use the Poisson approximation
to the binomial distribution to determine the probability that 3 of 400 users of credit
card are defaulters.

2. Records show that the probability is 0.00007 that a car will have a flat tire while
crossing a certain bridge. Use the Poisson distribution to approximate the binomial
probabilities that, among 20,000 cars crossing this bridge, (a) exactly two will have
a flat tire; (b) at most two will have a flat tire.
Fitting of a Poisson Distribution

To fit a poisson distribution to a given observed data (frequency distribution), the


procedure is given below:

∑ fi xi
1) For the given data, compute mean using formula, Mean = and equate it
∑ fi
to mean of poisson distribution, λ.
2) Compute probabilities of various values of the random variable X by using p.m.f.
e −λλ x
P(X = x) =
x!
3) Each probability obtained in step (2) is multiplied by N ( = Σfi = the total
frequency) to get expected frequencies.
Fit a Poisson distribution for the following data.
x Frequency (f)

0 56

1 156

2 132 ∑ fi xi 986
Mean = = = 1.972
3 92 ∑ fi 500

4 37 λ = 1.972
5 22

Use the λ value and estimate probabilities P(X = x)


6 4

7 0

8 1
Problem: The number of defects per unit in a sample of 300 units of manufactured product was
found as follows:

Number of Number of
defects units
0 40
1 60
2 100
3 50
4 30
5 20

Fit a poisson distribution for the data.


How to check whether a given set of data follows Poisson distribution?

There are 2 empirical ways of checking for a Poisson distribution.

1. The simplest and handiest way is to see if the variance is roughly equal to
the mean for your Poisson data.

2. A histogram of the Poisson data should be skewed right, though the


skewness becomes less pronounced as the mean increases.
Which discrete distribution to be used
based on mean and variance?
If you are trying to decide which of the discrete distributions to use to describe an
uncertain quantity and you know only the mean and variance, then you can chose
between the following three distributions based on whether the variance is less than,
equal to, or greater than the mean.
• For binomial distribution, mean is greater than variance.
• For Poisson distribution, mean is equal to variance.
Normal Distribution
• The normal distribution is the most used statistical distribution.
• Most of the data relating to economic and business statistics or even
in social, physical sciences, and engineering disciplines conform to
this distribution.
• Normal distribution is also known as Gaussian distribution (Gaussian
Law of Errors) after Karl Friedrich Gauss (1777-1855), who used this
distribution to describe the theory of accidental errors of
measurements involved in the calculation of orbits of heavenly
bodies.
• Normality is important in statistical inference.
Normal Distribution
A continuous random variable X follows a Normal distribution, if its
probability density function (pdf) is
1 e-(x-µ) 2/ 2σ 2
____
f(x)= __ , -∞ < x < +∞
√2πσ

Where π is a constant equal to 3.14159.


µ is the mean and σ is the standard deviation of normal distribution.

The normal distribution could be specified in terms of two parameters: µ


and σ.
µ

The exact shape of the Normal distribution depends on the mean and the
standard deviation of the distribution.

The standard deviation is a measure of spread and indicates the amount


of departure of the values from the mean.
Measures of Shape
• Symmetrical – the right half is a mirror image of the left half.

• Skewed – shows that the distribution lacks symmetry; used to denote the
data is thin at one end, and accumulated at the other end
• Absence of symmetry
• Extreme values or ‘tail’ in one side of a distribution
• Positively- or right-skewed vs. negatively- or left-skewed
0.15

0.15
0.10

0.10
y

y
0.05

0.05
0.00

0.00
0 5 10 15 20 0 5 10 15 20

x x
Skewness
Generally, If Mean > Mode, the skewness is positive.

If Mean < Mode, the skewness is negative.

If Mean = Mode, the skewness is zero.


Coefficient of Skewness
Coefficient of Skewness (Sk) - compares the mean and median in light of the magnitude
to the standard deviation.

3( − )
=
where, is the mean
Md is the median;
Sk is coefficient of skewness;
σ is the Standard deviation

∑ =1 ( − ̄ )
3
Alternatively, = 3
𝜎
𝜎
𝑆
𝑘
𝑆
𝑘
𝑖
𝑖
𝜇
𝑀
𝑑
𝑥
𝑥
𝜇
𝑇
Coefficient of Skewness

• If Sk < 0, the distribution is negatively skewed (skewed to the left).

• If Sk = 0, the distribution is symmetric (not skewed).

• If Sk is close to 0, it’s almost symmetric.

• If Sk > 0, the distribution is positively skewed (skewed to the right).


Kurtosis
• Peakedness of a distribution
• Leptokurtic: high and thin
• Mesokurtic: normal in shape
• Platykurtic: flat and spread out

Leptokurtic

Mesokurtic
Platykurtic
Measuring Kurtosis

T
∑i=1 (xi − x)4 /n
Coefficient of Kurtosis, K =
S4

Where x is sample mean


S = sample standard deviation
n = number of observations in the sample

If K = 3, the data is mesokurtic


If K > 3, the data is leptokurtic
If K < 3, the data is platykurtic.
Normal Distribution with Different
Values for µ

40 µ = 50 60

Smaller µ, same σ

µ = 40 50 60
Larger µ, same σ

40 50 µ = 60
Normal Distribution with Different
Values for σ

Same µ, smaller σ

Same µ, larger σ

µ
Although the distribution remains symmetric, the distribution becomes
flatter if we increase the standard deviation.
Three Common Areas Under Normal Curves

16% 68% 16% 2.3% 95.4% 2.3%

-1σ +1σ -2σ + 2σ


a µ b a µ b

.15% .15%
99.7%

-3σ + 3σ
a µ b
Normal Distribution – Calculating Probabilities
• Rather than create a different table for every normal distribution
(with different mean and std devs), we can calculate a standardized
normal distribution, called Z

x
z= σ− µ
• A z-score gives the number of standard deviations that a value x is
above the mean.

• Z distribution is a normal distribution with mean of 0 and standard


deviation of 1.
Standard Normal Distribution (SND)
The SND is a normal distribution with a mean of 0 and a variance of 1.
The probability density function of SND is given by,

2
f(z) = 1
_____ e-z / 2 -∞ < z < +∞
√2π
where z =(X-µ)/σ, is a standard normal variable

X = original normal variable

µ = mean
σ = standard deviation of the original normal distribution
Standard Normal Distribution

• Standard normal (Z distribution) probability values can be found in all


statistics text books or in the internet.

• Alternatively, these probabilities can also be calculated using Ms-Excel or R-


program

• Standard normal probability tables give the total area under the Z curve
between 0 and any point on the positive Z-axis.

• Since the curve is symmetric, the area under the curve between Z and 0 is
the same whether the Z curve is positive or negative.
Uses of z-table

The z-table is used to find the required normal probabilities.

The standard normal (z) probability distribution was also used to


find values of a random variable associated with given
probabilities.
Z Table
The meaning of the z-value
A z-value measures how far (in standard deviation terms) an x-value lies from its
mean, µ.

For example, for x = 14 with µ = 20 and σ = 4, the corresponding z = −1.5 implies


that the value of x = 14 lies 1.5 standard deviations below (due to negative sign)
the mean of 20.

Similarly, for x = 20 with µ = 20 and σ = 4, the corresponding z = 0 implies that


the value of x = 20 lies at its mean of 20 (i.e. zero deviation from its mean).

And for x=24 with µ = 20 and σ = 4, the corresponding z = 1 implies that the
value of x = 24 lies 1 standard deviation above its mean of 20.
Table Lookup of a Standard Normal Probability

P( 0 ≤ Z ≤ 1) = 0. 3413

Z 0.00 0.01 0.02

0.00 0.0000 0.0040 0.0080


0.10 0.0398 0.0438 0.0478
0.20 0.0793 0.0832 0.0871

1.00 0.3413 0.3438 0.3461

1.10 0.3643 0.3665 0.3686


1.20 0.3849 0.3869 0.3888
-3 -2 -1 0 1 2 3
Applying the Z Formula

X is normally distributed with µ = 485, and σ = 105


P( 485 ≤ X ≤ 600 ) = P( 0 ≤ Z ≤ 1.10 ) =. 3643
For X = 485, Z 0.00 0.01 0.02

X - µ 485 − 485 0.00 0.0000 0.0040 0.0080


Z= = =0 0.10 0.0398 0.0438 0.0478
σ 105 1.00 0.3413 0.3438 0.3461
For X = 600, 1.10 0.3643 0.3665 0.3686
X - µ 600 − 485
Z= = = 1.10 1.20 0.3849 0.3869 0.3888
σ 105
Applying the Z Formula
X is normally distributed with =494, and =100

( ≤ 550) = ( ≤ 0.56) = . 7123

For X = 550
X− 550 − 494
Z= = = 0.56
100

0.5 + 0.2123 = 0.7123


𝜎
𝑃
𝑋
𝑃
𝑍
𝜇
𝜎
𝜇
Applying the Z Formula

X is normally distributed with µ = 494, and σ = 100


P( X > 700) = P( Z > 2.06) = .0197
For X = 700
X - µ 700 − 494
Z= = = 2.06
σ 100

0.5 – 0.4803 = 0.0197

106
Applying the Z Formula

X is normally distributed with µ = 494, and σ = 100


P (300 ≤ X ≤ 600) = P (−1.94 ≤ Z ≤ 1.06) = .8292

For X = 300
X - µ 300 − 494
Z= = = −1.94
σ 100

For X = 600
X - µ 600 − 494
Z= = = 1.06
σ 100 0.4738+ 0.3554 = 0.8292
Probability as
Area Under the Curve
The total area under the curve is 1.0, and the curve is
symmetric, so half is above the mean, half is below

f(X) P( −∞ < X < µ) = 0.5


P(µ < X < ∞ ) = 0.5

0.5 0.5

µ X
P( −∞ < X < ∞ ) = 1.0
P(Z < a)
P(Z > -a)

P(Z > -a) = P(Z < a)


P(-a < z < -b) P(b < z < a)

P(-a < z < -b) = P(b < z < a)


k
Percentage of the Areas Under the Standard Normal Curve

A graph of the standardized normal curve (mean μ=0 and standard deviation σ = 1) is
shown below:

P(−1 < z < 1) = 0.6827 =

P(−2 < z < 2) = 0.9545 =

P(−3 < z < 3) = 0.9973 =


Finding Normal Probabilities

Probability is measured by the area under the curve

f(X)
P (a ≤ X ≤ b )

= P ( a < X < b ) = P(X<b) - P(X<a)


(Note that the probability of any
individual value is zero)

a b X
Problem:
Find the values of z that correspond to the probabilities of (i)
0.3517; (ii) 0.2536 using the standard normal distribution table.

Solution:
(i) Since 0.3517 falls between 0.3508 and 0.3531,
corresponding to z = 1.04 and z = 1.05, and 0.3517 is closer to
0.3508 than 0.3531, we choose z = 1.04.

(ii) Since 0.2533 falls midway between 0.2517 and 0.2549,


corresponding to z = 0.68 and z = 0.69, we choose z = 0.685.
1. Find the missing values for the following probabilities for the standard
normal distribution using the z-tables: (i) P(z < ?) = 0.9147 (ii) P(z > ?) =
0.5319 (iii) P(0 < z < ?) = 0.4015 (iv) P(? < z < 0) = 0.4803 (v) P(? < z) = 0.0985
(vi) P(z < ?) = 0.2517 (vii) P(? < z) = 0.6331

2. Given that x (= flight times between cities A and B) follows a normal


distribution with a mean (µ) of 64 minutes and a standard deviation (σ) of
2.5 minutes, use the standard normal z-table to find: (i) P(x < 62)
(ii) P(x > 67.4) (iii) P(59.6 < x < 62.8) (iv) P(x > ?) = 0.1026
(v) P(x > ?) = 0.9772 (vi) P(60.2 < x < ?) = 0.6652
Problem:
Suppose that the purchase value of transactions (say X) of a leading textiles store follows
normal distribution with a mean of INR 2440 and a standard deviation of INR 64. (a) What is
the minimum purchase value of transactions for the highest-spending 15% of textile store
customers? (b) What purchase value of transactions separates the lowest-spending 30% of
store customers from the remaining customers?

Solution:
(a) We need to find the specific transaction value, x, such that above this value the top 15%
of high-spending customers are found.

Step-1: Draw the normal curve, showing the transaction value, x, that identifies the area
corresponding to the 15% of highest spenders.
Step-2: Using z-table, find z-value that corresponds to an area of 0.15 in the
right/top tail of the standard normal distribution. To use the z-table, the
appropriate area to read off is 0.5 – 0.15 = 0.35 (i.e. the middle area). The
closest z-value is 1.04.

Step-3: Find the x-value associated with the identified z-value in Step 2.

Substitute z = 1.04, µ = 2440 and σ = 64 into the z transformation = ,
and solve for x.

− 2440
1.04 = ⇒ = 2506.56
64
Thus the highest-spending 15% of store customers spend at least INR 2506.56
𝜎
per transaction.
𝑥
𝑧
𝑥
𝑥
𝜇
(b) This requires that a specific transaction value, x, be identified such that the area below
this value is 30%, which represents the lowest spending 30% of store customers.

Step-1: Sketch the normal curve, showing the transaction value, x, that identifies the area
corresponding to the 30% of lowest spenders.

Step-2: From the z-table, find the z-value that corresponds to an area of 0.30 in the bottom
tail of the standard normal distribution. To use the z-table, the area that must be found is 0.5
– 0.30 = 0.20 (i.e. the middle area). The closest z-value read off the z-table is 0.52.
However, since the required z-value is below its mean, the z-value will be negative. Hence
z = −0.52.
Step-3: Find the x-value associated with the identified z-value in Step 2. Substitute

z = −0.52, µ = 2440 and σ = 64 into the z transformation and solve for x.

− 2440
−0.52 = ⇒ = 2406.72
64

Thus the lowest-spending 30% of store customers spend a maximum of INR

2406.72 per transaction.


𝑥
𝑥
Problem: The monthly sales of laptops at a big retailer is assumed to be approximately normally
distributed with mean 2700 units and standard 300 units. Determine the lower and upper limits
for laptop sales during a month with 95% probability.

Solution:
Let X be the monthly sale of laptops. X follows normal distribution with mean 2700 and SD 300.
We have to find the values x1 and x2 such that P(x1 < X < x2) = 0.95

X1 − μ X − μ X1 − μ
or P( < < ) = 0.95 ⇒ P(z1 < Z < z2 ) =0.95
σ σ σ

We know that, P(-1.96 < Z < 1.96) = 0.95

X1 − μ X1 − 2700 X2 − μ X2 − 2700
So z1 = -1.96 ⇒ = -1.96 ⇒ z2 = 1.96 ⇒ = 1.96 ⇒ = 1.96
σ 300 σ 300
⇒ X1=2112 ⇒ X2=3288
Hence, the retailer is 95% sure that sales in any given month will be between 2112 and 3288 units.
Class exercise:
1. The cricket scores for a university team were normally distributed with a mean of 52
runs and a standard deviation of 7. Find the probability that a randomly selected
player scored less than 48 runs.
2. The golf scores for a school team were normally distributed with a mean of 68 and a
standard deviation of three. Find the probability that a golfer scored between 66 and
70.
3. The amount of fuel consumed by an aircraft on a flight between Hyderabad and
Delhi is a normally distributed random variable X with mean µ = 4.7 tons and
standard derivation σ = 0.5 tons. Carrying too much fuel is inefficient as it slows the
plane. If too little fuel is loaded on the plane, an emergency landing may be
necessary. What should be the amount of fuel to load so that there is 0.99
probability that the plane will arrive at its destination without emergency landing?
4. The demand for gasoline at a service station is normally distributed with mean
12,809 litres per day and standard deviation 49 litres. Find two values that will give
a symmetric 0.80 probability interval and 98% probability interval for the amount of
gasoline demanded daily.
Standard Normal Distribution (z) - Finding Probabilities for z-values in MS-
Excel

The NORMSDIST function computes cumulative normal probabilities for z-values.


NORMSDIST(z), where: z = specified z-value
This function computes the cumulative probability from (−∞ to the z-value).
For example, P(0 < z < 1.46) = P(z < 1.46) − P(z < 0) (where P(z < 0) = 0.5).
Thus to find P(0 < z < 1.46) in Excel, we need to find NORMSDIST(1.46)–NORMSDIST(0) to give
0.4279.
For Example, P(−2.3 < z < 0) = P(z < 0) − P(z < −2.3).
Thus to find P(−2.3 < z < 0) in excel, we need to compute:
NORMSDIST(0)–NORMSDIST(–2.3) to give 0.48928.

For Example, P(z > 1.82) is the complement of the cumulative probability from −∞ up to z =
1.82. Thus to find P(z > 1.82), type =1–NORMSDIST(1.82) to give 0.0344.
Normal Distribution – Finding Probabilities for x-values in MS-Excel

The NORMDIST function computes cumulative normal probabilities for x-values.

=NORMDIST(x, mean, standard_dev, cumulative)

The values x, mean (µ), and standard_dev (σ) are to be specified to use this function.
Cumulative = TRUE specifies that cumulative normal probabilities are computed.

This function computes normal probabilities directly for an x random variable with a given
mean, µ, and standard deviation, σ, without first having to convert to a z-value.

For example, for X~N(45,8), P(40 < x < 51) can be found by subtracting the two cumulative
probabilities P(x < 51) and P(x < 45).

Thus to find P(45 < x < 51), we use NORMDIST(51, 45, 8, TRUE)–NORMDIST(40, 45, 8, TRUE) to
get 0.5074.

For example, P(x < 48) is found by evaluating NORMDIST(48, 45, 8, TRUE) to give 0.64617.
Standard Normal Distribution – Finding z-values using MS-Excel
The NORMSINV (i.e. standard normal inverse) function finds the z-value associated with the
cumulative probability up to z.

=NORMSINV(cumulative probability).

For example, to find k such that P(0 < z < k) = 0.3461, the cumulative area up to k must be used
in the Excel function.
The cumulative area is 0.5 + 0.3461 = 0.8461.
NORMSINV(0.8461) gives 1.01985. Thus k = 1.02.
To find k such that P(z > k) = 0.8051, the cumulative area of the complement up to k must be
used in the Excel function.
The cumulative area up to k is 1.0 – 0.8051 = 0.1949.
NORMSINV(0.1949) to give −0.85998. Thus k = −0.86.
Normal distribution – Finding x-limits in MS-Excel
The NORMINV (i.e. normal inverse) function finds the x-limit associated with a
normal probability distribution with a specified mean µ and standard deviation σ.

NORMINV(cumulative probability, mean, standard_dev) function identifies the x-


value directly (for a given mean, µ, and standard deviation, σ) without first having
to identify a z-value and then convert it back to an x-value.

For example, for a normally distributed random variable with mean=250,


standard deviation = 40, if we have to find k such that the cumulative area up to
k is 0.25, we use:
=NORMINV(0.25,250,40) to give 223.0204.

Thus k = 223.0204.
Practice problems:
1. For the numbers below find the area between the mean and the z
a) z = 1.17 b) z = -.85 c) z = 2.07 d) z = -1.37
2. Determine: (a) P(0 < z < 1.46) (b) P(−2.3 < z < 0) (c) P(z > 1.82)
(d) P(−2.1 < z < 1.32) (e) P(1.24 < z <2.075)
3. If scores are normally distributed with a mean of 30 and a standard deviation
of 5, what percent of the scores is: (a) greater than 30? (b) greater than 37?
(c) between 28 and 34?
4. The cost of treatment per patient for a certain medical problem was modeled
by one insurance company as a normal random variable with mean Rs. 8500
and standard deviation Rs.1750. What is the probability that the treatment
cost of a patient is less than Rs. 9,000, based on this model?
5. Suppose the length of stay in a hospital following coronary artery bypass surgery
is normally distributed with a mean of 14 days and a standard deviation of 3
days. What proportion of patients will have a length of stay:
(A) less than 10 days?
(B) greater than 10 days?
(C) is between 12 days and 18 days?

6. A city hospital conducted a cognitive abilities test for patients recently diagnosed
with Parkinson’s disease. The average and standard deviation of scores were 64
and 7 respectively. (a) If a randomly selected patient scores 58 on the test, what
is this patient's percentile rank? (b) If another patient scores 66 on the test, what
percent of individuals would receive a higher score?
7. The lifetime of a certain brand of washing machine is normally distributed
with mean and standard deviation equal to 3.4 and 1.6 years respectively.
(a) If this type of washing machine is guaranteed for one year, what
percentage of original sales will require replacement if they fail within the
guarantee period? (b) What percentage of these washing machines is likely
to be operating:(i) after 4 years? (ii) after 5.5 years? (c) If the manufacturer
of these washing machines wants to ensure that no more than 5% of these
machines will be replaced within a guarantee period, what new guarantee
period should they choose?
8. The service time of the first service of a BMW is normally distributed, with a
mean of 70 minutes and a variance of 81 minutes. (a) If a customer brings
her BMW in for its first service, what is the probability that the car will be
ready within one hour? (b) What is the probability that the job will take
more than an hour and a half? (c) What percentage of first services will be
completed between 50 and 60 minutes? (d) The BMW dealer has a policy to
give its customers a 15% discount on the cost of the first service if the
service is not completed within 80 minutes. From a sample of
120 customers who brought their BMWs in for its first service, how many
are likely to receive the 15% discount? (e) If the BMW dealer wants to
ensure that no more than 5% of all first services will take longer than
80 minutes, what should the mean service time be?
9. A coffee-dispensing machine used in cafeterias is set to dispense coffee with
an average fill of 230 ml and a standard deviation of 10 ml per cup. Assume
that the volume dispensed is normally distributed. (a) For a randomly
selected cup dispensed by the machine, what is the probability that: (i) the
cup is filled to more than 235 ml? (ii) the cup is filled to between 235 ml and
245 ml? (iii) the cup is less than 220 ml full? (b) If the company supplying the
coffee machines wants only 15% of cups to exceed a given fill level, what
level of fill (in ml) does this correspond to? (c) What must the mean fill level
(in ml) be set to in order to ensure that no more than 10% of cups are filled
to less than 220 ml?
10. Assume that the mean life of a particular brand of car battery is normally
distributed with a mean of 28 months and a standard deviation of 4 months.
(a) For a randomly selected battery, what is the probability that it will last
between 30 and 34 months?
(b) What is the probability that a randomly selected battery will fail within
two years of the date of purchase?
(c) By what time period will 60% of all batteries of this make fail?
(d) If a guarantee period is to be set, how many months would it have to be
to replace no more than 5% of batteries of this make?
Fitting of Normal Distribution
• To fit normal distribution to the given data, we first calculate the mean μ , and standard
deviation σ, from the given data.

• Then the normal curve fitted to the given data is given by


• To calculate the expected normal frequencies, we first find the standard normal variates
corresponding to the 'lower limits' of each of the class intervals. i.e., we compute
zi = (xi − μ)/σ, where xi = the lower limit of the i th class interval
• Then the areas, under the normal curve to the left of the ordinate z = zi say ϕ(zi) are
computed from the standard normal tables.
• Finally, the areas of the successive class intervals are obtained by subtraction, viz.,
Δϕ(zi) = ϕ(zi+1) − ϕ(zi), (i = 1,2,....) and on multiplying these areas by N=total
frequency, we get the expected normal frequencies.
Problem: Fit a normal distribution for the following data

Class Frequency
10-20 9
20-30 16
30-40 25
40-50 12
50-60 9
60-70 4
Fitting a normal distribution
Class Frequency Mid value of fi mi fi mi2
( fi) class (mi)
10-20 9 15 135 2025
20-30 16 25 400 10000
30-40 25 35 875 30625
40-50 12 45 540 24300 ∑ fimi
50-60 9 55 495 27225
Mean =
∑ fi
60-70 4 65 260 16900
Total 75 2705 111075 ∑ fimi2
Variance = − Mean 2
Mean = 36.1 ∑ fi
Variance = 180.2
SD = 13.4
Fitting a normal distribution
Problem: Fit a normal distribution for the following data pertaining to end-
semester marks in Probability and statistics (P&S) subject obtained by B.Tech.
students.

Marks in P&S Number of


students
0-5 2
5-10 7
10-15 14
15-20 28
20-25 12
25-30 7
70

You might also like