0% found this document useful (0 votes)
244 views33 pages

3-Statistical Learning - Distributions

Here are the steps to solve this problem: a) The average stay is 28 months with standard deviation of 4.8 months. To get bonus, stay must exceed 36 months. Z score = (36 - 28)/4.8 = 2. We look up the normal distribution table to find the probability of a value exceeding Z = 2, which is 0.0228. With 50 new nurses, expected number getting bonus is 50 * 0.0228 = 1.14 ~ 1 Bonus amount is 1,00,000 per nurse. Total bonus is 1,00,000 b) If bonus is given for stay exceeding 24 months, average stay is still 28 months. New Z score

Uploaded by

Rajeev Soni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
244 views33 pages

3-Statistical Learning - Distributions

Here are the steps to solve this problem: a) The average stay is 28 months with standard deviation of 4.8 months. To get bonus, stay must exceed 36 months. Z score = (36 - 28)/4.8 = 2. We look up the normal distribution table to find the probability of a value exceeding Z = 2, which is 0.0228. With 50 new nurses, expected number getting bonus is 50 * 0.0228 = 1.14 ~ 1 Bonus amount is 1,00,000 per nurse. Total bonus is 1,00,000 b) If bonus is given for stay exceeding 24 months, average stay is still 28 months. New Z score

Uploaded by

Rajeev Soni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Statistical Learning

PGPAIML
Outline

1. Introduction to Distributions
2. Binomial Distribution
3. Poisson Distribution
4. Normal distribution
5. Summary of applications of different distributions
What is a Probability Distribution
• In precise terms, a probability distribution is a total listing of the various
values the random variable can take along with the corresponding probability
of each value. A real life example could be the pattern of distribution of the
machine breakdowns in a manufacturing unit.
• The random variable in this example would be the various values the machine
breakdowns could assume.
• The probability corresponding to each value of the breakdown is the relative
frequency of occurrence of the breakdown.
• The probability distribution for this example is constructed by the actual
breakdown pattern observed over a period of time. Statisticians use the term
“observed distribution” of breakdowns.
Probability Distributions - Example

15-Feb-20 4
Example Problem

• Fashion Trends Online(FTO) is an e-commerce company that sells women


apparel. It is observed that about 10% of their customers return the items
purchased by them for many reasons(such as size, color and material
mismatch). On a particular day, 20 customers purchased items from FTO.
Calculate:
1. Probability that exactly 5 customer will return the items?
2. Probability that a maximum of 5 customers will return the items?
3. Probability that more than 5 customers will return the items purchased by
them?
Example Problem

• A bank issues credit cards to customers under the scheme of Master Card.
Based on the past data, the bank has found out that 60% of all accounts pay on
time following the bill. If a sample of 7 accounts is selected at random from the
current database, construct the Probability Distribution of accounts paying on
time.
Binomial Distribution
• The Binomial Distribution is a widely used probability distribution of a discrete
random variable.
• It plays a major role in quality control and quality assurance function.
Manufacturing units do use the binomial distribution for defective analysis.
• Reducing the number of defectives using the proportion defective control chart
(p chart) is an accepted practice in manufacturing organizations.
• Binomial distribution is also being used in service organizations like banks, and
insurance corporations to get an idea of the proportion customers who are
satisfied with the service quality.
Conditions for Applying Binomial Distribution
(Bernoulli Process)
• Trials are independent and random.
• There are fixed number of trials (n trials).
• There are only two outcomes of the trial designated as success or
failure.
• The probability of success is uniform through out the n trials
Binomial Probability Function
Example for Binomial Distribution
• A bank issues credit cards to customers under the scheme of Master
Card. Based on the past data, the bank has found out that 60% of all
accounts pay on time following the bill. If a sample of 7 accounts is
selected at random from the current database, construct the
Binomial Probability Distribution of accounts paying on time.

• Solution using Python?


Solution

• This problem can be structured as a Bernoulli Process where an account paying on time is
taken as success and an account not paying on time is taken as failure. The random variable x
represents here an account paying on time, which can take values 0,1,2,3,4,5,6,7. You need to
prepare a table containing x and P(x) for all the values of x. Performing calculations using
Binomial Probability Function

For x = 0, 1, 2, 3, 4, 5, 6, 7 is very tedious

• The best option is to use the Microsoft Excel to calculate the Binomial Probabilities both for
individual values and for the cumulative position. This facility is available under the option
"Paste Function". The form of the function is: n, p, O or 1) where x is the number of successes,
n is the number of trials, and p is the probability of success in each trial. The last term 0 or I
performs a logical operation. If you enter 0, the computer returns the individual probability
value; if 1 is entered, the computer gives the cumulative probability value
Mean and Standard Deviation of the Binomial
Distribution
Example

If on an average, 6 customers arrive every two minutes at a bank during the busy
hours of working, a) what is the probability that exactly four customers arrive in a
given minute? b) What is the probability that more than three customers will arrive
in a given minute?

How to solve this problem?


Poisson Distribution
• Poisson Distribution is another discrete distribution which also plays a
major role in quality control in the context of reducing the number of
defects per standard unit.
• Examples include number of defects per item, number of defects per
transformer produced, number of defects per 100 m2 of cloth, etc.
• Other real life examples would include 1) The number of cars arriving
at a highway check post per hour; 2) The number of customers
visiting a bank per hour during peak business period.
Poisson Process
• The probability of getting exactly one success in a continuous interval
such as length, area, time and the like is constant.
• The probability of a success in any one interval is independent of the
probability of success occurring in any other interval.
• The probability of getting more than one success in an interval is 0.
Poisson Probability Function
Example – Poisson Distribution
• If on an average, 6 customers arrive every two minutes at a bank
during the busy hours of working, a) what is the probability that
exactly four customers arrive in a given minute? b) What is the
probability that more than three customers will arrive in a given
minute?

• Work out using Python program


Example – Poisson Distribution
If on an average, 6 customers arrive every two minutes at a bank during the busy
hours of working, a) what is the probability that exactly four customers arrive in a
given minute? b) What is the probability that more than three customers will arrive
in a given minute?

6 customers arrive every two minutes. Therefore , 3 customers arrive every


minute. That implies my lambda=3
P(X=4)=?
P(X>3)=? Implies 1-P(X< =3)?
Example Problem

The mean weight of a morning breakfast cereal pack is 0.295 kg with a standard
deviation of 0.025 kg. The random variable weight of the pack follows a normal
distribution.

a)What is the probability that the pack weighs less than 0.280 kg?

b)What is the probability that the pack weighs more than 0.350 kg?

c)What is the probability that the pack weighs between 0.260 kg to 0.340 kg?
Normal Distribution
Normal Distribution
Normal Distribution
Properties of Normal Distribution
Standard Normal Distribution
Example Problem
• The mean weight of a morning breakfast cereal pack is 0.295 kg with
a standard deviation of 0.025 kg. The random variable weight of the
pack follows a normal distribution.

a)What is the probability that the pack weighs less than 0.280 kg?
b)What is the probability that the pack weighs more than 0.350 kg?
c)What is the probability that the pack weighs between 0.260 kg to
0.340 kg?
Solution a)
Solution b)
Solution c)
Example Problem1

A company produces a bolt of length 10mm for its customers The bolts produces
are normally distributed with average length of 10.01mm & standard deviation
0.06mm
(a) What is the probability that the bolt produced will be longer than 10.2 mm

(b) The sales team is negotiating with a new customer who has more stringent
quality requirements. The new customer requires bolts shall be between 9.9 and
10.15 mm. What is the probability that a bolt produces by the current process will
be acceptable to the new customer

(c) What is the length for which 99% of bolts produced will be less that the length?
Example Problem2

ABC hospital recruits nurses frequently to manage high attrition among the
nursing staff. Not all job offers from ABC hospital are accepted. Based on the past
recruitment data, it was estimated that only 70% of offers rolled out by ABC
hospital are accepted.
a. If 10 offers are made, what is the probability that more than 5 and less than 8
candidates will accept the offer from ABC hospital?
b. During December 2019, ABC requires 14 new nurses to manage attrition. What
should be the number of offers made by ABC hospital so that the average
number of nurses accepting the offer is 14?
Example Problem3

According to a survey on use of smart phones in India, the smart phone users
spend 68minutes in a day on average in sending messages and the corresponding
standard deviation is 12 minutes. Assume that the time spent in sending messages
follows a normal distribution.
a. What proportion of the smart phone users are spending more than 90 minutes
in sending messages daily?
b. What proportion of customers are spending less than 20 minutes?
c. What proportion of customers are spending between 50 minutes and 100
minutes?
Example Problem4

At ABC hospital, nurses are given an additional bonus of INR 1,00,000 if they stay
for more than 36 months with ABC hospital. The average stay of nurses follows a
normal distribution with an average of 28 months and the corresponding standard
deviation is 4.8 months. Calculate:
a. The expected number of nurses who will be given bonus and the value of
bonus that will be given if 50 new nurses join ABC hospital in the current
month?
b. What will be the additional amount paid if ABC hospital changes the policy that
they will give bonus if the stay exceeds 24 months? What assumptions are
made in this case?

You might also like