3-Statistical Learning - Distributions
3-Statistical Learning - Distributions
PGPAIML
Outline
1. Introduction to Distributions
2. Binomial Distribution
3. Poisson Distribution
4. Normal distribution
5. Summary of applications of different distributions
What is a Probability Distribution
• In precise terms, a probability distribution is a total listing of the various
values the random variable can take along with the corresponding probability
of each value. A real life example could be the pattern of distribution of the
machine breakdowns in a manufacturing unit.
• The random variable in this example would be the various values the machine
breakdowns could assume.
• The probability corresponding to each value of the breakdown is the relative
frequency of occurrence of the breakdown.
• The probability distribution for this example is constructed by the actual
breakdown pattern observed over a period of time. Statisticians use the term
“observed distribution” of breakdowns.
Probability Distributions - Example
15-Feb-20 4
Example Problem
• A bank issues credit cards to customers under the scheme of Master Card.
Based on the past data, the bank has found out that 60% of all accounts pay on
time following the bill. If a sample of 7 accounts is selected at random from the
current database, construct the Probability Distribution of accounts paying on
time.
Binomial Distribution
• The Binomial Distribution is a widely used probability distribution of a discrete
random variable.
• It plays a major role in quality control and quality assurance function.
Manufacturing units do use the binomial distribution for defective analysis.
• Reducing the number of defectives using the proportion defective control chart
(p chart) is an accepted practice in manufacturing organizations.
• Binomial distribution is also being used in service organizations like banks, and
insurance corporations to get an idea of the proportion customers who are
satisfied with the service quality.
Conditions for Applying Binomial Distribution
(Bernoulli Process)
• Trials are independent and random.
• There are fixed number of trials (n trials).
• There are only two outcomes of the trial designated as success or
failure.
• The probability of success is uniform through out the n trials
Binomial Probability Function
Example for Binomial Distribution
• A bank issues credit cards to customers under the scheme of Master
Card. Based on the past data, the bank has found out that 60% of all
accounts pay on time following the bill. If a sample of 7 accounts is
selected at random from the current database, construct the
Binomial Probability Distribution of accounts paying on time.
• This problem can be structured as a Bernoulli Process where an account paying on time is
taken as success and an account not paying on time is taken as failure. The random variable x
represents here an account paying on time, which can take values 0,1,2,3,4,5,6,7. You need to
prepare a table containing x and P(x) for all the values of x. Performing calculations using
Binomial Probability Function
• The best option is to use the Microsoft Excel to calculate the Binomial Probabilities both for
individual values and for the cumulative position. This facility is available under the option
"Paste Function". The form of the function is: n, p, O or 1) where x is the number of successes,
n is the number of trials, and p is the probability of success in each trial. The last term 0 or I
performs a logical operation. If you enter 0, the computer returns the individual probability
value; if 1 is entered, the computer gives the cumulative probability value
Mean and Standard Deviation of the Binomial
Distribution
Example
If on an average, 6 customers arrive every two minutes at a bank during the busy
hours of working, a) what is the probability that exactly four customers arrive in a
given minute? b) What is the probability that more than three customers will arrive
in a given minute?
The mean weight of a morning breakfast cereal pack is 0.295 kg with a standard
deviation of 0.025 kg. The random variable weight of the pack follows a normal
distribution.
a)What is the probability that the pack weighs less than 0.280 kg?
b)What is the probability that the pack weighs more than 0.350 kg?
c)What is the probability that the pack weighs between 0.260 kg to 0.340 kg?
Normal Distribution
Normal Distribution
Normal Distribution
Properties of Normal Distribution
Standard Normal Distribution
Example Problem
• The mean weight of a morning breakfast cereal pack is 0.295 kg with
a standard deviation of 0.025 kg. The random variable weight of the
pack follows a normal distribution.
a)What is the probability that the pack weighs less than 0.280 kg?
b)What is the probability that the pack weighs more than 0.350 kg?
c)What is the probability that the pack weighs between 0.260 kg to
0.340 kg?
Solution a)
Solution b)
Solution c)
Example Problem1
A company produces a bolt of length 10mm for its customers The bolts produces
are normally distributed with average length of 10.01mm & standard deviation
0.06mm
(a) What is the probability that the bolt produced will be longer than 10.2 mm
(b) The sales team is negotiating with a new customer who has more stringent
quality requirements. The new customer requires bolts shall be between 9.9 and
10.15 mm. What is the probability that a bolt produces by the current process will
be acceptable to the new customer
(c) What is the length for which 99% of bolts produced will be less that the length?
Example Problem2
ABC hospital recruits nurses frequently to manage high attrition among the
nursing staff. Not all job offers from ABC hospital are accepted. Based on the past
recruitment data, it was estimated that only 70% of offers rolled out by ABC
hospital are accepted.
a. If 10 offers are made, what is the probability that more than 5 and less than 8
candidates will accept the offer from ABC hospital?
b. During December 2019, ABC requires 14 new nurses to manage attrition. What
should be the number of offers made by ABC hospital so that the average
number of nurses accepting the offer is 14?
Example Problem3
According to a survey on use of smart phones in India, the smart phone users
spend 68minutes in a day on average in sending messages and the corresponding
standard deviation is 12 minutes. Assume that the time spent in sending messages
follows a normal distribution.
a. What proportion of the smart phone users are spending more than 90 minutes
in sending messages daily?
b. What proportion of customers are spending less than 20 minutes?
c. What proportion of customers are spending between 50 minutes and 100
minutes?
Example Problem4
At ABC hospital, nurses are given an additional bonus of INR 1,00,000 if they stay
for more than 36 months with ABC hospital. The average stay of nurses follows a
normal distribution with an average of 28 months and the corresponding standard
deviation is 4.8 months. Calculate:
a. The expected number of nurses who will be given bonus and the value of
bonus that will be given if 50 new nurses join ABC hospital in the current
month?
b. What will be the additional amount paid if ABC hospital changes the policy that
they will give bonus if the stay exceeds 24 months? What assumptions are
made in this case?