Gren Belt Training Program Session 2
Gren Belt Training Program Session 2
Program - Lote
AGENDA
• Probability distribution
– Normal Distribution
– Binomial Distribution
– Poisson Distribution
• Z value exercise
• Sample vs population
• Visual Exploratory Data Analysis
– Histogram, Box plot, Multivariate charts
• Central Limit Theorem
Six Sigma
Probability
• Probability is a chance of happening any thing.
Tossing a coin, Having a breakdown,getting a batch with
required purity.
Total chances – 6
1 2 3 4 5 6
0.75
0.5
0.25
0 1 2
Types of distributions
• Binomial distribution
• Poisson distribution
• Normal Distribution
• Weibull distribution
Binomial distribution
• Binomial distribution describes discrete
data – situations where there can be only
two results in a random experiment.
Examples are:
– Pass or failure, Compliance or non-
compliance, yes or no
• Probability=- n!*pr*q(n-r)/r!(n-r)!
• Measures of central tendency and
dispersion, for the binomial distribution
– p=Probability of success
– q=Probability of failure = (1-p)
– Mean = np
– Standard deviation = npq
• Probability=x * e- /x!
When n>20, or when the number of observations are very large, it has
been statistically proven that the Poisson distribution becomes a very
good approximation of the binomial distribution.
Weibull Distribution
• Weibull distribution describes
continuous data. This distribution
varies its shape depending upon a
parameter “b”. This can be
employed in conditions where:
– Wear rates and failure rates of
equipment are calculated
– Tensile strength or electrical
resistance measurements are
taken
• For large “b” values, the Weibull
distribution takes the shape of the
normal distribution
Which distribution will you use?
1. You are a marketing manager, your
average success in getting an order
during a visit is 40%. Out of 5 visits what
is the chance that you get at least 2
orders?
2. The CEO visits the plant on an average
3 times a year. What is the probability
that he will come more than once in the
next month?
Normal Distribution
• Normal distribution also describes
continuous data. The normal
distribution enjoys a very prominent
position in statistics, since:
– The Normal distribution is a
very useful sampling distribution
– It has been statistically shown
that it can explain several
naturally occurring phenomena.
• Measures of central tendency and
dispersion, for the normal
distribution
– Mean =
• (Describes the location)
– Standard deviation = ()
• (Describesthe Spread)
xi / n
(x )
i
2
n
The last take on distributions….
• For large sample sizes, the Poisson distribution
tends to a Binomial distribution.
• For large sizes, the Binomial distribution tends to
a Normal distribution too.
• Hence, for most phenomena on which we collect
data, we should assume a normal distribution
and check out the normality using some tests
which are shown subsequently.
• Weibull distribution should be used for wear
rates and failure rates data in our projects.
The z-transform
Why know it?
C D
The total area under the normal curve is 1 which is split into
two symmetrical halves of 0.5 each.
Some tables (including MS Excel) give the cumulative area under the curve from the left tip
to the specified Z value.
Calculating the z-value - Using the z-
table…1
We have a training program designed to teach greenbelt
to some candidates. The time required to be spent for
grasping the tools well follows a normal curve. A study of
past participants indicates that the best program has a
mean duration of 500 hours and a standard deviation of
100 hrs.
1) Can we find out the probability of a participant selected
at random will take more than 500 hrs?
500 650
3) What is the probability that a candidate selected at random
will take more than 700 hrs?
700 500
z700 2
100
Look up 2 in the z-table, you get a value of
0.4772, but that is the area between the
500 700
mean and that point, hence the shaded area
is 0.5-0.4772 which is 0.0228
Exercise on Z value
• You are the manager of the plant and you are
told that for 1,000,000 pouches of product
filled, the mean weight is 500 gms and the
standard deviation is 10 gms.
• How many % of units are…
1. Greater than 500 gms
2. Between 485 gms & 500 gms
3. Between 500 gms & 505 gms
4. Between 490 gms & 520 gms
5. Less than 475 gms
6. More than 530 gms
Exercise on Z value - Solution
• You are the manager of the formulations plant and you are told
that for 1,000,000 pouches of product filled, the mean weight is
500 gms and the standard deviation is 10 gms.
• How many units are…
1. Greater than 500 gms (0.5 X 10^6 = 500,000)
2. Between 485 gms & 500 gms (0.4332 X 10^6 = 433,200)
3. Between 500 gms & 505 gms (0.1915 X 10^6 = 191,500)
4. Between 490 gms & 520 gms (0.8185 X 10^6 = 818,500)
5. Less than 475 gms (0.0062 X 10^6 = 6,200)
6. More than 530 gms (0.0013 X 10^6 = 1,300)
Sampling
Why know it?
•Statistical procedures help us
make inferences about the
population based on sampling
fundamentals.
LSL T USL
2
Z /2
n
X
Central Limit Theorem
What is the Central Limit Theorem?
Normal
The four propositions of Central Limit
Theorem
• The mean of the means of all possible samples from a
population will be equal to population mean
• The standard deviation of averages of all possible
samples will lesser than the standard deviation of the
population
• As the sample size increases the standard deviation of
the averages of all possible samples decreases.
• As the sample size increases,the sampling distribution
of the means approaches normal distribution
Estimating Sampling Distribution standard deviation from
Population Standard deviation
x
n
The practical aspect of all this is that if you want to improve the precision of any
test, increase the sample size.
So, if you want to reduce measurement error (for example) to determine a better
estimate of a true value, increase the sample size. The resulting error will be
reduced by a factor of 1 .
n
The same goes for any significance testing. Increasing the sample size will
reduce the error in a similar manner.
For a finite population where N is known
and where n/N>1/20
x * N-n
n N-1
the formula for calculation
Different terminologies used in Sample ,
Population and Sampling Distribution
Standard s x=
Deviation /sqrt(n)
Sample n N
Size
4 km
2 km
5 7
1 2
4
3
6 8
10 9
How do we apply this learning to statistics?
Given below is the sampling
distribution of means. We have already
seen how 68.26% of all means lie
within +/- 1 Sigma from the Mean of
the means Range within which the mean of
the sample will lie
+ / - 1 Sigma 68.26%
+ / - 2 Sigma 95.46%
+ / - 3 Sigma 99.73
Confidence limits
Depending on how much from the
For convenience the confidence limits
mean I am willing to walk (in terms of
are expressed as follows…
sigma), my chances of including the
population mean in my range
increases…
Distance walked % of means Distance walked % of means
along the normal covered in the along the normal covered in the
curve range curve range