0% found this document useful (0 votes)
17 views46 pages

Gren Belt Training Program Session 2

Uploaded by

Bhaskar Godbole
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views46 pages

Gren Belt Training Program Session 2

Uploaded by

Bhaskar Godbole
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 46

Six Sigma Green Belt Training

Program - Lote
AGENDA
• Probability distribution
– Normal Distribution
– Binomial Distribution
– Poisson Distribution
• Z value exercise
• Sample vs population
• Visual Exploratory Data Analysis
– Histogram, Box plot, Multivariate charts
• Central Limit Theorem
Six Sigma
Probability
• Probability is a chance of happening any thing.
Tossing a coin, Having a breakdown,getting a batch with
required purity.

Probability is a ratio of actual happening Vs total possible


happening.

A head when tossing a coin is 1/2


Six Sigma
Probability Distribution

• The listing of all probable outcomes with


their probabilities
When tossing a dice

All probable outcome – 1, 2, 3, 4, 5, 6

Total number of outcome – 6

When tossing a dice once

Chances that number 5 will come – 1

Total chances – 6

Probability of getting number 5 = 1/6


Six Sigma
Probability Distribution

• The listing of all probable outcomes with


their probabilities
Similarly for getting 1, 2, 3, 4 5 , 6 = 1/6, 1/6, 1/6, 1/6, 1/6 1/6,

Total Probability = 1/6+1/6+ 1/6+1/6 +1/6+1/6 =1

Lesson 1 = maximum probability can not be more than 1

This listing of all probability is called probability distribution.


1/6 1/6 1/6 1/6 1/6 1/6

1 2 3 4 5 6

Probability Distribution of Tossing a Dice


How are they constructed?
• Probability distributions are constructed by plotting the
probabilities of all probable outcomes in the form of a graph.
• The X axis indicates all the probable outcomes and the Y axis
indicates the probability of that particular outcome.
• Example: A bag contains 1 red ball and 3 black balls . The
different possible outcomes are
– The probability of choosing 0 red balls is 3/4
– The probability of choosing 1 red ball is 1/4
– The probability of choosing 2 red balls is 0

0.75
0.5

0.25

0 1 2
Types of distributions
• Binomial distribution
• Poisson distribution
• Normal Distribution
• Weibull distribution
Binomial distribution
• Binomial distribution describes discrete
data – situations where there can be only
two results in a random experiment.
Examples are:
– Pass or failure, Compliance or non-
compliance, yes or no
• Probability=- n!*pr*q(n-r)/r!(n-r)!
• Measures of central tendency and
dispersion, for the binomial distribution
– p=Probability of success
– q=Probability of failure = (1-p)
– Mean = np
– Standard deviation = npq

Important conditions for applying Binomial distribution


– Each trial has only two outcomes
– The probability of the outcome of any trial remains fixed over time.
– The trials are statistically independent I.e the outcome of one trial does not
influence
Poisson distribution
• Poisson distribution also describes discrete data
– situations where the random variable can take
integer values. Examples are:
– Number of accidents

• Probability=x * e- /x!

• Measures of central tendency and dispersion, for


the poison distribution
– Mean = Number of occurrences per interval of time
– Standard deviation = mean

When n>20, or when the number of observations are very large, it has
been statistically proven that the Poisson distribution becomes a very
good approximation of the binomial distribution.
Weibull Distribution
• Weibull distribution describes
continuous data. This distribution
varies its shape depending upon a
parameter “b”. This can be
employed in conditions where:
– Wear rates and failure rates of
equipment are calculated
– Tensile strength or electrical
resistance measurements are
taken
• For large “b” values, the Weibull
distribution takes the shape of the
normal distribution
Which distribution will you use?
1. You are a marketing manager, your
average success in getting an order
during a visit is 40%. Out of 5 visits what
is the chance that you get at least 2
orders?
2. The CEO visits the plant on an average
3 times a year. What is the probability
that he will come more than once in the
next month?
Normal Distribution
• Normal distribution also describes
continuous data. The normal
distribution enjoys a very prominent
position in statistics, since:
– The Normal distribution is a
very useful sampling distribution
– It has been statistically shown
that it can explain several
naturally occurring phenomena.
• Measures of central tendency and
dispersion, for the normal
distribution
– Mean =
• (Describes the location)
– Standard deviation = ()
• (Describesthe Spread)
  xi / n


 (x   )
i
2

n
The last take on distributions….
• For large sample sizes, the Poisson distribution
tends to a Binomial distribution.
• For large sizes, the Binomial distribution tends to
a Normal distribution too.
• Hence, for most phenomena on which we collect
data, we should assume a normal distribution
and check out the normality using some tests
which are shown subsequently.
• Weibull distribution should be used for wear
rates and failure rates data in our projects.
The z-transform
Why know it?

•To find out how can we use the


normal curve to make predictions
about phenomena occuring around
us

What will we learn?

•What is the z-value?


•How to calculate z-value?
•How to interpret z-value?
Calculating data from the Normal
distribution - The z-transform
• Suppose we establish that the
underlying probability distribution
is normal. What do we do with it?
• Each point on this bell shaped
curve is explained by a parameter
called the “z” value.
• Z= x  So, what’s the use?
where,
 • Since, the area under a standard
x = value of the random variable normal curve is 1, the z value at a
 = mean point can tell us the probability of
 = standard deviation that value happening.
• Z value is a measure of the • And that helps us in predicting
number of standard deviations what are the values that can
that the value is away from the occur at 1,2 or 3 Standard
mean Deviations.
• This concept forms the backbone
of all the statistical theory that we
are to go into.
The value of area from the Z table will
tell us four information
1. as that of C The value of A
can be found from the table
by looking at Z.
2. The area B is same as that of
A
3. The area C is 0.5-A

A B 4. The area D is same

C D
The total area under the normal curve is 1 which is split into
two symmetrical halves of 0.5 each.

Some tables (including MS Excel) give the cumulative area under the curve from the left tip
to the specified Z value.
Calculating the z-value - Using the z-
table…1
We have a training program designed to teach greenbelt
to some candidates. The time required to be spent for
grasping the tools well follows a normal curve. A study of
past participants indicates that the best program has a
mean duration of 500 hours and a standard deviation of
100 hrs.
1) Can we find out the probability of a participant selected
at random will take more than 500 hrs?

 = 500 hrs Half the area under the normal curve


is beyond 500 hrs, since the total area
 = 100 is 1, the area beyond 500 hrs will be
hrs 0.5. Hence, the probability of such a
participant is 0.5
Calculating the z-value - Using the z-
table…2
2) What is the probability that a candidate selected at random
will take between 500 and 650 hrs? 650  500
z650 = 1.5
100
Look up 1.5 in the z-table, you get a
value of 0.4332, which means that the
probability that a candidate will take
between 500 and 650 hrs is = 0.4332

500 650
3) What is the probability that a candidate selected at random
will take more than 700 hrs?
700  500
z700  2
100
Look up 2 in the z-table, you get a value of
0.4772, but that is the area between the
500 700
mean and that point, hence the shaded area
is 0.5-0.4772 which is 0.0228
Exercise on Z value
• You are the manager of the plant and you are
told that for 1,000,000 pouches of product
filled, the mean weight is 500 gms and the
standard deviation is 10 gms.
• How many % of units are…
1. Greater than 500 gms
2. Between 485 gms & 500 gms
3. Between 500 gms & 505 gms
4. Between 490 gms & 520 gms
5. Less than 475 gms
6. More than 530 gms
Exercise on Z value - Solution
• You are the manager of the formulations plant and you are told
that for 1,000,000 pouches of product filled, the mean weight is
500 gms and the standard deviation is 10 gms.
• How many units are…
1. Greater than 500 gms (0.5 X 10^6 = 500,000)
2. Between 485 gms & 500 gms (0.4332 X 10^6 = 433,200)
3. Between 500 gms & 505 gms (0.1915 X 10^6 = 191,500)
4. Between 490 gms & 520 gms (0.8185 X 10^6 = 818,500)
5. Less than 475 gms (0.0062 X 10^6 = 6,200)
6. More than 530 gms (0.0013 X 10^6 = 1,300)
Sampling
Why know it?

•Statistical procedures help us
make inferences about the
population based on sampling
fundamentals.
LSL T USL

•Bad sampling means bad results.

What will we learn?

•Sample vs. Population


•Types of Sampling
- Random
- Systematic
- Stratified
•Sample size calculations
Mithai Shop-Sampling
• When we go to a mithai shop, most of us taste a little
bite of the mithai. Why do we sample the mithai?
• We assume that the entire mithai is made from one base
and so if we sample a bit, we can be reasonably sure
about the taste of the entire half a kg or one kg of mithai
that we want to buy.
• We take a small sample because the mithai wala won’t
allow us to taste the entire mithai before hand.
• We do similar samplings day on day in our lives
Population Vs. Sample (Certainty Vs. Uncertainty)

• A sample is just a subset of all possible


values sample Populations could
population be vibration of ID
fan, total orders
processed etc.

• Since the sample does not contain all the


possible values, there is some uncertainty
about the population. Hence any statistics,
such as mean and standard deviation,
are just estimates of the true
population parameters.
Frequently used sampling techniques
• Random Sampling
– Every item in the population or process has an equal chance of being
selected for counting
– Done by assigning computer generated random numbers to items
being surveyed
• Systematic Sampling
– Data samples are taken at certain intervals (every hour or every 20 th
item)
– Beware of bias introduced due to hidden patterns while using this
method
• Stratified sampling
– Divides the population into groups or strata for sampling based on
characteristics of the group
– Random or systematic sampling can then be used to collect data
SixSigma
Sample Size

2
 Z  /2 
n 
  X
Central Limit Theorem
What is the Central Limit Theorem?

Central Limit Theorem


For almost all populations, the sampling distribution
of the mean can be approximated closely by a normal
distribution, provided the sample size is sufficiently
large.

Normal
The four propositions of Central Limit
Theorem
• The mean of the means of all possible samples from a
population will be equal to population mean
• The standard deviation of averages of all possible
samples will lesser than the standard deviation of the
population
• As the sample size increases the standard deviation of
the averages of all possible samples decreases.
• As the sample size increases,the sampling distribution
of the means approaches normal distribution
Estimating Sampling Distribution standard deviation from
Population Standard deviation

x 
n
The practical aspect of all this is that if you want to improve the precision of any
test, increase the sample size.
So, if you want to reduce measurement error (for example) to determine a better
estimate of a true value, increase the sample size. The resulting error will be
reduced by a factor of 1 .
n
The same goes for any significance testing. Increasing the sample size will
reduce the error in a similar manner.
For a finite population where N is known
and where n/N>1/20
x  * N-n

n N-1
the formula for calculation
Different terminologies used in Sample ,
Population and Sampling Distribution

Paramete Sample Sampling Population


r Distributio
n
Mean x
x 

Standard s x=
Deviation /sqrt(n)

Sample n N
Size

Use - Estimation of population from Sample Acceptance


Confidence Intervals and levels
Why know it?

•The understanding of confidence


intervals helps us to understand
limitations in our estimates on the
population. LSL T USL

•It helps us to quickly and


efficiently screen data for
significance

What will we learn?

•Usage of confidence intervals and


levels
Confidence Intervals

4 km

2 km

5 7

1 2

4
3

6 8

10 9
How do we apply this learning to statistics?
Given below is the sampling
distribution of means. We have already
seen how 68.26% of all means lie
within +/- 1 Sigma from the Mean of
the means Range within which the mean of
the sample will lie

Depending on how much from the


mean I am willing to walk (in terms of
sigma), my chances of including the
population mean in my range
increases…
Distance walked % of means
along the normal covered in the
curve range

+ / - 1 Sigma 68.26%

+ / - 2 Sigma 95.46%

+ / - 3 Sigma 99.73
Confidence limits
Depending on how much from the
For convenience the confidence limits
mean I am willing to walk (in terms of
are expressed as follows…
sigma), my chances of including the
population mean in my range
increases…
Distance walked % of means Distance walked % of means
along the normal covered in the along the normal covered in the
curve range curve range

+ / - 1 Sigma 68.26 % + / - 1.65 Sigma 90 %

+ / - 2 Sigma 95.46 % + / - 1.96 Sigma 95 %

+ / - 3 Sigma 99.73 % + / - 2.57 Sigma 99 %


Example for confidence
intervals
• The maintenance head wants to know the
average life of bulbs in the factory.
• He knows from earlier experience that the
standard deviation is 10 months.
• He takes a sample of 30 bulbs & you find that
the average life of the bulbs is 36 months
• What is the range of average life of the bulbs
at each of the following confidence levels…
– 90%
– 95%
– 99%
Example for confidence intervals - solution

• The maintenance head wants to know the average


life of bulbs in the factory.
• He knows from earlier experience that the standard
deviation is 10 months.
• He takes a sample of 30 bulbs & you find that the
average life of the bulbs is 36 months
• What is the range of average life of the bulbs at
each of the following confidence levels…
• SE = 1.825
– 90% (32.99 to 39.01)
– 95% (32.42 to 39.58)
– 99% (31.31 to 40.69)
1) Example for Confidence interval – RND
trials
• The RND team has taken trials for implementation of
the new process. The average yield achieved in 5
batches is as follows…
– 92.3%
– 91.5%
– 92.0%
– 91.8%
– 91.7%
• We know that the standard deviation of the yield of this
process is 0.5%
• Estimate the limits within which the yield of the new
process will lie with 90% confidence level
2) Example for Confidence interval – Lab
test
• While approving the new vendor for DMAPT, the lab has
checked 5 “samples” from the vendors, which show the
purity of THPI as…
– 99.01%
– 98.99%
– 99.02%
– 98.98%
– 98.80%
• We know that the standard deviation of the yield of this
purity of the THPI is 0.1
• The vendor claims that his THPI always has an average
purity of 99.25%.
• Can his claim be verified at 99% confidence level?
3) Example for Confidence interval – Gland
packing
• You have used Haathi chaap gland packing for 10
reactors and the life of the gland has been as
follows…
– 200 days, 205 days, 220 days, 250 days, 180
days, 183 days, 190 days, 192 days, 196 days,
204 days
• You are told that the standard deviation of the gland
packing life is 20 days
• You are now planning to replace all your gland
packings across your plant with this brand and you
need to estimate the average life of this brand with
95% certainty
4) Example for Confidence interval – Qty of
coal
• The plant manager wants to estimate the
quantity of coal required for generation in the
year. A random sample of 10 weeks over the
last 5 years gives a mean usage of 11,400
tons a week
• He knows that the standard deviation is 700
tons a week
• What will be his average weekly consumption
in the next year at 95% confidence level?
1) Solution
• X bar = 91.86
• Sigma X bar = 0.5 / sqrt (5) = 0.22
• X bar + 1.65 X 0.22 = 91.86 + 1.65 X
0.22 = 92.223
• X bar – 1.65 X 0.22 = 91.86 – 1.65 X
0.22 = 91.497
2) Solution
• X bar = 98.96
• Sigma X bar = 0.0448
• 98.96 + 2.57 X 0.0448 = 99.07
• 98.96 – 2.57 X 0.0448 = 98.84
• Since 99.25 lies out of the confidence
limits, we reject the claim of the vendor
3) Solution
• X bar = 202
• Sigma X bar = 6.32
• 202 + 1.96 X 6.32 = 214.38 days
• 202 – 1.96 X 6.32 = 189.61 days
4) Solution
• X bar = 11400
• Sigma X bar = 700 / sqrt (10) = 221.51
• 11400 + 1.96 X 221.51 = 11834.15
• 11400 – 1.96 X 221.51 = 10965.84
Exercise 1
• Phorate is produced with a standard deviation
of 1.6% purity.
• A random sample of 64 batches is selected
from the output and this sample has a mean
purity of 90%.
• The customer will reject the batch if it is either
less than 88% or more than 92%.
• Does the 95% confidence interval for the true
mean purity of all the batches produced
ensure acceptance by customer?
Phorate …Solution
• X bar = 90
 /sqrt(n)=1.6/sqrt(64)=0.2
• To find the limits of mean of the population
– X-Z*/sqrt(n)<=<= X+Z*/sqrt(n)
Z for 95% confidence interval is 1.96
• 90-1.96*1.2= 89.61
• 90+1.96*2= 90.39

• The limits of mean are 89.61 and 90.39 at


95% confidence level

You might also like