Six Sigma: Statistics: By: - Hakeem-Ur-Rehman
Six Sigma: Statistics: By: - Hakeem-Ur-Rehman
TECHNIQUES
Q T T
SIX SIGMA: STATISTICS
By: -
Hakeem–Ur–Rehman
MS-TQM, M.I.O.M(Operations Research)
Certified Six Sigma Black Belt (Singapore)
Lead Auditor ISO 9001 (UK)
1
IQTM–PU
WHAT IS STATISTICS?
Analysis
– e.g. Survey
2. Presenting Data
– e.g., Charts & Tables
Decision-
3. Characterizing Data Making
– e.g., Average
Sample • S in Sample
& Statistic
Portion of Population
Parameter
Summary Measure about Population
Statistic
Summary Measure about Sample
TYPES OF DATA
Attribute Data (Qualitative)
Is always binary, there are only two possible values (0, 1)
1. Yes, No
2. Success, Failure
3. Go, No Go
4. Pass, Fall
Variable Data (Quantitative)
Discrete (Count) Data:
Can be categorized in a classification and is based on counts.
1. Number of defects
2. Number of defective units
3. Number of Customer Returns
Continuous Data:
Can be measured on a scale, it has decimal subdivisions that are
meaningful
1. Time, Pressure
2. Money
3. Material feed rate
DISCRETE & CONTINUOUS
VARIABLES
DISCRETE VARIABLE POSSIBLE VALUES FOR THE
VARIABLE
The number of defective needles in boxes 0,1,2,3 … 100
of 100 diabetic syringes
The number of individuals in groups of 30 0,1,2,3 … 30
with a Type–A Personality
The number of surveys returned out of 300 0,1,2,3 … 300
mailed in a customer satisfaction study.
The length of prison time served All the real numbers between ‘a’ and ‘b’, where
for individuals convicted ‘a’ is the smallest amount of time served and ‘b’
is the largest.
The household income for All the real numbers between ‘a’ and $30,000,
households with incomes less where ‘a’ is the smallest household income in the
than or equal to $30,000 population.
DEFINITIONS OF SCALED DATA
Understanding the nature of data and how to represent it can affect the types of statistical tests
possible.
1. NOMINAL SCALE:
“Numbers representing nominal data can be used only to classify or categorize”;
Data consists of Names, Labels, or categories.
A player with number 30 is not more of anything than a player with number 15,
and is certainly not twice whatever number 15 is.
Few examples of Nominal Data are:
Sex, Religion, Geographic Location, Place of Birth, employee ID Numbers
etc.
2. ORDINAL SCALE:
“Ordinal Level data measurement is higher than the nominal level. In addition to
the nominal level capabilities, Ordinal level measurement can be used to rank or
order objects”.
The Categorization of people or objects, or the ranking of items, Nominal and
Ordinal data are non–metric data and are sometimes referred to as qualitative
data.
EXAMPLES:
“AUTOMOBILES SIZES” Subcompact, compact, intermediate, full size,
luxury
“PRODUCT RATING” Poor, Good, Excellent
“CUSTOMER SATISFACTION” Very poor, Poor, Neither good or bad, Good,
Excellent.
DEFINITIONS OF SCALED DATA (Cont…)
3. INTERVAL SCALE:
“The distances between consecutive numbers have meaning and the data are always
numerical”.
FOR EXAMPLE, when measuring temperature (in Fahrenheit), the distance from 30-40
is same as the distance from 70-80. The interval between values is interpretable.
EXAMPLE:
IQ Scores of students in Black Belt Training:
100 … (the difference between scores is measureable and has meaning but
a difference of 20 points between 100 and 120 does not indicate that one
student is 1.2 times more intelligent)
4. RATIO SCALE:
“Data that can be ranked and for which all arithmetic operations including division can
be performed. (Division by Zero is of course excluded) Ratio level data has an
absolute zero and a value of zero indicates a complete absence of the characteristic of
interest”.
FOR EXAMPLE,
Grams of fat consumed per adult in Pakistan
0 … (if person – A consumes 25 grams of fat and person – B consumes 50
grams, we can say that person – B consumes twice as much fat as person –
A. if a person – C consumes ZERO gram of fat per day, we can say there is
a complete absence of fat consumed on that day. Note that a ratio is
interpretable and an absolute zero exists.)
OTHER EXAMPLE:
Production Cycle time, Work measurement time, Number of trucks sold, Number
of employees etc.
DEFINITIONS OF SCALED DATA (Cont…)
TYPE OF DATA OPERATOR DESCRIPTION EXAMPLES
Statistical
Methods
Descriptive Inferential
Statistics Statistics
DESCRIPTIVE STATISTICS
1. Involves $
– Collecting Data 50
– Presenting Data 25
– Characterizing Data
0
2. Purpose Q1 Q2 Q3 Q4
– Describe Data
X = 30.5 S2 = 113
INFERENTIAL STATISTICS
1. Involves Population?
– Estimation
– Hypothesis Testing
2. Purpose
– Make Decisions
About Population
Characteristics
DESCRIPTIVE ANALYSIS OF
QUALITATIVE DATA
QUALITATIVE DATA
QUANTITATIVE DATA
Important
Center Variation Distribution
Points
Mean
Median Median Range
Mode Quartiles Inter-Quartile Range Skewness
Geometric Mean Deciles Variance Kurtosis
Harmonic Mean Percentiles Standard Deviation
Trimmed Mean 13
MINITAB: AN INTRODUCTION
BEGINNING AND ENDING A MINITAB SESSION:
To start a Minitab session from the menu, select
Start All Programs MINITAB 15 English MINITAB 15
English
SESSION
To exit Minitab, select
WINDOW
File Exit
When you first enter
Minitab, the screen will
appear as in the figure:
Trimmed Mean is a:
Compromise between the MEAN and MEDIAN
1. The Trimmed Mean is calculated by eliminating a specified percentage of the
smallest and largest observations from the data set and then calculating the
average of the remaining observations.
2. Useful for data with potential extreme values.
MODE:
Mode - the most frequently occurring value in a data set
Applicable to all levels of data measurement (nominal, ordinal, interval, and
ratio)
Can be used to determine what categories occur most frequently
Bimodal – In a tie for the most frequently occurring value, two modes are listed
Multimodal -- Data sets that contain more than two modes
MEASURES OF VARIATION
RANGE:
The difference between the largest and the smallest values
in a set of data
Advantage – easy to compute
Disadvantage – is affected by extreme values
INTER–QUARTILE RANGE:
Inter-quartile Range - range of values between the first and
third quartile
Range of the “middle half”; middle 50%
Inter-quartile Range – used in the construction of box and
whisker plots
STANDARD DEVIATION:
S=
VARIANCE:
S2 = Square of S
SHAPE OF THE DISTRIBUTION
Skewness: indicator used in distribution analysis as a sign of asymmetry and
deviation from a normal distribution.
Do the following to get the tools where you need them. Click on Tools
Customize Toolbars tab.
In the dialog box that opens, check and uncheck as needed so that it matches the
figure to the below.
WHAT IS A HISTOGRAM?
4. Determine the class width by dividing the range (R) by the number of classes
(K) and rounding up.
23
THE HISTOGRAM
Open Bears.MTW
You will create a frequency histogram of the variable Age.
THE HISTOGRAM (Cont…)
CONTROLLING HISTOGRAMS:
What you get in this case is a histogram with 10 classes.
To get the right number of classes, get into the "X Scale" editing dialog box and click on
the "Binning" tab.
For "Interval Type" click on "Cut point" and for "Interval Definition" click on "Number of
intervals:" and change it to 6; Now click "OK“
This graph still does not conform to standards because the class width and class
boundaries were not calculated according to rules. To get what we want, we must
define the class boundaries (what Minitab calls "cut points") ourselves.
The minimum value of the data is 8 and the maximum is 177. Our formula for the class
width with 6 classes is (177–8)/6 = 28.5..., which rounds up to 29. (Remember; always
round up unless the fraction yields an integer.) If we choose 8 as the lowest class limit,
then the lowest class boundary will be 7.5, and the rest will be 36.5, 65.5, 94.5, 123.5,
152.5 and 181.5.
Now get back into the "Binning" dialog box, click on "Midpoint/Cutpoint positions:",
delete the existing cutpoints then enter the first 2 class boundaries listed above into
the box (separate with spaces, not commas) and click "OK".
THE HISTOGRAM (Cont…)
EXERCISE:
The data in C:\Program Files\Minitab
15\English\Sample Data \Grades.MTW consists of
verbal and math SAT scores and corresponding
GPA's.
i. Create a frequency histogram with 7 classes of
the verbal SAT scores.
ii. Create a relative frequency histogram with 7
classes of the verbal SAT scores.
iii. Create a frequency polygon with 7 classes of the
verbal SAT scores.
BOX & WHISKER’S PLOT
Use a Box & Whisker’s Plot to
assess and compare
distribution characteristics
such as median, range, and
symmetry, and to identify
outliers.
A minimum of 10
observations should be
included in generating the
Box Plot.
27
BOX & WHISKER’S PLOT USING MINITAB
CONSTRUCTING BOX PLOT (One Y):
You want to examine the overall
durability of your carpet products.
Samples of the carpet products are
placed in four homes and you
measure durability after 60 days.
Create a Box Plot to examine the
distribution of durability scores.
Open worksheet Carpet.mtw
Choose Graph Boxplot
Under One Y, Choose Simple, Click
Ok
In Variable, enter Durability. Click
ok
28
BOX & WHISKER’S PLOT USING MINITAB
Constructing Box Plot: (One Y–with Groups)
29
BOX & WHISKER’S PLOT USING MINITAB
Constructing Box Plot: (One Y–with Groups)
(Cont…)
30
BOX & WHISKER’S PLOT USING MINITAB
Constructing Box Plot: (One Y–with Groups)
(Cont…)
EXERCISE # 2:
The following data represent the percentage of calories that come from fat
for burgers and chicken items from a sample of fast food chains.
BURGER
43 51 48 47 51 50 55 55 59 57
CHICKEN
60 54 53 57 57 46 45 56 57
Construct the Box & Whisker’s for analyze the data. 32
PROBABILITY
DISTRIBUTION OF DATA
Data generating process of the data is known as
Distribution of the Data.
For Example:
In the manufacturing sector the measurements
such as length, diameter, etc usually follow
NORMAL Distribution
In Service sector say Banks, the customer
waiting Time follow EXPONETIONAL
Distribution
In Service sector say Banks, the number of
customers arriving follow POISSON Distribution
NORMAL DISTRIBUTION
Characteristics of the normal distribution:
Continuous distribution - Line does not break
Symmetrical distribution - Each half is a mirror of the other half
Asymptotic to the horizontal axis - it does not touch the x axis and goes on
forever
Unimodal - means the values mound up in only one portion of the graph
Area under the curve = 1; total of all probabilities = 1
Normal distribution is characterized by the mean and the Std Dev
Values of μ and σ produce a normal distribution
x
2
1
1
f ( x) e 2
2
Where :
mean of X
standard deviation of X
= 3.14159 . . .
e 2.71828 . . . X
STANDARD NORMAL DISTRIBUTION
1
A normal distribution with
a mean of zero, and
a standard deviation of one
0
Z Formula
standardizes any normal
distribution
Z Score X
computed by the Z Formula Z
the number of standard
deviations which a value
is away from the mean
NORMALITY TEST FROM
GRAPHIC SUMMARY OF DATA
Open the worksheet CRANKSH.MTW
NORMALITY TEST:
In an operating engine, parts of the crankshaft move up
and down. AtoBDist is the distance (in mm) from the
actual (A) position of a point on the crankshaft to a
baseline (B) position. To ensure production quality, a
manager took five measurements each working day in a
car assembly plant, from September 28 through October
15, and then ten per day from the 18th through the 25th.
You wish to see if these data follow a normal
distribution,
so you use Normality test.
Open the worksheet CRANKSH.MTW
NORMALITY TEST (Cont…)
39
SCATTER PLOT
What is the relationship between the X and Y Plot?
40
SCATTER PLOT
EXAMPLE: You are interested in how well your
company's camera batteries are meeting customers'
needs. Market research shows that customers become
annoyed if they have to wait longer than 5.25 seconds
between flashes.
You collect a sample of batteries that have been in use
for varying amounts of time and measure the voltage
remaining in each battery immediately after a flash
(VoltsAfter), as well as the length of time required for
the battery to be able to flash again (flash recovery time,
FlashRecov). Create a scatter plot to examine the
results. Include a reference line at the critical flash
recovery time of 5.25 seconds.
Open the worksheet BATTERIES.MTW 41
SCATTER PLOT
EXAMPLE (Cont…):
42
SCATTER PLOT
44