0% found this document useful (0 votes)
147 views44 pages

Six Sigma: Statistics: By: - Hakeem-Ur-Rehman

statistics

Uploaded by

andrej.gregorcic
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
147 views44 pages

Six Sigma: Statistics: By: - Hakeem-Ur-Rehman

statistics

Uploaded by

andrej.gregorcic
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

QUALITY TOOLS &

TECHNIQUES
Q T T
SIX SIGMA: STATISTICS

By: -
Hakeem–Ur–Rehman
MS-TQM, M.I.O.M(Operations Research)
Certified Six Sigma Black Belt (Singapore)
Lead Auditor ISO 9001 (UK)
1
IQTM–PU
WHAT IS STATISTICS?

1. Collecting Data Data Why?

Analysis
– e.g. Survey

2. Presenting Data
– e.g., Charts & Tables
Decision-
3. Characterizing Data Making
– e.g., Average

© 1984-1994 T/Maker Co.


KEY TERMS
 Population (Universe)
• P in Population
 All Items of Interest & Parameter

 Sample • S in Sample
& Statistic
 Portion of Population
 Parameter
 Summary Measure about Population
 Statistic
 Summary Measure about Sample
TYPES OF DATA
 Attribute Data (Qualitative)
 Is always binary, there are only two possible values (0, 1)
1. Yes, No
2. Success, Failure
3. Go, No Go
4. Pass, Fall
 Variable Data (Quantitative)
Discrete (Count) Data:
 Can be categorized in a classification and is based on counts.
1. Number of defects
2. Number of defective units
3. Number of Customer Returns
Continuous Data:
 Can be measured on a scale, it has decimal subdivisions that are
meaningful
1. Time, Pressure
2. Money
3. Material feed rate
DISCRETE & CONTINUOUS
VARIABLES
DISCRETE VARIABLE POSSIBLE VALUES FOR THE
VARIABLE
The number of defective needles in boxes 0,1,2,3 … 100
of 100 diabetic syringes
The number of individuals in groups of 30 0,1,2,3 … 30
with a Type–A Personality
The number of surveys returned out of 300 0,1,2,3 … 300
mailed in a customer satisfaction study.

CONTINUOUS VARIABLE POSSIBLE VALUES FOR THE VARIABLE

The length of prison time served All the real numbers between ‘a’ and ‘b’, where
for individuals convicted ‘a’ is the smallest amount of time served and ‘b’
is the largest.
The household income for All the real numbers between ‘a’ and $30,000,
households with incomes less where ‘a’ is the smallest household income in the
than or equal to $30,000 population.
DEFINITIONS OF SCALED DATA
Understanding the nature of data and how to represent it can affect the types of statistical tests
possible.
1. NOMINAL SCALE:
 “Numbers representing nominal data can be used only to classify or categorize”;
Data consists of Names, Labels, or categories.
 A player with number 30 is not more of anything than a player with number 15,
and is certainly not twice whatever number 15 is.
 Few examples of Nominal Data are:
 Sex, Religion, Geographic Location, Place of Birth, employee ID Numbers
etc.
2. ORDINAL SCALE:
 “Ordinal Level data measurement is higher than the nominal level. In addition to
the nominal level capabilities, Ordinal level measurement can be used to rank or
order objects”.
 The Categorization of people or objects, or the ranking of items, Nominal and
Ordinal data are non–metric data and are sometimes referred to as qualitative
data.
 EXAMPLES:
 “AUTOMOBILES SIZES”  Subcompact, compact, intermediate, full size,
luxury
 “PRODUCT RATING”  Poor, Good, Excellent
 “CUSTOMER SATISFACTION”  Very poor, Poor, Neither good or bad, Good,
Excellent.
DEFINITIONS OF SCALED DATA (Cont…)
3. INTERVAL SCALE:
 “The distances between consecutive numbers have meaning and the data are always
numerical”.
 FOR EXAMPLE, when measuring temperature (in Fahrenheit), the distance from 30-40
is same as the distance from 70-80. The interval between values is interpretable.
 EXAMPLE:
 IQ Scores of students in Black Belt Training:
 100 … (the difference between scores is measureable and has meaning but
a difference of 20 points between 100 and 120 does not indicate that one
student is 1.2 times more intelligent)
4. RATIO SCALE:
 “Data that can be ranked and for which all arithmetic operations including division can
be performed. (Division by Zero is of course excluded) Ratio level data has an
absolute zero and a value of zero indicates a complete absence of the characteristic of
interest”.
 FOR EXAMPLE,
 Grams of fat consumed per adult in Pakistan
 0 … (if person – A consumes 25 grams of fat and person – B consumes 50
grams, we can say that person – B consumes twice as much fat as person –
A. if a person – C consumes ZERO gram of fat per day, we can say there is
a complete absence of fat consumed on that day. Note that a ratio is
interpretable and an absolute zero exists.)
 OTHER EXAMPLE:
 Production Cycle time, Work measurement time, Number of trucks sold, Number
of employees etc.
DEFINITIONS OF SCALED DATA (Cont…)
TYPE OF DATA OPERATOR DESCRIPTION EXAMPLES

Nominal =, ≠ Categories Types of defects,


Types of colors

Ordinal <, > Rankings Severity of defects:


critical, major,
minor

Interval +, - Differences but Temperature of a


no absolute zero ship

Ratio / Absolute zero Pressure, Speed


STATISTICAL METHODS

Statistical
Methods

Descriptive Inferential
Statistics Statistics
DESCRIPTIVE STATISTICS

1. Involves $
– Collecting Data 50
– Presenting Data 25
– Characterizing Data
0
2. Purpose Q1 Q2 Q3 Q4
– Describe Data
X = 30.5 S2 = 113
INFERENTIAL STATISTICS

1. Involves Population?

– Estimation
– Hypothesis Testing

2. Purpose
– Make Decisions
About Population
Characteristics
DESCRIPTIVE ANALYSIS OF
QUALITATIVE DATA
QUALITATIVE DATA

TABLES GRAPHS NUMBERS

One Way Table


Two–Ways Table Bar Chart
. Pie Chart
. Multiple Bar Chart
Percentages
. Component Bar Chart
N – Ways Table
12
DESCRIPTIVE ANALYSIS OF QUANTITATIVE DATA

QUANTITATIVE DATA

TABLES GRAPHS NUMBERS


Frequency Distribution Histogram
Stem and Leaf Plot Box and Whisker’s Plot

Important
Center Variation Distribution
Points
Mean
Median Median Range
Mode Quartiles Inter-Quartile Range Skewness
Geometric Mean Deciles Variance Kurtosis
Harmonic Mean Percentiles Standard Deviation
Trimmed Mean 13
MINITAB: AN INTRODUCTION
BEGINNING AND ENDING A MINITAB SESSION:
 To start a Minitab session from the menu, select
 Start  All Programs  MINITAB 15 English  MINITAB 15
English
SESSION
 To exit Minitab, select
WINDOW
 File  Exit
When you first enter
Minitab, the screen will
appear as in the figure:

The session window contains DATA


comments, tables, descriptive WINDOW
summaries, and inferential
statistics.

The data window consists of


all the data and variable names.

Graph windows contain high


resolution graphs.
DESCRIPTIVE ANALYSIS USING
MINITAB
 In the Minitab Data
folder, open the
worksheet Pulse.mtw
 Conduct Descriptive
Analysis on the pulse1
data.
MEASURES OF LOCATION
 Mean is: SAMPLE:
 Mean is the average of a group of numbers
 Applicable for interval and ratio data POPULATION:
 Not applicable for nominal or ordinal data
 Affected by each value in the data set, including extreme values Computed by
summing all values in the data set and dividing the sum by the number of values
in the data set
 Stat  Basic Statistics  Display Descriptive Statistics:::
 Select; Statistics (and choose appropriate measures)
 Select; Graphs  Histogram of data, with normal curve

Descriptive Statistics: Pulse1


MEASURES OF LOCATION
 Median is:
 Median - middle value in an ordered array of numbers.
 For an array with an odd number of terms, the median is the middle number
 For an array with an even number of terms the median is the average of the
middle two numbers

 Trimmed Mean is a:
 Compromise between the MEAN and MEDIAN
1. The Trimmed Mean is calculated by eliminating a specified percentage of the
smallest and largest observations from the data set and then calculating the
average of the remaining observations.
2. Useful for data with potential extreme values.

 MODE:
 Mode - the most frequently occurring value in a data set
 Applicable to all levels of data measurement (nominal, ordinal, interval, and
ratio)
 Can be used to determine what categories occur most frequently
 Bimodal – In a tie for the most frequently occurring value, two modes are listed
 Multimodal -- Data sets that contain more than two modes
MEASURES OF VARIATION
 RANGE:
 The difference between the largest and the smallest values
in a set of data
 Advantage – easy to compute
 Disadvantage – is affected by extreme values
 INTER–QUARTILE RANGE:
 Inter-quartile Range - range of values between the first and
third quartile
 Range of the “middle half”; middle 50%
 Inter-quartile Range – used in the construction of box and
whisker plots
 STANDARD DEVIATION:
S=

 VARIANCE:
 S2 = Square of S
SHAPE OF THE DISTRIBUTION
Skewness: indicator used in distribution analysis as a sign of asymmetry and
deviation from a normal distribution.

 Skewness > 0 - Right skewed distribution - most values are concentrated


on left of the mean, with extreme values to the right.
 Skewness < 0 - Left skewed distribution - most values are concentrated on
the right of the mean, with extreme values to the left.
 Skewness = 0 - mean = median, the distribution is symmetrical around
the mean.

Kurtosis - indicator used in distribution analysis as a sign of flattening or


"peakedness" of a distribution.

 Kurtosis > 3 - Leptokurtic distribution, sharper than a normal distribution,


with values concentrated around the mean and thicker tails. This means high
probability for extreme values.
 Kurtosis < 3 - Platykurtic distribution, flatter than a normal distribution with
a wider peak. The probability for extreme values is less than for a normal
distribution, and the values are wider spread around the mean.
 Kurtosis = 3 - Mesokurtic distribution - normal distribution for example.
INTRODUCTION TO GRAPHING

The purpose of Graphing is to:


1. To identify the shape of distribution of data
2. To locate the Average, Spread and Outliers of
the Distribution
3. To compare the shapes and variation of different
variables
4. To observe the trends, drifts and shifts in the
collected data

Here we will discuss …


 Histogram
 Box Plots (Box & Whisker’s Plot)
INTRODUCTION TO GRAPHING
(Cont…)
When you start Minitab–15, if your tool bars do not look like the figure below,

 Do the following to get the tools where you need them. Click on Tools 
Customize  Toolbars tab.
 In the dialog box that opens, check and uncheck as needed so that it matches the
figure to the below.
WHAT IS A HISTOGRAM?

 A histogram is a summary graph showing distribution of


data points measured that falls within various class-
intervals.
 WHAT QUESTIONS THE ‘HISTOGRAM’ ANSWERS?
 What distribution (center, variation and shape) does the data
have?
 Does the data look symmetric or is it skewed to the left or right?
 Does the data contain outliers?
 Is Process within Specification Limits? 22
GUIDELINES FOR
CONSTRUCTING A HISTOGRAM
1. Determine the number of data points in the data set. Call this number ‘n’.
2. Determine the range, R, of the values in the data set.
3. Determine the number of classes; there are no set rules; however, there are
some rules of thumb that can be used.
a) # if Classes = 1 + 3.3 log(n)
b) The logarithm (base 2) rule.
 # of Classes = K = [log2n] + 1 = [(log n) / (log 2)] + 1
c) Following table [Goal 88] gives a range of classes.
 # of Classes = K =

4. Determine the class width by dividing the range (R) by the number of classes
(K) and rounding up.
23
THE HISTOGRAM
 Open Bears.MTW
 You will create a frequency histogram of the variable Age.
THE HISTOGRAM (Cont…)
 CONTROLLING HISTOGRAMS:
 What you get in this case is a histogram with 10 classes.
 To get the right number of classes, get into the "X Scale" editing dialog box and click on
the "Binning" tab.
 For "Interval Type" click on "Cut point" and for "Interval Definition" click on "Number of
intervals:" and change it to 6; Now click "OK“
 This graph still does not conform to standards because the class width and class
boundaries were not calculated according to rules. To get what we want, we must
define the class boundaries (what Minitab calls "cut points") ourselves.
 The minimum value of the data is 8 and the maximum is 177. Our formula for the class
width with 6 classes is (177–8)/6 = 28.5..., which rounds up to 29. (Remember; always
round up unless the fraction yields an integer.) If we choose 8 as the lowest class limit,
then the lowest class boundary will be 7.5, and the rest will be 36.5, 65.5, 94.5, 123.5,
152.5 and 181.5.
 Now get back into the "Binning" dialog box, click on "Midpoint/Cutpoint positions:",
delete the existing cutpoints then enter the first 2 class boundaries listed above into
the box (separate with spaces, not commas) and click "OK".
THE HISTOGRAM (Cont…)
 EXERCISE:
The data in C:\Program Files\Minitab
15\English\Sample Data \Grades.MTW consists of
verbal and math SAT scores and corresponding
GPA's.
i. Create a frequency histogram with 7 classes of
the verbal SAT scores.
ii. Create a relative frequency histogram with 7
classes of the verbal SAT scores.
iii. Create a frequency polygon with 7 classes of the
verbal SAT scores.
BOX & WHISKER’S PLOT
Use a Box & Whisker’s Plot to
assess and compare
distribution characteristics
such as median, range, and
symmetry, and to identify
outliers.
A minimum of 10
observations should be
included in generating the
Box Plot.
27
BOX & WHISKER’S PLOT USING MINITAB
CONSTRUCTING BOX PLOT (One Y):
 You want to examine the overall
durability of your carpet products.
Samples of the carpet products are
placed in four homes and you
measure durability after 60 days.
Create a Box Plot to examine the
distribution of durability scores.
 Open worksheet Carpet.mtw
 Choose Graph  Boxplot
 Under One Y, Choose Simple, Click
Ok
 In Variable, enter Durability. Click
ok
28
BOX & WHISKER’S PLOT USING MINITAB
Constructing Box Plot: (One Y–with Groups)

You want to assess the durability of


four experimental carpet products.
Samples of the carpet products are
placed in four homes and you
measure durability after 60 days.
Create a box plot with median labels
and color-coded boxes to examine the
distribution of durability for each
carpet product.

Open the worksheet CARPET.MTW

29
BOX & WHISKER’S PLOT USING MINITAB
Constructing Box Plot: (One Y–with Groups)
(Cont…)

30
BOX & WHISKER’S PLOT USING MINITAB
Constructing Box Plot: (One Y–with Groups)
(Cont…)

Interpreting the results:


 Median durability is highest for Carpet 4 (19.75). However, this product also demonstrates
the greatest variability, with an inter-quartile range of 9.855. In addition, the distribution is
negatively skewed, with at least one durability measurement of about 10.
 Carpets 1 and 3 have similar median durability's (13.52 and 12.895, respectively). Carpet
3 also exhibits the least variability, with an inter-quartile range of only 2.8925.
 Median durability for Carpet 2 is only 8.625. This distribution and that of Carpet 1 are
positively skewed, with inter-quartile ranges of about 5-6. 31
BOX & WHISKER’S PLOT
EXERCISE # 1:
A random sample of 50 observations on the mileage per gallon of a
particular brand of gasoline is shown:
33.2 29.1 34.5 32.6 30.7 34.9 30.2 31.8 30.8 33.5
29.4 32.2 33.6 30.4 31.9 32.8 26.8 29.2 31.8 27.4
36.5 38.1 30.0 29.5 36.0 31.5 27.4 30.4 28.4 31.8
29.8 34.6 32.3 28.2 27.5 28.8 28.4 27.7 27.8 30.5
28.5 28.5 27.5 28.6 29.1 26.9 34.2 28.5 34.8 30.5
Develop Box & Whisker’s Plot for analyzing the data.

EXERCISE # 2:
The following data represent the percentage of calories that come from fat
for burgers and chicken items from a sample of fast food chains.
BURGER
43 51 48 47 51 50 55 55 59 57
CHICKEN
60 54 53 57 57 46 45 56 57
Construct the Box & Whisker’s for analyze the data. 32
PROBABILITY
DISTRIBUTION OF DATA
Data generating process of the data is known as
Distribution of the Data.
For Example:
 In the manufacturing sector the measurements
such as length, diameter, etc usually follow
NORMAL Distribution
 In Service sector say Banks, the customer
waiting Time follow EXPONETIONAL
Distribution
 In Service sector say Banks, the number of
customers arriving follow POISSON Distribution
NORMAL DISTRIBUTION
 Characteristics of the normal distribution:
 Continuous distribution - Line does not break
 Symmetrical distribution - Each half is a mirror of the other half
 Asymptotic to the horizontal axis - it does not touch the x axis and goes on
forever
 Unimodal - means the values mound up in only one portion of the graph
 Area under the curve = 1; total of all probabilities = 1
 Normal distribution is characterized by the mean and the Std Dev
 Values of μ and σ produce a normal distribution

 x 
2

 
1
1 
f ( x)  e 2
  
 2
Where :
  mean of X
  standard deviation of X
 = 3.14159 . . .
e  2.71828 . . .  X
STANDARD NORMAL DISTRIBUTION

 1
 A normal distribution with
 a mean of zero, and
 a standard deviation of one
0
 Z Formula
 standardizes any normal
distribution
 Z Score X 
 computed by the Z Formula Z 
 the number of standard
deviations which a value

is away from the mean
NORMALITY TEST FROM
GRAPHIC SUMMARY OF DATA
Open the worksheet CRANKSH.MTW

 If Sk < 0, the distribution is


negatively skewed (skewed to
the left).
 If Sk = 0, the distribution is
symmetric (not skewed).
 If Sk > 0, the distribution is
positively skewed (skewed to the
right).
 The value of Skewness shows
data is not normal.
 P – Value is less than 5% (Value
of Alpha (mean level of
If ‘P’ value is > alpha; Data is Normal; significance)); shows data is not
otherwise it will be Not-Normal normal
NORMALITY TEST (Cont…)
 NORMALLY TEST:
o Generate a normal probability plot and performs a hypothesis test to
examine whether or not the observations follow a normal distribution.
For the normality test, the hypothesis are,
o Ho: Data follow a normal distribution Vs H1: Data do not follow a
normal distribution
o If ‘P’ value is > alpha; Accept Null Hypothesis (Ho)

NORMALITY TEST:
 In an operating engine, parts of the crankshaft move up
and down. AtoBDist is the distance (in mm) from the
actual (A) position of a point on the crankshaft to a
baseline (B) position. To ensure production quality, a
manager took five measurements each working day in a
car assembly plant, from September 28 through October
15, and then ten per day from the 18th through the 25th.
 You wish to see if these data follow a normal
distribution,
 so you use Normality test.
 Open the worksheet CRANKSH.MTW
NORMALITY TEST (Cont…)

INTERPRETING THE RESULTS:


The graphical output is a plot of normal probabilities versus the data. The data
depart from the fitted line most evidently in the extremes, or distribution tails.
 The Anderson–Darling test’s ‘p–value’ indicates that, at a levels greater than
0.022, there is evidence that the data do not follow a normal distribution.
 There is a slight tendency for these data to be lighter in the tails than a normal
distribution because the smallest points are below the line and the largest point is
just above the line.
 A distribution with heavy tails would show the opposite pattern at the extremes.
38
SCATTER PLOT
WHAT IS A SCATTER PLOT?
Is a graphical presentation of any possible relationship
between two sets of variables by a simple X-Y plot,
which may or may not be dependent.

39
SCATTER PLOT
What is the relationship between the X and Y Plot?

40
SCATTER PLOT
EXAMPLE: You are interested in how well your
company's camera batteries are meeting customers'
needs. Market research shows that customers become
annoyed if they have to wait longer than 5.25 seconds
between flashes.
You collect a sample of batteries that have been in use
for varying amounts of time and measure the voltage
remaining in each battery immediately after a flash
(VoltsAfter), as well as the length of time required for
the battery to be able to flash again (flash recovery time,
FlashRecov). Create a scatter plot to examine the
results. Include a reference line at the critical flash
recovery time of 5.25 seconds.
Open the worksheet BATTERIES.MTW 41
SCATTER PLOT
EXAMPLE (Cont…):

42
SCATTER PLOT

INTERPRETING THE RESULTS:


As expected, the lower the voltage in a battery after a flash, the
longer the flash recovery time tends to be.
The reference line helps to illustrate that there were many flash
recovery times greater than 5.25 seconds. 43
QUESTIONS

44

You might also like