0% found this document useful (0 votes)
45 views85 pages

Data and Basic Stats Rev C 1-25 (Compatibility Mode)

This document discusses key concepts in data and basic statistics for Six Sigma. It introduces the 12 steps of the Six Sigma DMAIC methodology and the types of data, including attribute/discrete and variable/continuous data. Continuous data is preferable as it provides more precise information. A data collection plan is created to determine what data is needed from a process and how it will be collected. Rational subgroups ensure data only includes normal short-term variation and excludes special causes of variation.

Uploaded by

SMAK
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views85 pages

Data and Basic Stats Rev C 1-25 (Compatibility Mode)

This document discusses key concepts in data and basic statistics for Six Sigma. It introduces the 12 steps of the Six Sigma DMAIC methodology and the types of data, including attribute/discrete and variable/continuous data. Continuous data is preferable as it provides more precise information. A data collection plan is created to determine what data is needed from a process and how it will be collected. Rational subgroups ensure data only includes normal short-term variation and excludes special causes of variation.

Uploaded by

SMAK
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 85

Data and Basic Statistics

Introduction To Data

1
Six Sigma Breakthrough Steps

Step 1 - Select Output Characteristic


Define
- Identify Process Input/Output Variables
Step 2 - Define Performance Standards
Step 3 - Validate Measurement System
Measure Step 4 - Establish Process Capability
Step 5 - Define Performance Objectives
Step 6 - Identify Variation Sources
Analyze
Step 7 - Screen Potential Causes
Step 8 - Discover Variable Relationships
Step 9 - Establish Operating Tolerances
Improve
Step 10 - Validate Measurement System
Step 11 - Determine Process Capability
Control Step 12 - Implement Process Controls

2
Basic Data: Questions to Answer

• What is data?
• What are the different types of data?
• Why is continuous data better?
• What is a data collection plan?
• What is a rational subgroup?

3
Data

An individual fact or a
collection of facts about
something is called DATA

Information about Something

4
Types of Data

 Attribute (Discrete) Data (Qualitative)


− Categories
− Yes, No
− Go, No go
− Pass / Fail
− Good / Defective
− Computer Equipment Failures, Number of defects
− Attribute data is not capable of being meaningfully subdivided into

more precise increments


 Variable (Continuous) Data (Quantitative)
− Continuous Data
> Decimal places show absolute distance between numbers
> Time, Finance charges, length, width

5
Discrete Vs. Continuous Data

DISCRETE DATA EXAMPLES

SHIPPING ORDER FAIL PASS

QTY UNIT DESCRIPTION TOTAL


1 $10.00 $10.00
3 $1.50 $4.50
10 $10.00 $10.00
NO-GO GO 2 $5.00 $10.00 Electrical Circuit

CONTINUOUS DATA EXAMPLES

TEMPERATURE
Caliper
Thermometer
Time

6
Discrete Vs. Continuous Data

• To obtain the same level of


understanding regarding a process:

• Discrete Data
• Provides sparse information

• Continuous Data
• Is rich with information

7
Categories of Scales

Categories of scale Description Examples


First Take Away is the difference between
Discrete -Nominal: unrelated
• Grouping / sorting • Categories
categories which represent
Discrete/Attribute
• Yesand Variable/Continuous
/ no, pass / fail • Labels
membership or non-
membership.
• Ranking • 1st, 2nd, 3rd
Discrete - Ordinal: ordered
categories with no • Scale of 1 to 10 • Relative height
information about distance • Alphabetic order
Second Take Away is the difference between
between categories. • 1<2<3<4
Discrete/Nominal
Variable - Interval:ordered with and
• Most common Discrete/Ordinal
scale
• Continuous data • Dial indicator transfer
categories of equal
Discrete/Ordinal can sometimes use some Continuous
distance, but no absolute
gage Data Tools
• Air speed indicator
zero point.
Variable - Ratio :ordered with • Calipers
• Continuous data
categories of equal • Ruler
distance, but with an
absolute zero point.

8
What about this?

• Given: I make 3,000 to 4,000 parts a shift/3


shifts a day and I use a Pass/Fail visual
inspection.
– Question: Am I stuck using discrete data?
• Maybe not: I could calculate a % defective
per shift and using that as my measurement
get 15 “continuous” readings per week
– If you have such data, talk to your Black Belt about
limitations/cautions of using such data.

9
Or This?

• I have production which I can sort at the end


of the line into 10 categories, with 1 being the
best and 10 being scrap.
– Question: Can I use continuous tools?
– Yes, but with caution and be aware that you only
have 10 divisions
• Guidelines:
– Have at least 10 divisions.
– Have the “distance” between each
division/category be as consistent as possible.

10
Or this?

• I am measuring if we are shipping on time.


– I could use a discrete measurement for each lot:
Was it on time or not?
– BUT I can, in many cases, use variable data:
measure days or hours early/late.

11
Data & Statistics

Important:
• DATA, by itself, DOES NOT provide
information.
• You have to MANIPULATE the data for it
to give information.
• We use STATISTICS to manipulate data.

12
Statistical Techniques

• There are statistical techniques to cover


all combinations of data types.
Outputs
Discrete (Attribute) Continuous (Variable)
Discrete Analysis of
Chi-square
(Attribute) Proportions Test Variance,T-Tests
Inputs

Continuous Discriminant Analysis Correlation


(Variable) Logistic regression Simple Regression

These are some of the statistical techniques that are used to


drive Process Improvement. Are the ones you will be
exposed to.
13
Where Do We Get Our Data?

• To do real projects, we need real data.


• Real data can be messy, ugly, and hard to
find.
• One primary tool is to think out ahead of time
what you think you need, and create a clean,
easily understood plan and perhaps a form to
collect your data.

14
Data Collection Plan Questions

• What do you want to know about the process?


• How do you want to see what it is that you want to
know?
• What type of tool will generate what it is that you
need to see?
• What type of data are required of the selected tool?
• What are the likely causes of variation in the
process? (X’s)
• Are there cycles in the process?
• Who will be collecting the data?

15
Data Collection Plan Questions (continued)

• How long do you need to collect data to capture a


true picture?
• How will you test your measurement system?
• Are the operational definitions detailed enough?
• How will you display the data?
• Is data available? If not, what will you prepare for
data collection sheets?
• Where could data collections occur?
• What are your collection plans?

16
Data Collection Plan Model

Plan
Activities

Answer
Critical
Questions

Execute Data
Collection Plan

17
Rational Subgroups

A rational subgroup is a group of data, usually taken


together at one time, that includes ONLY short term
variation

• Rational Subgroups represent Short term data. Do not collect


a subgroup over a significant process “event” such as operator
change, tool change, material coil change, start up cycles,
before and after lunch, etc.

• If you take data over these conditions, it may contain and hide
special or assignable causes that should be attributed directly
to that special cause.

• Rational subgroups need to be taken on standard production


only, not special cases such as production trials.
18
Basic Data Questions to Answer

• What is data?
• What are the different types of data?
• Why is continuous data better?
• What is a data collection plan?
• What is a rational subgroup?

19
Basic Data Questions Summary

• Data is an individual fact or a collection of facts


about a topic of interest.
• The two types of data are Attribute and Variable.
• Attribute data is primarily qualitative. Variable
data is quantitative.
• Variable continuous data is better because it is
rich with information. Discrete data gives much
less information.

20
Basic Data Questions Summary

• A data collection plan is an outline for execution


of gathering necessary data regarding your
project.
• A rational subgroup is a group of data, usually
taken together at one time, that includes ONLY
short term variation.

21
Basic Data Lessons Learned

• A well thought out and executed data collection


plan ensures the correct data has been recorded
and all causes of variation have been observed.
• Statistical significance is validated when reliable
data is used as the source for analysis.
• The data collection plan must be clearly
understood by all data collectors.
• A trial or pilot data collection exercise is a must.

22
Basic Data: Deliverables

• A well thought out Data Collection Plan for


gathering data regarding your project problem.
• Entire team has the same understanding as to
why, where, when, who and how to collect the
data.
• Collection of reliable data allowing forward
progress of the project.
• Source data which can be manipulated to provide
the information needed about the process.

23
Data Collection Exercise

• You know what your project is going to improve.


• If I ask you to measure how good are you now
can you answer the following questions?
– Is it variable data or discrete?
– If variable, is it or can it be collected by a subgroup?
– If Discrete, can you tell me how many bad OUT OF
HOW MANY?
– Is data presently collected?
– Is data on a paper form, electronic, or other?
– Is it easy to get the data?
– Do you know the units of measure?
24
Data and Basic Statistics
Basic Statistics

25
Six Sigma Breakthrough Steps

Step 1 - Select Output Characteristic


Define
- Identify Process Input/Output Variables
Step 2 - Define Performance Standards
Step 3 - Validate Measurement System
Measure Step 4 - Establish Process Capability
Step 5 - Define Performance Objectives
Step 6 - Identify Variation Sources
Analyze
Step 7 - Screen Potential Causes
Step 8 - Discover Variable Relationships
Step 9 - Establish Operating Tolerances
Improve
Step 10 - Validate Measurement System
Step 11 - Determine Process Capability
Control Step 12 - Implement Process Controls

26
Basic Data: Questions to Answer

• What are statistics?


• What are the measures of central tendency?
• What are the measures used for variation?
• Why do we concern ourselves with stability?
• What is a distribution, a “normal” distribution?
• Why is the “area under the curve” important?
• How does Z-bench differ from Cpk?
• What links a sample to the population?
27
Statistics

• Statistics is the organization, analysis, and


interpretation of data.
– Yards per carry
– Earned run average
– Miles per gallon
• Statistics is how we make sense out of
hundreds or thousands of bits of individual
data.

28
Statistics – Benefits of Plotting the Data

STATISTICS ARE TOOLS. Like any other tool


they can be misused, resulting in misleading,
distorted, or incorrect conclusions. It is not
sufficient to be able to do the computations. One
must also be able to make the correct
interpretations. An important analysis tool for
statistical support is to ALWAYS PLOT THE DATA.

29
Variability, Centering, and Stability

• Variability
– How much does a process vary? We all know
that every process has some movement, not
every piece will come out “exactly” the same.

– The most common measure of variation is the


Standard Deviation (σ). We will key on this.

– Other measures of variation are:


• Range (Maximum-minimum values)
• Variance
• Sum of the squares

30
Variability Measures - Formulas

• Range: Numerical distance


Range = max− min
between the highest and the lowest
values in a data set.
n
• Variance σ2
(σ s2
; ): The average
squared deviation of each
∑ i
(X − X ) 2

s2 = i =1
individual data point from the n −1
mean.
n
• Standard Deviation (σ σ ; s): The
square root of the variance. Most
∑ i
(X
i =1
− X ) 2

s=
commonly used measurement to n −1
quantify variability.
Computers do all the hard work 31
Variability Exercise

• You will be broken up into 3 teams (3 shifts) and


each given your production by day for last week.

• The instructor will ask each team to calculate its


variation using the range as the measuring method.

• The instructor will then calculate the overall variation


of the class, again using the range. Why is the range
for all 3 shifts greater than any one shift?

• The instructor will then provide the standard deviation


for the same data.
32
Components of Variation

• Common Cause:
– This is the normal “bouncing around” of any process.
– This is what we saw within each of the 3 teams/shifts.
– To reduce this type of variation, we usually need to
modify the process or technology.
• Special or Assignable Cause:
– This is the variation due to an “assignable” input such
as each shift using different targets, change of
material vendors, using tools past their replacement
point.
– This is what we saw between the 3 teams/shifts.
– To reduce this type of variation, we usually need to
develop and enforce better controls for our process.
33
Variability, Centering, and Stability

• Centering – Measures of Central Tendency


– Where is the process located? Where is its
“average”
– The most common measure of central tendency
is the mean (µ)
µ) (pronounced mu). Frequently
called X. n
• This is the traditional arithmetic average, ∑ xn
add them all up and divide.
– Other measures of central tendency are:
x= n=1
n
• Median: Reflects the 50% rank, the center
number after a set of numbers has been sorted. Does not
necessarily include all values in the calculation and is “robust”
to extreme scores.
• Mode: Most frequently occurring value in a data set.

34
Measures of Central Tendency - Exercise
• Calculate the Mean, Median and Mode for the each data set
shown below. Use the space provided in the chart for your
answers.
Index Data Set 1 Data Set 2 Data Set 3
A 5 3 9
B 6 6 1
C 4 3 1
D 5 4 8
E 5 3 1
F 7 4 6
G 4 16 10
H 7 4 1
I 6 5 7
J 3 3 1
K 3 4 10

Statistic Data Set 1 Data Set 2 Data Set 3


Mean
Median
Mode

35
Variability, Centering, and Stability

• Stability (process must be stable before it


can be improved)
– How does the process perform over time?
– Stability is represented by a constant mean
and predictable variability over time.
If a process is stable, its variation will be centered around
a stable and constant average. The process might or might not
be good, BUT you can predict what it will make and you can
measure improvements you make to it.
NO process can be measured (baselined) until it has stabilized.
How can you define the capability of a process that can be
different from one day to the next?
36
Stability Questions
Run Chart for meters Run Chart for Feet
26 45

25
meters

35

Feet
24

23
25

5 15 25 2 12 22
Observation Observation

What are the approximate means of these two processes?


If this data represents this week’s data, what are the means of
the two processes likely to be next week?

37
No Stability = “all bets are off”

• If you were betting on


the dice and all of a
sudden, the pair of dice
rolled a “17”, what
would you do?
• Could you comfortably
predict that the next
rolls would be between
2 and 12 as you had
previously believed?
• Once you have lost
stability, all predictive
ability is gone.
Something fishy is
going on.
38
Statistics – General Exercise

• Machines A, B, and C make identical products (range charts in control),


the target value for each product output variable is 100 mm.
1. Which machines exhibit(s) variation?
2. Where is each machine centered?
3. Which machines are predictable over time?
4. Which machines have special cause variation?
5. Which machine would you want making your product for today’s
order?
6. Which machine would probably be easiest to fix?
X-bar Chart for Machine A X-bar Chart for Machine B X-bar Chart for Machine C
145 110 1
120
135 138.4 108.5 119.7

Sample Mean
Sample Mean

125
Sample Mean

115
105 100 X=101.0
X=100.7 115 X=115.0
95
85 93.42
75
90
65 62.93 110.4
55 1 110
0 10 20 0 10 20 0 10 20
Sample Number Sample Number Sample Number

39
Statistics - Mini Road Map

• Follow these steps when you are using data to improve


a process. Always do step 1 (stability) first, Step 2
(Variation) is usually done before step 3 (centering).
• 1) Determine if process is stable. If not, identify and
remove causes (X’s) of instability (obvious non-random
variation).

• 2) Estimate the magnitude of the total variability. Is it


acceptable regarding specification limits? If not, identify the
sources of variability and eliminate or reduce their influence
on the process.
• 3) Determine the location of the process mean. If not on
target, identify (X’s) which affect the mean and determine
optimal settings to achieve target value.

40
Variation Is the Enemy

• You are in class this morning and the


temperature in the class room is 50 deg F.
Are you comfortable?
• OK, so I turn up the temperature to 90 deg F.
Are you comfortable now?
• What’s your problem, You had an average
temperature of 70 deg F for the day.

Customers feel the variation


More than the Mean

41
Shaft and Bushing Example

If you make to engineer’s targets of 1.000” and 1.002”;


you have .001” clearance per side (.002” total).
Specifications
Bushing 1.002” -.000”/+.005”

Shaft 1.000” -.005”/+.000”

Bushing
Clearance
IDbush - ODshaft = Clearance
1.002” - 1.000” = .002” (total)

Exactly to Nominal = OK  Rotating Shaft

42
Example Continued
• If you use the entire tolerance, you could have a shaft
of .995”, and a bushing of 1.007” for a clearance per
side of .006” (.012” total).

• Take Away: Variation is the enemy, increase in slop


causing rattle and premature wear (but within
tolerance and therefore good?).
Bushing
IDbush - ODshaft = Clearance Clearance (Slop)
1.007” - .995” = .012” (total)

Within Tolerance But 6X


The Desired Clearance Rotating Shaft

43
Distributions

• We can describe the behavior of any


process or system by plotting multiple data
points for the same variable over time, across
products, on different machines, etc.
• The accumulation of these data can be
viewed as a Distribution of Values and
represented by dot plots, histograms, or
“smoothed” distributions.

44
Dotplot & Histogram

:
:
. . . : . .
:: : :::.:: :: . ::
. : .. .:.:.:::::::::::::::.::.::::..: : .
-------+---------+---------+---------+---------+-------GPM
49.00 49.50 50.00 50.50 51.00

4 0

3 0
Frequency

2 0

10

4 8 .8 4 9 .3 4 9 .8 5 0 .3 5 0 .8 5 1. 3
G PM

45
Smoothed (Normal) Distribution

• A smoothed distribution or “normal distribution”


assumption plot provides an approximation of how the data
might look if an infinite number of data points were collected.

48 .0 48 .5 4 9.0 49 .5 50.0 50.5 51.0 51.5 5 2.0

46
Normal Distribution

• Most processes in the world result in a Normal


Distribution.
– Most of the values are near the center of the process.
– As you get further away from the mean or center, you
get less and less points from that process.
• To COMPLETELY define a Normal Distribution,
you only need two pieces of information.
– You need to know where the center of the distribution is
located. We will use the mean to do this.
– You need to know how wide the distribution is. The
width is the variation, how far can points vary from the
center. We will use the standard deviation for this.

47
The Normal Distribution - Properties

Lets look at two properties of the Normal Distribution


(1): We have already stated that a normal
distribution can be described completely by knowing
only the mean and standard deviation.
(2): The areas under sections of the smoothed
curve can be used to estimate the cumulative
probability of a certain “event” occurring.
We will concentrate on understanding
the first item.We will come
back to the second item later.

48
The Normal Distribution – Property 1

• It is obvious that if we know the mean or center of


a process, we can locate the center of our Normal
Distribution.
• But how does knowing the standard deviation let
us know how to finish the drawing of the normal or
bell curve?
+/- 1 standard deviation covers 68.26% of all events
+/- 2 standard deviation covers 95.44% of all events
+/- 3 standard deviation covers 99.73% of all events
Also the point of inflection is at 1 standard deviation

49
Normal Curve Probabilities
Probability of sample value

68%
40%
Point of
30% 95% Inflection
20%
99.73%
10%

0%
-4 -3 -2 -1 0 1 2 3 4

Number of standard deviations from the mean


NOTE: even though it looks like the curve ends at +/- 3 Sigma, it actually continues

50
Normal Curve - Exercise 1

What is the mean?


What is the standard deviation?
-3σ -2σ -1σ 0σ 1σ 2σ 3σ Standard
40% Deviations
30%
From mean

20%

10%

0%
2 4 6 8 10 12 14 16 18 Inches

51
Normal Curve - Exercise 2

What is the mean?


What is the standard deviation?

40%

30%

20%

10%

0%
-4 -3 -2 -1 0 1 2 3 4

52
Normal Curve - Exercise 3

What is the mean?


What is the standard deviation?

40%

30%

20%

10%

0%
-8 -5 -2 1 4 7 10 13 16

53
Normal Curve - Exercise 4

What is the mean?


What is the standard deviation?

40%

30%

20%

10%

0%
4 5.5 7 8.5 10 11.5 13 14.5 16

54
Normal Curve - Exercise 5

Given the mean of 20 and a standard deviation of 5


Fill in the blanks.

40%

30%

20%

10%

0%

55
How do I know if my data is Normal?

• We can test whether a given data set can be


described as “normal” with a test called a Normal
Probability Plot. If a distribution is close to normal,
the normal probability plot will be a straight line.
• Study the Normal Probability Plot and the Histogram
below.
– Does the line represent a “normal” distribution of data?
– Does the Histogram look like a normal or bell curve?.
Normal Distribution
Normal Probability Plots
.999

100 .99
.95

Probability
.80
Frequency

.50

50 .20
.05
.01
.001

0
26 36 46 56 66 76 86 96 106
20 30 40 50 60 70 80 90 100 110 Normal
C1 Average: 70 Anderson-Darling Normality Test
Std Dev: 10 A-Squared: 0.418
N of data: 500 p-value: 0.328

56
Normal Probability Plots (continued)

• What are your conclusion(s) for the Histograms and


Normal Probability Plots shown below?
Normal Probability Plots Normal Probability Plots
300
300

200
200
Mystery Distribution

equency
equency

Freq
Freq

100 100
.999
.99
0 0 .95

Probability
60 70 80 90 100 110 120 130 0 10 20 30 40 50 60 70 80 .80
C2 C3
.50
.20
.05
Positive Skewed Distribution Negative Skewed Distribution .01
.001

.999
50 100 150
.99
.999 .95 Mystery
.99 .80 Average: 100 Anderson-Darling NormalityTest
Probability

Probability

.95 .50 Std Dev: 32.3849 A-Squared: 27.108


.80 Nof data: 500 p-value: 0.000
.20
.50 .05
.20 .01
.001
.05
.01
.001

60 70 80 90 100 110 120 130 0 10 20 30 40 50 60 70 80


Pos Skew Neg Skew
Average: 70 Anderson-Darling Normality Test Average: 70 Anderson-Darling Normality Test
Std Dev: 10 A-Squared: 46.447 Std Dev: 10 A-Squared: 43.953
N of data: 500 p-value: 0.000 N of data: 500 p-value: 0.000

57
Stop

58
Z Transformation
Z transformation takes a normal distribution
and translates it to a normalized distribution 10 USL
Lets assume a process
with a mean of zero and a standard
Mu = 10 and
deviation of 1.0).
Std Dev = 2
Z= X-µ Z = 13 - 10 = 1.5
σ 2
Question 1: if my
tolerance is 13,
how many inches
X scale = units
away am I from Are Inches
my Mean ? 13
4 6 8 10 12 14 16
Question 2: If std
deviation is 2,
Z scale = units
How many std dev Are standard
Is my tolerance from -3 -2 -1 0 1 2 3 deviations
my mean?
59
Z Transformation - Exercise
Z = X- µ First Question, what is the mean and standard deviation?
σ

X Z
? 1
10 ?
6 ?
? -3
? 1.5 X scale = units
? -2.25 Are Inches

13 ? 4 6 8 10 12 14 16

15.5 ? Z scale = units


? 4 Are standard
-3 -2 -1 0 1 2 3 deviations
? -4
60
The Normal Distribution – Property 2

Lets look at two properties of the Normal Distribution


(1): We have already stated that a normal
distribution can be described completely by knowing
only the mean and standard deviation.
(2): The areas under sections of the smoothed
curve can be used to estimate the cumulative
probability of a certain “event” occurring.

We will now discuss


the second item.

61
Probability

The relationships between samples and


populations most often are described in terms of
probability. Probability is the link that lets one
predict population behavior based on a sample.
For an independent variable, the probability is expressed as
a real number between (0) and (1) that defines the likelihood
of a particular outcome compared to all possible outcomes.
For (6) sided dice: P (roll=6) = 1/6 = 0.1666
For a coin: P (flip=head) = 1/2 = 0.500
The sum of probabilities for all remaining cases is equal to
(1 - certainty).
62
Normal Curve Probabilities

Example:
68% of the
Probability of sample value

40%
68% Points will be
Between
30% 95% plus and
20%
minus one
Standard
99.73%
10% Deviation.
0%
-4 -3 -2 -1 0 1 2 3 4

Number of standard deviations from the mean


Important concept: The area under the standard normal curve is 1.000
63
Probability - Exercise

Given what we know from previous slide, answer the following:


X z % of % of
area area
to to left
right of X
of X
10
12
14
16 X scale = units
Are Inches
8
6 4 6 8 10 12 14 16
4
Z scale = units
Are standard
-3 -2 -1 0 1 2 3 deviations

64
Probability – Exercise method

For X = 12, Z = +1
The area under the
curve to the right is 16%
and the area under the
curve to the left is
68%
16+68 = 84%
ALTERNATE
To get from mean to 16% 16% X scale = units
1 sigma is half of the Are Inches
68% between +/- s. 4 6 8 10 12 14 16
Add this 34% to the
Z scale = units
50% to get from left Are standard
To mean = 84% -3 -2 -1 0 1 2 3 deviations

65
Z Table

• Instructor will pass out and demonstrate use


of a Z- table.

• This is how you can find the area under the


curve for any point.

• Excel and Minitab also do this very quickly.

66
Z-Table Exercises

• Find the area under the curve to the right and


to the left of each of the following Z values:
1.1, 2.4, 3.2, 0.45, -2.2, -1.75

• Given a process with a mean of 20 and a


standard deviation of 4, find the area under
the curve to the right and to the left of each of
the following X values:
22, 26, 20, 18, 14

67
Z Transformation - Use

X −µ
The Standardized Z Transformation: Z=
σ
X − µ
Suppose the diameters Z =
σ 47.5
of shafts are normally 47 . 5 − 45 DEFECTS
distributed with a mean Z =
1 to the right
of (45) and a standard Z = 2 .5 of the USL
deviation of (1). The From a Z-table the probability that a shaft is less
customer derived upper then (47.5) is 99.37% and the probability of a
specification limit is defect is (1 - .9937) or .0063%.
(47.5). What is the
DPMO for this process. DPMO = .0063 x 1,000,000 = 6,300

Knowing the Distribution and the Specification


Limits, allows the prediction of Capability!
68
Z Transformation DPMO Calculation

Lets assume a process 10 USL

µ = 10 and Std Dev = 2


Z = 13 - 10 = 1.5
2
Question: if my
Probability
tolerance is 13,
Of a defect
What % of my
Is 6.68%
Production is
Defective? 13
(Red area under 4 6 8 10 12 14 16
The curve)
Answer: use Z table
Or Minitab for Z =1.5 -3 -2 2 3
-1 0 1

69
Z Transformation
DPMO Calculation For a Lower Spec
Same process 10

µ = 10 LSL = 8
and Std Dev = 2
Z = 8 - 10 = -1
2
Question: if my Probability
LSL is 8 Of a defect
What % of my Is 15.87%
Production is
Defective?
(Green area under 4 6 8 10 12 14 16
The curve)
Answer: use Z table
Or Minitab for Z =-1 -3 -2 2 3
-1 0 1

70
Z Transformation
DPMO Calculation Z bench
LSL 10 USL
Probability
Of a defect
Question: if my below LSL
USL is 13 and my LSL Is 15.87%
Is 8, What % of my
Production is Probability
Defective? Of a defect
(Red and green areas Past USL
Under the curve) Is 6.68%
Answer: use Z table
Or Minitab for Z =1.5 13
4 6 8 10 12 14 16
And Z = -1 and add
The probabilities of
Defects on both sides
-3 -2 -1 0 1 2 3

71
Z Transformation
Z Bench Calculation for Combined Defects
10 11.5

Question:
P. USL = 6.68 %
P. LSL = 15.87%
P. Total = 22.55%
If I threw all my
Total
defects on one side,
Probability
How many std dev
of a defect
would fit between the
is 22.55%.
mean and the line
4 6 8 10 12
From Z table
where the defects 14 16
or Minitab
start?
Find Z = .75
Answer: use Z table
Or Minitab for p=.2255 -3 -2 -1 0 1 2 3
0.75
72
Z Transformation
Z Bench Calculation for Combined Defects
Total Probability of a defect is 10 11.5
22.55% (area under curve to right)

Z-bench is 0.75, you can fit


+0.75 standard deviations
between the mean and the
point of interest. Total
Probability
Of a defect
Is 22.55%
4 6 8 10 12
From Z table
14 16
Or Minitab
Find Z = .75

-3 -2 -1 0 1 2 3
0.75
73
Z Bench versus Cpk and Ppk

LSL 10 USL
Cpk and Ppk take into Z bench takes into
account only those account all of the
defects associated with defects.
the closest spec limit.

13
4 6 8 10 12 14 16

-3 -2 -1 0 1 2 3

74
Population Vs Sample

• Population: An entire group of objects that have been, or will


be, made containing a characteristic of interest.
• Sample: The group of objects actually measured in a
statistical study which is usually a subset of the population of
interest.
“Population Parameters” “Sample Statistics”
µ = mean X = mean
σ = standard deviation Million s = standard deviation
marbles

100
marbles
Sample

Sample Statistics Approximate Population Parameters


75
Population Vs Sample

• Population:
– Are all the parts that are.
– Are difficult and expensive to measure because of volume
• Sample:
– Is a small subset of the population
– Is selected randomly to best represent population
– After a change to a process, a new sample can be easily taken and
used to determine if an improvement has truly been made
• Note:
– In actual usage it is common for the Population Parameters of (σ =
population standard deviation) and (µ = population mean) to be
frequently substituted for the Sample Statistics of (s = standard
deviation of a sample) and (X bar = sample mean).
Sample

Samples are Windows to Seeing the Population


76
Exercise

• Procedure:
– Set up the catapult and keep all conditions except
ammo type fixed for this exercise.
– Pull back angle must be selected and fixed at
approximately half way back.
– Select and use a ping pong ball and a die or different
ball for the two types of ammo.
– Launch five test fires with each type to estimate range.
– Two inspectors silently record distance values for 20
launches of each ammo type using same operator.
– Perform appropriate analysis as listed on following
pages.
– Input measured distance by averaging the prerecorded
data from the two different inspectors (data sheet on
next page).
77
Exercise - Data Sheet
Date: Line/team
Operator: Angle special
Inspector: cause
Test Fire # Ping Pong Die notes
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
78
Exercise - Deliverables

1. Using the computer, the instructor will calculate


the mean, median, range and standard deviation
for each ammo. Then repeat for one ball
evaluating the differences between the two
inspectors.

2. Using data for just one ball use computer to


generate a histogram of the total data distribution
(Graphical summary)

3. Any conclusions with respect to ball type?


Inspector?

79
Basic Data Lessons Learned

• Understand the the mean and standard deviation of a data


distribution.
• The mean and the standard deviation are most
commonly used in process improvement efforts
because:
• The mean reflects the influence of all data values
• The standard deviation best quantifies the process
variability.
• Understand the normal distribution and how the area
under the curve can be interpreted as % defective.

80
Basic Data Lessons Learned

• Understand the typical sequence of process


improvement based on obtained data values:
• stabilizing the process is first;
• elimination of non-normal variation is second;
• reduction of normal variation is third;
• centering the process is fourth.

81
Basic Data Questions to Answer

• What are statistics?


• What are the measures of central tendency?
• What are the measures used for variation?
• Why do we concern ourselves with stability?
• What is a distribution, a “normal” distribution?
• Why is the “area under the curve” important?
• How does Z-bench differ from Cpk?
• What links a sample to the population?
82
Basic Data Answers Summary

• Statistics is the organization, analysis, and


interpretation of data.
• Measures of central tendency of data include the
mean, median and mode.
• Measures of variation of data include the range,
variance and standard deviation.
• If a process is not stable, no information can be
obtained from it. We can not baseline it as we
can calculate neither an accurate mean nor an
accurate standard deviation.
83
Basic Data Answers Summary (continued)

• A distribution is an accumulation of data, usually


viewed as a plot, which shows the process variability
and stability over time. A normal distribution is a data
plot with a constant mean and predictable variability
over time. It is a bell shaped curve.
• Given a normal distribution and spec limits, the area
under the curve is used to predict the % defective. It
is also used to provide metrics such as Z-bench or
Cpk.
• Z-bench takes into account defects on both sides
(upper and lower spec violations), while Cpk and Ppk
look at only the worst of the two sides.
84
Basic Data Answers Summary (continued)

• The probability or likelihood of an outcome occurring


is the link that allows one to predict population
behavior based on sample data. In other terms, the
sample mean and standard deviation are used to
predict the population mean and standard deviation.

85

You might also like