08 GB DMAIC Basic Statistics Part 3
08 GB DMAIC Basic Statistics Part 3
BASIC STATISTICS
Part 3
Define
Measure
Analyze
Improve
Control
i 1
GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.1
g GE Global Research
Learning Objectives
2
GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.2
g GE Global Research
Statistics Fundamentals
Sample :
A small fraction (subset) of the population which the experimenter
chooses for study in order to make some statement about the
population.
Inference :
The conclusion drawn about the population based on the study of
sample.
Ah! Now I
understand!
Population Sample
Mean x
Variance s
Parameter Statistic
Types of Estimators
Point Estimator :
• Estimation of population in terms of a single number
Population
Point Estimator
Interval Estimator :
Population
Interval Estimator
X
X (grand average)
The
Thecentral
centrallimit
limittheorem
theoremstates
statesthat,
that,for
forlarge
largevalues
values
of
ofn,
n,the
thedistribution
distributionof
ofthe
thesample
samplemean
meanwillwillhave
have
approximately
approximatelyaanormal
normaldistribution,
distribution,even
eventhough
thoughthethe
individual
individualdata
datapoints
pointsmay
maybe benon-normal.
non-normal.
Any Distribution of x
x
Population Mean and Variance
x
Normal Distribution of x
/ n
2
x
2
x
x x
Mean and Variance of Sample Means
For n > 30, Mean of Sample means tends to be normally distributed
x x x
z
x n
1 Z Distribution
0 z
GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.9
g GE Global Research
Parent Population
Mean =
SDEV =
95 %
Confidence Intervals
• Means
• Standard Deviations
• Process Capability
It is the estimated
range of values which
is likely to include an
unknown population
parameter, the
estimated range being
calculated from a
given set of sample
data.
x x x
How much confident are you that x will contain ?
Confidence Interval
zc
Shaded Area under the curve is the Confidence Level .
Remaining Area is .
zcˆ
Use to arrive at zc
n
Population Distribution
Standard Normal Probability Curve
95 %
In this example we know the true mean and true standard deviation
of the total population.
• Suppose you are told only the value of the population and are asked
to estimate the value of from a sample with value, X, that is randomly
selected from the population.
• 95% of the time we are confident that the value of the unknown mean, ,
lies “somewhere” in the interval:
1
21
f t , 1
t
2 2
for t
2
Student's t-distribution
0.5
Probability Density
0.4 df=1
df=2
Function
0.3
df=5
0.2 df=10
0.1 N(0,1)
0
-4 -3 -2 -1 0 1 2 3 4
t
Mean =
SDEV =
95 %
0.5 0.5
0 0
-4 -3 -2 -1 0 1 2 3 4 -4 -3 -2 -1 0 1 2 3 4
z t
Do
Donot
notuse
usezzdistribution
distributionfor
fornn<< 30
30
X-
t= Compare to Z= X-
S/n
Student's t-distribution
0.5
Probability Density
0.4 df=1
df=2
Function
0.3
df=5
0.2 df=10
0.1 N(0,1)
0
-4 -3 -2 -1 0 1 2 3 4
t
95 % 95 %
95 %
4.30 0 4.30
t
Confidence Intervals
• Means
• Standard Deviations
• Process Capability
x 2
e 2
2 2
2
Chi Square Distribution
0.35
0.3
Probability Density Function
0.25 df=1
0.2 df=2
0.15 df=5
0.1 df=10
0.05
0
0 10 20 30
Chi-Square
Chi-squared distribution
If X is normally Then s2 is 2
distributed distributed
( X ) (n 1) s 2
Z 2
df n 1
2
e. g. Z (1 ) 1.645 for 5% e.g. 2 (1 ),df 9.488 for n 5, 5%
0.5 0.35
0.3
0.4
0.25 df=1
0.3 0.2 df=2
Probability Density Function
Probability Density
Function
22 distribution
distributionisisused
usedto tofind
findthe
the
confidence
confidenceinterval
intervalfor
forvariance.
variance.
Confidence Intervals
• Means
• Standard Deviations
• Process Capability
n = # samples used to
estimate
Mean and STDEV
Short Term
Long Term
P(d)LSL P(d)USL
1 Z2
[ ]
1/2
Z + 1.96 n + 2(n-1)
Hypothesis Testing
Hypothesis Testing
All
Potential
“X”s
Vital Few
“X”s
Ho
The starting point for a hypothesis test is the “null”
hypothesis - Ho. Ho is the hypothesis of sameness, or
no difference.
Example: The population mean equals the test mean.
Ha
The second hypothesis is Ha - the “alternative”
hypothesis. It represents the hypothesis of
difference.
Example: The population mean does not equal the
test mean.
•• You
Youusually
usuallywant
wanttotoshow
showthat
thatthere
thereis
isaa
difference
difference(H(Ha).). a
•• Start
Startby
byassuming
assumingequality
equality(H
(Hoo).).
•• IfIfthe
thedata
datashow
showthey
theyare
arenot
notequal,
equal,then
thenthey
they
must
mustbe
bedifferent
different(H
(Haa).).
Ho Innocent, Guilty,
Set Free Set Free
Set Free
Verdict
Verdict
Innocent, Guilty,
Ha Jailed Jailed
Jailed
Truth
Truth
Ho Ha
1 - = Chance
of detecting a
Type II specified
Correct Error
change in the
Ho population
Decision
(with the given
sample) if the
difference is
actually there to
Accept
Accept detect. Also
called “power
of the test.” In
Type I some respects,
to simultaneously
commit a Type I and
Type II decision error. 1 - = Confidence that an observed outcome in the
In short, either an alpha sample is “real” (i.e., the outcome is not due to random
or beta decision error sampling error and, therefore, reflects the true state-of-
can be made, but not
both. affairs in the population).
Hypothesis Testing
ˆ old 10.0
ˆ new 11 .1
Example:
2 2
ˆ test ˆ old ˆ new
2 2
ˆ test ˆ old ˆ new
t-distribution for
10 degrees of freedom
(based upon 2 samples of 6 each) 1.1 (11.1 – 10.0)
t = 1.88
-5 -4 -3 -2 -1 0 1 2 3 4 5
The p-value
The p-value is the probability of making
a Type I Error.
Unless there is an exception based on
engineering judgment, we will set an
acceptance level of a Type I error at
= 0.05.
Thus, any p-value less than 0.05
means we accept the alternative
hypothesis.
Truth
Truth
Ho Ha
Ho
Accept
Accept
Ha p-value
df = 4
Different LCL Same Process UCL Different
Process Process
= 2.5% 95% 2.5% =
2 2
x
t = 2.776 t = 2.776
Confidence Interval
Risk (Rejection Region for Ha) Risk
LCL = x – t /2 UCL = x + t /2
n n
LCL = x – 2.776 UCL = x + 2.776
5 5
There is 95% certainty that the true population mean will be contained within the
given confidence interval. If we observe a sampling average greater than UCL or less
than LCL, we may conclude that such an event could only occur 5 out of 100 by
random chance (sampling variations).
Hypothesis Testing
ˆ 12
F
ˆ 22 Ronald A Fisher
Fisher's F Distribution
5.0
Probability Density
4.0
F(1,1)
Function
3.0 F(1,2)
2.0 F(2,1)
F(10,1)
1.0
0.0
0.00 0.20 0.40 0.60 0.80 1.00
F
F distribution
If X is normally Then s12/ s22 is F
distributed distributed
21 / 1 s12
( X ) F1 n1 1 2 2
Z 2 n2 1 / 2 s2
2
0.5 5.0
0.4 4.0
F(1,1)
0.3 3.0 F(1,2)
Probability Density
Function
Probability Density
Function
0 0.0
-4 -3 -2 -1 0 1 2 3 4 0.00 0.20 0.40 0.60 0.80 1.00
z F
FFdistribution
distributiongives
givesp-values
p-valuesfor
for
two-sample
two-samplevariance
variancecomparisons.
comparisons.
GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.42
g GE Global Research
Hypothesis Testing
Example:
Variation associated
with machine #1 Variation associated with
(group #1) machine #2 (group #2)
c n
SST = ( X ij X )2
i
j 1 i 1
c n j
X ij
j 1 i 1 is called the overall or grand mean
X
n
Between-Group Variation
(Sum of Squares)
c
SSB = nj ( X j X ) 2
j 1
Within-Group Variation
(Sum of Squares)
n
c j
SSW = ( X ij X j ) 2
j 1 i 1
n j 1 i 1
X j 1 i 1
n
18 11.84 111.95
Home Work
nj X j SSW j ( X ij 2
Group Values
X j)
i
1 1 2 3
2 2 3 4
3 3 4 5
4 4 5 6
5 5 6 7
n j 1 i 1
X j 1 i 1
n
To See Answer to
Hypothesis Testing
level 1 level 2
the gap
Between group
variation (signal)
the delta (
Total variation
Within
Withingroup
group
variation
variation(noise)
(noise)
Variation Split
Total variation
Within group
variation (noise)
One-Way ANOVA
Example Data
Group means
Mean
Machine #1 9.64 11.08 9.88 8.43 10.41 11.04 10.08
One-Way ANOVA
Assume Same
Parent Population
Mean =
SDEV =
Machine # 1 Machine # 2 Machine # 3
9.64, 11.0,…. 8.18,11.99,… 14.91,13.25,…
SDEV = SSD
S2 estimate 55.45/2
SD
S2 estimate 27.73
P
df = 2
To See Answer to
ni x
1
x SSB ni (xi x )2 SST ( xij x ) 2
i ni
i
i j
ij
i
ˆ LT
SST SSW i
ˆ ST i
ni 1
i
(n 1) i
i
To See Answer to
ANOVA HomeWork
More on Hypothesis
Equality = Simple
Inequality > Directional / One sided Composite
< Directional / One sided Composite
= Non -Directional / Two sided Composite
One Sample Hypothesis for Mean One Sample Hypothesis for Variance
0= 0=
a=
a=
Two Sample Hypothesis for Mean Two Sample Hypothesis for Variance
0= 0=
a=
a=
Hypothesis Test
Hypothesis Test