0% found this document useful (0 votes)

16 views68 pages

08 GB DMAIC Basic Statistics Part 3

DMAIC Basic Statistics Part 3

Uploaded by

dinesh.munaswamy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views68 pages

08 GB DMAIC Basic Statistics Part 3

DMAIC Basic Statistics Part 3

Uploaded by

dinesh.munaswamy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 68

g GE Global Research

Green Belt DMAIC Workshop

BASIC STATISTICS
Part 3

Define
Measure
Analyze
Improve
Control

i 1
GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.1
g GE Global Research

Learning Objectives

 Understand the terms sample and

population.
 Understand the terms parameter and
statistic
 Understand the terms point estimator and
interval estimator
 Understand the Central Limit Theorem
 Understand Confidence Interval
 Understand what is hypothesis
 Understand various distributions
 Understand ANOVA

2
GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.2
g GE Global Research

Statistics Fundamentals

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.3

g GE Global Research

Population and Sample

Universe / Population :
The set of all potential observations about which the experimenter
wishes to make some general statement.

Sample :
A small fraction (subset) of the population which the experimenter
chooses for study in order to make some statement about the
population.

Inference :
The conclusion drawn about the population based on the study of
sample.
Ah! Now I
understand!

 Inference about the population has to be drawn on the basis of sample

 The inferences must be drawn under uncertainty

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.4

g GE Global Research

Parameter and Statistic

Population Sample

Mean  x

Variance  s

Parameter Statistic

Statistic is used as the estimate of the parameter to draw the inference

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.5

g GE Global Research

Types of Estimators

Point Estimator :
• Estimation of population in terms of a single number

Population

Point Estimator

Interval Estimator :

• The estimation of the population in terms of a range.

• Specified as Point Estimator  Error

Population

Interval Estimator

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.6

g GE Global Research

Central Limit Theorem

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.7

g GE Global Research

The Central Limit Theorem Applies also

to Non-Normal Parent Populations

Distributions of individual measurements

Distribution of averages -- n measurements in each average

X
X (grand average)

There are “n” samples in each subgroup.

The
Thecentral
centrallimit
limittheorem
theoremstates
statesthat,
that,for
forlarge
largevalues
values
of
ofn,
n,the
thedistribution
distributionof
ofthe
thesample
samplemean
meanwillwillhave
have
approximately
approximatelyaanormal
normaldistribution,
distribution,even
eventhough
thoughthethe
individual
individualdata
datapoints
pointsmay
maybe benon-normal.
non-normal.

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.8

g GE Global Research

Central Limit Theorem

 Any Distribution of x

 x
Population Mean and Variance

 x 
Normal Distribution of x
  / n
2
x
2

x
x x
Mean and Variance of Sample Means
For n > 30, Mean of Sample means tends to be normally distributed

x  x x 
z 
x  n

1 Z Distribution
0 z
GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.9
g GE Global Research

Central Limit Theorem –illustrated with

a normally distributed parent population

Parent Population

Mean = 

SDEV = 
95 %

   Sampling Distribution

of the Means
Mean = 
Randomly select “n” 
samples from our SDEV = SD =
n
parent population and take the mean.
Do this for all possible combinations of “n”.
The Central Limit Theorem tells us
that the distribution of these means
will be normal and have the same
mean, as the parentwith the SD  SD
value shown above.

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.10

g GE Global Research

Confidence Intervals
• Means

• Standard Deviations

• Process Capability

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.11

Interval g GE Global Research
Confidence Interval : Estimator
Confidence Interval : Concept

Does not contain 

It is the estimated
range of values which
is likely to include an
unknown population
parameter, the
estimated range being
calculated from a
given set of sample
data.

x  x x 

    
How much confident are you that x  will contain  ?

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.12

g GE Global Research

Confidence Interval

zc
Shaded Area under the curve is the Confidence Level .
Remaining Area is .

zcˆ
Use  to arrive at zc 
n

x  , x   is the confidence interval for 

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.13

g GE Global Research

Confidence Intervals for Means

Population Distribution
Standard Normal Probability Curve
95 %

  
In this example we know the true mean and true standard deviation
of the total population.

• 95% of the population lies between the  limits shown

• Randomly select a sample from this population

• 95% of the time the sample that you select
will have a value, X, in the range

X < 

or X = 95% of the time)
• Now look at it another way

• Suppose you are told only the  value of the population and are asked
to estimate the value of from a sample with value, X, that is randomly
selected from the population.

• 95% of the time we are confident that the value of the unknown mean, ,
lies “somewhere” in the interval:

X - 1.96< < X + 1.96

or we estimate that = X at the 95% level of confidence

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.14

g GE Global Research

Confidence Intervals for Means

Real Data

What about data collected in the real world?

• Typically we select a number, n, of samples and

determine the mean.

• We also obtain an estimate , S, of the standard

deviation from the n-samples

How do we use this limited data to

• Estimate the true population mean?

• Determine our level of confidence?

We turn to the Student t-distribution

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.15

g GE Global Research

Confidence Intervals for Means

Student’s t-distribution

The t-distribution probability

density function takes into
consideration the uncertainties
inherent in our estimates of the
mean and standard deviation for a
finite sample. The shape of the
curve depends upon the degrees
of freedom, .
William Sealey Gosset
“Student”

  1 
  21 
f t ,    1 
  t

2   2 
for   t 
 2     
Student's t-distribution

0.5
Probability Density

0.4 df=1
df=2
Function

0.3
df=5
0.2 df=10
0.1 N(0,1)

0
-4 -3 -2 -1 0 1 2 3 4
t

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.16

g GE Global Research

Confidence Intervals for Means

Central Limit Theorem
Parent Population

Mean = 

SDEV = 
95 %

   Sampling Distribution

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.17

g GE Global Research

Confidence Interval and t distribution

Use z distribution if Use t distribution if
n > 30 n <= 30

( X   ) Note the Difference ( X   )

Z t df  n  1
 n s n

z Distribution Student's t-distribution

0.5 0.5

0.4 0.4 df=1

df=2
0.3 0.3
df=5
Probability Density
Function
Probability Density
Function

0.2 0.2 df=10

0.1 0.1 N(0,1)

0 0
-4 -3 -2 -1 0 1 2 3 4 -4 -3 -2 -1 0 1 2 3 4
z t

• Symmetric about its mean t=0

• More tail-heavy

Do
Donot
notuse
usezzdistribution
distributionfor
fornn<< 30
30

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.18

g GE Global Research

Confidence Intervals for Means

Student’s t-distribution

Using a definition analogous to the “Z” statistic

we define a “t” statistic based upon our “real
data” t-distribution

X-
t= Compare to Z= X-
S/n 

Student's t-distribution

0.5
Probability Density

0.4 df=1
df=2
Function

0.3
df=5
0.2 df=10
0.1 N(0,1)

0
-4 -3 -2 -1 0 1 2 3 4
t

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.19

g GE Global Research

Confidence Intervals for Means

Student’s t-distribution
n=6 n=6
df = 5 df = 5

95 % 95 %

2.57 0 2.57 2.57 0 2.57

t t
n=3
df = 2

95 %

4.30 0 4.30
t

= X t S/n degrees of freedom = n - 1

As the number of samples decreases

our confidence interval gets larger.

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.20

g GE Global Research

Confidence Intervals for Means

Student’s t-distribution

You have just received a report from the Analytical

Laboratory informing you that your sample was found
to contain 97 ppb Sodium with a STDEV of 2 ppb.

You want to estimate the true population mean at a

95% confidence level. Being a well educated
Green Belt you go back to the analyst to learn how
many samples were used in the analysis. The analyst
can’t remember exactly but is sure that either 3 or 6
samples were used.

For 3 samples: degrees of freedom = 3-1 = 2 and t = 4.30

= 97  4.30*2/1.73 = 97 5 ppb

95% Confidence Interval 92 ---- 97---- 102

For 6 samples: degrees of freedom = 6-1 = 5 and t = 2.57

= 97  2.57*2/2.45 = 97 2 ppb

95% Confidence interval 95 ---- 97 ---- 99

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.21

g GE Global Research

Example on Confidence Interval for Mean

Time of last boarding call of an airplane to New York for
various days at random in a year is as follows –
10:20am, 10:22am, 10:31am, 10:47am, 10:25am
10:39am, 11:13am, 10:18am, 10:38am, 10:26am
Find at what time a tele-checked person should reach airport
in order that he would not miss the plane with 95%
confidence.
Reference 10:00 Confidence Level 0.95
Degrees of Freedom 9
Time of Difference
Arrival w.r.t. 10:00 tc 2.262159
10:20 0:20
10:22 0:22
10:31 0:31
10:47 0:47
10:25 0:25
11:39 1:39
10:13 0:13
10:18 0:18
10:38 0:38
10:26 0:26

Average Sample Standard

Difference 0:33 Deviation 0:24
Half interval 0:17
Mean Confidence Interval
Arrival Time 10:33 10:15 10:50

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.22

g GE Global Research

Confidence Intervals
• Means

• Standard Deviations

• Process Capability

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.23

g GE Global Research
Confidence Intervals for Standard Deviation
Helmert’s 2-distribution

The probability density function of

the variance (2) for finite samples
also has a dependence upon the
number of degrees of freedom, 

1 Friedrich Robert Helmert

f x ,   1 
 x

x 2
e 2

2  2

2

Chi Square Distribution

0.35
0.3
Probability Density Function

0.25 df=1
0.2 df=2
0.15 df=5
0.1 df=10
0.05
0
0 10 20 30
Chi-Square

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.24

g GE Global Research

Chi-squared distribution
If X is normally Then s2 is 2
distributed distributed

( X  ) (n  1) s 2
Z   2
df  n  1
  2

e. g. Z (1  ) 1.645 for  5% e.g.  2 (1  ),df 9.488 for n 5, 5%

z Distribution Chi Square Distribution

0.5 0.35
0.3
0.4
0.25 df=1
0.3 0.2 df=2
Probability Density Function
Probability Density
Function

0.2 0.15 df=5

0.1 df=10
0.1
0.05
0 0
-4 -3 -2 -1 0 1 2 3 4 0 10 20 30
z Chi-Square

22 distribution
distributionisisused
usedto tofind
findthe
the
confidence
confidenceinterval
intervalfor
forvariance.
variance.

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.25

g GE Global Research
Confidence Intervals for Standard Deviation
Helmert’s 2-distribution

95% Confidence Interval

Because of the shape of the -distribution the

confidence intervals determined for standard
deviation estimates are not symmetric.

Example: Using 16 samples we compute s = 1.66

95% Confidence Interval for S

1.23 -------- 1.66 ------------------ 2.57

Fortunately, as we will see later, programs such as

MiniTab will calculate the confidence intervals for us.

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.26

g GE Global Research

Confidence Intervals
• Means

• Standard Deviations

• Process Capability

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.27

g GE Global Research

How much confidence do we have in our

Process Capability Z-scores?
95% Confidence Interval
(assumes normal distribution)

n = # samples used to
estimate
Mean and STDEV
Short Term
Long Term

P(d)LSL P(d)USL

1 Z2
[ ]
1/2
Z + 1.96 n + 2(n-1)

Problem: I have calculated Z = 2.56 for a process

capability study
in which I used 75 samples.

Question: What are the 95% confidence interval limits for

my estimated value of Z?

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.28

g GE Global Research

Hypothesis Testing

• t-test for means

• F-test for standard deviations

• ANOVA test for means

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.29

g GE Global Research

Hypothesis Testing

Hypothesis Tests as an alternative

method for determining difference
Confidence intervals give a range of plausible
values for a population value (parameter).

Hypothesis tests determine if an apparent

difference is real or could be due to chance. We
can quantify our level of confidence that the
difference is real.

All
Potential
“X”s

Vital Few
“X”s

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.30

Hypothesis Testing
g GE Global Research

Defining the Hypotheses

Ho
The starting point for a hypothesis test is the “null”
hypothesis - Ho. Ho is the hypothesis of sameness, or
no difference.
Example: The population mean equals the test mean.

Ha
The second hypothesis is Ha - the “alternative”
hypothesis. It represents the hypothesis of
difference.
Example: The population mean does not equal the
test mean.
•• You
Youusually
usuallywant
wanttotoshow
showthat
thatthere
thereis
isaa
difference
difference(H(Ha).). a

•• Start
Startby
byassuming
assumingequality
equality(H
(Hoo).).
•• IfIfthe
thedata
datashow
showthey
theyare
arenot
notequal,
equal,then
thenthey
they
must
mustbe
bedifferent
different(H
(Haa).).

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.31

g GE Global Research

Evaluation of Decision Error

Four possible outcomes that determine whether a
decision is correct or in error:

Ho: Person is innocent.

Ha: Person is guilty.
Truth
Truth
Ho Ha
Innocent Guilty

Ho Innocent, Guilty,
Set Free Set Free
Set Free
Verdict
Verdict

Innocent, Guilty,
Ha Jailed Jailed
Jailed

© 1994 Dr. Mikel J. Harry V3.0

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.32

g GE Global Research

Evaluation of Decision Error

Truth
Truth

Ho Ha
1 -  = Chance
of detecting a
Type II specified
Correct Error
change in the
Ho population
Decision

(with the given
sample) if the
difference is
actually there to
Accept
Accept detect. Also
called “power
of the test.” In
Type I some respects,

Error Correct it is the

likelihood of
Ha Decision detecting

Note: It is not possible

 beneficial
change.

to simultaneously
commit a Type I and
Type II decision error. 1 -  = Confidence that an observed outcome in the
In short, either an alpha sample is “real” (i.e., the outcome is not due to random
or beta decision error sampling error and, therefore, reflects the true state-of-
can be made, but not
both. affairs in the population).

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.33

g GE Global Research

Hypothesis Testing

• t-test for means

• F-test for standard deviations

• ANOVA test for means

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.34

g GE Global Research

Hypothesis Testing of Means

ˆ old 10.0
ˆ new 11 .1

Example:

We have made changes to a process to shift the mean.

We have taken 6 samples of both the original process
and the new process and have estimates for the means
and STDEV’s.
ˆ old 10.0 ˆ old 0.85
ˆ new 11 .1 ˆ new 1.14

The means and STDEV’s are observed to differ.

Questions:

Are the differences statistically significant?

We can use a hypothesis test to determine this.
Are the differences practically significant?
You and the team have to make this decision.

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.35

g GE Global Research

Hypothesis Test of Means – p-value

1 Sided t-test Example

Our Null Hypothesis: ˆ old ˆ new

The distribution of ˆ new  ˆ old will have a

standard deviation given by:

2 2
ˆ test  ˆ old  ˆ new

If our null hypothesis is true, then this distribution

should have a mean value = 0
ˆ new  ˆ old 0

2 2
ˆ test  ˆ old  ˆ new

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.36

g GE Global Research

Hypothesis Test of Means – p-value

1 Sided t-test Example

t-distribution for
10 degrees of freedom
(based upon 2 samples of 6 each) 1.1 (11.1 – 10.0)

t = 1.88

-5 -4 -3 -2 -1 0 1 2 3 4 5

Area beyond “t = 1.88” is 0.045

Our “p-value” = 0.045

0.045 < 0.05 – Therefore we say that we accept the

alternate hypothesis at the 95% confidence level

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.37

g GE Global Research

The p-value


The p-value is the probability of making
a Type I Error.

Unless there is an exception based on
engineering judgment, we will set an
acceptance level of a Type I error at
 = 0.05.

Thus, any p-value less than 0.05
means we accept the alternative
hypothesis.
Truth
Truth
Ho Ha

Accept
Accept

Ha p-value

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.38

g GE Global Research

Two-Sided Use of the t Distribution

Distribution of
Sampling Averages

df = 4
Different LCL Same Process UCL Different
Process Process

 
= 2.5% 95% 2.5% =
2 2
x
t = 2.776 t = 2.776

Confidence Interval
Risk (Rejection Region for Ha) Risk
 
LCL = x – t /2 UCL = x + t /2
n n
 
LCL = x – 2.776 UCL = x + 2.776
5 5
There is 95% certainty that the true population mean will be contained within the
given confidence interval. If we observe a sampling average greater than UCL or less
than LCL, we may conclude that such an event could only occur 5 out of 100 by
random chance (sampling variations).

© 1994 Dr. Mikel J. Harry V3.0

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.39

g GE Global Research

Hypothesis Testing

• t-test for means

• F-test for standard deviations

• ANOVA test for means

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.40

g GE Global Research

Hypothesis Test for Standard Deviations

The ratio between two variances

follows a non-symmetric distribution
called the F-distribution that depends
upon the degrees of freedom, .

ˆ 12
F
ˆ 22 Ronald A Fisher

The experimentally determined value of the F-value is

evaluated wrt the appropriate distribution and a p-value
(area beyond the F-value) is determined as is done in
the t-test.

Fisher's F Distribution

5.0
Probability Density

4.0
F(1,1)
Function

3.0 F(1,2)
2.0 F(2,1)
F(10,1)
1.0

0.0
0.00 0.20 0.40 0.60 0.80 1.00
F

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.41

g GE Global Research

F distribution
If X is normally Then s12/ s22 is F
distributed distributed
21 / 1 s12
( X  ) F1 n1  1  2  2
Z  2 n2  1  /  2 s2
 2

e. g. Z (1  ) 1.645 for  5% df1  1 n1  1 df 2  2 n2  1

e.g. F(1  ),df1 ,df2 6.388 for n1 n2 5, 5%
z Distribution Fisher's F Distribution

0.5 5.0

0.4 4.0
F(1,1)
0.3 3.0 F(1,2)
Probability Density
Function
Probability Density
Function

0.2 2.0 F(2,1)

F(10,1)
0.1 1.0

0 0.0
-4 -3 -2 -1 0 1 2 3 4 0.00 0.20 0.40 0.60 0.80 1.00
z F

•F-distribution is symmetric (skewed to the right)

FFdistribution
distributiongives
givesp-values
p-valuesfor
for
two-sample
two-samplevariance
variancecomparisons.
comparisons.
GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.42
g GE Global Research

Hypothesis Testing

• t-test for means

• F-test for standard deviations

• ANOVA test for means

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.43

g GE Global Research

ANOVA - Analysis of Variance

The ANOVA test for means uses two

tools:
• ANOVA – The analysis of variance
• The F-test for differences between
standard deviations

We will first describe the basics of

ANOVA as this is a very powerful tool
that is used in many statistical
analyses

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.44

g GE Global Research

Introduction to Analysis of Variance

ANOVA

Basic Assumption: Our data is normal

Example:

We are making precision spacing blocks with

a target dimension of 10 mm, and we
have gathered data for the output of our factory.

The standard deviation (spread in the data) is

higher than we would like.

We want to analyze our data for the purpose of

identifying sources of variation in our factory
that might be responsible and which we can correct.

After some work we have discovered that the blocks

are made on 3 different machines.

After more work, we are able to identify which parts

were made on each of the three machines.

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.45

g GE Global Research

Introduction to Analysis of Variance

ANOVA
Total variation of products
coming from factory

Variation associated
with machine #1 Variation associated with
(group #1) machine #2 (group #2)

Variation associated with

machine #3 (group #3)

Original Total Data Set from Factory

9.64 8.18 14.91 11.08 11.99 13.25 9.88 17.75 14.95

8.43 11.67 12.99 10.41 10.61 14.66 11.04 9.09 12.59

Data sorted by Machine (Group)

Machine #1 9.64 11.08 9.88 8.43 10.41 11.04

Machine #2 8.18 11.99 10.56 10.23 10.41 10.21

Machine #3 14.91 13.25 14.95 12.99 12.59 14.66

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.46

g GE Global Research

Introduction to Analysis of Variance

ANOVA

Breaking down total variance

SST = SSB + SSW

SST = Total Variation (Sum of Squares)

SSB = Between-Group Variation (Sum of Squares)

SSW = Within-Group Variation (Sum of Squares)

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.47

g GE Global Research

Introduction to Analysis of Variance

ANOVA

Total Variation (Sum of Squares)

c n
SST =   ( X ij  X )2
i

j 1 i 1

X ij = an individual data point – the i-th observation in

in the group (or level) j.

c n j

  X ij
j 1 i 1 is called the overall or grand mean
X 
n

ni = the number of data points (observations) within a given

group j.

n = the total number of data points (observations) in all

of the groups combined. ( n = n1 + n2 + n3 + …..+ nc)

c = the number of groups

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.48

g GE Global Research

Introduction to Analysis of Variance

ANOVA

Between-Group Variation
(Sum of Squares)

c
SSB =  nj ( X j  X ) 2
j 1

X = the overall or grand mean (see SST definition)

X j = the sample mean of group j.

nj = the number of data points (observations) within a given

group j.

c = the number of groups

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.49

g GE Global Research

Introduction to Analysis of Variance

ANOVA

Within-Group Variation
(Sum of Squares)

n
c j
SSW =   ( X ij  X j ) 2
j 1 i 1

X ij = an individual data point – the i-th observation in

in the group (or level) j.

X j = the sample mean of group j.

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.50

g GE Global Research

Introduction to Analysis of Variance

ANOVA

Analyzing Variance in Our Factory Data

Subgroup: Size Average Sum of Squares

nj X SSW j   ( X ij  2
Group Values
j X j)
i
1 9.64 11.08 9.88 8.43 10.41 11.04 6 10.08 4.99
2 8.18 11.99 17.75 11.67 10.61 9.09 6 11.55 56.97
3 14.91 13.25 14.95 12.99 12.59 14.66 6 13.89 5.66

Overall: Size Average Sum of Squares

c n j
c n
  X ij SST =   ( X ij  X )2
i

n j 1 i 1
X  j 1 i 1
n
18 11.84 111.95

Standard Deviation: Overall Pooled

SST SSW
̂  ̂ Pool 
 nj  1  ( n j  1)
j j
2.57

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.51

g GE Global Research

Introduction to Analysis of Variance

ANOVA

Variation Source Sum of Squares % Contribution

Machine #1 Variation 5.0 4.5
Machine #2 Variation 57.0 50.9
Machine #3 Variation 5.7 5.1
SSW - Within-Group Variation 67.6 60.4
SSA - Among Group Variation 44.3 39.6
SST - Total Variation 112.0 100.0

Assuming for now that these differences are

statistically significant, where might be the
opportunities to reduce variation in our overall
process (ie. the factory)?

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.52

g GE Global Research

Introduction to Analysis of Variance

ANOVA

Home Work
nj X j SSW j   ( X ij  2
Group Values
X j)
i
1 1 2 3
2 2 3 4
3 3 4 5
4 4 5 6
5 5 6 7

Overall: Size Average Sum of Squares

c n j
c n
  X ij SST =   ( X ij  X )2
i

n j 1 i 1
X  j 1 i 1
n

Standard Deviation: Overall Pooled

SST SSW
̂  ̂ Pool 
 nj  1  ( n j  1)
j j

Variation Source Sum of Squares % Contribution

Group 1
Group 2
Group 3
Group 4
Group 5
SSW - Within-Group Variation

SSB - Between Group Variation

SST - Total Variation

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.53

g GE Global Research

Click on Notes Page

To See Answer to

ANOVA Home Work

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.54

g GE Global Research

Hypothesis Testing

• t-test for means

• F-test for standard deviations

• ANOVA test for means

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.55

g GE Global Research

What is Analysis of Variance?

• A technique used to determine the statistical

significance of the relationship between a
dependent variable (“Y”) and single or multiple
independent variable(s) (“X”s) that have been
organized into two or more discrete groups or
levels.

• A procedure that determines whether or not

the means of the responses at each level are
drawn from the same population. (Are they
different?)

• A way to screen for potential Vital Few “X”s

ANOVA is used for continuous “Y” data

with discrete / continuous “X” levels

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.56

The Concept of ANOVA g GE Global Research
A tool to compare several means
(for continuous response data!)

level 1 level 2

Current New Process

the gap
Between group
variation (signal)
the delta (
Total variation

Within
Withingroup
group
variation
variation(noise)
(noise)

ANOVA determines if the variation between the average of

the levels is greater than could reasonably be expected
from the variation that occurs within the level
... that’s how it got its name

Is the signal between greater

than the noise within?

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.57

g GE Global Research

Variation Split

Between group variation

(signal) delta ()

Total variation

Within group
variation (noise)

Average SSbetween signal

ANOVA calculates the ratio of : Average SS =
within noise

SS = Sum of Squares (a measure of variation)

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.58

g GE Global Research

Why not simply use the t-test?

In the Analyze Phase, you learned how to use the “t-test”

statistic to compare two sample averages for difference.
(Remember the “two sample” t-test?)

Example: Insurance Costs Project

How would you compare the average regional insurance
costs? Are the costs different between the five regions?

Regional Operation Insurance Costs

($K)
763
763 1,335
1,335 596
596 3,742
3,742 1,632
1,632
4,365
4,365 1,262
1,262 1,448
1,448 1,833
1,833 5,078
5,078
2,144
2,144 217
217 1,183
1,183 375
375 3,010
3,010
1,998
1,998 4,100
4,100 3,200
3,200 2,010
2,010 671
671
5,412
5,412 2,948
2,948 630
630 743
743 2,145
2,145
957
957 3,210
3,210 942
942 867
867 4,063
4,063
1,286
1,286 867
867 1,285
1,285 1,233
1,233 1,232
1,232
311
311 3,744
3,744 128
128 1,072
1,072 1,456
1,456
863
863 1,635
1,635 844
844 3,105
3,105 2,735
2,735
1,499
1,499 643
643 1,683
1,683 1,767
1,767 767
767
Average: 1,960
1,960 1,996
1,996 1,194
1,194 1,675
1,675 2,279
2,279

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.59

g GE Global Research

Problems with Multiple Comparisons

Using t-Tests

Problems with all possible “two sample” t-tests:

• We would have to make 10 separate
comparisons, to test each pair of averages.
(AB, AC, AD, AE, BC, BD, BE, CD, CE, DE)
• Even if all average costs were equal, there is
a 5% chance that we would reject Ho and
conclude that one of the pairs of averages is not
equal. When this test procedure is repeated ten
times, the risk of incorrectly concluding that at
least one pair of averages is different would be
very high (much greater than 5%).

ANOVA gives a single hypothesis test to compare all

five averages at one time.

Analysis of Variance (ANOVA) allows us to

make all ten comparisons at one time, and
controls the overall  risk…

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.60

g GE Global Research

One-Way ANOVA

One way ANOVA is used to compare the means from

three or more sample sets to determine if there is
evidence that at least one of the means is different.

Example Data
Group means
Mean
Machine #1 9.64 11.08 9.88 8.43 10.41 11.04 10.08

Machine #2 8.18 11.99 10.56 10.23 10.41 10.21 10.26

Machine #3 14.91 13.25 14.95 12.99 12.59 14.66 13.892

Grand mean 11.41
In an ANOVA analysis one:

• Assumes that the samples come from the same

“normal” parent population

• Assumes that the “within sample” variation is the

same for all groups.
The standard deviation estimates are “pooled”

• Computes the F-value and determines the p-value.

• If p < 0.05 we accept the alternate hypothesis
(there is a difference)

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.61

g GE Global Research

One-Way ANOVA

Assume Same
Parent Population

Mean = 

SDEV = 


Machine # 1 Machine # 2 Machine # 3
9.64, 11.0,…. 8.18,11.99,… 14.91,13.25,…

S2 estimates 0.97 1.49 1.13

Pooled estimate 1.21

Sampling Distribution
of SP2 df = 15
of the Means
Mean = 

SDEV = SSD 

F  271..272 20.67(, ,  F 3,8) 

Means
p  0.05  Reject Null Hypothesis
10.08, 10.26,13.89

S2 estimate 55.45/2
SD

S2 estimate 27.73
P
df = 2

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.62

g GE Global Research

Click on Notes Page

To See Answer to

ANOVA Class Exercise

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.63

g GE Global Research

Example of ANOVA (Homework)

Subgroup: Size Average Sum of Squares Within
ni xi 
1
x SSWi  ( xij  xi ) 2
Group Values n i j ij j
1 1 2 3
2 2 3 4
3 3 4 5
4 4 5 6
5 5 6 7
No. of Groups m = SSW
Size Average Sum of Squares Between Sum of Squares Total

 ni x 
1
x SSB ni (xi  x )2 SST  ( xij  x ) 2
i  ni
i
i j
ij
i

Standard Deviation: Overall Pooled

ˆ LT 
SST  SSW i
ˆ ST  i
 ni  1
i
 (n  1) i
i

Sum of Square Varation Dof Mean Sum of Square Variation F

SSB= m-1= MSB=

SSW= mn-m= MSW=
SST= mn-1=

At 95% confidence level, F-Critical = How is it related with F?

What is the conclusion?

• What is the relation between SST, SSW and SSB ?

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.64

g GE Global Research

Click on Notes Page

To See Answer to

ANOVA HomeWork

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.65

g GE Global Research

Relation Symbol Type of Hypothesis

Equality = Simple
Inequality > Directional / One sided Composite
< Directional / One sided Composite
= Non -Directional / Two sided Composite

One Sample Hypothesis for Mean One Sample Hypothesis for Variance
0= 0=
a=
 a=


Two Sample Hypothesis for Mean Two Sample Hypothesis for Variance
0= 0=
a=
 
a=
 

Multi-Sample Hypothesis for Mean Multi-Sample Hypothesis for Variance

H0 : =2=…=n H0 : =2=…=n
Ha : At least one not equal Ha : At least one not equal

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.66

g GE Global Research

Hypotheses of Means & tests

Hypothesis Test

One Sample Hypothesis for Mean

0=
z-test if sample-size > 30
a=

1 sample t-test if sample-size < 30

Two Sample Hypothesis for Mean

0=
a=
 2 sample t-test


Multi-Sample Hypothesis for Mean

H0 : =2=…=n
ANOVA
Ha : At least one not equal

What are these tests ?

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.67

g GE Global Research

Hypotheses of Variances & tests

Hypothesis Test

One Sample Hypothesis for Variance

0=
Chi squared

a=

Two Sample Hypothesis for Variance

0=
F test
a=



Multi-Sample Hypothesis for Variance

H0 : =2=…=n
Homogeneity of Variance
Ha : At least one not equal

What are these tests ?

GB DMAIC - Basic Statistics Part 3 Version 2.0 5/2002 7.68

Sample Size for Analytical Surveys, Using a Pretest-Posttest-Comparison-Group Design
From Everand
Sample Size for Analytical Surveys, Using a Pretest-Posttest-Comparison-Group Design
Joseph George Caldwell
No ratings yet
Gfmam The Maintenance Framework First Edition English Version
100% (1)
Gfmam The Maintenance Framework First Edition English Version
24 pages
Julia de Burgos Biography - Bilingual
No ratings yet
Julia de Burgos Biography - Bilingual
2 pages
Estimtion Confidence Interval
No ratings yet
Estimtion Confidence Interval
46 pages
Chapter 8 - Confidence Intervals - Lecture Notes
No ratings yet
Chapter 8 - Confidence Intervals - Lecture Notes
12 pages
03 Estimation IITB PDF
No ratings yet
03 Estimation IITB PDF
58 pages
04 GB DMAIC Basic Statistics Part 1
No ratings yet
04 GB DMAIC Basic Statistics Part 1
18 pages
Lecture 6 Estimation
No ratings yet
Lecture 6 Estimation
8 pages
Applied Statistics and Probability For Engineers Chapter - 8
No ratings yet
Applied Statistics and Probability For Engineers Chapter - 8
13 pages
Confidence Intervals-Reader
No ratings yet
Confidence Intervals-Reader
9 pages
SB K49 Lecture7
No ratings yet
SB K49 Lecture7
57 pages
Lecture 7
No ratings yet
Lecture 7
18 pages
L8 Statistical Estimation 1
No ratings yet
L8 Statistical Estimation 1
48 pages
BBA 122 Notes On Estimation and Confidence Intervals
No ratings yet
BBA 122 Notes On Estimation and Confidence Intervals
34 pages
Chapter Two
50% (2)
Chapter Two
13 pages
Stat-II CH-TWO
No ratings yet
Stat-II CH-TWO
68 pages
Estimation of Parameters
No ratings yet
Estimation of Parameters
30 pages
Chapter 07
No ratings yet
Chapter 07
59 pages
Power Point
No ratings yet
Power Point
28 pages
Chapter 3 Estimation
No ratings yet
Chapter 3 Estimation
43 pages
C 4
No ratings yet
C 4
61 pages
DOM105 2024 Session 12
No ratings yet
DOM105 2024 Session 12
20 pages
Inferential PDF
No ratings yet
Inferential PDF
9 pages
Bus 7
No ratings yet
Bus 7
48 pages
SLIDES 1 Week 7-8. Confidence Intervals v4
No ratings yet
SLIDES 1 Week 7-8. Confidence Intervals v4
58 pages
Chap 9
No ratings yet
Chap 9
9 pages
Program L5: - Confidence Intervals
No ratings yet
Program L5: - Confidence Intervals
40 pages
QEM 2004 - Module 2 (Confidence Interval Estimation)
No ratings yet
QEM 2004 - Module 2 (Confidence Interval Estimation)
59 pages
Chapter 10
No ratings yet
Chapter 10
23 pages
Ch6 - Estimation (6)
No ratings yet
Ch6 - Estimation (6)
28 pages
Chapter 8-Statistical Inference - Updated 27 December 2022 GC
No ratings yet
Chapter 8-Statistical Inference - Updated 27 December 2022 GC
18 pages
Module 4 (301 SI-2)
No ratings yet
Module 4 (301 SI-2)
24 pages
Chapter 2-Part 2
No ratings yet
Chapter 2-Part 2
28 pages
Chapter 8: Estimating With Confidence: Section 8.3 Estimating A Population Mean
No ratings yet
Chapter 8: Estimating With Confidence: Section 8.3 Estimating A Population Mean
27 pages
Math 1060 - Lecture 5
No ratings yet
Math 1060 - Lecture 5
9 pages
Chapter 06
No ratings yet
Chapter 06
44 pages
Inferential Statistics - GRY 324
No ratings yet
Inferential Statistics - GRY 324
88 pages
10 Inferential Statistics
No ratings yet
10 Inferential Statistics
39 pages
Estimation
No ratings yet
Estimation
39 pages
BSCHAPTER - (Theory of Estimations)
No ratings yet
BSCHAPTER - (Theory of Estimations)
39 pages
BA Module 02 - 2.4 - Confidence Interval
No ratings yet
BA Module 02 - 2.4 - Confidence Interval
41 pages
A Session 18 2021
No ratings yet
A Session 18 2021
36 pages
PSUnit IV Lesson 3 Confidence Intervals For The Population Mean When Is Unknown
No ratings yet
PSUnit IV Lesson 3 Confidence Intervals For The Population Mean When Is Unknown
18 pages
Chapter 3 Sampling Distribution and Confidence Interval
100% (2)
Chapter 3 Sampling Distribution and Confidence Interval
57 pages
Module 06 - One Population Parameter Estimation - Topic 4A
No ratings yet
Module 06 - One Population Parameter Estimation - Topic 4A
59 pages
CLO4-PPT1-Estimation and Confidence Intervals
No ratings yet
CLO4-PPT1-Estimation and Confidence Intervals
29 pages
Estimation
No ratings yet
Estimation
27 pages
Confidence Interval
No ratings yet
Confidence Interval
44 pages
Confidence Intervals For The Population Mean When Is Unknown
No ratings yet
Confidence Intervals For The Population Mean When Is Unknown
18 pages
Sampling Distributions & Confidence Interval
No ratings yet
Sampling Distributions & Confidence Interval
42 pages
CHAPTER 8 Interval Estimation
No ratings yet
CHAPTER 8 Interval Estimation
8 pages
Chapter 3 - Sampling Distribution and Confidence Interval1
No ratings yet
Chapter 3 - Sampling Distribution and Confidence Interval1
54 pages
5-6.sampling Error and Confidence Interval
No ratings yet
5-6.sampling Error and Confidence Interval
74 pages
Chapter 9 Slides
No ratings yet
Chapter 9 Slides
33 pages
Chapter 6 - Estimation
No ratings yet
Chapter 6 - Estimation
20 pages
Lecture06 Ch6 Forsyth Inf Stats FA24
No ratings yet
Lecture06 Ch6 Forsyth Inf Stats FA24
56 pages
Chapter Four
No ratings yet
Chapter Four
9 pages
Confidence Intervals
No ratings yet
Confidence Intervals
56 pages
4 Confidence Intervals
100% (1)
4 Confidence Intervals
49 pages
Overview Of Bayesian Approach To Statistical Methods: Software
From Everand
Overview Of Bayesian Approach To Statistical Methods: Software
Vinaitheerthan Renganathan
No ratings yet
Computer Vision Graph Cuts: Exploring Graph Cuts in Computer Vision
From Everand
Computer Vision Graph Cuts: Exploring Graph Cuts in Computer Vision
Fouad Sabry
No ratings yet
Forecasting Models – an Overview With The Help Of R Software
From Everand
Forecasting Models – an Overview With The Help Of R Software
Editor IJSMI
No ratings yet
Role of The Leader - Keller Case
No ratings yet
Role of The Leader - Keller Case
12 pages
Electrical Thumb Rules You MUST Follow Part 5
No ratings yet
Electrical Thumb Rules You MUST Follow Part 5
3 pages
Career Success Map Orientation & How To Manage Boss - Handout
No ratings yet
Career Success Map Orientation & How To Manage Boss - Handout
9 pages
Collaborative Leadership - Handout
No ratings yet
Collaborative Leadership - Handout
16 pages
Electrical Thumb Rules You MUST Follow Part 7
No ratings yet
Electrical Thumb Rules You MUST Follow Part 7
4 pages
07 GB DMAIC Analyze Part 1
No ratings yet
07 GB DMAIC Analyze Part 1
53 pages
01 GB DMAIC Introduction
No ratings yet
01 GB DMAIC Introduction
21 pages
Test Bench TS1300 - High Quality in A Small Space
No ratings yet
Test Bench TS1300 - High Quality in A Small Space
2 pages
Assessment of Credit Management in Micro Finance Institution
No ratings yet
Assessment of Credit Management in Micro Finance Institution
42 pages
An Application of Ultrasound Technology in Condition Monitoring-Rev.1-Web
No ratings yet
An Application of Ultrasound Technology in Condition Monitoring-Rev.1-Web
16 pages
Vaginal Exam Learning Guide
No ratings yet
Vaginal Exam Learning Guide
2 pages
Formulation of Objective
No ratings yet
Formulation of Objective
16 pages
Research II Proposal
No ratings yet
Research II Proposal
26 pages
Cloud Seeding
No ratings yet
Cloud Seeding
23 pages
Combining Hospitality With Security: Are We Secure Enough?
No ratings yet
Combining Hospitality With Security: Are We Secure Enough?
20 pages
Disclosure To Promote The Right To Information: IS 9875 (1990) : Lipstick (PCD 19: Cosmetics)
No ratings yet
Disclosure To Promote The Right To Information: IS 9875 (1990) : Lipstick (PCD 19: Cosmetics)
18 pages
Energy Relationships in Chemical Reactions
No ratings yet
Energy Relationships in Chemical Reactions
11 pages
Marking Guideline: Building and Structural Construction N5
No ratings yet
Marking Guideline: Building and Structural Construction N5
8 pages
C-Data Gepon Olt Fd2000s Ems User Manual-V2.0
No ratings yet
C-Data Gepon Olt Fd2000s Ems User Manual-V2.0
67 pages
23PGHR023 Final Review Ather
No ratings yet
23PGHR023 Final Review Ather
13 pages
A First Book Nature UK Part4
100% (1)
A First Book Nature UK Part4
13 pages
U2000 Northbound Performance File Interface Developer Guide (NE-Based)
No ratings yet
U2000 Northbound Performance File Interface Developer Guide (NE-Based)
79 pages
TIẾNG ANH CHUYÊN NGÀNH 2
No ratings yet
TIẾNG ANH CHUYÊN NGÀNH 2
12 pages
The Road To Makkah As God Inspired Book
No ratings yet
The Road To Makkah As God Inspired Book
5 pages
Ethical Considerations in Civic Engagement
80% (5)
Ethical Considerations in Civic Engagement
2 pages
Practice Exam For Final Exam Acct301 With Answers
No ratings yet
Practice Exam For Final Exam Acct301 With Answers
9 pages
User Manual Geafol Neo en
No ratings yet
User Manual Geafol Neo en
25 pages
Liverpool Football Club Annual Report and Consolidated Financial Statements
No ratings yet
Liverpool Football Club Annual Report and Consolidated Financial Statements
38 pages
A Circular-Economy-Retrospective
No ratings yet
A Circular-Economy-Retrospective
16 pages
#6 Adding File Upload To A Form
No ratings yet
#6 Adding File Upload To A Form
10 pages
Soe Hed Cbcs Syllabus
No ratings yet
Soe Hed Cbcs Syllabus
53 pages
Playdor School Bus Schedule - 15 April 2024 To 26 July 2024-1
No ratings yet
Playdor School Bus Schedule - 15 April 2024 To 26 July 2024-1
1 page
PT - English 1 - Q3
No ratings yet
PT - English 1 - Q3
4 pages
Scherfi Gsvej 8, DK-2100 Copenhagen Ø, Denmark Tel.: +45 39 17 17 17. Fax: +45 39 17 18 18. E-Mail: Postmaster@euro - Who.int Web Site: WWW - Euro.who - Int
No ratings yet
Scherfi Gsvej 8, DK-2100 Copenhagen Ø, Denmark Tel.: +45 39 17 17 17. Fax: +45 39 17 18 18. E-Mail: Postmaster@euro - Who.int Web Site: WWW - Euro.who - Int
205 pages
Qualification and Validation
No ratings yet
Qualification and Validation
45 pages