0% found this document useful (0 votes)
50 views50 pages

Anova Zoom

The document discusses analysis of variance (ANOVA) and provides an example. It analyzes the effect of different hardwood concentrations (5%, 10%, 15%, 20%) on the tensile strength of paper bags. Samples were taken from each concentration and their tensile strengths measured. The sums of squares for treatment, error, and total are calculated, along with their corresponding degrees of freedom. ANOVA is used to test if hardwood concentration has an effect on tensile strength.

Uploaded by

Minh Khánh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views50 pages

Anova Zoom

The document discusses analysis of variance (ANOVA) and provides an example. It analyzes the effect of different hardwood concentrations (5%, 10%, 15%, 20%) on the tensile strength of paper bags. Samples were taken from each concentration and their tensile strengths measured. The sums of squares for treatment, error, and total are calculated, along with their corresponding degrees of freedom. ANOVA is used to test if hardwood concentration has an effect on tensile strength.

Uploaded by

Minh Khánh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

HCMC University of Technology

Probability and
Dung Nguyen Statistics

Anova
Outline I
1 Anova

2 Post-hoc analysis

3 Further discussion

Dung Nguyen Probability and Statistics 2/50


Anova

1 Anova

Dung Nguyen Probability and Statistics 3/50


Anova

The ANalysis Of VAriance (ANOVA): the


analysis of quantitative responses from
experimental units.
1 The effects of (five) different brands

of gasoline on automobile engine


operating efficiency (mpg).
2 The effects of the presence of (four)

different sugar solutions (glucose,


sucrose, fructose, and a mixture of the
three) on bacterial growth.
Dung Nguyen Probability and Statistics 4/50
Anova

3 Whether hardwood concentration in pulp


(%) has an effect on tensile strength of
bags made from the pulp.
4 Whether the color density of fabric
specimens depends on the amount of dye
used.

Dung Nguyen Probability and Statistics 5/50


Anova

Example 1
A manufacturer of paper used for making
grocery bags is interested in improving
the product’s tensile strength. Product
engineering believes that tensile strength
is a function of the hardwood
concentration in the pulp and that the
range of hardwood concentrations of
practical interest is between 5% and 20%.
A team of engineers responsible for the
study decides to investigate four levels
Dung Nguyen Probability and Statistics 6/50
Anova

of hardwood concentration: 5%, 10%, 15%,


and 20%. They decide to make up six test
specimens at each concentration level by
using a pilot plant. All 24 specimens are
tested on a laboratory tensile tester in
random order.
Hardwood concentration Tensile strength
5% 7 8 15 11 9 10
10% 12 17 13 18 19 15
15% 14 18 19 17 16 18
20% 19 25 22 23 18 20
Dung Nguyen Probability and Statistics 7/50
Anova

Hardwood Tensile strength Sum Average


concentration
5% 7 8 15 11 9 10 60 10.00
10% 12 17 13 18 19 15 94 15.67
15% 14 18 19 17 16 18 102 17.00
20% 19 25 22 23 18 20 127 21.17
383 15.99
The levels of the factor: treatments.
Each treatment: observations or
replicates.
Balanced design vs. Unbalanced design
Dung Nguyen Probability and Statistics 8/50
Anova

Target
Effect of “Hardwood concentration” on
“Tensile strength”.
H0 : µ1 = µ2 = µ3 = µ4 versus H1 : ∃µi ̸= µk .

Dung Nguyen Probability and Statistics 9/50


Anova

Group 1 x11 x12 ... x1N1


Group 2 x21 x22 ... x2N2
... ... ... ... ...
Group I xI1 xI2 ... xINI

H0 : µ1 = µ2 = · · · = µI versus H1 : ∃µi ̸= µk .

Dung Nguyen Probability and Statistics 10/50


Anova

X 1, . . . , X I : sample sums and sample means


X & X: grand sum & grand mean
Ni
all group
Ni
sum I X Ni I Ni
X 1 X X 1 XX
Xi = Xij, X i = Xij, X = Xij, X = Xij.
j=1
N i j=1 i=1 j=1
N i=1 j=1
Sum Average
Group 1 x11 x12 ... x1N1 X1 X1
Group 2 x21 x22 ... x2N2 X2 X2
... ... ... ... ... ... ...
Group I xI1 xI2 ... xINI XI XI
X X
Dung Nguyen Probability and Statistics 11/50
Anova

Assumptions and Objective


The I population or treatment
distributions are all normal with the same
variance σ 2:
Xij ∼ N(µi, σ 2), E(Xij) = µi, V(Xij) = σ 2.

Xij = µi + ϵij, ϵij ∼ N(0, σ 2)


H0 : µ1 = µ2 = · · · = µI versus H1 : ∃µi ̸= µk
Dung Nguyen Probability and Statistics 12/50
The concept of Anova

10 x 1 = 0, x 2 = 10.

Are µ1 and µ2
0 the same or
different?

Group Responses Av.


I 1 -1 2 -2 1.5 -1.5 0.5 -0.5 0
II 11 9 12 8 11.5 8.5 10.5 9.5 10
The concept of Anova

x 1 = 0, x 2 = 10.
10
Are µ1 and µ2
0 the same or
different?

Group Responses Av.


I 5 -5 10 -10 7.5 -7.5 2.5 -2.5 0
II 15 5 20 0 17.5 3.5 12.5 7.5 10
Anova

The concept of Anova

10 10

0 0

Group Responses Av. Group Responses Av.


I 1 -1 2 -2 1.5 -1.5 0.5 -0.5 0 I 5 -5 10 -10 7.5 -7.5 2.5 -2.5 0
II 11 9 12 8 11.5 8.5 10.5 9.5 10 II 15 5 20 0 17.5 3.5 12.5 7.5 10
Dung Nguyen Probability and Statistics 15/50
Anova

Treatment sum of squares

21.17 The treatment SOS


I X
X Ni

17.00 SSTr = (X i − X)2.


15.67 i=1 j=1
Variation between
10.00 groups.

Dung Nguyen Probability and Statistics 16/50


Anova

Error sum of squares

21.17
The error SOS
I X
X Ni

17.00 SSE = (Xij − X i)2.


15.67 i=1 j=1
Variation within
10.00 groups.

Dung Nguyen Probability and Statistics 17/50


Anova

Total sum of squares

The total SOS


X Ni
I X
15.99 SST = (Xij − X)2.
i=1 j=1
Total variation.

Dung Nguyen Probability and Statistics 18/50


Sums of squares
XX H.C. Tensile strength Sum Av.
SSTr = (X i − X)2, 5% 7 8 15 11 9 10 60 10.00
XX 10% 12 17 13 18 19 15 94 15.67
SSE = (Xij − X i)2, 15% 14 18 19 17 16 18 102 17.00
XX
SST = (Xij − X)2. 20% 19 25 22 23 18 20 127 21.17
383 15.99

21.17 21.17

17.00 17.00
15.67 15.67 15.99

10.00 10.00

(a) SSTr (b) SSE (c) SST


Hardwood Tensile strength Sum Av.
concentration
5% 7 8 15 11 9 10 60 10.00
10% 12 17 13 18 19 15 94 15.67
15% 14 18 19 17 16 18 102 17.00
20% 19 25 22 23 18 20 127 21.17
383 15.99
SSTr = (10.00 − 15.99)2 + · · · + (10.00 − 15.99)2
+ (15.67 − 15.99)2 + · · · + (15.67 − 15.99)2
+ (17.00 − 15.99)2 + · · · + (17.00 − 15.99)2
+ (21.17 − 15.99)2 + · · · + (21.17 − 15.99)2
= 382.7917. 6 times
Hardwood Tensile strength Sum Av.
concentration
5% 7 8 15 11 9 10 60 10.00
10% 12 17 13 18 19 15 94 15.67
15% 14 18 19 17 16 18 102 17.00
20% 19 25 22 23 18 20 127 21.17
383 15.99
SSE = (7 − 10.00)2 + · · · + (10 − 10.00)2
+ (12 − 15.67)2 + · · · + (15 − 15.67)2
+ (14 − 17.00)2 + · · · + (18 − 17.00)2
+ (19 − 21.17)2 + · · · + (20 − 21.17)2
= 130.1667.
Hardwood Tensile strength Sum Av.
concentration
5% 7 8 15 11 9 10 60 10.00
10% 12 17 13 18 19 15 94 15.67
15% 14 18 19 17 16 18 102 17.00
20% 19 25 22 23 18 20 127 21.17
383 15.99
SST = (7 − 15.99)2 + · · · + (10 − 15.99)2
+ (12 − 15.99)2 + · · · + (15 − 15.99)2
+ (14 − 15.99)2 + · · · + (18 − 15.99)2
+ (19 − 15.99)2 + · · · + (20 − 15.99)2
= 512.9583.
Degree of freedom
H.C. Tensile strength Sum Av.
5% 7 8 15 11 9 10 60 10.00
10% 12 17 13 18 19 15 94 15.67
15% 14 18 19 17 16 18 102 17.00
20% 19 25 22 23 18 20 127 21.17
random
Sum of squares df
(X i − X)2 = 382.7917 4 − 1 = 3
PP
SSTr =
(Xij − X i)2 = 130.1667 5 × 4 = 20
PP
SSE =
(Xij − X)2 = 512.9583 24 − 1 = 23
PP
SST =
Anova

The Anova identity


SST = SSTr + SSE and df(SST) = df(SSTr) + df(SSE).

Sum of squares dfDefinition Computation


X
2
X
2 X2
Total (SST) N −1 (Xij − X) Xij −
i,j i,j
N
X X X2 X2
2 i
Treatment (SSTr) I − 1 (X i − X) −
i,j i
Ni N
X
Error (SSE) N −I (Xij − X i)2 SST − SSTr
i,j
Dung Nguyen Probability and Statistics 24/50
Hardwood Tensile strength Sum Av.
concentration
5% 7 8 15 11 9 10 60 10.00
10% 12 17 13 18 19 15 94 15.67
15% 14 18 19 17 16 18 102 17.00
20% 19 25 22 23 18 20 127 21.17
383 15.99

2 2 2 3832

SST = 7 + 8 + · · · + 20 − = 512.9583.
24
1 2 2 2 2
 3832
SSTr = 60 + 94 + 102 + 127 − = 382.7917.
6 24
SSE = 512.9583 − 382.7917 = 130.1667.
Anova

The statistic
The mean square for treatment:
MSTr = SSTr/ df(SSTr) = SSTr/(I − 1).
The mean square for error:
MSE = SSE/ df(SSE) = SSE/(N − I).
Consider the following
statistic
SSTr If H0 is true then
MSTr −1. F ∼ F(I − 1, N − I).
F= = ISSE
MSE
N −I
Dung Nguyen Probability and Statistics 26/50
Anova

ANOVA Table

Source of Df Sum of Mean F


variation squares square
SSTr MSTr
Treatment I − 1 SSTr MSTr =
I − 1 MSE
SSE
Error N − I SSE MSE =
N −I
Total N − 1 SST

Rejection region F ≥ Fα,I−1,N −I


Dung Nguyen Probability and Statistics 27/50
Anova

ANOVA Table
Source of Df Sum of Mean F
variation squares square
Treatment 3 382.79 127.60 19.60
Error 20 130.17 6.51
Total 23 512.96

Rejection region F ≥ 3.10

Dung Nguyen Probability and Statistics 28/50


Anova

Example 2
Consider the following computer output for
a balanced design.
Source of Df Sum of Mean F
variation squares square
Treatment ? ? 39.1 ?
Error ? 396.8 ?
Total 19 514.2
Fill in the missing information in the
ANOVA table and make a conclusion about
differences in the factor-level means.
Dung Nguyen Probability and Statistics 29/50
Anova

Solution
Source of Df Sum of Mean F
variation squares square
Treatment 3 117.4 39.1 1.5766
Error 16 396.8 24.8
Total 19 514.2
F0.05,3,16 ≈ 3.24 =⇒ Fail to reject H0.

Dung Nguyen Probability and Statistics 30/50


Anova

Example 3
Consider the following computer output for
a balanced experiment. The factor was
tested over four levels. Fill in the
missing information in the ANOVA table and
make a conclusion.
Source of Df Sum of Mean F
variation squares square
Treatment ? ? 330.4716 4.42
Error ? ? ?
Total 31 ?
Dung Nguyen Probability and Statistics 31/50
Anova

Solution
Source of Df Sum of Mean F
variation squares square
Treatment 3 991.4148 330.4716 4.42
Error 28 2093.485 74.76733
Total 31 3084.9
F0.05,3,16 ≈ 2.95 =⇒ Reject H0.

Dung Nguyen Probability and Statistics 32/50


Post-hoc analysis

2 Post-hoc analysis

Dung Nguyen Probability and Statistics 33/50


Confidence Intervals
Confidence Interval on a Treatment Mean:

µi = X i ± tα/2si/ Ni.
H.C. Tensile strength Sum Av. SD
5% 7 8 15 11 9 10 60 10.00 2.8284
10% 12 17 13 18 19 15 94 15.67 2.8047
15% 14 18 19 17 16 18 102 17.00 1.7889
20% 19 25 22 23 18 20 127 21.17 2.6394
383 15.99

µ1 = 10.00 ± 2.9682. µ3 = 17.00 ± 1.8773.


µ2 = 15.67 ± 2.9434. µ4 = 21.17 ± 2.7699.
Post-hoc analysis

Confidence Intervals
σ 2 ≈ MSE

Confidence Interval on a Treatment r Mean:


MSE
µi = X i ± tα/2 sei, sei =
Ni
p
MOE = t0.025 se = 2.086 ∗ 6.51/6 = 2.1728
µ1 = 10.00 ± 2.1728. µ3 = 17.00 ± 2.1728.
µ2 = 15.67 ± 2.1728. µ4 = 21.17 ± 2.1728.
Dung Nguyen Probability and Statistics 35/50
Post-hoc analysis

Confidence intervals the


differences in means
r
MSE MSE
µi −µk = (X i −X k )±LSDik , LSDik = tα/2 + .
Ni Nk
r
MSE MSE p
LSD = t0.025 + = 2.086 2(6.51)/6 = 3.07.
6 6

µ1 − µ2 = (10.00 − 15.67) ± 3.07.


µ2 − µ3 = (15.67 − 17.00) ± 3.07.
µ3 − µ4 = (17.00 − 21.17) ± 3.07.
Dung Nguyen Probability and Statistics 36/50
Post-hoc analysis

Multiple Comparisons
The comparisons among the observed
treatment averages are as follows
(LSD=3.07):
4 vs. 1 = 21.17 - 10.00 = 11.17 > 3.07
4 vs. 2 = 21.17 - 15.67 = 5.50 > 3.07
4 vs. 3 = 21.17 - 17.00 = 4.17 > 3.07
3 vs. 1 = 17.00 - 10.00 = 7.00 > 3.07
3 vs. 2 = 17.00 - 15.67 = 1.33 < 3.07
2 vs. 1 = 15.67 - 10.00 = 5.67 > 3.07
Dung Nguyen Probability and Statistics 37/50
Further discussion

3 Further discussion
Anova vs t-test
Random-effect models
RCBD

Dung Nguyen Probability and Statistics 38/50


Further discussion Anova vs t-test

Anova vs t-test
Can Anova replace t-test?
Can multiple t-test replace Anova?

Dung Nguyen Probability and Statistics 39/50


Comparison T-TEST ANOVA
variable
Definition compare the compare the
means of two means of more
population than two popu-
groups. lation groups.
Error less likely has more error
to commit an risks.
error.
Test double-sided one-sided test
or single-sided due to no nega-
test. tive variance.
Further discussion Random-effect models

Random-Effect Models
In Montgomery’s book, he describes a
single-factor experiment involving the
random-effects model in which a textile
manufacturing company weaves a fabric on a
large number of looms. The company is
interested in loom-to-loom variability in
tensile strength.

Dung Nguyen Probability and Statistics 41/50


Further discussion Random-effect models

To investigate this variability, a


manufacturing engineer selects four looms
at random and makes four strength
determinations on fabric samples chosen
Loom Tensile strength
1 98 97 99 96
2 91 90 93 92
3 96 95 97 95
4 95 96 99 98

Dung Nguyen Probability and Statistics 42/50


Further discussion Random-effect models

Solution

Loom Tensile strength Sum Average


1 98 97 99 96 390 97.5
2 91 90 93 92 366 91.5
3 96 95 97 95 383 95.8
4 95 96 99 98 388 97.0
1527 95.45

Dung Nguyen Probability and Statistics 43/50


Further discussion Random-effect models

Solution
ANOVA table:
Source of Df Sum of Mean F
variation squares square
Loom 3 89.188 29.729 16.183
Error 12 22.045 1.837 (> 5.953)
Total 15 111.938

Dung Nguyen Probability and Statistics 44/50


Further discussion Random-effect models

Fixed-Effect vs Random-Effect
Loom Tensile
H.C. Tensile strength
strength
5% 7 8 15 11 9 10
1 98 97 99 96
10% 12 17 13 18 19 15
2 91 90 93 92
15% 14 18 19 17 16 18
3 96 95 97 95
20% 19 25 22 23 18 20
4 95 96 99 98

Dung Nguyen Probability and Statistics 45/50


Further discussion RCBD

Randomized Complete Block


Design (RCBD)
An experiment was performed to determine
the effect of four different chemicals on
the strength of a fabric. These chemicals
are used as part of the permanent press
finishing process. Five fabric samples
were selected, and an RCBD was run by
testing each chemical type once in random
order on each fabric sample.
Dung Nguyen Probability and Statistics 46/50
Further discussion RCBD

The data are shown in the following table.


We will test for differences in means
using an ANOVA with α = 0.01.
Fabric Sample
Chemical Type 1 2 3 4 5
1 1.3 1.6 0.5 1.2 1.1
2 2.2 2.4 0.4 2.0 1.8
3 1.8 1.7 0.6 1.5 1.3
4 3.9 4.4 2.0 4.1 3.4

Dung Nguyen Probability and Statistics 47/50


Further discussion RCBD

Solution
Fabric Sample
Chemical 1 2 3 4
5 Sum Av.
Type
1 1.3 1.6 0.5 1.2 1.1 5.7 1.14
2 2.2 2.4 0.4 2.0 1.8 8.8 1.76
3 1.8 1.7 0.6 1.5 1.3 6.9 1.38
4 3.9 4.4 2.0 4.1 3.4 17.8 3.56
Total 9.2 10.1 3.5 8.8 7.6
Average 2.30 2.53 0.88 2.20 1.90
Dung Nguyen Probability and Statistics 48/50
Further discussion RCBD

Solution
Anova table:
Source of Df Sum of Mean F
variation squares square
Chemical types 3 18.08 6.01 75.13
Fabric samples 4 6.69 1.67 (> 5.95)
Error 12 0.96 0.08
Total 19 25.69

Dung Nguyen Probability and Statistics 49/50


Summary

Summary
Anova
Post-hoc analysis
Confidence interval
Multiple comparison
Further discussion
Anova vs t-test
Random effect
RCBD

Dung Nguyen Probability and Statistics 50/50

You might also like